Wikidata: Events SPARQL Query¶

Alexander Dunkel, Institute of Cartography, TU Dresden

Visualization of events (for Nevada example) queried from Wikidata using SPARQL.

Originally, we intended to discuss the idea of "Event Inventories" in the conference paper. However, this part was cut due to limited space and additional work required. This is part of the draft text that was removed, describing results below:

Finally, we explored the idea of "event inventories" based on the explicit inclusion of structured VGI. We queried all Wikipedia entities of the type "occurrence" with an explicit spatial reference within two areas covering Nevada, USA, and the state of Saxony, Germany. A large number of classical event types were found, such as festivals, sports events, and city fairs, or related sites of historical sieges. However, the results also include many events that would be difficult to consider as part of a temporal landscape scenic resource inventory, such as accidents, wildfire sites, or homicides and school shootings. We have not included these results here because more work is needed to integrate these data.

Arthur et al. (1977) first used the category of "descriptive inventory" to describe methods in which features that are thought to contribute to the visual character of a landscape are first systematically recorded and then aggregated and related to each other to estimate an overall value. These methods are still used in practice for landscape character assessment, mainly because of their ease of use. As a temporal counterpart, we propose "event inventories" as a solution to filter for known temporal features of landscapes at different levels of specificity, such as the waterfalls in Yosemite that are most impressive in spring, or the regular pattern of California poppies or Nevada deserts in bloom. As a starting point to mitigate the challenge of varying levels of specificity in temporal landscape scenic resources, curated event inventories, such as those derived from Wikipedia, can be used. Such positive filter lists can then be explored and monitored using customized workflows and integrated social media and VGI data. In the fields of landscape and urban planning, event inventories can help to better understand the unique transient characteristics of places, areas and landscapes, to protect and develop specific ephemeral scenic values, or to propose actions to change negative influences.

Preparations¶

Create environment

!python -m venv /envs/wikidata_venv

Install qwikidata in a venv and link the Python Kernel to Jupyter Lab.

%%bash
if [ ! -d "/envs/wikidata_venv/lib/python3.10/site-packages/qwikidata" ]; then
    /envs/wikidata_venv/bin/python -m pip install qwikidata ipykernel pandas > /dev/null 2>&1
else
  echo "Already installed."
fi
# link
if [ ! -d "/root/.local/share/jupyter/kernels/qwikidata" ]; then
    echo "Linking environment to jupyter"
    /envs/wikidata_venv/bin/python -m ipykernel install --user --name=qwikidata
else
  echo "Already linked."
fi

Already installed.
Linking environment to jupyter
Installed kernelspec qwikidata in /root/.local/share/jupyter/kernels/qwikidata

Hit F5 and select the qwikidata Kernel on the top-right corner of Jupyter Lab.

See the package versions used below.

Query wikidata using SPARQL¶

import dependencies

import csv
import pandas as pd
from qwikidata.sparql import return_sparql_query_results

Define query:

use distance query to Nevada (centroid)
filter based on country geometry is done later in Geopandas
see SPARQL examples here and here

Parameters¶

There are two parameters that needs modification, the entity name that is used to get the centroid (location), for filtering based on geodistance (the second parameter).

## Example 1:
loc_name = "Nevada"
entity = "Q1227"
geodistance = 400

## Example 2:
# loc_name = "Leipzig"
# geodistance = 80
# entity = "Q2079" # Leipzig, Germany

sparql_query = f"""
#title: All events in {loc_name}, based on distance query ({geodistance})
SELECT ?event ?eventLabel ?date ?location ?eventDescription
WITH {{
  SELECT DISTINCT ?event ?date ?location
  WHERE {{
    # find events
    wd:{entity} wdt:P625 ?loc_ref. 
    ?event wdt:P31/wdt:P279* wd:Q1190554.
           # wdt:P17 wd:Q30;
    # with a point in time or start date
    OPTIONAL {{ ?event wdt:P585 ?date. }}
    OPTIONAL {{ ?event wdt:P580 ?date. }}
    ?event wdt:P625 ?location.
    FILTER(geof:distance(?location, ?loc_ref) < {geodistance}).
  }}
  LIMIT 5000
}} AS %i
WHERE {{
  INCLUDE %i
  SERVICE wikibase:label {{ bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en,de" .}}
}}
"""

%%time
result = return_sparql_query_results(sparql_query)

CPU times: user 26.6 ms, sys: 0 ns, total: 26.6 ms
Wall time: 39.8 s

Format and convert to pandas DataFrame

import dateutil.parser

event_list = []
for event in result["results"]["bindings"]:
    date_val = event.get('date')
    if date_val:
        date_val = date_val.get('value')
        date_val = pd.to_datetime(dateutil.parser.parse(date_val), errors = 'coerce')
    event_desc = event.get('eventDescription')
    if event_desc:
        event_desc = event['eventDescription']['value']
    event_tuple = (
        event['event']['value'],
        event['eventLabel']['value'],
        date_val,
        event['location']['value'],
        event_desc)
    event_list.append(event_tuple)

df = pd.DataFrame(event_list, columns=result['head']['vars'])

df.head()

print(len(df))

328

Store to disk

from pathlib import Path
OUTPUT = Path.cwd().parents[0] / "out" 
df.to_pickle(OUTPUT / f"wikidata_events_{loc_name.lower()}.pkl")

Visualize on a map¶

Select worker_env as the visualization environment.

%load_ext autoreload
%autoreload 2

Load dependencies

import sys
import pandas as pd
import geopandas as gp
from pathlib import Path
from shapely.geometry import Point
from shapely import wkt
module_path = str(Path.cwd().parents[0] / "py")
if module_path not in sys.path:
    sys.path.append(module_path)
from modules.base import tools

OUTPUT = Path.cwd().parents[0] / "out" 
df = pd.read_pickle(OUTPUT / f"wikidata_events_{loc_name.lower()}.pkl")

CRS_WGS = "epsg:4326"

df['geometry'] = df.location.apply(wkt.loads)
gdf = gp.GeoDataFrame(df, crs=CRS_WGS)

Get Shapefile for US States/ Germany

if loc_name == "Nevada":
    source_zip = "https://www2.census.gov/geo/tiger/GENZ2018/shp/"
    filename = "cb_2018_us_state_5m.zip"
    shapes_name = "cb_2018_us_state_5m.shp"
elif loc_name == "Leipzig":
    source_zip = "https://daten.gdz.bkg.bund.de/produkte/vg/vg2500/aktuell/"
    filename = "vg2500_12-31.utm32s.shape.zip"
    shapes_name = "vg2500_12-31.utm32s.shape/vg2500/VG2500_LAN.shp"

SHAPE_DIR = (OUTPUT / "shapes")
SHAPE_DIR.mkdir(exist_ok=True)

if not (SHAPE_DIR / shapes_name).exists():
    tools.get_zip_extract(uri=source_zip, filename=filename, output_path=SHAPE_DIR)
else:
    print("Already exists")

Already exists

shapes = gp.read_file(SHAPE_DIR / shapes_name)
shapes = shapes.to_crs("EPSG:4326")

ax = shapes.plot(color='none', edgecolor='black', linewidth=0.5)
ax = gdf.plot(ax=ax)
ax.set_axis_off()
buffer = 0.5
minx, miny, maxx, maxy = gdf.total_bounds
ax.set_xlim(minx-buffer, maxx+buffer)
ax.set_ylim(miny-buffer, maxy+buffer)

(35.117, 43.0)

Highlight/Select all in Region¶

We want to filter those events whose location falls within the state boundary (Nevada, Saxony)

if loc_name == "Nevada":
    state_name = "Nevada"
    col_name = "NAME"
elif loc_name == "Leipzig":
    state_name = "Sachsen"
    col_name = "GEN"

sel_geom = shapes[shapes[col_name]==state_name].copy()

tools.drop_cols_except(df=sel_geom, columns_keep=["geometry", col_name])
sel_geom.rename(columns={col_name: "country"}, inplace=True)

gdf_overlay = gp.overlay(
    gdf, sel_geom,
    how='intersection')

ax = shapes.plot(color='none', edgecolor='black', linewidth=0.5)
ax = gdf.plot(ax=ax)
ax = gdf_overlay.plot(ax=ax, color='red')
ax.set_axis_off()
buffer = 1
minx, miny, maxx, maxy = gdf.total_bounds
ax.set_xlim(minx-buffer, maxx+buffer)
ax.set_ylim(miny-buffer, maxy+buffer)

(34.617, 43.5)

print(f'{len(gdf_overlay)} events queried from wikidata that are located in {loc_name}')

117 events queried from wikidata that are located in Nevada

gdf_overlay.head(20)

Store results as CSV

gdf_overlay.to_csv(OUTPUT / f"wikidata_events_{loc_name.lower()}.csv")

Create notebook HTML¶

!jupyter nbconvert --to html_toc \
    --output-dir=../resources/html/ ./03_wikidata_event_query.ipynb \
    --output 03_wikidata_event_query_{loc_name.lower()} \
    --template=../nbconvert.tpl \
    --ExtractOutputPreprocessor.enabled=False >&- 2>&-

	event	eventLabel	date	location	eventDescription
0	http://www.wikidata.org/entity/Q116448291	California Revealed	NaT	Point(-121.49633 38.575783)	online project of archival resources
1	http://www.wikidata.org/entity/Q29098186	Hilton Grand Vacations Club	NaT	Point(-115.161261 36.140165)	hotel in Las Vegas, Nevada
2	http://www.wikidata.org/entity/Q29098186	Hilton Grand Vacations Club	NaT	Point(-115.160386 36.140174)	hotel in Las Vegas, Nevada
3	http://www.wikidata.org/entity/Q4602566	2004 Bridgestone 400	2004-09-25 00:00:00+00:00	Point(-115.01112 36.27134)	motor car race
4	http://www.wikidata.org/entity/Q16274840	1964 LPGA Championship	1964-01-01 00:00:00+00:00	Point(-115.125 36.128)	golf tournament

	event	eventLabel	date	location	eventDescription	country	geometry
0	http://www.wikidata.org/entity/Q29098186	Hilton Grand Vacations Club	NaT	Point(-115.161261 36.140165)	hotel in Las Vegas, Nevada	Nevada	POINT (-115.16126 36.14017)
1	http://www.wikidata.org/entity/Q29098186	Hilton Grand Vacations Club	NaT	Point(-115.160386 36.140174)	hotel in Las Vegas, Nevada	Nevada	POINT (-115.16039 36.14017)
2	http://www.wikidata.org/entity/Q4602566	2004 Bridgestone 400	2004-09-25 00:00:00+00:00	Point(-115.01112 36.27134)	motor car race	Nevada	POINT (-115.01112 36.27134)
3	http://www.wikidata.org/entity/Q16274840	1964 LPGA Championship	1964-01-01 00:00:00+00:00	Point(-115.125 36.128)	golf tournament	Nevada	POINT (-115.12500 36.12800)
4	http://www.wikidata.org/entity/Q4571929	1965 LPGA Championship	1965-01-01 00:00:00+00:00	Point(-115.125 36.128)	golf tournament	Nevada	POINT (-115.12500 36.12800)
5	http://www.wikidata.org/entity/Q4570360	1961 LPGA Championship	1961-01-01 00:00:00+00:00	Point(-115.125 36.128)	golf tournament	Nevada	POINT (-115.12500 36.12800)
6	http://www.wikidata.org/entity/Q4572336	1966 LPGA Championship	1966-01-01 00:00:00+00:00	Point(-115.125 36.128)	golf tournament	Nevada	POINT (-115.12500 36.12800)
7	http://www.wikidata.org/entity/Q4570751	1962 LPGA Championship	1962-01-01 00:00:00+00:00	Point(-115.125 36.128)	golf tournament	Nevada	POINT (-115.12500 36.12800)
8	http://www.wikidata.org/entity/Q4571127	1963 LPGA Championship	1963-01-01 00:00:00+00:00	Point(-115.125 36.128)	golf tournament	Nevada	POINT (-115.12500 36.12800)
9	http://www.wikidata.org/entity/Q111021622	3-Cushion World Cup 2022-2	2022-01-01 00:00:00+00:00	Point(-115.18708 36.116869)	Internationales Karambolageturnier	Nevada	POINT (-115.18708 36.11687)
10	http://www.wikidata.org/entity/Q24906942	Real World: Go Big or Go Home	NaT	Point(-115.140444444 36.170972222)	thirty-first season of Real World	Nevada	POINT (-115.14044 36.17097)
11	http://www.wikidata.org/entity/Q7759664	The Real World: Las Vegas, 2002 season	2002-09-17 00:00:00+00:00	Point(-115.194 36.1139)	twelth season of The Real World	Nevada	POINT (-115.19400 36.11390)
12	http://www.wikidata.org/entity/Q7759665	The Real World: Las Vegas, 2011 season	2011-03-09 00:00:00+00:00	Point(-115.154 36.11)	twenty-fifth season of The Real World	Nevada	POINT (-115.15400 36.11000)
13	http://www.wikidata.org/entity/Q104786210	2021 NHL Outdoor Games	NaT	Point(-119.949 38.968)	outdoor National Hockey League game	Nevada	POINT (-119.94900 38.96800)
14	http://www.wikidata.org/entity/Q25316469	1954 NCAA Skiing Championships	1954-01-01 00:00:00+00:00	Point(-119.872 39.318)	None	Nevada	POINT (-119.87200 39.31800)
15	http://www.wikidata.org/entity/Q15092916	Sparks Middle School shooting	NaT	Point(-119.76838889 39.55191667)	Shooting in Sparks, Nevada, on October 21, 2013	Nevada	POINT (-119.76839 39.55192)
16	http://www.wikidata.org/entity/Q15806674	Dreiband-Weltmeisterschaft 1978	1978-01-01 00:00:00+00:00	Point(-115.172816 36.114646)	33. Turnier des Karambolagebillards	Nevada	POINT (-115.17282 36.11465)
17	http://www.wikidata.org/entity/Q15806682	1986 UMB World Three-cushion Championship	1986-01-01 00:00:00+00:00	Point(-115.172816 36.114646)	41. Turnier des Karambolagebillards	Nevada	POINT (-115.17282 36.11465)
18	http://www.wikidata.org/entity/Q15806666	Dreiband-Weltmeisterschaft 1970	1970-01-01 00:00:00+00:00	Point(-115.172816 36.114646)	25. Turnier des Karambolagebillards	Nevada	POINT (-115.17282 36.11465)
19	http://www.wikidata.org/entity/Q6492580	Las Vegas Grind	NaT	Point(-115.193 36.1166)	ls Vegas Grind Festival	Nevada	POINT (-115.19300 36.11660)