Wikidata: Events SPARQL Query¶

Alexander Dunkel, Institute of Cartography, TU Dresden

Visualization of events (for Nevada example) queried from Wikidata using SPARQL.

Preparations¶

Create environment

!python -m venv /envs/wikidata_venv

Install qwikidata in a venv and link the Python Kernel to Jupyter Lab.

%%bash
if [ ! -d "/envs/wikidata_venv/lib/python3.10/site-packages/qwikidata" ]; then
    /envs/wikidata_venv/bin/python -m pip install qwikidata ipykernel pandas > /dev/null 2>&1
else
  echo "Already installed."
fi
# link
if [ ! -d "/root/.local/share/jupyter/kernels/qwikidata" ]; then
    echo "Linking environment to jupyter"
    /envs/wikidata_venv/bin/python -m ipykernel install --user --name=qwikidata
else
  echo "Already linked."
fi

Already installed.
Linking environment to jupyter
Installed kernelspec qwikidata in /root/.local/share/jupyter/kernels/qwikidata

Hit F5 and select the qwikidata Kernel on the top-right corner of Jupyter Lab.

See the package versions used below.

Query wikidata using SPARQL¶

import dependencies

import csv
import pandas as pd
from qwikidata.sparql import return_sparql_query_results

Define query:

use distance query to Nevada (centroid)
filter based on country geometry is done later in Geopandas
see SPARQL examples here and here

Parameters¶

There are two parameters that needs modification, the entity name that is used to get the centroid (location), for filtering based on geodistance (the second parameter).

## Example 1:
# loc_name = "Nevada"
# entity = "Q1227"
# geodistance = 400

## Example 2:
loc_name = "Leipzig"
geodistance = 80
entity = "Q2079" # Leipzig, Germany

sparql_query = f"""
#title: All events in {loc_name}, based on distance query ({geodistance})
SELECT ?event ?eventLabel ?date ?location ?eventDescription
WITH {{
  SELECT DISTINCT ?event ?date ?location
  WHERE {{
    # find events
    wd:{entity} wdt:P625 ?loc_ref. 
    ?event wdt:P31/wdt:P279* wd:Q1190554.
           # wdt:P17 wd:Q30;
    # with a point in time or start date
    OPTIONAL {{ ?event wdt:P585 ?date. }}
    OPTIONAL {{ ?event wdt:P580 ?date. }}
    ?event wdt:P625 ?location.
    FILTER(geof:distance(?location, ?loc_ref) < {geodistance}).
  }}
  LIMIT 5000
}} AS %i
WHERE {{
  INCLUDE %i
  SERVICE wikibase:label {{ bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en,de" .}}
}}
"""

%%time
result = return_sparql_query_results(sparql_query)

CPU times: user 26.7 ms, sys: 3.29 ms, total: 30 ms
Wall time: 47.2 s

Format and convert to pandas DataFrame

import dateutil.parser

event_list = []
for event in result["results"]["bindings"]:
    date_val = event.get('date')
    if date_val:
        date_val = date_val.get('value')
        date_val = pd.to_datetime(dateutil.parser.parse(date_val), errors = 'coerce')
    event_desc = event.get('eventDescription')
    if event_desc:
        event_desc = event['eventDescription']['value']
    event_tuple = (
        event['event']['value'],
        event['eventLabel']['value'],
        date_val,
        event['location']['value'],
        event_desc)
    event_list.append(event_tuple)

df = pd.DataFrame(event_list, columns=result['head']['vars'])

df.head()

print(len(df))

69

Store to disk

from pathlib import Path
OUTPUT = Path.cwd().parents[0] / "out" 
df.to_pickle(OUTPUT / f"wikidata_events_{loc_name.lower()}.pkl")

Visualize on a map¶

Select worker_env as the visualization environment.

%load_ext autoreload
%autoreload 2

Load dependencies

import sys
import pandas as pd
import geopandas as gp
from pathlib import Path
from shapely.geometry import Point
from shapely import wkt
module_path = str(Path.cwd().parents[0] / "py")
if module_path not in sys.path:
    sys.path.append(module_path)
from modules.base import tools

OUTPUT = Path.cwd().parents[0] / "out" 
df = pd.read_pickle(OUTPUT / f"wikidata_events_{loc_name.lower()}.pkl")

CRS_WGS = "epsg:4326"

df['geometry'] = df.location.apply(wkt.loads)
gdf = gp.GeoDataFrame(df, crs=CRS_WGS)

Get Shapefile for US States/ Germany

if loc_name == "Nevada":
    source_zip = "https://www2.census.gov/geo/tiger/GENZ2018/shp/"
    filename = "cb_2018_us_state_5m.zip"
    shapes_name = "cb_2018_us_state_5m.shp"
elif loc_name == "Leipzig":
    source_zip = "https://daten.gdz.bkg.bund.de/produkte/vg/vg2500/aktuell/"
    filename = "vg2500_12-31.utm32s.shape.zip"
    shapes_name = "vg2500_12-31.utm32s.shape/vg2500/VG2500_LAN.shp"

SHAPE_DIR = (OUTPUT / "shapes")
SHAPE_DIR.mkdir(exist_ok=True)

if not (SHAPE_DIR / shapes_name).exists():
    tools.get_zip_extract(uri=source_zip, filename=filename, output_path=SHAPE_DIR)
else:
    print("Already exists")

Already exists

shapes = gp.read_file(SHAPE_DIR / shapes_name)
shapes = shapes.to_crs("EPSG:4326")

ax = shapes.plot(color='none', edgecolor='black', linewidth=0.5)
ax = gdf.plot(ax=ax)
ax.set_axis_off()
buffer = 0.5
minx, miny, maxx, maxy = gdf.total_bounds
ax.set_xlim(minx-buffer, maxx+buffer)
ax.set_ylim(miny-buffer, maxy+buffer)

(50.28028, 52.395277777)

Highlight/Select all in Region¶

We want to filter those events whose location falls within the state boundary (Nevada, Saxony)

if loc_name == "Nevada":
    state_name = "Nevada"
    col_name = "NAME"
elif loc_name == "Leipzig":
    state_name = "Sachsen"
    col_name = "GEN"

sel_geom = shapes[shapes[col_name]==state_name].copy()

tools.drop_cols_except(df=sel_geom, columns_keep=["geometry", col_name])
sel_geom.rename(columns={col_name: "country"}, inplace=True)

gdf_overlay = gp.overlay(
    gdf, sel_geom,
    how='intersection')

ax = shapes.plot(color='none', edgecolor='black', linewidth=0.5)
ax = gdf.plot(ax=ax)
ax = gdf_overlay.plot(ax=ax, color='red')
ax.set_axis_off()
buffer = 1
minx, miny, maxx, maxy = gdf.total_bounds
ax.set_xlim(minx-buffer, maxx+buffer)
ax.set_ylim(miny-buffer, maxy+buffer)

(49.78028, 52.895277777)

print(f'{len(gdf_overlay)} events queried from wikidata that are located in Nevada')

41 events queried from wikidata that are located in Nevada

gdf_overlay.head(20)

Store results as CSV

gdf_overlay.to_csv(OUTPUT / f"wikidata_events_{loc_name.lower()}.csv")

Create notebook HTML¶

!jupyter nbconvert --to html_toc \
    --output-dir=../resources/html/ ./03_wikidata_event_query.ipynb \
    --output 03_wikidata_event_query_{loc_name.lower()} \
    --template=../nbconvert.tpl \
    --ExtractOutputPreprocessor.enabled=False >&- 2>&-

	event	eventLabel	date	location	eventDescription
0	http://www.wikidata.org/entity/Q16854674	MENALIB	NaT	Point(11.97 51.49)	Webportal des Fachinformationsdienstes Nahost-...
1	http://www.wikidata.org/entity/Q24730444	adlr.link	NaT	Point(12.368194444 51.3325)	web portal of the Specialised Information Serv...
2	http://www.wikidata.org/entity/Q28245008	Specialised Information Service Middle East, N...	NaT	Point(11.97 51.49)	MENALIB – Web Portal of the Specialised Inform...
3	http://www.wikidata.org/entity/Q65952781	Staatliche Studienakademie Leipzig	NaT	Point(12.30261 51.31028)	None
4	http://www.wikidata.org/entity/Q96623670	Staatliche Studienakademie Riesa	NaT	Point(13.28919 51.31631)	None

	event	eventLabel	date	location	eventDescription	country	geometry
0	http://www.wikidata.org/entity/Q24730444	adlr.link	NaT	Point(12.368194444 51.3325)	web portal of the Specialised Information Serv...	Sachsen	POINT (12.36819 51.33250)
1	http://www.wikidata.org/entity/Q65952781	Staatliche Studienakademie Leipzig	NaT	Point(12.30261 51.31028)	None	Sachsen	POINT (12.30261 51.31028)
2	http://www.wikidata.org/entity/Q96623670	Staatliche Studienakademie Riesa	NaT	Point(13.28919 51.31631)	None	Sachsen	POINT (13.28919 51.31631)
3	http://www.wikidata.org/entity/Q828773	Berufsakademie Glauchau	NaT	Point(12.5567 50.8228)	educational institution	Sachsen	POINT (12.55670 50.82280)
4	http://www.wikidata.org/entity/Q19963896	N’Ostalgiemuseum	NaT	Point(12.37845 51.342030555)	museum in Germany	Sachsen	POINT (12.37845 51.34203)
5	http://www.wikidata.org/entity/Q1082822	2008 German motorcycle Grand Prix	2008-07-13 00:00:00+00:00	Point(12.6887 50.7915)	None	Sachsen	POINT (12.68870 50.79150)
6	http://www.wikidata.org/entity/Q1682931	Leipziger Kleinmesse	NaT	Point(12.34305556 51.34055556)	volksfestartige Veranstaltung in Leipzig	Sachsen	POINT (12.34306 51.34056)
7	http://www.wikidata.org/entity/Q14544300	Nachtdigital	NaT	Point(13.09080833 51.40503611)	Musikfestival für Techno und House	Sachsen	POINT (13.09081 51.40504)
8	http://www.wikidata.org/entity/Q15060352	Th!nk?	NaT	Point(12.33527778 51.26944444)	music festival near Leipzig, Germany	Sachsen	POINT (12.33528 51.26944)
9	http://www.wikidata.org/entity/Q2311733	Splash!	NaT	Point(12.81416667 50.83694444)	hip hop and reggae festival in Germany	Sachsen	POINT (12.81417 50.83694)
10	http://www.wikidata.org/entity/Q836514	Wave-Gotik-Treffen	NaT	Point(12.37472222 51.34027778)	music festival in Leipzig. Germany	Sachsen	POINT (12.37472 51.34028)
11	http://www.wikidata.org/entity/Q1340529	Endless Summer Open Air	1996-01-01 00:00:00+00:00	Point(12.967601 51.5465796)	music festival	Sachsen	POINT (12.96760 51.54658)
12	http://www.wikidata.org/entity/Q60524551	Siege of Gana	NaT	Point(13.216666666 51.25)	929 CE German-Slavic military conflict	Sachsen	POINT (13.21667 51.25000)
13	http://www.wikidata.org/entity/Q815212	Siege of Torgau	1813-10-18 00:00:00+00:00	Point(13.005555555 51.560277777)	1813 siege during the War of the Sixth Coalition	Sachsen	POINT (13.00556 51.56028)
14	http://www.wikidata.org/entity/Q1069580	Chemnitz Linux Days	NaT	Point(12.92972222 50.81305556)	event sequence	Sachsen	POINT (12.92972 50.81306)
15	http://www.wikidata.org/entity/Q107157137	1995 Breitenau rail accident	1995-05-23 00:00:00+00:00	Point(13.1585209 50.8387995)	Kollision zweier Reisezüge mit einem Bagger im...	Sachsen	POINT (13.15852 50.83880)
16	http://www.wikidata.org/entity/Q571730	Leipzig Book Fair	NaT	Point(12.40277778 51.39666667)	recurring event	Sachsen	POINT (12.40278 51.39667)
17	http://www.wikidata.org/entity/Q15110205	Eisenbahnunfall von Braunsdorf	1913-12-14 00:00:00+00:00	Point(13.0241 50.8884)	Eisenbahnunfall nach Bergrutsch im Jahr 1913 b...	Sachsen	POINT (13.02410 50.88840)
18	http://www.wikidata.org/entity/Q228536	Eisenbahnunfall von Schweinsburg-Culten	1972-10-30 00:00:00+00:00	Point(12.36544 50.78028)	train wreck	Sachsen	POINT (12.36544 50.78028)
19	http://www.wikidata.org/entity/Q1312143	Bornitz train collision	1956-02-25 00:00:00+00:00	Point(13.176 51.3027)	train wreck	Sachsen	POINT (13.17600 51.30270)