Tag Maps rendering with Python and Mapnik ¶

Alexander Dunkel, TU Dresden; Institute of Cartography

TL;DR This notebook illustrates the process for rendering of tag maps shapefiles in Mapnik. This notebook can be run with the jupyter docker for cartography (docker tag jupyterlab:mapnik), which contains the stable Mapnik installation and Python bindings, but must be accessed through the system Python installation (/usr/bin/python3), not the conda environment.

The source files for this notebook are available in https://gitlab.vgiscience.de/ad/tagmaps-mapnik-jupyter.

Preparations¶

Import dependencies

from numba.core.errors import NumbaDeprecationWarning, NumbaPendingDeprecationWarning
import warnings

warnings.simplefilter('ignore', category=NumbaDeprecationWarning)
warnings.simplefilter('ignore', category=NumbaPendingDeprecationWarning)

import sys
import math
import fiona
import rasterio
import numpy as np
import geopandas as gp
import contextily as cx
import mapclassify as mc
import libpysal as lps
import geopandas as gp
import matplotlib.pyplot as plt
from IPython import display
from contextily import Place
from typing import Tuple, Optional
from rasterio.plot import show as rioshow
from pathlib import Path
from esda.getisord import G_Local
from shapely.geometry import shape

We are using a number of helper functions from py/modules/tools.py to classify values.

module_path = str(Path.cwd().parents[0] / "py")
if module_path not in sys.path:
    sys.path.append(module_path)
from modules.base import tools

%load_ext autoreload
%autoreload 2

Test Mapnik bindings

!/usr/bin/python3 -c "import mapnik;print(mapnik.__file__)"

/usr/lib/python3/dist-packages/mapnik/__init__.py

Global Parameters

Some parameters are used throughout this notebook. Adjust to your needs.

INPUT: Path = Path.cwd().parents[0] / "input" # define path to input and working directory (shapefiles, stylesheets etc.)
OUTPUT: Path = Path.cwd().parents[0] / "output" # define path to output directory (map, notebook html etc.)
TMP: Path = INPUT / "intermediate"
MAP_X: int = 2500 # x dimensions of the final map, in pixels
MAP_Y: int = 1400 # y dimensions of the final map, in pixels

for folder in [OUTPUT, INPUT, TMP / "bg"]:
    folder.mkdir(exist_ok=True, parents=True)

Download font set for Segoe UI Symbol Regular, Aerial Bold, and Noto Sans Regular

!wget https://raw.githubusercontent.com/mrbvrz/segoe-ui-linux/master/font/seguisym.ttf \
    https://raw.githubusercontent.com/foursquare/foursquair/master/src/assets/fonts/Arial%20Bold.ttf \
    https://github.com/openmaptiles/fonts/raw/master/noto-sans/NotoSans-Regular.ttf \
    --directory-prefix {TMP}/fonts -nc --quiet
!rm /fonts && ln -s {TMP}/fonts /fonts

Set global font path variable for mapnik.

!export MAPNIK_FONT_PATH=/fonts

Test creating a map¶

Data source: Shapefiles in input/shapefiles, created using TagMaps
- Original data from Flickr, Twitter and Instagram for the TUD Campus
- Clustered and aggregated to illustrate collective use of terms (allTagCluster.shp)
- and overall frequentation of places in the area (allLocationCluster.shp)
see this notebook for how to generate these files
A mapnik stylesheet input/tagmap_style.xml, containing the style information for how the data should be rendered
A python script input/tagmap_xml.py, using Mapnik Python bindings to process data and generate the map

The mapnik renderer can be accessed through Python bindings, available in the system python installation. Below, we use a python script that provides specific instructions to render a map in Mapnik.

%%time
!cd {INPUT} && /usr/bin/python3 ../input/tagmap_xml.py

CPU times: user 2.72 s, sys: 372 ms, total: 3.09 s
Wall time: 2min 21s

Load the generated image to jupyter

display.Image(f"{INPUT}/intermediate/tagmap_style.png")

Add Basemap¶

Get bounds from shapefile:

CRS_WGS = "epsg:4326"

gdf = gp.read_file(INPUT / "shapefiles_gg" / "allLocationCluster.shp")

BOUNDS = list(gdf.to_crs(CRS_WGS).total_bounds)

Contextily allows to plot existing basemaps in Python.

dresden = Place("Grosser Garten, Dresden", source=cx.providers.Esri.WorldImagery)
ax = dresden.plot()

It also provides options to store results as geotiff, see the docs.

buf = 0.006
# filename = f"{TMP}/bg/dresden_carto.tif"
filename = f"{TMP}/bg/dresden_meingruen.tif"
img, ext = cx.bounds2raster(
    dresden.w-buf, dresden.s, dresden.e+buf, dresden.n+0.005,
    # filename, source=cx.providers.CartoDB.PositronNoLabels,
    filename, source='https://maptiles.geo.tu-dresden.de/tiles/styles/meinGruen/{z}/{x}/{y}.png',
    ll=True)

fig, ax = plt.subplots(figsize=(8, 8))
ax.set_axis_off()
with rasterio.open(f"{TMP}/bg/dresden_carto.tif") as r:
    rioshow(r, ax=ax)

fig, ax = plt.subplots(figsize=(8, 8))
ax.set_axis_off()
with rasterio.open(f"{TMP}/bg/dresden_meingruen.tif") as r:
    rioshow(r, ax=ax)

Get the basemap at a higher resolution, using the zoom parameter, to get a higher level of detail.

zoom = 17
bg_name = f"{TMP}/bg/grossergarten_carto_{zoom}.tif"
if not Path(bg_name).exists():
    img, ext = cx.bounds2raster(
        dresden.w-buf, dresden.s, dresden.e+buf, dresden.n+0.005,
        bg_name, source=cx.providers.CartoDB.PositronNoLabels,
        ll=True, zoom=zoom)

bg_name = f"{TMP}/bg/grossergarten_meingruen_{zoom}.tif"
if not Path(bg_name).exists():
    img, ext = cx.bounds2raster(
        dresden.w-buf, dresden.s, dresden.e+buf, dresden.n+0.005,
        bg_name, source='https://maptiles.geo.tu-dresden.de/tiles/styles/meinGruen/{z}/{x}/{y}.png',
        ll=True, zoom=zoom)

Get bbox for map from cluster data

gdf = gp.read_file(INPUT / "shapefiles_gg" / "allTagCluster.shp", encoding='utf-8')
CRS_PROJ = gdf.crs
bbox_map = gdf.total_bounds.squeeze()
minx, miny = bbox_map[0], bbox_map[1]
maxx, maxy = bbox_map[2], bbox_map[3]

print(f'Projected bounds: {(minx, miny, maxx, maxy)}')

Projected bounds: (412002.4948451808, 5653846.735233658, 414409.2816068467, 5655987.42471083)

def add_buffer_bbox(
        bbox: Tuple[float,float,float,float], buffer: int) -> Tuple[float,float,float,float]:
    """Add buffer to bbox tuple (Meters)"""
    return (bbox[0]-buffer, bbox[1]-buffer, bbox[2]+buffer, bbox[3]+buffer)

bounds = add_buffer_bbox((minx, miny, maxx, maxy), -200)

print(bounds)

(412202.4948451808, 5654046.735233658, 414209.2816068467, 5655787.42471083)

get bbox from raster:

west, south, east, north = BOUNDS

project:

from pyproj import Transformer
transformer = Transformer.from_crs("EPSG:4326", "epsg:32633", always_xy=True)
points = [(west, south), (west, north), (east, north),  (east, south)]
points_proj = []
for pt in transformer.itransform(points):
    points_proj.append(pt)
west, south, east, north = points_proj[0][0], points_proj[3][1], points_proj[2][0], points_proj[1][1]
print(f'Projected bounds: {(west, south, east, north)}')

Projected bounds: (412003.78932344244, 5653872.362963473, 414386.1015206523, 5655951.9933490055)

Render tagmap with basemap¶

Below, use mapnik-cli to use map generation with parameters.

stylesheet = "tagmap_bg_style.xml"
output_name = "tagmap_bg_style.png"

%%time
!/usr/bin/python3 -m mapnik_cli \
    --stylesheet_name {stylesheet} \
    --output_name {output_name} \
    --map_dimensiony_x {MAP_X} \
    --map_dimensiony_y {MAP_Y} \
    --input_path {INPUT} \
    --output_path {OUTPUT}

CPU times: user 5.11 s, sys: 773 ms, total: 5.88 s
Wall time: 2min 52s

display.Image(f'{OUTPUT}/{output_name}')

Add Cluster Points¶

TagMaps puts out two shapefiles, one for Emoji+Tag Clusters (allTagCluster.shp), and one for Location clusters (allLocationCluster), which can be used to visualize overall frequentation patterns, regardless of the tags/emoji used.

The steps to visualize these clusters in ESRI ArcMap are explained in the docs. The process requires additional tools, such as Incremental Spatial Autocorrelation and the Getis-Ord GI* Statistic.

Below, the same approach is followed using Open Source Software in Python.

stylesheet = "tagmap_points_style.xml"
output_name = "tagmap_points_style.png"

For the actual Point, we need to create an svg.

Create the svg first:

the radius r defines how 'smooth' the svg curve is (how many vertices are used), adapt to your final output map dimension

point_svg = \
"""<?xml version="1.0" standalone="yes"?>
<svg height="128" width="128" version="2"
     xmlns="http://www.w3.org/2000/svg">
  <circle cx="0" cy="0" r="100" stroke="white" stroke-width="10" fill="#849EB9" />
</svg>
"""

Write svg to file:

with open(TMP / "circle.svg", "w") as svg_file:
    svg_file.write(point_svg)

%%time
!/usr/bin/python3 -m mapnik_cli \
    --stylesheet_name {stylesheet} \
    --output_name {output_name} \
    --map_dimensiony_x {MAP_X} \
    --map_dimensiony_y {MAP_Y} \
    --input_path {INPUT} \
    --output_path {OUTPUT}

CPU times: user 35.1 ms, sys: 42 ms, total: 77.1 ms
Wall time: 2.16 s

display.Image(f'{OUTPUT}/{output_name}')

The Mapnik PointSymbolizer has little ability to modify size of points. Directly Mapping the [Join_Count] or [Weights] attribute from the Location cluster shapefile provides no useful output: The largest cluster dominates in the map above.

We can read in the shapefile using Fiona and add a field, with values to be used as the svg scale factor for points in Mapnik's PointSymbolizer. This direction provides more flexibility to influence the style of the location cluster layer. Below is mainly following the docs.

Read allLocationCluster.shp

data_src = Path(INPUT / "shapefiles_gg" / "allLocationCluster.shp")
locations = fiona.open(data_src, encoding='UTF-8', mode="r")

The Join_Count refers to the number of user that took photos or posted from a particular cluster. Weights is a normalized version of Join_Count, to the values range 1..1000.

print(locations.schema)
print(locations.crs)

{'properties': OrderedDict([('Join_Count', 'int:18'), ('Weights', 'float:24.15')]), 'geometry': 'Point'}
{'init': 'epsg:32633'}

Copy the source schema and add a new property, for point size.

updated_schema = locations.schema
updated_schema["properties"]["point_size"] = "float"

Write a new shapefile with the same format and coordinate reference system as the source, with the new field.

The formula used for point size below:

4+(math.sqrt(feature["properties"]["Join_Count"])*8)

The use of square root means that the largest values will be affected most, which helps a bit reducing the dominant largest clusters on the map.

with fiona.open(
        Path(TMP / "allLocationCluster_Size.shp"), "w",
        crs=locations.crs,
        driver=locations.driver,
        schema=updated_schema,
    ) as shapefile:
        for feature in locations:
            _size = 4+(math.sqrt(feature["properties"]["Join_Count"])*8)
            feature["properties"].update(point_size=_size)
            shapefile.write(feature)

Create map

stylesheet = "tagmap_pointssize_style.xml"
output_name = "tagmap_pointssize_style.png"

%%time
!/usr/bin/python3 -m mapnik_cli \
    --stylesheet_name {stylesheet} \
    --output_name {output_name} \
    --map_dimensiony_x {MAP_X} \
    --map_dimensiony_y {MAP_Y} \
    --input_path {INPUT} \
    --output_path {OUTPUT}

CPU times: user 30.6 ms, sys: 51.4 ms, total: 82 ms
Wall time: 2.04 s

display.Image(f'{OUTPUT}/{output_name}')

Create a map with all labels

stylesheet = "tagmap_pointssize_style_v2.xml"
output_name = "tagmap_pointssize_style_v2.png"

%%time
!/usr/bin/python3 -m mapnik_cli \
    --stylesheet_name {stylesheet} \
    --output_name {output_name} \
    --map_dimensiony_x {MAP_X} \
    --map_dimensiony_y {MAP_Y} \
    --input_path {INPUT} \
    --output_path {OUTPUT}

CPU times: user 4.39 s, sys: 581 ms, total: 4.97 s
Wall time: 2min 51s

display.Image(f'{OUTPUT}/{output_name}')

Local Autocorrelation: Hot Spots, Cold Spots, and Spatial Outliers¶

The process below is based on the PySal Notebook Exploratory Analysis of Spatial Data: Spatial Autocorrelation and the API docs for esda.G_Local.

Loading location clusters and preparing the point data set for spatial analysis. We are using Join_Count field as the weight for clusters in our spatial autocorrelation analysis.

points = []
weights = []
with fiona.open(data_src, encoding='UTF-8', mode="r") as shapefile:
    for feature in locations:
        point = (feature["geometry"]['coordinates'][0], feature["geometry"]['coordinates'][1])
        points.append(point)
        weights.append(float(feature["properties"]["Join_Count"]))

The points are already projected to a UTM coordinate system (33N), meaning that the values are given in Meters.

points[:5]

[(412506.50846066093, 5654658.360449937),
 (412557.2575476704, 5655713.798559217),
 (412721.0145564849, 5654101.871548808),
 (414032.1407540008, 5654755.54839422),
 (412093.73301671044, 5654235.220379112)]

weights[:5]

[1.0, 1.0, 1.0, 1.0, 1.0]

Below, libpysal is used to create a weights object from points.

The threshold value referc to the distance threshold and is specified in Meters (based on the UTM Zone 33 Coordinate System). Smaller values will include less points, larger values will include more points in the distance matrix of the statistical calculation. The result is a smooth transition between colors (larger values), or more spatially fine grained color classification (smaller values).

distance_threshold = 350

w = lps.weights.DistanceBand(points, threshold=distance_threshold, binary=False)

/opt/conda/envs/worker_env/lib/python3.9/site-packages/scipy/sparse/_data.py:119: RuntimeWarning: divide by zero encountered in reciprocal
  return self._with_data(data ** n)

Preparing the weights variable

y = np.array(weights)

Applying Getis and Ord local G* test using a binary weights object

There are a number of other examples in the pysal/esda docs for parameters. We are using the Getis Ord Gi*, where the star means focal observation.

%%time
lg_star = G_Local(y, w, transform='B', star=True)

lg_star.Zs

array([-0.36711081,  0.28126067, -0.42871257, ...,  0.09073237,
        0.34636949,  1.91501171])

p-value based on standard normal approximation from permutations. Smaller p-values mean less significance.

round(lg_star.p_sim[0], 3)

0.449

Classify values¶

PySal mapclassify provides classification schemes such as Quantiles, HeadTailBreaks, or NaturalBreaks.

print(lg_star.Zs.min())
print(lg_star.Zs.max())

-1.0226904925168754
3.7987134732781214

scheme_breaks = tools.classify_data(
    values=lg_star.Zs, scheme="Quantiles")

scheme_breaks

Quantiles

   Interval      Count
----------------------
[-1.02, -0.56] |   240
(-0.56, -0.48] |   240
(-0.48, -0.37] |   240
(-0.37, -0.03] |   239
(-0.03,  0.30] |   240
( 0.30,  0.66] |   240
( 0.66,  3.80] |   240

cat_nr = scheme_breaks.find_bin(
    lg_star.Zs)

cat_nr

array([3, 4, 2, ..., 4, 5, 6])

Update schema and write to shapefile

updated_schema["properties"]["cat_nr"] = "int"

with fiona.open(
        Path(TMP / "allLocationCluster_SizeClass.shp"), "w",
        crs=locations.crs,
        driver=locations.driver,
        schema=updated_schema,
    ) as shapefile:
        for ix, feature in enumerate(locations):
            _size = 4+(math.sqrt(feature["properties"]["Join_Count"])*8)
            feature["properties"].update(point_size=_size)
            feature["properties"].update(cat_nr=int(cat_nr[ix]))
            shapefile.write(feature)

Create SVG markers for classes¶

The last remaining step is to assign colors to classes. Since Mapnik style sheets provide no parameter for color values, everything must be defined in the svg. We will create a separate svg for each of the 7 classes, each having a different color defined.

Define list of colors (hot and cold spots):

hex_list = ['#4576b5', '#849eba', '#c0ccbe', '#ffffc0', '#fab984', '#ed7551', '#d62f27']

tools.display_hex_colors(hex_list)

def get_svg_circle(color: str) -> str:
    """Returns a circle as svg with the given color"""
    return \
    f"""<?xml version="1.0" standalone="yes"?>
    <svg height="128" width="128" version="2"
         xmlns="http://www.w3.org/2000/svg">
      <circle cx="0" cy="0" r="100" stroke="white" stroke-width="10" fill="{color}" />
    </svg>
    """

for ix, hex_color in enumerate(hex_list):
    with open(TMP / f"circle_{ix}.svg", "w") as svg_file:
        point_svg = get_svg_circle(color=hex_color)
        svg_file.write(point_svg)

Produce Map¶

stylesheet = "tagmap_production.xml"
output_name = "tagmap_production.png"

%%time
!/usr/bin/python3 -m mapnik_cli \
    --stylesheet_name {stylesheet} \
    --output_name {output_name} \
    --map_dimensiony_x {MAP_X} \
    --map_dimensiony_y {MAP_Y} \
    --input_path {INPUT} \
    --output_path {OUTPUT}

CPU times: user 5.48 s, sys: 803 ms, total: 6.29 s
Wall time: 2min 44s

display.Image(f'{OUTPUT}/{output_name}')

Focus regions¶

Commonly, these maps would be interactively explored using a tile server with multiple zoom levels. In order to zoom in to certain clusters, we can also create static close-ups of focus regions.

To locate elements, select a hashtag that is written bold (e.g. PALAIS). Bold tells us that this is the most important cluster of the hashtag PALAIS. In the data table, this entry is highlighted with himp=1. We can use this information to get the latitude and longitude of the element and tell Mapnik to zoom into this location.

data_src = Path(INPUT / "shapefiles_gg" / "allTagCluster.shp")
focus_tag = 'PALAIS'

def find_feature(data_src: Path, feature_name: str, add_buffer: Optional[int] = None) -> Tuple[float,float,float,float]:
    """Returns bounding box of a feature (x, y), if found"""
    with fiona.open(data_src, encoding='UTF-8', mode="r") as shapefile:
        for feature in shapefile:
            properties = feature["properties"]
            if properties["HImpTag"] == 1 and properties["ImpTag"] == feature_name.lower():
                bounds = shape(feature["geometry"]).bounds
                if add_buffer:
                    bounds = add_buffer_bbox(bounds, buffer = add_buffer)
                return str(bounds).lstrip("(").rstrip(")").replace(" ","")

focus_bbox = find_feature(data_src=data_src, feature_name=focus_tag)

focus_bbox

'412941.9033526781,5654496.468149946,413639.09149143315,5654977.47920824'

Create map and zoom in to bounding box

output_name = f"tagmap_production_{focus_tag.lower()}.png"

%%time
!/usr/bin/python3 -m mapnik_cli \
    --stylesheet_name {stylesheet} \
    --output_name {output_name} \
    --map_dimensiony_x 500 \
    --map_dimensiony_y 250 \
    --input_path {INPUT} \
    --output_path {OUTPUT} \
    --bbox {focus_bbox}

CPU times: user 456 ms, sys: 102 ms, total: 558 ms
Wall time: 22.1 s

display.Image(f'{OUTPUT}/{output_name}')

Sometimes, the bounding box returned from the cluster is small. We can add a buffer to crop the map to a larger area, for the cluster under investigation.

focus_tag = 'ZOODRESDEN'
focus_bbox = find_feature(data_src=data_src, feature_name=focus_tag, add_buffer=100)
output_name = f"tagmap_production_{focus_tag.lower()}.png"

%%time
!/usr/bin/python3 -m mapnik_cli \
    --stylesheet_name {stylesheet} \
    --output_name {output_name} \
    --map_dimensiony_x 500 \
    --map_dimensiony_y 250 \
    --input_path {INPUT} \
    --output_path {OUTPUT} \
    --bbox {focus_bbox}

CPU times: user 946 ms, sys: 126 ms, total: 1.07 s
Wall time: 41.6 s

display.Image(f'{OUTPUT}/{output_name}')

Experiment¶

Now it is time to experiment. For instance, changing the background map to aerial imagery and filtering only emoji.

focus_tag = 'BOTANISCHERGARTEN'

focus_bbox

'412283.89823593496,5654253.254016886,413112.26139851037,5655040.910832429'

zoom = 17
bg_name = f"{TMP}/bg/grossergarten_aerial_{zoom}.tif"
if not Path(bg_name).exists():
    img, ext = cx.bounds2raster(
        dresden.w-buf, dresden.s, dresden.e+buf, dresden.n+0.003,
        bg_name, source=cx.providers.Esri.WorldImagery,
        ll=True, zoom=zoom)

Create a white svg circle

with open(TMP / f"circle_white.svg", "w") as svg_file:
    point_svg = get_svg_circle(color='#FFFFFF')
    svg_file.write(point_svg)

Zoom to Campus

focus_bbox = find_feature(data_src=data_src, feature_name=focus_tag, add_buffer=450)

Visualize

stylesheet = "tagmap_aerial_emoji.xml"
output_name = f"tagmap_production_aerial_{focus_tag.lower()}.png"

%%time
!/usr/bin/python3 -m mapnik_cli \
    --stylesheet_name {stylesheet} \
    --output_name {output_name} \
    --map_dimensiony_x 1600 \
    --map_dimensiony_y 900 \
    --input_path {INPUT} \
    --output_path {OUTPUT} \
    --bbox {focus_bbox}

CPU times: user 148 ms, sys: 66.5 ms, total: 214 ms
Wall time: 9.86 s

display.Image(f'{OUTPUT}/{output_name}')

Further work¶

Experiment further, e.g.:

Emoji Font: Replace Segoe UI Symbol Regular in input/*.xml Stylesheets with another Emoji font, e.g. Twitter Color Emoji Regular, Segoe UI Symbol Regular, Segoe UI Emoji Regular, Noto Emoji Regular, Symbola Regular etc.
Use gray location markers and use color to classify different types of emoji, e.g. all emoji of the unicode group "activity emoji" as red
Adjust Rules in XML styles to affect label placement
Create a custom basemap from vector/osm data
Use of different svg markers to classify different types of places (e.g. based on the majority of posts from Instagram, Twitter, or Flickr)
Output a tile map with mapnik, for different resolutions and areas

Create Notebook HTML¶

!jupyter nbconvert --to html_toc \
    --output-dir=../resources/html/ ./01_tagmaps.ipynb \
    --template=../nbconvert.tpl \
    --ExtractOutputPreprocessor.enabled=False >&- 2>&- # create single output file and suppress output

Tag Maps rendering with Python and Mapnik¶