Mapnik Generative AI workflow: Processing¶

Alexander Dunkel, Institute of Cartography, TU Dresden

No description has been provided for this image

Mapnik rendering based on stable diffusion generative AI and social media data.

This notebook is a continuation from the previous notebook (01_mapnik_generativeai.html).

Prepare environment¶

Load base dependencies:

import os, sys
import re
import shutil
import geopandas as gp
import pandas as pd
import geopandas
import matplotlib.pyplot as plt
import rasterio as rio
from pathlib import Path
from rasterio.plot import show

Install temporary package rembg

!../py/modules/base/pkginstall.sh "rembg"

Installed rembg 2.0.50.

Symlink font folder

!ln -s {TMP}/fonts /

Import every cell from the previous notebook, except those tagged with active-ipynb. This will make all variables and methods from the previous notebook available in the current runtime, so we can continue where we left.

module_path = str(Path.cwd().parents[0] / "py")
if module_path not in sys.path:
    sys.path.append(module_path)
from modules.base import raster
from _01_mapnik_generativeai import *

Activate autoreload of changed python files:

%load_ext autoreload
%autoreload 2

Parameters¶

APIURL = "http://127.0.0.1:7861"
BASE_PROMPT_POS: str = \
    "white background,simple outline,masterpiece,best quality,high quality," \
    "<lora:Japanese_style_Minimalist_Line_Illustrations:0.2>"
BASE_PROMPT_NEG: str = \
    "(bad-artist:1),(worst quality, low quality:1.4),lowres,bad anatomy,bad hands," \
    "((text)),(watermark),error,missing fingers,extra digit,fewer digits,cropped,worst quality," \
    "low quality,normal quality,((username)),blurry,(extra limbs),bad-artist-anime," \
    "(three hands:1.6),(three legs:1.2),(more than two hands:1.4),(more than two legs,:1.2)," \
    "label,(isometric), (square)"

Set global SD-settings

payload = {
    "CLIP_stop_at_last_layers": 1,
    "sd_vae":"vae-ft-mse-840000-ema-pruned.safetensors",
    "sd_model_checkpoint":"hellofunnycity_V14.safetensors",
}
requests.post(url=f'{APIURL}/sdapi/v1/options', json=payload)

<Response [200]>

Have a look at our per-job basis settings, loaded from the last notebook:

SD_CONFIG

{'steps': 20, 'batch_size': 4, 'sampler_name': 'DPM++ 2M SDE Exponential'}

For this notebook, increase steps to 28

SD_CONFIG["steps"] = 28

Test image generation for tags and emoji¶

The next step is to process social media metadata (tags, emoji) in descending importance (cluster-size), generate images for clusters, and place images on the map, according to the center of gravity for the cluster shape from tagmaps package.

Test API for selected tags¶

PROMPT = "(Grosser Garten, Palais, Nature)"

output_name = "test_image_palais_default"
KWARGS = {
    "prompt": concat_prompt(PROMPT),
    "negative_prompt": BASE_PROMPT_NEG,
    "save_name": output_name,
    "sd_config": SD_CONFIG,
    "show": False
}
DKWARGS = {
    "resize":(350, 350),
    "figsize":(22, 60),
}

if not (OUTPUT / "images" / f'{output_name}.png').exists():
    generate(**KWARGS)

imgs = list((OUTPUT / "images").glob(f'{output_name}*'))
tools.image_grid(imgs, **DKWARGS)

We have to think about a way to better incorporate these square images in the map. Maybe if we add A thought bubble of to our prompt?

def generate_samples(
        prompt: str, save_name: str, kwargs=KWARGS, output=OUTPUT,
        dkwargs=DKWARGS, print_prompt: bool = None, rembg: bool = None):
    """Generate and show 4 sample images for prompt"""
    kwargs["prompt"] = concat_prompt(prompt)
    if print_prompt:
        print(kwargs["prompt"][:50])
    kwargs["save_name"] = save_name
    if not (output / "images" / f'{kwargs["save_name"]}.png').exists():
        if rembg:
            generate_rembg(**kwargs)
        else:
            generate(**kwargs)
    imgs = list((output / "images").glob(f'{kwargs["save_name"]}*'))
    tools.image_grid(imgs, **dkwargs)

generate_samples("(thought bubble of Grosser Garten, Palais, Nature)", save_name="test_image_palais_bubble")

or maybe icon?

generate_samples("(A map icon of Grosser Garten, Palais, Nature)", save_name="test_image_palais_icon")

Let's keep A map icon of as the pre-prompt.

Some more tests for other tags and terms

generate_samples("(A map icon of Botanischergarten), flower, grün, 🌵 🌿 🌱", save_name="test_image_botan_icon")

generate_samples("(A map icon of Gläsernemanufaktur), volkswagen, building", save_name="test_image_vw_icon")

generate_samples("(A map icon of zoo), zoodresden, animals", save_name="test_image_zoo_icon")

generate_samples("(A map icon of fussball stadion), dynamo, stadion", save_name="test_image_fussball_icon")

generate_samples("(people 🏃), activity", save_name="test_image_running_activity")

Enough tests. Now, we can move to collecting tag and emoji clusters and move on to batch generation.

Process clustered data¶

The overall workflow looks like this:

Find all clusters above a weight of x
Walk through clusters, get cluster centroid
Select all other cluster-shapes that can be found at this location
Concat prompt based on ascending importance
Generate image, remove background, save
Create Mapnik Stylesheet to place images as either symbols or raster images
Render map
(Adjust parameters and repeat, until map quality is acceptable)

data_src = Path(INPUT / "shapefiles_gg" / "allTagCluster.shp")

gdf = gp.read_file(INPUT / "shapefiles_gg" / "allTagCluster.shp", encoding='utf-8')
CRS_PROJ = gdf.crs

def sel_cluster(gdf: gp.GeoDataFrame, min_weight: int) -> gp.GeoSeries:
    """Return GeoSeries of Clusters above min_weight"""
    with fiona.open(data_src, encoding='UTF-8', mode="r") as shapefile:
        for feature in shapefile:
            properties = feature["properties"]
            if properties["HImpTag"] == 1 and properties["ImpTag"] == feature_name.lower():
                bounds = shape(feature["geometry"]).bounds
                if add_buffer:
                    bounds = add_buffer_bbox(bounds, buffer = add_buffer)
                return bounds

OUTPUT_MAPS = Path.cwd().parents[1] / "tagmaps-mapnik-jupyter" / "input" / "bg"

Reproject raster

%%time
raster.reproject_raster(
    raster_in=f"{OUTPUT_MAPS}/grossergarten_carto_17.tif", 
    raster_out=f"{OUTPUT_MAPS}/grossergarten_carto_17_proj.tif",
    dst_crs=f'epsg:{CRS_PROJ.to_epsg()}')

CPU times: user 860 ms, sys: 102 ms, total: 962 ms
Wall time: 963 ms

basemap = rio.open(f"{OUTPUT_MAPS}/grossergarten_carto_17_proj.tif")

bbox_map = gdf.total_bounds.squeeze()
minx, miny = bbox_map[0], bbox_map[1]
maxx, maxy = bbox_map[2], bbox_map[3]
x_lim=(minx, maxx)
y_lim=(miny, maxy)

Plot all cluster shapes

fig, ax = plt.subplots(figsize=(10, 10))
rio.plot.show(basemap, ax=ax)
gdf.plot(ax=ax, facecolor='none', edgecolor='red', linewidth=0.1)
ax.set_xlim(*x_lim)
ax.set_ylim(*y_lim)
ax.set_axis_off()

Plot only cluster shapes above a certain weight

cluster_sel = gdf[gdf["Weights"]>300]

def plot_clustermap(cluster_sel: gp.GeoDataFrame, basemap: rio.DatasetReader, label: bool = None):
    """Plot a map with clusters, basemap, and cluster labels"""
    if label is None:
        label = True
    fig, ax = plt.subplots(figsize=(7, 10))
    rio.plot.show(basemap, ax=ax)
    cmap=plt.get_cmap('Paired')
    cluster_sel.plot(ax=ax, facecolor='none', cmap=cmap, linewidth=1)
    if label:
        tools.annotate_locations_fit(
            gdf=cluster_sel, ax=ax,
            text_col="ImpTag", arrowstyle='-', arrow_col='black', fontsize=10,
            font_path="/fonts/seguisym.ttf")
    ax.set_xlim(*x_lim)
    ax.set_ylim(*y_lim)
    ax.set_axis_off()
    with warnings.catch_warnings():
        # Ignore emoji "Variation-Selector" not found in font
        warnings.filterwarnings("ignore", category=UserWarning)
        plt.show()

plot_clustermap(cluster_sel=cluster_sel, basemap=basemap)

There are several clusters visible. On the upper left, we can see the Dynamo Dresden stadium. Several tag and emoji cluster shapes are can be found at in this area. There is also a big shape covering the Großer Garten. Two smaller shapes can be found hovering the Dresden Zoo and the Gläserne Manufaktur.

Process Emoji¶

We start with processing emoji. This seems like the easier part, since emoji are already highly abstracted concepts that can convey many meanings in a simplified form.

Some emoji, however, are very generic and used for arbitrary context. We use a broad positive filter list with 693 emoji (out of about 200 available) to focus on specific activity and environment emoji.

emoji_filter_list = pd.read_csv(
    INPUT / 'SelectionList_EmojiLandscapePlanning.txt', header=None, names=["emoji"], encoding="utf-8", on_bad_lines='skip')
emoji_filter_list = emoji_filter_list.set_index("emoji").index

print(emoji_filter_list[:20])

Index(['🌊', '🌅', '🍻', '🎡', '📸', '🎢', '🎶', '💪', '📷', '🐶', '🍁', '🍂', '🌸', '💦',
       '👭', '🍀', '🏖', '👫', '🎈', '🍃'],
      dtype='object', name='emoji')

cluster_sel = gdf[(gdf["Weights"]>100) & (gdf["emoji"]==1) & (gdf["ImpTag"].isin(emoji_filter_list))].copy()

plot_clustermap(cluster_sel=cluster_sel, basemap=basemap)

We can see four spatial groups of emoji clusters, the football stadium (upper left), the Zoo (below), the botanical garden (upper group) and the Junge Garde (lower right), an outdoor music venue.

Concat emoji based on cluster group/spatial intersection¶

intersects = cluster_sel.sjoin(cluster_sel[["geometry"]], how="left", predicate="intersects").reset_index()
cluster_groups = intersects.dissolve("index_right", aggfunc="min")

Join back the group-id's

cluster_sel["group"] = cluster_groups["index"]

cluster_lists = cluster_sel.groupby("group")["ImpTag"].apply(list)

cluster_lists

group
1     [⚽, 💪, 🍻, 💪🏻, 🏈, 🏃, 💪🏼, 🏆, 📸]
26                        [🌵, 🌿, 🌱]
62                     [🐒, 🦁, 🐘, 🐨]
71                              [🎶]
Name: ImpTag, dtype: object

Generate images for cluster-groups¶

emoji_cluster_1 = list(cluster_sel[cluster_sel["group"]==1]["ImpTag"])
emoji_cluster_1

['⚽', '💪', '🍻', '💪🏻', '🏈', '🏃', '💪🏼', '🏆', '📸']

Generate sample images for clusters

for ix, cluster_list in enumerate(cluster_lists):
    print(cluster_list)
    generate_samples(
        f"A map icon of happy ({cluster_list[0]}), {''.join(cluster_list[1:])}", save_name=f"emoji_{ix:03d}", rembg=True)

Removed bg from 4 of 4 images.

Test placement on map in rasterio

Get bounds of cluster group 1

bounds = cluster_sel[cluster_sel["group"]==1]["geometry"].total_bounds

in_img = OUTPUT / "images" / "000.png"
out_img = OUTPUT / "images_gis" / "000_geo.png"

Convert to GeoPng to place cluster on the map

raster.georeference_raster(
    raster_in=in_img,
    raster_out=out_img, bbox=bounds, crs_out=CRS_PROJ)

Preview in rasterio

cluster_raster = rasterio.open(out_img)
fig, ax = plt.subplots(figsize=(4, 4))
rasterio.plot.show(basemap, ax=ax)
rasterio.plot.show(cluster_raster, ax=ax, alpha=0.5)
ax.set_axis_off()

TODO: Display image with alpha channel in rio, e.g. ^1

Create shapefile and Mapnik stylesheet¶

In order to place multiple images on the map, we create a shapefile with features cluster_groups as points. For each point, we add a column with the [reference] to its generated image and a [scale], to emphasize weights.

TODO: Maybe use GroupSymbolizer?

Use Geopandas to write gdf to point shapefile

df = cluster_lists.to_frame()
df.reset_index(inplace=True)

df.head()

Prepare conversion to geodataframe

Two options here:

use centroid for symbol placement
or use dissolved geometry [x]

def cluster_id(row):
    return row.name
    
def centroid_geom(row, cluster_sel):
    return cluster_sel[cluster_sel["group"] == row.group]["geometry"].to_frame().dissolve().centroid

def bounds_geom(row, cluster_sel):
    return cluster_sel[cluster_sel["group"] == row.group]["geometry"].to_frame().dissolve().geometry
    
df["cluster_id"] = df.apply(cluster_id, axis=1)
df["geometry"] = df.apply(bounds_geom, axis=1, cluster_sel=cluster_sel)
df["ImpTag"] = df.ImpTag.map(' '.join)

df.head(6)

gdf = gp.GeoDataFrame(df, crs=CRS_PROJ, geometry=df.geometry)
gdf.to_file(filename=INPUT / 'shapefiles_gg' / 'gen_img.shp', driver="ESRI Shapefile")

Prepare Mapnik Plot¶

Copy generated images to input path for mapnik

def _copy_generated_tomapnik(
        input_path: Path = OUTPUT / "images", output_path: Path = INPUT / "cluster_img", batch: int = None, emoji: bool = None):
    """Copy files from image gen folder to mapnik plot folder, rename to standard"""
    if batch is None:
        batch = 0
    cluster_img = []
    emoji_pre = ""
    if emoji:
        emoji_pre= "emoji_"
    for fname in input_path.glob(f"{emoji_pre}*.png"):
        if batch == 0:
            if re.match(rf"{emoji_pre}[0-9][0-9][0-9].png", fname.name):
                cluster_img.append(fname)
        else:
            if re.match(f"[0-9][0-9][0-9]_{batch:02}.png", fname.name):
                cluster_img.append(fname)
    print(f'Copied {len(cluster_img)} files.')
    for file in cluster_img:
        shutil.copy(file, output_path / file.name.replace(emoji_pre, "").replace(f"_{batch:02}", ""))

_copy_generated_tomapnik(emoji=True)

Copied 4 files.

output_name = "tagmap_production_cluster_gg_emoji.png"
stylesheet = "tagmap_production_testraster_points_gg_emoji.xml"

%%time
!/usr/bin/python3 -m mapnik_cli \
    --stylesheet_name {stylesheet} \
    --output_name {output_name} \
    --map_dimensiony_x 1000 \
    --map_dimensiony_y 1000 \
    --input_path {INPUT} \
    --output_path {OUTPUT}

CPU times: user 16.7 ms, sys: 192 ms, total: 209 ms
Wall time: 1.52 s

display.Image(f'{OUTPUT}/{output_name}')

Process Tags¶

For processing tags, there are several additional challenges (compare image below):

some clusters have a large number of tags (e.g. Dynamo Dresdne Stadium, on the left)
some clusters have diverging concepts (e.g. rammsteinlive and football at the same location)
one cluster covers a large area (Großer Garten), which includes other smaller clusters (Junge Garde, Botanischer Garten)

The workflow below tries to solve these issues:

first, select the largest cluster, area-wise; this will be our "background" prompt ("Großer Garten") that we can add to all smaller clusters in the area
select only a number of tags from clusters with many tags (e.g. Dynamo Dresden Stadium)
try to select recursively clusters that cover different areas, so that we can get an even coverage, filling gaps in the map
(identify similarity of concepts based on NLP/BART/Cosine Similarity and generate separate images for different concepts)

Parameter:

CLUSTER_WEIGHT_CUTOFF = 10

gdf = gp.read_file(INPUT / "shapefiles_gg" / "allTagCluster.shp", encoding='utf-8')

cluster_sel = gdf[(gdf["Weights"]>CLUSTER_WEIGHT_CUTOFF) & (gdf["emoji"]==0)].copy()

Get preview (limit to >100 weights)

plot_clustermap(cluster_sel=cluster_sel[cluster_sel["Weights"]>100], basemap=basemap)

Get preview (limit to <=100 weights)

plot_clustermap(cluster_sel=cluster_sel[cluster_sel["Weights"]<=100], basemap=basemap, label=False)

ToDo: Separate clusters¶

As is visible, many cluster shapes can be found in few dense areas, overlapping each other. For our map, we want a possibly maxmimum of coverage, without overlapping symbols. For this, we first dissolve all cluster shapes into a single MultiPolygon and then separate areas that do not touch.

gdf["area"] = gdf.geometry.area / 1000

gdf.area.max()

2293772.865577627

gdf.area.min()

3823.4700000020302

import mapclassify as mc
def get_scheme_breaks(series_nan: pd.Series, scheme: str = None):
    """Classify series of values
    
    Notes: some classification schemes (e.g. HeadTailBreaks)
        do not support specifying the number of classes returned
        construct optional kwargs with k == number of classes
    """
    optional_kwargs = {"k":9}
    if scheme is None:
        scheme = "NaturalBreaks"
    if scheme == "HeadTailBreaks":
        optional_kwargs = {}
    scheme_breaks = mc.classify(
        y=np.abs(series_nan.values), scheme=scheme, **optional_kwargs)
    return scheme_breaks

breaks = get_scheme_breaks(gdf["area"], scheme="HeadTailBreaks")

bins = np.flip(breaks.bins[:-1])

bins

array([2064.10384691, 1743.28175478, 1049.06408188,  413.11115637,
        130.45049109,   52.03500257,   19.53215571,    7.34789744])

gdf["area"].max()

2293.772865577627

gdf[(gdf["area"]>=bins[0])]

cmap = tools.get_cmap(len(bins), 'Paired')

SUBPLOTS = len(bins)
fig, axes = plt.subplots(nrows=int(round(SUBPLOTS/4)), ncols=4, figsize=(8, 4))
for ix, ax in enumerate(axes.reshape(-1)):
    if ix >= SUBPLOTS:
        break
    if ix >= len(bins)-1:
        mask = (gdf["area"]<=bins[ix])
    else:
        mask = (gdf["area"]>=bins[ix+1]) & (gdf["area"]<=bins[ix])
    gdf[mask].plot(facecolor='none', edgecolor=cmap(ix), ax=ax)
    ax.set_axis_off()

Process Clusters¶

Select top-cluster based on area/coverage/percentage
Go through each level and select clusters based on distinct areas;
Exclude previous cluster areas (except top-level) from the follow-up levels
until all levels are processed and a maximum coverage is achieved.

Process Top-Cluster

in m²

cluster_sel["area"] = cluster_sel.area * 0.001
cluster_sel.sort_values("area", ascending=False).head()

Our whole area is:

total_area = cluster_sel["geometry"].to_frame().dissolve().area[0] * 0.001
print(total_area)

2755.3831500480974

Calculate percentage for all cluster areas of the total area, and filter all clusters above a certain percentage.

cluster_sel["percs"] = cluster_sel.sort_values("area", ascending=False)["area"] / (total_area/100)

cluster_sel.sort_values("area", ascending=False)["percs"][:10]

11      83.246966
641     66.576397
82      59.169919
76      44.079581
51      21.757337
249     18.162850
426     17.846108
1706    15.524580
114     10.626592
1061     8.883660
Name: percs, dtype: float64

We can see there is a gap between the fourth cluster (garden) and the fifth (zoo). We use 20% as the cutoff value.

top_cluster_mask = cluster_sel["percs"] >= 20

top_cluster_mask

0       False
4       False
5       False
9       False
11       True
        ...  
2037    False
2038    False
2039    False
2040    False
2041    False
Name: percs, Length: 1145, dtype: bool

plot_clustermap(cluster_sel=cluster_sel[top_cluster_mask], basemap=basemap)

Get Cluster Groups

Below, we use a simple approach to best-coverage, by first selecting the top cluster, and then selecting a limited number of non-intersecting cluster areas afterwards.

def get_cluster_groups(gdf: gp.GeoDataFrame) -> gp.GeoSeries:
    """Get cluster groups based on spatial self-intersection,
    and return list of tags/emoji sorted by ascending importance
    """
    intersects = gdf.sjoin(
        gdf[["geometry"]], how="left", predicate="intersects"
        ).reset_index()
    cluster_groups = intersects.dissolve("index_right", aggfunc="min")
    cluster_sel["group"] = cluster_groups["index"]
    cluster_lists = cluster_sel.groupby("group")["ImpTag"].apply(list)
    return cluster_lists

top_cluster_group = get_cluster_groups(cluster_sel[cluster_sel["percs"] >= 20])
top_cluster_geom = cluster_sel[cluster_sel["percs"] >= 20]["geometry"].to_frame().dissolve().geometry[0]

top_cluster_group

group
11.0    [garten, garden, park, großer, grosergarten]
Name: ImpTag, dtype: object

other_cluster_groups = get_cluster_groups(cluster_sel[cluster_sel["percs"] < 20])

other_cluster_groups

group
0.0       [dynamo, rammstein, stadion, football, fussbal...
9.0       [zoo, zoodresden, animals, love, nature, flami...
15.0      [volkswagen, manufaktur, gläsernemanufaktur, f...
19.0      [jungegarde, konzert, concert, annenmaykantere...
21.0      [großergarten, palais, nature, palaisteich, gr...
22.0                                               [winter]
24.0      [art, exhibition, architecture, skatepark, aus...
73.0                                                [natur]
114.0             [großergarten, brunnen, mosaik, fountain]
119.0     [carolaschlösschen, großergarten, nature, love...
195.0     [parkeisenbahn, großergarten, park, parkeisenb...
250.0     [travel, love, oldtown, дрезден, town, beautif...
254.0     [breakfast, milchmädchen, frühstück, cafemilch...
426.0                                          [volkswagen]
743.0                                            [strehlen]
837.0           [großergarten, nature, love, grossergarten]
861.0     [nature, travel, hiking, saxonyswitzerland, be...
1243.0                                    [estancia, steak]
1957.0                                              [party]
Name: ImpTag, dtype: object

Concat the two series

cluster_groups = pd.concat([top_cluster_group, other_cluster_groups])

cluster_groups

group
11.0           [garten, garden, park, großer, grosergarten]
0.0       [dynamo, rammstein, stadion, football, fussbal...
9.0       [zoo, zoodresden, animals, love, nature, flami...
15.0      [volkswagen, manufaktur, gläsernemanufaktur, f...
19.0      [jungegarde, konzert, concert, annenmaykantere...
21.0      [großergarten, palais, nature, palaisteich, gr...
22.0                                               [winter]
24.0      [art, exhibition, architecture, skatepark, aus...
73.0                                                [natur]
114.0             [großergarten, brunnen, mosaik, fountain]
119.0     [carolaschlösschen, großergarten, nature, love...
195.0     [parkeisenbahn, großergarten, park, parkeisenb...
250.0     [travel, love, oldtown, дрезден, town, beautif...
254.0     [breakfast, milchmädchen, frühstück, cafemilch...
426.0                                          [volkswagen]
743.0                                            [strehlen]
837.0           [großergarten, nature, love, grossergarten]
861.0     [nature, travel, hiking, saxonyswitzerland, be...
1243.0                                    [estancia, steak]
1957.0                                              [party]
Name: ImpTag, dtype: object

some tags repeat at lower cluster groups (e.g. "großergarten"; we want to remove these, to make space for more specific terms)

def pop_recursive(cluster_groups: pd.Series, lim_terms: int = 3) -> pd.Series:
    """Remove recursive terms that repeat at lower levels; return new Series
    Further, limit to the list of terms per cluster to n items; n=3
    """
    terms = set()
    d = {}
    for idx, cluster_group in cluster_groups.items():
        new_words = [term for term in cluster_group if not term in terms]
        terms.update(set(new_words))
        if len(new_words) > 0:
            d[idx] = new_words[:lim_terms]
    series = pd.Series(d)
    series.rename_axis('group', inplace=True)
    series.rename("ImpTag", inplace=True)
    return series

cleaned_groups = pop_recursive(cluster_groups)

Limit to the first n items in each list; n=3

Pop single cluster from a wrongly georeferenced Instagram place:

cleaned_groups.pop(861.0)

['hiking', 'saxonyswitzerland']

cleaned_groups

group
11.0                                 [garten, garden, park]
0.0                            [dynamo, rammstein, stadion]
9.0                              [zoo, zoodresden, animals]
15.0           [volkswagen, manufaktur, gläsernemanufaktur]
19.0      [jungegarde, annenmaykantereit, jungegardedres...
21.0                          [palais, palaisteich, palace]
24.0                   [exhibition, skatepark, ausstellung]
114.0                                     [brunnen, mosaik]
119.0                [carolaschlösschen, afterwork, schwan]
195.0                         [parkeisenbahndresden, train]
250.0                               [oldtown, town, prague]
254.0                  [breakfast, milchmädchen, frühstück]
743.0                                            [strehlen]
1243.0                                    [estancia, steak]
Name: ImpTag, dtype: object

from shapely.geometry.point import Point
from IPython.display import display as ipydisplay

def get_scale(geom_series: pd.Series, min_scale: float = 0.2, max_scale: float = 0.4) -> List[str]:
    """Get Scale (e.g. 0.2,0.2) for Mapnik Symbol Placement from cluster area
    
    1. Take the Minimum cluster area
    2. Take the Maximum cluster area
    3. Create scale interpolation of values between min (default: 0.1) and max (default: 0.6)
    """
    areas = geom_series.area
    series_max = areas.max()
    series_min = areas.min()
    series_interp = np.interp(
        areas, (series_min, series_max), (min_scale, max_scale))
    # format for Mapnik and return
    # return [f'{x:.2},{x:.2}' for x in series_interp]
    return series_interp
    
def offset_points(points: List[Point]):
    """Try to minimize overlap by offsetting points a limited number of times
    TODO: Not yet implemented; ideally look into adjustText and how this is
    solved with bioframe.core.arrops.overlap_intervals()
    """
    ipydisplay(points)
    ipydisplay(type(points[0]))
    for pt in points:
        distance_between_pts = points[0].distance(pt)
        print(distance_between_pts)
        
def offset_points_manual(points: List[Point]):
    # garden
    points[0] = Point(points[0].x-50, points[0].y+100)
    # stadium
    points[1] = Point(points[1].x - 500, points[1].y)
    # zoo
    points[2] = Point(points[2].x - 300, points[2].y)
    # vw
    points[3] = Point(points[3].x - 100, points[3].y+100)
    # palais
    points[5] = Point(points[5].x, points[5].y)
    # schwan
    points[8] = Point(points[8].x+100, points[8].y)
    # train
    points[9] = Point(points[9].x, points[9].y+300)
    # old town
    points[10] = Point(points[10].x-150, points[10].y+150)
    return points
    
def create_clustergroups_shape(
        cluster_series: pd.Series, top_cluster_geom: "Point", cluster_gdf: gp.GeoDataFrame = cluster_sel,
        output_folder: Path = None, crs_proj: str = CRS_PROJ, input: Path = INPUT):
    """Prepare cluster shapefile for Mapnik, store to output_folder"""
    if output_folder is None:
        output_folder = input / 'shapefiles_gg'
    df = cluster_series.to_frame()
    df.reset_index(inplace=True)
    df["cluster_id"] = df.apply(cluster_id, axis=1)
    df["geometry"] = df.apply(bounds_geom, axis=1, cluster_sel=cluster_gdf)
    # update top cluster geom
    df["geometry"][0] = top_cluster_geom
    df["ImpTag"] = df.ImpTag.map(' '.join)
    gdf = gp.GeoDataFrame(df, crs=crs_proj, geometry=df.geometry)
    gdf["scale"] = get_scale(gdf.geometry)
    # use point for symbol placement, as Mapnik will assume centroid of polygons anyway
    gdf["geometry"] = [geom.centroid for geom in df["geometry"]]
    gdf["geometry"] = offset_points_manual(gdf["geometry"])
    gdf.to_file(filename = output_folder / 'gen_img_tags.shp', driver="ESRI Shapefile", encoding='utf-8')

create_clustergroups_shape(cluster_series=cleaned_groups, top_cluster_geom=top_cluster_geom)

Generate images for clusters¶

Note: To re-generate images, delete first in output/images

%%time
import random
for ix, cluster_list in enumerate(cleaned_groups):
    # pre_prompt = random.choice(["A map icon", "An icon", "A thought bubble"])
    pre_prompt = "A map icon"
    generate_samples(
        # f"{pre_prompt} of ({', '.join(cluster_list[0:2])}), {''.join(cluster_list[2:])}",
        f"{pre_prompt} of ({', '.join(cluster_list)})",
        save_name=f"{ix:03d}", print_prompt=True)

A map icon of (garten, garden, park),,white backgr
A map icon of (dynamo, rammstein, stadion),,white 
A map icon of (zoo, zoodresden, animals),,white ba
A map icon of (volkswagen, manufaktur, gläserneman
A map icon of (jungegarde, annenmaykantereit, jung
A map icon of (palais, palaisteich, palace),,white
A map icon of (exhibition, skatepark, ausstellung)
A map icon of (brunnen, mosaik),,white background,
A map icon of (carolaschlösschen, afterwork, schwa
A map icon of (parkeisenbahndresden, train),,white
A map icon of (oldtown, town, prague),,white backg
A map icon of (breakfast, milchmädchen, frühstück)
A map icon of (strehlen),,white background,simple 
A map icon of (estancia, steak),,white background,
CPU times: user 4.44 s, sys: 96.7 ms, total: 4.54 s
Wall time: 1min 16s

Use batch n=0-3 to select a different image batch for map generation.

_copy_generated_tomapnik(output_path = INPUT / "cluster_img_tags", batch=3)

Copied 14 files.

for file in (INPUT / "cluster_img_tags").glob('*.png'):
    remove_background(file)

Render map

output_name = "tagmap_production_cluster_gg.png"
stylesheet = "tagmap_production_testraster_points_gg.xml"

%%time
!/usr/bin/python3 -m mapnik_cli \
    --stylesheet_name {stylesheet} \
    --output_name {output_name} \
    --map_dimensiony_x 2000 \
    --map_dimensiony_y 2000 \
    --input_path {INPUT} \
    --output_path {OUTPUT}

CPU times: user 7.73 ms, sys: 56.1 ms, total: 63.8 ms
Wall time: 1.57 s

display.Image(f'{OUTPUT}/{output_name}')

img2img¶

The last test is to use a final img2img pass to merge overlaid images with the background and produce a combined images, reducing the overlay effect of icons.

def img2img(
        text, image_path, steps: int = 50, denoising_strength: float = 0.05, 
        api: str = APIURL, output=OUTPUT / "img2img"):
    api_url = f"{api}/sdapi/v1/img2img"
    with open(image_path, 'rb') as file:
        image_data = file.read()
    encoded_image = base64.b64encode(image_data).decode('utf-8')
    payload = {
        "init_images": [encoded_image],
        'prompt' : text,
        "steps": steps,
        "denoising_strength": denoising_strength
    }
    response = requests.post(api_url, json=payload)
    name = 'GENimg2img_'
    for i in range(random.randint(15, 25)):
        name += random.choice('QAZXfrSWEDCVFRTqazxswgbnhyujmkiolpGBNHYUJedcvtMKIOLP')
    print(name)
    if response.status_code == 200:
        response_data = response.json()
        encoded_result = response_data["images"][0]
        result_data = base64.b64decode(encoded_result)
        output_path = output / f'{name}.jpg'
        with open(output_path, 'wb') as file:
            file.write(result_data)
        return name

name = img2img("A tourist city map with points of interests, wimmelbild",  image_path=OUTPUT / output_name)

GENimg2img_ehJYfRWhXYpJRDjVgecCCpF

display.Image(OUTPUT / "img2img" / f'{name}.jpg')

The result here is not convincing. Even using a very small denoising_strength of 0.05 produces a map with distorted icons.

One solution could be to tile the image, and use ControlNet, together with Upscaler, to produce a more fine-grained result.

We cannot use this currently through the API, as ControlNet and Upscaler are extensions, and these extensions are not available through the /sdapi endpoint. Try in the native webui.

Create notebook HTML¶

!jupyter nbconvert --to html_toc \
    --output-dir=../resources/html/ ./02_map_processing.ipynb \
    --template=../nbconvert.tpl \
    --ExtractOutputPreprocessor.enabled=False >&- 2>&-

	Join_Count	Views	COUNT_User	ImpTag	TagCountG	HImpTag	Weights	WeightsV2	WeightsV3	geometry	area
11	2070	395179	873	garten	180	1	356.005051	502.799194	1000.000000	POLYGON ((412138.621 5654816.052, 412129.875 5...	2293.772866
641	874	285012	112	grosergarten	117	1	26.160288	65.377445	652.170310	POLYGON ((412481.210 5655490.237, 412498.156 5...	1834.434828
82	830	129065	300	großer	31	1	114.133745	173.439586	635.712462	POLYGON ((412781.017 5655281.688, 412788.698 5...	1630.357989
76	736	149270	295	park	321	1	118.169352	170.565593	599.020796	POLYGON ((413391.976 5655282.814, 413392.946 5...	1214.561336
51	357	173653	279	garden	233	1	155.863724	161.368815	419.158162	POLYGON ((412680.698 5655566.619, 412675.251 5...	599.498009

	group	ImpTag
0	1	[⚽, 💪, 🍻, 💪🏻, 🏈, 🏃, 💪🏼, 🏆, 📸]
1	26	[🌵, 🌿, 🌱]
2	62	[🐒, 🦁, 🐘, 🐨]
3	71	[🎶]

	group	ImpTag	cluster_id	geometry
0	1	⚽ 💪 🍻 💪🏻 🏈 🏃 💪🏼 🏆 📸	0	POLYGON ((412166.430 5655287.283, 412186.390 5...
1	26	🌵 🌿 🌱	1	POLYGON ((412853.641 5655338.984, 412854.179 5...
2	62	🐒 🦁 🐘 🐨	2	POLYGON ((412539.530 5654672.537, 412534.747 5...
3	71	🎶	3	POLYGON ((413836.501 5654093.424, 413822.028 5...