Finding the most concentrated areas of bakeries in Lyon

In this blog post, we’ll use Python to analyze the distribution of bakeries in Lyon, France. Our goal is to identify the areas with the highest concentration of bakeries in the city, potential hotspots for bread lovers. We will be using data from OpenStreetMap to achieve this.

Outline

Imports
Bounding box
Fetch the data from Open Street Map
Data cleanup
Building a density surface
The Bakery hotspots

Imports

Here we import the necessary libraries and modules that will help us fetch, process, and visualize the bakery data.

import contextily as cx
import matplotlib.pyplot as plt
import numpy as np
import osmnx as ox
import xyzservices.providers as xyz
from scipy.stats import gaussian_kde
from shapely import get_coordinates
from shapely.geometry import Polygon
from skimage.feature import peak_local_max

FS = (12, 12)  # figure size
KDE_FACTOR = 0.1  # kernel-density estimate factor
N_HOTSPOTS = 6  # fixed number of hotspots

Bounding box

We start by defining a bounding box encompassing the city of Lyon to query the points of interest from OpenStreetMap within this specific geographic area.

# A bounding box is a rectangular area defined by its minimum and maximum coordinates.
bbox = (4.78, 45.71, 4.92, 45.80)

# Extract coordinates for better readability
min_lon, min_lat, max_lon, max_lat = bbox

# Create points representing the corners of the bounding box
bottom_left = (min_lon, min_lat)
bottom_right = (max_lon, min_lat)
top_right = (max_lon, max_lat)
top_left = (min_lon, max_lat)

# Create a Shapely Polygon object from the bounding box coordinates
polygon = Polygon([bottom_left, bottom_right, top_right, top_left])
polygon

bounding box

Fetch the data from Open Street Map

We use the osmnx library to fetch bakery data directly from OpenStreetMap. We’ll use the shop tag with a value of ‘bakery’ to filter for relevant locations.

%%time
tags = {"shop": "bakery"}
bakeries_gdf = ox.features.features_from_polygon(polygon, tags)

CPU times: user 43.2 ms, sys: 13 ms, total: 56.2 ms
Wall time: 70.8 ms

bakeries_gdf = bakeries_gdf.reset_index(drop=False)
bakeries_gdf.shape

(394, 123)

We successfully retrieved 394 rows. Each entry includes 123 data columns containing information from OpenStreetMap.

To ensure accurate spatial calculations and analysis, we reproject the bakeries GeoDataFrame from the original geographic coordinate system to the projected coordinate system: EPSG:2154.

bakeries_gdf = bakeries_gdf.to_crs("EPSG:2154")

The following visualization depicts the raw distribution of bakeries (represented by blue markers) within the bounding box.

ax = bakeries_gdf.plot(markersize=4, figsize=FS)
cx.add_basemap(
    ax,
    source=xyz.CartoDB.VoyagerNoLabels,
    crs=bakeries_gdf.crs.to_string(),
    alpha=0.8,
)
_ = plt.axis("off")

bakeries

Data cleanup

To focus on essential information, we filter the data to retain only points representing bakeries (excluding lines and polygons).

bakeries_gdf = bakeries_gdf.loc[bakeries_gdf.element_type == "node"]

The dataset contains a substantial number of columns, due to the rich metadata available from OpenStreetMap.

cols = bakeries_gdf.columns.to_list()
print(cols[:20])
print(f"column count : {len(cols)}")

['element_type', 'osmid', 'name', 'opening_hours', 'payment:cash', 'payment:credit_cards', 'phone', 'shop', 'wheelchair', 'geometry', 'access:covid19', 'opening_hours:covid19', 'payment:contactless', 'payment:visa', 'ref:FR:SIRET', 'fax', 'check_date', 'delivery:covid19', 'description:covid19', 'email']
column count : 123

We reduce the dataset to include solely the name and geometry columns, discarding unnecessary attributes.

bakeries_gdf = bakeries_gdf[["name", "geometry"]]

To maintain data integrity, we eliminate records lacking a name attribute, as these entries are likely incomplete and unreliable for our analysis.

bakeries_gdf.isna().sum(axis=0)

name        8
geometry    0
dtype: int64

bakeries_gdf = bakeries_gdf.dropna()
bakeries_gdf.shape

(372, 2)

The final dataset comprises 372 bakeries with their corresponding geographic coordinates, ready for further exploration.

bakeries_gdf.head(3)

	name	geometry
0	Boulangerie Patisserie des Gratte Ciel	POINT (846015.870 6520210.859)
1	Boulangerie Pâtisserie Maison Champeaud	POINT (846751.582 6518543.900)
2	L'atelier de Xavier	POINT (845016.256 6515517.100)

Let’s look for my personal favorite bakery, Partisan:

bakeries_gdf.loc[bakeries_gdf.name.str.contains("Partisan")]

	name	geometry
242	Partisan Boulanger	POINT (842382.257 6521422.948)

Partisan

Building a density surface

To estimate the density function of bakery locations, we employ the Gaussian kernel density estimation (KDE) method provided by scipy.stats.gaussian_kde. This approach allows us to create a smooth 2D density function from the given point cloud data. The KDE method is particularly effective for visualizing the spatial distribution of bakery locations, and is allowing us to easily look for maxima.

We first extract the coordinates of the bakery locations and convert them into a numpy array:

points = np.array(get_coordinates(bakeries_gdf.geometry.values))

We then create a KDE object. Here we do not use the automatic bandwidth algorithms but give a scalar value to the bw_method argument, in order to control the bandwidth of the kernel, which affects the smoothness of the estimated density.

kde = gaussian_kde(points.T, bw_method=KDE_FACTOR)

Next, we define a grid over which we will evaluate the density. The grid is created using np.mgrid, which generates a mesh grid of points within the range of the input coordinates. Note that we give complex integer values for the step lengths:

if the step length is a complex number (e.g. 5j), then the integer part of its magnitude is interpreted as specifying the number of points to create between the start and stop values, where the stop value is inclusive.

x_min, x_max = points[:, 0].min(), points[:, 0].max()
y_min, y_max = points[:, 1].min(), points[:, 1].max()
x_grid, y_grid = np.mgrid[x_min:x_max:400j, y_min:y_max:400j]
positions = np.vstack([x_grid.ravel(), y_grid.ravel()])

We then evaluate the density at each point in the grid using the KDE object:

density = np.reshape(kde(positions).T, x_grid.shape).T

Finally, we find the location of the maximum density by identifying the indices of the maximum value in the density array. This gives us the coordinates of the point with the highest density of bakeries.

j_max, i_max = = np.unravel_index(np.argmax(density), density.shape)
x_max_density, y_max_density = x_grid[i_max, j_max], y_grid[i_max, j_max]

plt.figure(figsize=FS)
plt.imshow(density, extent=[x_min, x_max, y_min, y_max], origin="lower", cmap="YlOrBr")
ax = plt.gca()
ax = bakeries_gdf.plot(markersize=4, ax=ax, alpha=0.25)
plt.plot(x_max_density, y_max_density , marker="*", markersize=10, color="black")
cx.add_basemap(
    ax,
    source=xyz.CartoDB.Positron,
    crs=bakeries_gdf.crs.to_string(),
    alpha=0.6,
)
_ = plt.axis("off")

bakery density

The maximum density of bakeries is found to be located between the Guillotière and Saxe-Gambetta districts.

The Bakery hotspots

To identify the hotspots where bakeries are most densely concentrated, we use image local maxima detection and select the N_HOTSPOTS first values:

coordinates = peak_local_max(density, min_distance=1)

coords = [] 
for i in range(N_HOTSPOTS): 
    j_max, i_max = coordinates[i, :]
    x, y = x_grid[i_max, j_max], y_grid[i_max, j_max]
    coords.append([x,y])

plt.figure(figsize=FS)
plt.imshow(density, extent=[x_min, x_max, y_min, y_max], origin="lower", cmap="YlOrBr")
ax = plt.gca()
ax = bakeries_gdf.plot(markersize=4, ax=ax, alpha=0.25)
for i in range(n_hotspots): 
    plt.plot(coords[i][0], coords[i][1], marker="*", markersize=8, color="grey")
cx.add_basemap(
    ax,
    source=xyz.CartoDB.Positron,
    crs=bakeries_gdf.crs.to_string(),
    alpha=0.6,
)
_ = plt.axis("off")

hotspots