A Quick study of air quality in Lyon with Python

The aim of this post is to use Python to fetch air quality data from a web service and to create a few plots.

We are going to:

look at some daily data from earlier this year, before and after the lockdown, in the city of Lyon, France
compare 2020 with some from previous years
and eventually look at some specific pollutants

The data is provided by an organization called Atmo Auvergne-Rhône-Alpes monitoring air quality over the Auvergne-Rhône-Alpes region. They also come up with a convenient web API (HTTP GET method). Note that an API token is required, you can easily get from their website after registration.

Imports

from datetime import datetime

import requests
import pandas as pd
from fastprogress.fastprogress import progress_bar
from matplotlib import pyplot as plt

plt.style.use("seaborn")
%load_ext lab_black

FS = (16, 9)  # figure size

# location: LYON 4EME ARRONDISSEMENT (4th district)
CODE_INSEE = 69384

# api-token
API_TOKEN = "8xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx7"

# api-endpoint
URL = f"https://api.atmo-aura.fr/communes/{CODE_INSEE}/indices?api_token={API_TOKEN}"

First try of the API

PARAMS = {"date": "2020-05-24"}
r = requests.get(url=URL, params=PARAMS)
data = r.json()
data

{'licence': 'https://opendatacommons.org/licenses/odbl/',
 'commune': 'LYON-4E-ARRONDISSEMENT',
 'code_insee': '69384',
 'indices': {'date': '2020-05-24',
  'valeur': '37.5560109609897',
  'couleur_html': '#99E600',
  'qualificatif': 'Bon',
  'type_valeur': 'prévision'}}

Since we will not change the location (4th district in Lyon), we only need to keep the date and valeur items from this dictionary. valeur is the air pollution index value. It ranges from 0 to 100, but could also rise above 100, which would mean that the warning threshold is exceeded. This would imply a serious risk to the population and the environment justifying the implementation of emergency measures.

Note that you can get historic values but also forecasts if you enter a future date in the get params (2 day horizon at most when I tried that).

So let’s fetch the daily pollution index on a temporal range starting at the begining of 2020 and ending today.

Daily air quality in 2020 so far

First we create a date range:

start = datetime(2020, 1, 1).date()
end = datetime.now().date()
day_range = pd.date_range(start=start, end=end, freq="D")

Now we can loop on these days and fetch the air quality data for each day. Since each item in day_range is a Timestamp object, we convert them to str with strftime:

daily_pollution_index = []
for day in progress_bar(day_range):
    day_str = day.strftime("%Y-%m-%d")
    params = {"date": day_str}
    r = requests.get(url=URL, params=params)
    data = r.json()
    daily_pollution_index.append(data["indices"])

100.00% [143/143 00:21<00:00]

The list is given to Pandas to create a dataframe:

df = pd.DataFrame(daily_pollution_index)
df.head(2)

	date	valeur	couleur_html	qualificatif	type_valeur
0	2020-01-01	50.4253716534476	#FFFF00	Moyen	réelle
1	2020-01-02	48.5803761861679	#C3F000	Bon	réelle

We clean up the dataframe a little bit, and create a DatetimeIndex:

df.rename(
    columns={"valeur": "value",}, inplace=True,
)
df.drop(["type_valeur", "couleur_html", "qualificatif"], axis=1, inplace=True)
df["value"] = df["value"].astype(float)
df["date"] = pd.to_datetime(df["date"])
df.set_index("date", inplace=True)
df.head(2)

	value
date
2020-01-01	50.425372
2020-01-02	48.580376

And we can now plot the daily historical air pollution index.

ax = df["value"].plot(
    ms=10, linewidth=3, alpha=0.7, style="o-", figsize=FS, c="#1f77b4",
)
_ = plt.vlines("2020-03-17", 0, 100, colors="r")  # lockdown start
_ = plt.vlines("2020-05-11", 0, 100, colors="r")  # lockdown end
_ = ax.set_ylim(0, 100)
_ = ax.set(
    title="Daily air quality in Lyon 4E",
    xlabel="Date",
    ylabel="Air pollution index (smaller is better)",
)

png

Well we do not see a significant drop in air pollution during the lockdown that took place between march 17 and may 11. The reason is that the air pollution index is combining 3 different kinds of pollutants:

NO2: Nitrogen dioxide,
O3: Ozone,
PM10: particulate matter, which are coarse particles with a diameter between 2.5 and 10 micrometers (μm)

From what I understand, an index is computed for each of these three pollutants, and the air pollution index corresponds to a combination of these three values. So the pollution index will keep being large if any of these pollutant concentrations remain large.

Now theses pollutants have very different origins: NO2 is mainly related to road traffic and other fossil fuel combustion processes, PM10 to wood heating (in the suburb), agriculture,… O3 is more complex; here is a quote from wikipedia:

Ozone precursors are a group of pollutants, predominantly those emitted during the combustion of fossil fuels. Ground-level ozone pollution (tropospheric ozone) is created near the Earth’s surface by the action of daylight UV rays on these precursors. The ozone at ground level is primarily from fossil fuel precursors, but methane is a natural precursor, and the very low natural background level of ozone at ground level is considered safe.

And of course these levels of pollution also depends on the weather conditions. A cold weather may yield a larger PM10 concentration due to heating. A sunny weather may increase the level of ozone concentration, while some rain may “clean” the particles, I guess. So many factors contribute to air pollution.

Concerning the first part of 2020, let’s try to compare the above pollution values with the ones recorded in the 2 previous years.

Monthly air quality from 2018 to 2020

We already have the data for 2020, so we now fetch the data for the same range of days (january 1 to may 22), but for 2018 and 2019. The data from previous years (<2018) does not seem to be available on the web service.

daily_pollution_index = []
for year in range(2018, 2020):
    for day in progress_bar(day_range):
        try:
            day = day.replace(year=year).date()
            day_str = day.strftime("%Y-%m-%d")
            params = {"date": day_str}
            r = requests.get(url=URL, params=params)
            data = r.json()
            if data is not None:
                if data["indices"] is not None:
                    daily_pollution_index.append(data["indices"])
        except:
            print(f"No data for the date: {day.date()}")

100.00% [143/143 00:22<00:00]

No data for the date: 2020-02-29

100.00% [143/143 00:21<00:00]

No data for the date: 2020-02-29

Similarly to what we did before, we create a dataframe with a DatetimeIndex:

df_hist = pd.DataFrame(daily_pollution_index)
df_hist.rename(
    columns={"valeur": "value",}, inplace=True,
)
df_hist.drop(["type_valeur", "couleur_html", "qualificatif"], axis=1, inplace=True)
df_hist["value"] = df_hist["value"].astype(float)
df_hist["date"] = pd.to_datetime(df_hist["date"])
df_hist.set_index("date", inplace=True)
df_hist.head(2)

	value
date
2018-01-01	32.346865
2018-01-02	31.257633

And we merge the data corresponding to the 3 different years:

df.rename(
    columns={"value": "2020",}, inplace=True,
)
df["month"] = df.index.month
df["day"] = df.index.day
df.reset_index(drop=False, inplace=True)
df_hist["month"] = df_hist.index.month
df_hist["day"] = df_hist.index.day
df_all = pd.merge(df, df_hist["2018"], on=["month", "day"], how="left").rename(
    columns={"value": "2018"}
)
df_all = pd.merge(df_all, df_hist["2019"], on=["month", "day"], how="left").rename(
    columns={"value": "2019"}
)
df_all.drop(["month", "day"], axis=1, inplace=True)
df_all.set_index("date", inplace=True)
df_all.head(2)

	2020	2018	2019
date
2020-01-01	50.425372	32.346865	21.538545
2020-01-02	48.580376	31.257633	31.926375

Let’s plot the monthly average:

monthly = df_all.resample("M").mean()
monthly["month"] = monthly.index.month_name()
monthly.set_index("month", inplace=True, drop=True)
ax = monthly.plot.bar(figsize=FS, rot=10)
_ = ax.set_ylim(0, 100)
_ = ax.set(
    title="Monthly air quality in Lyon 4E",
    xlabel="Date",
    ylabel="Air pollution index (smaller is better)",
)

png

Well there is no significant difference between the different years.

Unfortunately the web service does not offer separate data for each pollutant. However these data are available from their website here, as CSV files.

Daily concentration of 3 pollutants

We download the data for the 3 pollutants seen earlier, on the same time range and with the same frequency. Let’s look at the first file:

NO2 = pd.read_csv("NO2.csv", sep=";")
NO2

	Station	Polluant	Mesure	Unité	01/01/2020	...	22/05/2020
0	Lyon Centre	Dioxyde d'azote	Dioxyde d'azote	microg/m3	38,2	...	18,3

1 rows × 147 columns

We need to transpose the dataframe in order to create a DatetimeIndex, and take care of the french number and date formats:

NO2.drop(["Station", "Polluant", "Mesure", "Unité"], axis=1, inplace=True)
NO2 = NO2.T
NO2.index = pd.to_datetime(NO2.index, format="%d/%m/%Y")
NO2.columns = ["NO2"]
NO2["NO2"] = NO2["NO2"].astype(str).map(lambda s: s.replace(",", ".")).astype(float)
NO2.head(2)

	NO2
2020-01-01	38.2
2020-01-02	42.6

Now we do the same with the two other files and concatenate the 3 columns:

O3 = pd.read_csv("O3.csv", sep=";")
O3.drop(["Station", "Polluant", "Mesure", "Unité"], axis=1, inplace=True)
O3 = O3.T
O3.index = pd.to_datetime(O3.index, format="%d/%m/%Y")
O3.columns = ["O3"]
O3["O3"] = O3["O3"].astype(str).map(lambda s: s.replace(",", ".")).astype(float)
PM10 = pd.read_csv("PM10.csv", sep=";")
PM10.drop(["Station", "Polluant", "Mesure", "Unité"], axis=1, inplace=True)
PM10 = PM10.T
PM10.index = pd.to_datetime(PM10.index, format="%d/%m/%Y")
PM10.columns = ["PM10"]
PM10["PM10"] = PM10["PM10"].astype(str).map(lambda s: s.replace(",", ".")).astype(float)
pol = pd.concat([NO2, O3, PM10], axis=1)
pol.head(2)

	NO2	O3	PM10
2020-01-01	38.2	0.8	38.2
2020-01-02	42.6	0.9	35.1

Here is a little function to plot each pollutant:

def plot_pollutant(df, pollutant, unit):
    ax = pol[pollutant].plot(
        ms=10,
        linewidth=3,
        alpha=0.6,
        style="-o",
        legend=False,
        color=colors[0],
        figsize=FS,
    )
    max_val = pol[pollutant].max()
    _ = plt.vlines("2020-03-17", 0, max_val, colors="r")  # lockdown start
    _ = plt.vlines("2020-05-11", 0, max_val, colors="r")  # lockdown end
    _ = ax.set(
        title=f"Daily {pollutant} cencentration in Lyon centre",
        xlabel="Date",
        ylabel=f"{pollutant} cencentration ({unit})",
    )

that we can call 3 times:

plot_pollutant(pol, "NO2", "microg/m3")

png

plot_pollutant(pol, "O3", "microg/m3")

png

plot_pollutant(pol, "PM10", "microg/m3")

png

So clearly the NO2 concentration dropped at the begining of the lockdown alongside the intensity of road traffic, and remain low after (as does road traffic). The level of the two other pollutants during lockdown have a lot do with the specific weather conditions (see this page from Atmo).