In this short post, we are using the Global Human Settlement Layer from the European Commission:

This spatial raster dataset depicts the distribution of population, expressed as the number of people per cell.

The downloaded file has a worldwide resolution of 250m, with a World Mollweide coordinates reference system. Values are expressed as decimals (float32) and represent the absolute number of inhabitants of the cell. A value of -200 is found whenever there is no data (e.g. in the oceans). Also, it corresponds to the 2015 population estimates.

We are going to load the data into a xarray DataArray and make some plots with Datashader.

Datashader is a graphics pipeline system for creating meaningful representations of large datasets quickly and flexibly.

Imports

import rioxarray
import xarray as xr
import datashader as ds
from datashader import transfer_functions as tf
from colorcet import palette

FP = "GHS_POP_E2015_GLOBE_R2019A_54009_250_V1_0.tif"  # file path

Loading the dataset

Let’s start by opening the file using rioxarray, and dask as backend. rioxarray is a geospatial xarray extension powered by rasterio.

da = rioxarray.open_rasterio(
    FP,
    chunks=True,
    lock=False,
)
type(da)
xarray.core.dataarray.DataArray
da
<xarray.DataArray (band: 1, y: 72000, x: 144328)>
dask.array<open_rasterio-95bb17bcac9421dbddd850e4aabcecb4<this-array>, shape=(1, 72000, 144328), dtype=float32, chunksize=(1, 5632, 5632), chunktype=numpy.ndarray>
Coordinates:
  * band         (band) int64 1
  * x            (x) float64 -1.804e+07 -1.804e+07 ... 1.804e+07 1.804e+07
  * y            (y) float64 9e+06 9e+06 8.999e+06 ... -8.999e+06 -9e+06 -9e+06
    spatial_ref  int64 0
Attributes:
    STATISTICS_COVARIANCES:  6079.753709536097
    STATISTICS_MAXIMUM:      442590.9375
    STATISTICS_MEAN:         3.402693912331
    STATISTICS_MINIMUM:      0
    STATISTICS_SKIPFACTORX:  1
    STATISTICS_SKIPFACTORY:  1
    STATISTICS_STDDEV:       77.972775438201
    _FillValue:              -200.0
    scale_factor:            1.0
    add_offset:              0.0
da.spatial_ref
<xarray.DataArray 'spatial_ref' ()>
array(0)
Coordinates:
    spatial_ref  int64 0
Attributes:
    crs_wkt:       PROJCS["World_Mollweide",GEOGCS["WGS 84",DATUM["WGS_1984",...
    spatial_ref:   PROJCS["World_Mollweide",GEOGCS["WGS 84",DATUM["WGS_1984",...
    GeoTransform:  -18041000.0 250.0 0.0 9000000.0 0.0 -250.0

Total population

Let’s compute the total population count:

%%time
total_pop = da.where(da[0] > 0).sum().compute()
total_pop = float(total_pop.values)
CPU times: user 4min 45s, sys: 22.5 s, total: 5min 8s
Wall time: 40.8 s
print(f"Total population : {total_pop}")
Total population : 7349329920.0

World population was indeed around 7.35 billion in 2015.

Europe

Let’s focus on Europe with a bounding box in World_Mollweide coordinates:

minx = float(da.x.min().values)
maxx = float(da.x.max().values)
miny = float(da.y.min().values)
maxy = float(da.y.max().values)
print(f"minx : {minx}, maxx : {maxx}, miny : {miny}, maxy : {maxy}")
minx : -18040875.0, maxx : 18040875.0, miny : -8999875.0, maxy : 8999875.0

So let’s clip the data array using a bounding box:

dac = da.rio.clip_box(
    minx=-1_000_000.0,
    miny=4_250_000.0,
    maxx=2_500_000.0,
    maxy=7_750_000.0,
)

And plot this selection:

dac0 = xr.DataArray(dac)[0]
dac0 = dac0.where(dac0 > 0)
dac0 = dac0.fillna(0.0).compute()
size = 1200
cvs = ds.Canvas(plot_width=size, plot_height=size)
raster = cvs.raster(dac0)

We are using the default mean downsampling operation to produce the image.

cmap = palette["fire"]
img = tf.shade(
    raster, how="eq_hist", cmap=cmap
)
img

Europe

France

We are now going to focus on France, by cliping /re-projecting/re-cliping the data:

dac = da.rio.clip_box(
    minx=-450_000.0,
    miny=5_000_000.0,
    maxx=600_000.0,
    maxy=6_000_000.0,
)
dacr = dac.rio.reproject("EPSG:2154")
minx = float(dacr.x.min().values)
maxx = float(dacr.x.max().values)
miny = float(dacr.y.min().values)
maxy = float(dacr.y.max().values)
print(f"minx : {minx}, maxx : {maxx}, miny : {miny}, maxy : {maxy}")
minx : 3238.8963631442175, maxx : 1051199.0429940927, miny : 6088320.296559229, maxy : 7160193.962105454
dacrc = dacr.rio.clip_box(
    minx=80_000,
    miny=6_150_000,
    maxx=1_100_000,
    maxy=7_100_000,
)
dac0 = xr.DataArray(dacrc)[0]
dac0 = dac0.where(dac0 > 0)
dac0 = dac0.fillna(0.0).compute()
cvs = ds.Canvas(plot_width=size, plot_height=size)
raster = cvs.raster(dac0)
cmap = palette["fire"]
img = tf.shade(raster, how="eq_hist", cmap=cmap)
img

France

We can notice that some areas are not detailed up to the 250m accuracy, but rather averaged over larger regions, exhibiting a uniform color (e.g. in the southern Alps).