Working with categorical data: land cover

This example builds upon the sample data for São Miguel introduced here to demonstrate the use of exactextract with categorical land cover data.

The following plot shows a sample of the CORINE 2018 landcover dataset that is distributed with exactextractr.

[1]:

import geopandas as gpd
import rasterio
import rasterio.plot
from matplotlib import pyplot

landcov = 'sao_miguel/clc2018_v2020_20u1.tif'
concelhos = 'sao_miguel/concelhos.gpkg'

fig, ax = pyplot.subplots()

with rasterio.open(landcov) as r:
    rasterio.plot.show(r, ax=ax, cmap='tab20c')

admin = gpd.read_file(concelhos)
admin.plot(ax=ax, facecolor='none', edgecolor='black')

[1]:

<Axes: >

Summarizing land cover classifications

One of the most basic questions we might ask is which land cover type is predominant in each concelho. We can do this with the built-in mode summary operation. The minority and variety operations are also applicable to categorical data and provide the least-common classification and number of distinct classifications, respectively.

[2]:

from exactextract import exact_extract

df = exact_extract(landcov, concelhos, ["variety", "mode", "minority"], include_cols='name', output='pandas')
display(df)

	name	variety	mode	minority
0	Lagoa	11	21	9
1	Nordeste	12	24	44
2	Ponta Delgada	21	18	5
3	Povoação	15	23	26
4	Ribeira Grande	16	21	44
5	Vila Franca do Campo	12	21	29

If needed, we can load the attribute table distributed with the landcover dataset and replace the numeric codes above with their descriptions.

[3]:

import geopandas as pd

classes = gpd.read_file('sao_miguel/clc2018_v2020_20u1.tif.vat.dbf').set_index('Value')
for col in ('mode', 'minority'):
    df[col] = df[col].map(classes['LABEL3'])
display(df)

	name	variety	mode	minority
0	Lagoa	11	Land principally occupied by agriculture, with...	Construction sites
1	Nordeste	12	Coniferous forest	Sea and ocean
2	Ponta Delgada	21	Pastures	Port areas
3	Povoação	15	Broad-leaved forest	Natural grasslands
4	Ribeira Grande	16	Land principally occupied by agriculture, with...	Sea and ocean
5	Vila Franca do Campo	12	Land principally occupied by agriculture, with...	Transitional woodland-shrub

Calculating the fraction of each land cover type

If we want more detailed information, the unique operation provides an array of distinct landcover types within each polygon. The frac operation provides a matching array with fraction of the polygon’s Cartesian area that is covered by each type.

[4]:

df = exact_extract(landcov, concelhos, ["unique", "frac"], include_cols='name', output='pandas')
display(df)

	name	unique	frac
0	Lagoa	[9, 2, 27, 21, 12, 7, 20, 44, 18, 23, 3]	[0.0063711393558363635, 0.06296995715158163, 0...
1	Nordeste	[26, 29, 24, 21, 2, 20, 23, 25, 12, 27, 18, 44]	[0.011951440831186277, 0.03741980964021535, 0....
2	Ponta Delgada	[5, 1, 10, 6, 8, 7, 3, 11, 27, 25, 44, 18, 20,...	[0.0006163751291919294, 0.0011556906409785125,...
3	Povoação	[25, 27, 26, 29, 24, 11, 21, 18, 44, 23, 10, 2...	[0.015132325263604073, 0.08214924261333095, 7....
4	Ribeira Grande	[3, 29, 44, 18, 41, 12, 25, 27, 2, 20, 7, 21, ...	[0.01728772697537766, 0.015469030136234001, 0....
5	Vila Franca do Campo	[20, 12, 25, 23, 2, 41, 29, 44, 18, 21, 27, 24]	[0.12116542049248848, 0.049864986045216775, 0....

To join these codes with the description, we can unnest the data using the pandas explode function and then remap the fields as done previously.

[5]:

df = df.explode(['unique', 'frac'])
df['unique'] = df['unique'].map(classes['LABEL3'])
display(df)

	name	unique	frac
0	Lagoa	Construction sites	0.006371
0	Lagoa	Discontinuous urban fabric	0.06297
0	Lagoa	Moors and heathland	0.066856
0	Lagoa	Land principally occupied by agriculture, with...	0.268145
0	Lagoa	Non-irrigated arable land	0.169704
...	...	...	...
5	Vila Franca do Campo	Sea and ocean	0.01868
5	Vila Franca do Campo	Pastures	0.154672
5	Vila Franca do Campo	Land principally occupied by agriculture, with...	0.359993
5	Vila Franca do Campo	Moors and heathland	0.149568
5	Vila Franca do Campo	Coniferous forest	0.020959

87 rows × 3 columns

Condensed output with `frac_as_map`

If we are working with JSON output, it is also possible to view the same information in a map format:

[6]:

exact_extract(landcov, concelhos, 'frac', include_cols='name', output='geojson', output_options={'frac_as_map': True})[:2]

[6]:

[{'type': 'Feature',
  'properties': {'frac': {9: 0.0063711393558363635,
    2: 0.06296995715158163,
    27: 0.06685635659192689,
    21: 0.2681454914912148,
    12: 0.16970419227699762,
    7: 0.006594688105163955,
    20: 0.15615313755220891,
    44: 0.03141814277570226,
    18: 0.1543907769765085,
    23: 0.0638116264431592,
    3: 0.013584491279699783},
   'name': 'Lagoa'}},
 {'type': 'Feature',
  'properties': {'frac': {26: 0.011951440831186277,
    29: 0.03741980964021535,
    24: 0.24256525747655305,
    21: 0.14168078769862463,
    2: 0.011527241099523834,
    20: 0.07714274446896201,
    23: 0.08590323108419894,
    25: 0.008761123929946161,
    12: 0.02967795033103945,
    27: 0.1277189861975002,
    18: 0.2246235200631785,
    44: 0.0010279071790714376},
   'name': 'Nordeste'}}]

Summarizing population land cover

One extension of the analysis above is to see which land covers are associated with human population in a given concelho. Is the population primary urban or rural?

As described in the basic usage example, the population density raster provides the most robust results in the presence of partially-covered pixels.

We are able to perform this analysis because the CORINE sample distributed with exactextractr has been reprojected from its native Lambert Equal Area projection into geographic coordinates consistent with GPW. Otherwise, working with multiple rasters in different projections requires transformation to a common grid using tools such as GDAL.

Very little about the call to exact_extract requires changing to incorporate population. We swap weighted_frac for frac and set weights = pop_density:

[7]:

pop_density = 'sao_miguel/gpw_v411_2020_density_2020.tif'

df = exact_extract(landcov, concelhos, ["unique", "weighted_frac"], weights=pop_density, include_cols='name', output='pandas')
df = df.explode(['unique', 'weighted_frac']).astype({'weighted_frac':'float64'})
df['unique'] = df['unique'].map(classes['LABEL3'])
df.sort_values('weighted_frac', ascending=False).drop_duplicates('name')

[7]:

	name	unique	weighted_frac
0	Lagoa	Discontinuous urban fabric	0.414716
2	Ponta Delgada	Discontinuous urban fabric	0.381365
5	Vila Franca do Campo	Discontinuous urban fabric	0.377407
4	Ribeira Grande	Discontinuous urban fabric	0.373551
1	Nordeste	Complex cultivation patterns	0.282158
3	Povoação	Complex cultivation patterns	0.261401

Looking at the highest-population land cover type in each concelho, we can can see that the western/central concelhos of Lagoa, Ponta Delgada, Ribeira Grande, and Vila Franca do Campo have a more urban population than Nordeste or Povoação to the east.

Working with categorical data: land cover

Summarizing land cover classifications

Calculating the fraction of each land cover type

Condensed output with frac_as_map

Summarizing population land cover

Condensed output with `frac_as_map`