You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: readme.md
+71-5Lines changed: 71 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,4 +1,4 @@
1
-
## **Data4Land** tool - enriching land-use/land-cover with historical vector data
1
+
## **Data4Land** tool - enriching land-use/land-cover with historical vector(s) data
2
2
3
3
Data4Land tool is developed to enrich various land-use/land-cover (LULC) spatial data with data from another sources to increase the consistency and reliability of LULC datasets for multiple purposes. This software currently includes four separate Jupyter Notebooks:
4
4
@@ -18,12 +18,12 @@ Rectification of commonly produced land-use/land-cover (LULC) raster data with a
18
18
19
19
Calculates 'landscape impedance' datasets based on user-defined biodiversity stressors. This block is useful for researchers to proceed with habitat connectivity studies.
20
20
21
-
Detailed documentation on each nested workflow is given at the beginnings of Jupyter Notebooks and includes descriptions of all input and output datasets.
21
+
Detailed documentation on each nested workflow is given at the beginnings of Jupyter Notebooks and includes descriptions of all input and output datasets, but general user guide is given below.
22
22
Sample input dataset is extracted from [ESA Sentinel-2](https://collections.sentinel-hub.com/impact-observatory-lulc-map/) remote sensing spatial datasets and located [here](src/data/input/).
23
23
All configuration to execute Jupyter Notebooks is pre-defined for the sample dataset in the [configuration YAML file](src/config/config.yaml): paths, filenames, API parameters, user-defined numerical parameters.
24
24
Data flow within the Data4Land tool can be also explored on the overarching diagram:.
25
25
26
-
### Installation
26
+
### INSTALLATION
27
27
Each Jupyter Notebook is recommended to execute through a built Docker to ensure that all dependencies needed are installed correctly. Otherwise, some of the libraries used by Data4Land may face issues. Multiple ways to configure and install Docker are available:
@@ -42,10 +42,76 @@ If you are experiencing issues with building Docker image, try to replace in `Do
42
42
43
43
Now, everything is prepared to execute the Jupyter Notebooks within a Docker environment.
44
44
45
-
#### Impact
45
+
### **USER GUIDE**
46
+
47
+
Each Notebook contains, aside from library imports and configuration file loading, a few Python classes created to modularise the code. Don’t worry — a detailed description of what each class does is provided before the corresponding code cell in the Notebooks. \
48
+
Each Notebook is also supplied by the timing code to test performance.
49
+
50
+
As a rule of thumb, it is recommended to use input land-use/land-cover (LULC) datasets of reasonable extents, covering areas similar to Catalonia, Spain or Northern England (69 925 km<sup>2</sup> and 20 650 km<sup>2</sup>, respectively) to avoid issues related to API throttling.
51
+
52
+
**ATTENTION: before running the Notebook, make sure that:**
53
+
1. You have at least one land-use/land-cover dataset in GeoTIFF format [here](data/input/lulc), which filename ends with a year, for example `lulc_esa_2017.tif`. You have also specified the impedance dataset filename in the [configuration file](config/config.yaml) in `lulc` key.
54
+
55
+
56
+
2. You have a corresponding landscape impedance/resistance dataset in GeoTIFF format [here](data/input/impedance), which filename ends with year, for example `impedance_lulc_esa_2017.tif`. You have also put the filename of impedance dataset to the [configuration file](config/config.yaml) in `impedance_tif` key.
57
+
58
+
59
+
3. You also have a table [here](data/input/impedance) that maps all LULC categories from the 1st input with the values from the 2nd input. Do not rename columns as it might break some parts of Notebooks. \
-`lulc` points out the LULC category from the input LULC raster file
66
+
-`impedance` defines the value of landscape impedance for this LULC category, derived from expert knowledge or found out by users from bibliography.
67
+
-`type` is what this LULC category describes.
68
+
-`edge_effect` is a boolean parameter - if 1 (`True`), this LULC category will be considered a 'biodiversity stressor' in the [4th Notebook](src/4_impedance.ipynb). If 0 (`False`), it won't be considered a stressor, so it won't update the initial impedance dataset.
69
+
-`vector_refine` is a boolean parameter - if 1 (`True`), this LULC category will be enriched with vector data from OpenStreetMap. If 0 (`False`), it will be just saved from the input LULC raster file, if not overwritten by overlaying features from fetched data.
70
+
71
+
You have also put the filename of impedance dataset to the [configuration file](config/config.yaml) in `impedance` key.
72
+
73
+
74
+
4. You have defined which LULC categories roads, railways and water features from vector data matches. For example, road and railways are described by the LULC=7 in the sample input dataset, and LULC=1 for water features. \
75
+
Check `lulc_codes` in the [configuration file](config/config.yaml).
76
+
77
+
78
+
5. You've received Protected Planet API token and put it to the [configuration file](config/config.yaml) in `token` key. You will only need it to run the [First Notebook](src/1_pas.ipynb) though.
79
+
80
+
81
+
**NOTEBOOK DESCRIPTIONS**
82
+
83
+
1.[**First Notebook**](src/1_pas.ipynb) enriches initial land-use/land-cover (LULC) data with auxiliary data on protected areas (PA) from [the World Database on Protected Areas (WDPA)](https://www.protectedplanet.net/en/thematic-areas/wdpa). \
84
+
Protected areas may significantly reduce resistance of LULC categories for species to pass through, when migrating between habitats. Therefore, landscapes intersected with PAs should be considered different from those without protected status. \
85
+
This workflow describes the enrichment of LULC data with protected areas. It provides two main outputs:
86
+
- LULC data enriched with protected areas (recorded as updated LULC value) for wide usage.
87
+
- For habitat connectivity calculations, impedance and affinity values to compute follow-up indicators in specific software (for example, MiraMon and Graphab).
88
+
89
+
**Attention!** you need to obtain first personal credentials for Protected Planet API. Granting access to the API is not automatic and reviewed by the Protected Planet team.
90
+
91
+
2.[**Second Notebook**](src/2_vector.ipynb) extracts and cleans up data from OpenStreetMap (OSM) database at various timestamps. \
92
+
This block uses [Overpass Turbo API](https://wiki.openstreetmap.org/wiki/Overpass_API) applied to fetch specific types of OSM features, including human-built infrastructure (roads and railways) and mostly natural features (inland waters - waterways and water bodies). Compared to the v.1.0.0, we have also excluded bridges and tunnels from Overpass Turbo API queries as they do not act as ecological barriers between habitats.
93
+
94
+
This block has multiple limitations, which can be explored in the Notebook, but the most important is timeline - OSM data through Overpass Turbo API are available from the earlier 2010s, which might lack a significant number of features compared to the current timestamp.
95
+
96
+
This block will provide user with OSM vector data by each year specified in the configuration file.
97
+
98
+
3.[**Third Notebook**](src/3_enrichment.ipynb)
99
+
100
+
This tool enriches input land-use/land-cover (LULC) raster data with data from OpenStreetMap.
101
+
102
+
Currently, this workflow has been successfully applied to enrich [MUCSC maps of Catalonia, Spain](https://www.mcsc.creaf.cat/index_usa.htm), [LCM (Land Cover Maps) by UKCEH (UK Centre for Ecology and Hydrology)](https://www.ceh.ac.uk/data/ukceh-land-cover-maps) and [Sentinel LULC](https://collections.sentinel-hub.com/impact-observatory-lulc-map/) with spatial resolution of 30, 25 and 10 m respectively. The sample of last dataset is provided along with the Data4Land tool.
2. Vector data (GPKG) to enrich and refine LULC data (currently, roads, railways, water bodies and waterways are processed) derived either from OSM or user-specified data. ***MANDATORY***
107
+
3. Tabular (CSV) data mapping LULC types to their specifications: (1) whether concrete LULC type should be refined by vector data or not (***MANDATORY***) and (2) whether negative "edge effect" of concrete LULC type should be considered, for instance, roads affect suitability of habitats alongside roads (***OPTIONAL***). This reclassification table is being used in the [first block](1_pas.ipynb) of the Data4Land tool.
108
+
109
+
4.[**Third Notebook**](src/4_impedance.ipynb)
110
+
111
+
#### IMPACT
46
112
The example of follow-up calculations of habitat connectivity based on non-enriched and enriched LULC datasets is given [here](stats/) to illustrate the significant impact of enriched raster pixels.
47
113
48
-
#### Further development
114
+
#### FURTHER DEVELOPMENT
49
115
This tool is mostly completed, but a few improvements are planned to be done:
50
116
51
117
-**Design GUI tool (command-line interface) as an addition to separate Notebooks/scripts.**
Copy file name to clipboardExpand all lines: src/2_vector.ipynb
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -120,7 +120,7 @@
120
120
"id": "2e06e0ce-55a9-4eab-8d9b-7d703b90e557",
121
121
"metadata": {},
122
122
"source": [
123
-
"Let's define Overpass query for roads. Queries below extract ways and correspoding nodes automatically as it is crucial to record geometries of spatial features. It is also important to define a maximum size of memory consumption, otherwise issue of memory runout might arise for big queries.\n",
123
+
"Let's define Overpass query for roads. Queries below extract ways and corresponding nodes automatically as it is crucial to record geometries of spatial features. It is also important to define a maximum size of memory consumption, otherwise issue of memory runout might arise for big queries.\n",
124
124
"\n",
125
125
"Four main categories of OSM features are fetched:\n",
Copy file name to clipboardExpand all lines: src/3_enrichment.ipynb
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -27,9 +27,9 @@
27
27
"\n",
28
28
"#### Input data\n",
29
29
"\n",
30
-
"Firstly, it is vital to define input data, file names and paths to them. This block also defines Open Street Map (OSM) data or user-specified vector data to refine raster data.\n",
30
+
"It is crucial to define input data, file names and paths to them. This block also defines Open Street Map (OSM) data or user-specified vector data to refine raster data.\n",
31
31
"The following types of input data are exploited:\n",
32
-
"1. Raster land-use/land-cover (LULC) data, GeoTIFF format. Cloud Optimised GeoTiff (COG) is preferable (COG with LZW compression is used to optimise storaging data). ***MANDATORY***\n",
"2. Vector data (GPKG) to enrich and refine LULC data (currently, roads, railways, water bodies and waterways are processed) derived either from OSM or user-specified data. ***MANDATORY***\n",
34
34
"3. Tabular (CSV) data mapping LULC types to their specifications: (1) whether concrete LULC type should be refined by vector data or not (***MANDATORY***) and (2) whether negative \"edge effect\" of concrete LULC type should be considered, for instance, roads affect suitability of habitats alongside roads (***OPTIONAL***). This reclassification table is being used in the [first block](1_pas.ipynb) of the Data4Land tool."
0 commit comments