Free & OpenSource Software Documentation for BigGeoData Processing
Welcome to the Spatial Ecology’s documentation! The content of this documentation is free and open source, (CC-BY-SA license) it can be used, but WITHOUT ANY WARRANTY. You can remix, tweak, and build upon our work as long as you credit us and license your new creations under the identical terms. Software we use have a GNU General Public License GPL or GPL / MIT compatible licenses.
Table of Contents
- Python environments or how to survive to your journey in the geodata space
- Introduction to Python
- Python & GeoComputation
- RasterIO for dummies: a brief intro to a pythonic raster library
- 2. Preparing the dataset for next ML exercises via rasterio
- 3. Beyond the basics
- Generalities about OGC Geospatial extensions for SQL
- SDM1 : Montane woodcreper - Gecomputation
- SDM1 : Montane woodcreper - Model
- SDM2 : Varied Thrush - Model
- Manipulate GSIM files
- Data type in GTiff
- Temporal interpolation of landsat images
- Dynamic Time Warping
- Estimating nitrogen and phosphorus concentrations in streams and rivers
- Estimating nitrogen concentrations in streams and rivers using NN
- Autoencoder (AE), Variational Autoencoder (VAE) and Generative Adversarial Network (GAN)
- LSTM Network
- Estimation of tree height using GEDI dataset - Data explore
- Estimation of tree height using GEDI dataset - Predictors extraction at point location
- Estimation of tree height using GEDI dataset - Random Forest prediction
- Estimation of tree height using GEDI dataset - Support Vector Machine for Regression (SVR) - 2022
- Estimation of tree height using GEDI dataset - Support Vector Machine for Regression (SVR) - 2023
- Estimation of tree height using GEDI dataset - Support Vector Machine for Regression (SVR) - 2024
- Exercise: explore the other parameters offered by the SVM library and try to make the model better. Some suggestions:
- Estimation of tree height using GEDI dataset - Perceptron 1 - 2022
- Estimation of tree height using GEDI dataset - Perceptron 1 - 2023
- Estimation of tree height using GEDI dataset - Perceptron - 2024
- Estimation of tree height using GEDI dataset - Perceptron tree prediction - 2023
- Estimation of tree height using GEDI dataset - Perceptron complete - 2024
- Estimation of tree height using GEDI dataset - Clean Data - Perceptron 2 - 2022
- Estimation of tree height using GEDI dataset - Neural Network 1
- Estimation of tree height using GEDI dataset - Neural Network 1 - 2024
- Neural Nets (pt.3), Interpretability and Convolutional Neural Networks
- Using Multi-layer Perceptron and Convolutional Neural Networks for Satellite image classification - 2022.
- Using Multi-layer Perceptron and Convolutional Neural Networks for Satellite image classification - 2023
- Using CNNs for a image dataset
- Prithvi 100M model
- Proposed exercises
- Autoencoder (AE), Variational Autoencoder (VAE)
- Implementing an Autoencoder
- Autoencoding MNIST
- Section 2
- Section 3 - Generative Models
- Using LSTM for time-series predictions
- Using GPT to implement a Convolutional Neural Networks for Satellite image classification.
- Classification in Python using pyjeo and sklearn
- Google Earth Engine use via Python, containers and other mythical beasts
- 1. 2021 SWEDEN
- 1.1. Calculating landcover distribution & vegetation extraction
- 1.2. Compiling OTB from source
- 1.3. Observed and simulated internal variability climate feedbacks comparison.
- 1.3.1. 1. Project description
- 1.3.2. 2. Data set and Methods
- 1.3.3. 2.1 Data
- 1.3.4. 2.1.1 Observations
- 1.3.5. 2.1.2 Simulations
- 1.3.6. 2.2 Methods
- 1.3.7. 2.2.1 Preprocessing
- 1.3.8. Bash script to preprocess observations (detrend and deseasonalize)
- 1.3.9. Bash script to preprocess simulations CMIP6 historical (detrend and deseasonalize)
- 1.3.10. Bash script to preprocess simulations AMIP (detrend and deseasonalize)
- 1.3.11. Bash script to preprocess simulations CMIP6 piControl and Abrupt
- 1.3.12. 2.2.2 Feedbacks
- 1.3.13. 3. Results
- 1.3.14. 3.1 Observed feedbacks
- 1.3.15. 3.2 Simulated feeedbacks
- 1.4. Statistical comparison global gridded climate datasets and their influence on LPJ-GUESS model outputs
- 1.5. Emulating FLEXPART with a Multi-Layer Perceptron
- 1.5.1. Carlos Gómez-Ortiz
- 1.5.2. Department of Physical Geography and Ecosystem Science
- 1.5.3. Lund University
- 1.5.4. Inverse modeling is a commonly used method and a formal approach to estimate the variables driving the evolution of a system, e.g. greenhouse gases (GHG) sources and sinks, based on the observable manifestations of that system, e.g. GHG concentrations in the atmosphere. This has been developed and applied for decades and it covers a wide range of techniques and mathematical approaches as well as topics in the field of the biogeochemistry. This implies the use of multiple models such as a CTM for generating background concentrations, a Lagrangian transport model to generate regional concentrations, and multiple flux models to generate prior emissions. All these models take several computational time besides the proper computational time of the inverse modeling. Replacing one of these steps with a tool that emulates its functioning but at a lower computational cost could facilitate testing and benchmarking tasks. LUMIA (Lund University Modular Inversion Algorithm) (Monteil & Scholze, 2019) is a variational atmospheric inverse modeling system developed within the regional European atmospheric transport inversion comparison (EUROCOM) project (Monteil et al., 2020) for optimizing terrestrial surface CO2 fluxes over Europe using ICOS in-situ observations. In this Jupyter Notebook, I will apply a AI tool to emulate the Lagrangian model FLEXPART to simulate the regional concentrations at one of the stations whitin the EUROCOM project.
- 1.5.4.1. Import packages
- 1.5.4.2. Download data
- 1.5.4.3. Retrieve data from swestore
- 1.5.4.4. Plot fluxes and observations
- 1.5.4.4.1. Read the fluxes
- 1.5.4.4.2. Read the observations
- 1.5.4.4.3. Determine the spatial and temporal coordinates
- 1.5.4.4.4. Plot the observation sites
- 1.5.4.4.5. Compute monthly and daily fluxes in PgC
- 1.5.4.4.6. Plot daily fluxes aggregated over the domain
- 1.5.4.4.7. Plot the fit to observations
- 1.5.4.5. Modeling observations
- 1.5.4.5.1. Time-series
- 1.5.4.5.2. Preparing input data
- 1.5.4.5.3. Importing additional packages
- 1.5.4.5.4. MLP model
- 1.5.4.5.5. Function to calculate results
- 1.5.4.5.6. Generating training and validation datasets
- 1.5.4.5.7. Training the MLP
- 1.5.4.5.8. Plotting results
- 1.5.4.5.9. Training MLPs for all sites
- 1.5.4.5.10. Plotting results for all sites
- 1.5.4.5.11. Conclusions
- 1.5.4.5.12. References
- 1.5.4.5.13. LSTM model
- 1.6. Processing Elmer/Ice output
- 1.7. pan-Arctic classified slope and aspect maps (Geo computation only)
- 1.8. Seasonal Analisis of discharges in the Mälaren catchement.
- 1.9. Mapping of soil organic carbon stocks with Random Forest
- 1.10. NDVI Computation
- 1.11. Phase Change Analysis
- 1.12. Relationship between continental-scale patterns of fire activity and modes of climate variability
- 2. 2022 MATERA
- 2.1. Janusz Godziek: Damaged vs undamaged trees - Random Forest classification
- 2.2. Alonso Gonzalez: Stream Network Abstraction
- 2.3. Sebastian Walter: Images shadow removal
- 2.4. Jaime García: Modelling freshwater biodiversity: setting the scene, from geo-data to text-data
- 2.4.1. First step: environmental information
- 2.4.2. Data preparation
- 2.4.3. Sub-catchments as unit of analysis for freshwater biodiversity analysis
- 2.4.4. Computational Units as units of data processing
- 2.4.5. Scripting procedures
- 2.4.5.1. Observations
- 2.4.5.2. Snapping observations to stream network
- 2.4.5.3. Calculating upstream basins from observation points
- 2.4.5.4. gdallocationinfo: at pixel value extraction
- 2.4.5.5. Stream order tables from vector .gpkg
- 2.4.5.6. r.univar for zonal statistics
- 2.4.5.7. Land cover
- 2.4.5.8. Deriving new data
- 2.4.5.9. The final set of tables
- 2.5. Maria Üblacker: Spectral clustering of freshwater habitats
- 2.5.1. Project description
- 2.5.2. Setup R Markdown Python Engine
- 2.5.3. R and Python packages
- 2.5.4. Study region
- 2.5.5. Example code for filtering data from text files by sub-catchment ID
- 2.5.6. Load the data
- 2.5.7. Principal Component analysis (PCA)
- 2.5.8. Evaluation of the best number of clusters
- 2.5.9. Spectral clustering
- 2.5.10. Predict clusters for the whole dataset
- 2.5.11. Exporting sub-catchment IDs and k for reclassification
- 2.5.12. Example code for reclassifying a raster file
- 2.5.13. Spatial cluster plots
- 2.5.14. Conclusion
- 2.5.15. Reference
- 2.6. Afroditi Grigoropoulou: Species Distribution Model with Random Forest
- 2.6.1. Taxon: genus Prebaetodes (Mayfly), occurring in a drainage basin in Colombia
- 2.6.2. Goal: Predict which subcatchments provide suitable habitats for this genus
- 2.6.3. Classify the subcatchments of the basin as 1 or 0, based on whether their habitat is suitable or not
- 2.6.4. Random forest classification
- 2.6.5. 1. Extract predictors for each subcatchment of this drainage basin
- 2.6.6. Computational unit
- 2.6.7. Basin
- 2.6.8. Subcatchments
- 2.6.9. Extract BIOCLIM data
- 2.6.10. Extract land cover mean proportion per subcatchment
- 2.6.11. Extract mean and sd of elevation, flow accumulation and slope gradient per subcatchment
- 2.6.12. Join all predictors in R
- 2.6.13. 2. Extract the predictors for the subcatchments where the genus occurs, plus for 10,000 random subcatchments that will serve as pseudoabsence data
- 2.6.14. For the prediction, we need the subcatchment raster to be cropped to the extent of the basin
- 2.6.15. 3. Run SDM with random forest
- 2.6.16. Predict with fitted model
- 2.6.17. Reclassify subcatchment raster based on predicted values
- 2.6.18. Predicted habitat suitability
- 2.7. Gidske L. Andersen: Topographic and hydrological influence on vegetation in an arid environment
- 2.8. Yusdiel Torres-Cambas: Distribution of freshwater biodiversity across Cuba
- 2.8.1. Introduction
- 2.8.2. 1. Predictor variables
- 2.8.3. 2. Stream network and sub-basins
- 2.8.3.1. 2.1. Make a GRASS GIS database
- 2.8.3.2. 2.2. Extract flow direction, flow accumulation, stream network, basins and sub-basins
- 2.8.3.3. 2.3. Aggregate predictors by sub-basin
- 2.8.3.4. 3.2. Make maps with presence and pseudoabsences
- 2.8.3.5. 3.3. Make inputs required to create an SSN object, a kind of R object necessary to fit a Spatial Linear Models for Stream Networks (Hoef et al. 2014, Peterson et al. 2020).
- 2.8.4. 5. Species distribution modelling
- 2.9. Txomin Bornaetxea: Modeling debris flow source areas
- 2.10. Ritwika Mukhopadhyay: Comparative Analysis of the prediction of AGB using Random Forest Regression, Support Vector Machine for Regression & FeedForward Neural Network
- 2.11. Hyeyoung Sim: Clustering Electric Vehicle Charging Pattern with DTW and EV Charging energy demand Prediction with LSTM
- 2.12. Hemalatha Velappan: Classification of different tree species plantations using deep learning
- 2.12.1. The goal of this work is to develop a model to identify planted forests and the tree species growing there. The model is developed using the
- 2.12.1.1. (1) known locations of planted forests based on literature and personal communications,
- 2.12.1.2. (2) image analysis and feature extraction of planted trees
- 2.12.1.3. (3) spectral signatures unique to each species
- 2.12.1.3.1. Performing zonal statistics on the polygon shapefile wrt the sentinel satellite image
- 2.12.1.3.2. The following are the tree species and the corresponding label numbers
- 2.12.1.3.2.1. Acrocarpus fraxinifolius 1
- 2.12.1.3.2.2. Calycophyllum spruceanum 2
- 2.12.1.3.2.3. Cedrela Mixed 3
- 2.12.1.3.2.4. Guazuma crinita 4
- 2.12.1.3.2.5. Miconia barbeyana 5
- 2.12.1.3.2.6. Ochroma pyramidale 6
- 2.12.1.3.2.7. Other Mixed 7
- 2.12.1.3.2.8. Swietenia Cedrela Mixed 8
- 2.12.1.3.2.9. Swietenia macrophylla 9
- 2.12.1.3.2.10. Swietenia Mixed 10
- 2.12.1.3.3. The input and output variables are split between training and testing by 70:30
- 2.12.1.4. All the 14 columns are X variables. The final column that has categorical numbers is the target or Y variable
- 2.12.2. Results:
- 2.12.1. The goal of this work is to develop a model to identify planted forests and the tree species growing there. The model is developed using the
- 2.13. Myriam Marending: Ships and economic activity: a starter
- 2.14. Florian Ellsäßer: Using a LSTM network and SHAP to determine the impact of drought and season on winter wheat