The project was developed by the CIN Researcher Sahba El-Shaw
Project Summary
- Developed an Environmental Feature Clustering framework, integrating geospatial data, remote sensing environmental indices, and AI-driven clustering techniques for eco-geographical delineation.
- Leveraged Earth Observation (EO) data from Copernicus Sentinel-2 satellites to analyze ecosystem variability across diverse ecoregions in the Eastern Mediterranean Coastal Region of the Middle East.
- Implemented unsupervised clustering algorithms to identify spatial patterns in environmental variables such as vegetation, water, and land health.
- Evaluated EFCs against existing ecoregion datasets to assess model accuracy and ecological correlations.
- Future work includes incorporating additional time-series analyses, refining clustering parameter optimization strategies, and expanding the framework’s applicability for biodiversity conservation, climate impact studies, and sustainable land management.
Development Tools
- Geospatial & Remote Sensing Libraries:
- GDAL, Rasterio, Geopandas for geospatial data manipulation
- PyTorch, Scikit-Learn, TensorFlow for machine learning and clustering
- EO Sources:
- Copernicus Sentinel-2 data
- Computational Infrastructure:
- ESA Φ-lab Cloud Computing resources, CREODIAS
- GPU acceleration for ML model training and clustering
- Visualization:
- Matplotlib and Shapely for cluster exploration
Development Outputs
- Validation reports comparing EFC-generated clusters with global ecoregion datasets
- Upcoming Developments:
- Releasing code and datasets following publication
- Expansion of the framework to include time-series clustering for detecting ecosystem changes
- Refinement of feature selection techniques using explainable AI (XAI) methods
- Development of an open-access web-based visualization tool for broader scientific use
List of publications developed during this collaboration:
- Living Planet Symposium abstract submitted
- Inclusion of study in PhD in Sustainable Development and Climate Change thesis
Project Description
This project introduces Environmental Feature Clusters (EFCs) as an alternative to traditional ecoregion classifications. Ecoregions are areas with similar ecosystems, vegetation types, and biodiversity. Unlike predefined boundaries based on expert delineation, EFCs are generated using unsupervised clustering of Sentinel-2 spectral indices, allowing for a dynamic, objective, and scalable approach to ecoregion mapping.
The Environmental Feature Clusters (EFC) framework offers a data-driven approach to boundary definition, making it neutral and objective, particularly in geopolitically sensitive regions where territorial classifications can be contentious. Its scalability and reproducibility allow for application across different geographic regions, facilitating global and regional pattern analysis while maintaining methodological consistency. Additionally, EFCs can complement established ecoregions by identifying previously unrecognized environmental patterns, providing deeper ecological context and enhancing environmental monitoring efforts.
The collaboration with ESA Φ-lab focused on developing, refining, and validating this methodology, leveraging GPU-accelerated computation on the CREODIAS platform to process large-scale EO data efficiently.
Focus Region
The Eastern Mediterranean Coastal Region in the Middle East was chosen for this analysis due to its significance in studying sustainability and climate dynamics, as it is highly sensitive to both environmental and anthropogenic pressures. The region is projected to experience accelerated temperature increases, with annual warming rates surpassing global averages by 20% and summer warming rates exceeding them by 50%. Additionally, its diverse ecosystems, spanning from coastal wetlands to semi-arid landscapes, face heightened vulnerability to climate change impacts, making it a crucial area for environmental monitoring and adaptation strategies.
Selected Baseline for Comparison
The study initially considered World Wide Fund for Nature (WWF) ecoregions, One Earth ecoregions, and FAO agro-ecological zones as potential baselines. However, One Earth ecoregions were ultimately selected for two key reasons. First, they offer a more comprehensive ecological framework, refining traditional WWF ecoregions by integrating new ecological data. Second, they provide higher-resolution classifications, making them better suited for AI-driven clustering comparisons. The goal is to evaluate whether AI-based unsupervised clustering can produce similar or more specific ecological divisions, helping to validate this data-driven approach.
Methodology
The approach followed a structured methodology consisting of three key steps: data acquisition and preprocessing, unsupervised spatial clustering, and analysis.
- Data Acquisition & Preprocessing:
First, Sentinel-2 imagery was acquired to cover the selected region and time points.
- Data Pre-processing
To ensure data quality, preprocessing involved filtering images with the lowest available cloud cover and ensuring full data tile coverage. After selection, various environmental indices were computed using Sentinel-2 spectral bands.
- Unsupervised spatial clustering
Next, for unsupervised spatial clustering, Principal Component Analysis (PCA) was applied to reduce the dimensionality of the dataset from nine indices to three principal components while retaining spatial coordinates. These features were then processed using a ResNet model, followed by k-means clustering to group the landscape based on similar environmental characteristics. The optimal number of clusters was determined using the elbow method, and water bodies were filtered out of the output clusters.
Preliminary Results
Fig. 1. Initial output of EFC method, with output clusters (left) and One Earth ecoregions (left) for comparison
Five unique environmental clusters were identified (Fig. 1). Clusters aligned with known biome boundaries (e.g., Mediterranean forests transitioning to semi-arid shrublands), demonstrating their ecological validity. The framework successfully captured ecotones, or transition zones, where ecosystems shift due to climate and geography. Traditional ecoregions do not typically consider ecotones.
On the other hand, cluster divergence from predefined ecoregions highlighted potential misclassifications or underrepresented environmental patterns in existing datasets. However, EFCs successfully captured fine-scale environmental variations, highlighting areas of heterogeneous landscapes that static ecoregions often overlook. In particular, clusters identified differences in desert landscapes (mountainous vs. flatlands), which may influence flood risks, water retention, and land resource management.
Advantages of the EFC Framework
The Environmental Feature Clusters (EFCs) framework offers several key advantages, making it a robust tool for ecological classification and analysis. By relying purely on remote sensing data, EFCs provide a data-driven and objective approach to environmental classification, uncovering hidden patterns without the biases of predefined ecoregions. The method also enables high-resolution spatial mapping, allowing for localized and precise classification that captures small-scale environmental variations.
Additionally, EFCs are highly adaptable, capable of scaling across different geographic regions and spatial scales, making them suitable for both regional and global applications. Unlike static ecoregion classifications, EFCs can be automatically updated with new satellite data, integrating machine learning and AI-driven clustering for real-time environmental monitoring. Lastly, the framework offers flexibility in classification and interpretation, as researchers and policymakers can adjust input indices, clustering methods, and spatial resolutions to align with specific environmental and sustainability goals.
EFCs are seen as complementary to traditional ecoregion classifications, with it being more suited for specific applications, depending on the objective scenarios. A comparison of applications with traditional regions is shown in Table 1 below.
Table 1. EFCs vs. Ecoregions Application Comparison
Scenario | Ecoregions | EFCs |
Conservation & Biodiversity | Best suited for designing protected areas and ecological reserves. | May not fully capture biodiversity but can detect changes in habitat conditions. |
Ecosystem Monitoring | Less responsive to rapid environmental change. | Tracks real-time shifts in soil, vegetation, and water indices. |
Land Use & Sustainability Planning | Broad ecological insights for general land management. | Identifies regions with shared environmental pressures for targeted interventions. |
Climate Change Adaptation | Predicts long-term ecosystem shifts. | Detects short-term environmental stressors and changing boundaries. |
Data Availability & Objectivity | Requires expert interpretation and historical data. | Fully data-driven, minimizing subjective classification biases. |
Future Development & Limitations
The EFCs framework is continuously evolving, with several key areas identified for improvement and future integration:
- Temporal Integration
- The current EFCs operate as static classifications, meaning they do not yet account for seasonal shifts or long-term climate trends. Future iterations will incorporate time-series analysis, enabling dynamic tracking of environmental changes over time.
- Cloud Masking
- The present methodology does not filter out cloud-covered pixels, which can lead to misclassifications and artifacts in the dataset. Improved preprocessing techniques will be implemented to address this limitation, ensuring cleaner and more reliable classifications.
- Ground Truthing
- While initial results show correlations with known bioregions, further on-the-ground validation is necessary to refine cluster interpretations and ensure the accuracy of the classifications in real-world applications.
- Automation & AI Integration
- Future development will focus on deep learning-based feature extraction, allowing for automated classification updates as new Earth Observation (EO) data becomes available. This will enhance the framework’s ability to continuously adapt to evolving environmental conditions.
Conclusion
The ESA Φ-lab collaboration demonstrated the feasibility of EFCs as a new paradigm for environmental classification. By leveraging machine learning and EO data, this framework can support data-driven ecological research, land use policy, and climate resilience planning. The next phase of development will focus on expanding automation, improving classification accuracy, and integrating multi-temporal datasets to track environmental changes in near real-time.