Environmental Feature Clusters (EFCs) Framework: Unsupervised Clustering of Sentinel-2 Data for Data-Driven Ecoregion Mapping

The project was developed by the CIN Researcher Sahba El-Shaw

Project Summary

Developed an Environmental Feature Clustering framework, integrating geospatial data, remote sensing environmental indices, and AI-driven clustering techniques for eco-geographical delineation.
Leveraged Earth Observation (EO) data from Copernicus Sentinel-2 satellites to analyze ecosystem variability across diverse ecoregions in the Eastern Mediterranean Coastal Region of the Middle East.
Implemented unsupervised clustering algorithms to identify spatial patterns in environmental variables such as vegetation, water, and land health.
Evaluated EFCs against existing ecoregion datasets to assess model accuracy and ecological correlations.
Future work includes incorporating additional time-series analyses, refining clustering parameter optimization strategies, and expanding the framework’s applicability for biodiversity conservation, climate impact studies, and sustainable land management.

Development Tools

Geospatial & Remote Sensing Libraries:

GDAL, Rasterio, Geopandas for geospatial data manipulation
PyTorch, Scikit-Learn, TensorFlow for machine learning and clustering

EO Sources:

Copernicus Sentinel-2 data

Computational Infrastructure:

ESA Φ-lab Cloud Computing resources, CREODIAS
GPU acceleration for ML model training and clustering

Visualization:

Matplotlib and Shapely for cluster exploration

Development Outputs

Validation reports comparing EFC-generated clusters with global ecoregion datasets
Upcoming Developments:

Releasing code and datasets following publication
Expansion of the framework to include time-series clustering for detecting ecosystem changes
Refinement of feature selection techniques using explainable AI (XAI) methods
Development of an open-access web-based visualization tool for broader scientific use

List of publications developed during this collaboration:

Living Planet Symposium abstract submitted
Inclusion of study in PhD in Sustainable Development and Climate Change thesis

Project Description

This project introduces Environmental Feature Clusters (EFCs) as an alternative to traditional ecoregion classifications. Ecoregions are areas with similar ecosystems, vegetation types, and biodiversity. Unlike predefined boundaries based on expert delineation, EFCs are generated using unsupervised clustering of Sentinel-2 spectral indices, allowing for a dynamic, objective, and scalable approach to ecoregion mapping.

The Environmental Feature Clusters (EFC) framework offers a data-driven approach to boundary definition, making it neutral and objective, particularly in geopolitically sensitive regions where territorial classifications can be contentious. Its scalability and reproducibility allow for application across different geographic regions, facilitating global and regional pattern analysis while maintaining methodological consistency. Additionally, EFCs can complement established ecoregions by identifying previously unrecognized environmental patterns, providing deeper ecological context and enhancing environmental monitoring efforts.

The collaboration with ESA Φ-lab focused on developing, refining, and validating this methodology, leveraging GPU-accelerated computation on the CREODIAS platform to process large-scale EO data efficiently.

Focus Region

The Eastern Mediterranean Coastal Region in the Middle East was chosen for this analysis due to its significance in studying sustainability and climate dynamics, as it is highly sensitive to both environmental and anthropogenic pressures. The region is projected to experience accelerated temperature increases, with annual warming rates surpassing global averages by 20% and summer warming rates exceeding them by 50%. Additionally, its diverse ecosystems, spanning from coastal wetlands to semi-arid landscapes, face heightened vulnerability to climate change impacts, making it a crucial area for environmental monitoring and adaptation strategies.

Selected Baseline for Comparison

The study initially considered World Wide Fund for Nature (WWF) ecoregions, One Earth ecoregions, and FAO agro-ecological zones as potential baselines. However, One Earth ecoregions were ultimately selected for two key reasons. First, they offer a more comprehensive ecological framework, refining traditional WWF ecoregions by integrating new ecological data. Second, they provide higher-resolution classifications, making them better suited for AI-driven clustering comparisons. The goal is to evaluate whether AI-based unsupervised clustering can produce similar or more specific ecological divisions, helping to validate this data-driven approach.

Methodology

The approach followed a structured methodology consisting of three key steps: data acquisition and preprocessing, unsupervised spatial clustering, and analysis.

Data Acquisition & Preprocessing:

First, Sentinel-2 imagery was acquired to cover the selected region and time points.

Data Pre-processing

To ensure data quality, preprocessing involved filtering images with the lowest available cloud cover and ensuring full data tile coverage. After selection, various environmental indices were computed using Sentinel-2 spectral bands.

Unsupervised spatial clustering

Next, for unsupervised spatial clustering, Principal Component Analysis (PCA) was applied to reduce the dimensionality of the dataset from nine indices to three principal components while retaining spatial coordinates. These features were then processed using a ResNet model, followed by k-means clustering to group the landscape based on similar environmental characteristics. The optimal number of clusters was determined using the elbow method, and water bodies were filtered out of the output clusters.

Preliminary Results

Fig. 1. Initial output of EFC method, with output clusters (left) and One Earth ecoregions (left) for comparison

Five unique environmental clusters were identified (Fig. 1). Clusters aligned with known biome boundaries (e.g., Mediterranean forests transitioning to semi-arid shrublands), demonstrating their ecological validity. The framework successfully captured ecotones, or transition zones, where ecosystems shift due to climate and geography. Traditional ecoregions do not typically consider ecotones.

On the other hand, cluster divergence from predefined ecoregions highlighted potential misclassifications or underrepresented environmental patterns in existing datasets. However, EFCs successfully captured fine-scale environmental variations, highlighting areas of heterogeneous landscapes that static ecoregions often overlook. In particular, clusters identified differences in desert landscapes (mountainous vs. flatlands), which may influence flood risks, water retention, and land resource management.

Advantages of the EFC Framework

The Environmental Feature Clusters (EFCs) framework offers several key advantages, making it a robust tool for ecological classification and analysis. By relying purely on remote sensing data, EFCs provide a data-driven and objective approach to environmental classification, uncovering hidden patterns without the biases of predefined ecoregions. The method also enables high-resolution spatial mapping, allowing for localized and precise classification that captures small-scale environmental variations.

Additionally, EFCs are highly adaptable, capable of scaling across different geographic regions and spatial scales, making them suitable for both regional and global applications. Unlike static ecoregion classifications, EFCs can be automatically updated with new satellite data, integrating machine learning and AI-driven clustering for real-time environmental monitoring. Lastly, the framework offers flexibility in classification and interpretation, as researchers and policymakers can adjust input indices, clustering methods, and spatial resolutions to align with specific environmental and sustainability goals.

EFCs are seen as complementary to traditional ecoregion classifications, with it being more suited for specific applications, depending on the objective scenarios. A comparison of applications with traditional regions is shown in Table 1 below.

Table 1. EFCs vs. Ecoregions Application Comparison

Scenario	Ecoregions	EFCs
Conservation & Biodiversity	Best suited for designing protected areas and ecological reserves.	May not fully capture biodiversity but can detect changes in habitat conditions.
Ecosystem Monitoring	Less responsive to rapid environmental change.	Tracks real-time shifts in soil, vegetation, and water indices.
Land Use & Sustainability Planning	Broad ecological insights for general land management.	Identifies regions with shared environmental pressures for targeted interventions.
Climate Change Adaptation	Predicts long-term ecosystem shifts.	Detects short-term environmental stressors and changing boundaries.
Data Availability & Objectivity	Requires expert interpretation and historical data.	Fully data-driven, minimizing subjective classification biases.

Future Development & Limitations

The EFCs framework is continuously evolving, with several key areas identified for improvement and future integration:

Temporal Integration

The current EFCs operate as static classifications, meaning they do not yet account for seasonal shifts or long-term climate trends. Future iterations will incorporate time-series analysis, enabling dynamic tracking of environmental changes over time.

Cloud Masking

The present methodology does not filter out cloud-covered pixels, which can lead to misclassifications and artifacts in the dataset. Improved preprocessing techniques will be implemented to address this limitation, ensuring cleaner and more reliable classifications.

Ground Truthing

While initial results show correlations with known bioregions, further on-the-ground validation is necessary to refine cluster interpretations and ensure the accuracy of the classifications in real-world applications.

Automation & AI Integration

Future development will focus on deep learning-based feature extraction, allowing for automated classification updates as new Earth Observation (EO) data becomes available. This will enhance the framework’s ability to continuously adapt to evolving environmental conditions.

Conclusion

The ESA Φ-lab collaboration demonstrated the feasibility of EFCs as a new paradigm for environmental classification. By leveraging machine learning and EO data, this framework can support data-driven ecological research, land use policy, and climate resilience planning. The next phase of development will focus on expanding automation, improving classification accuracy, and integrating multi-temporal datasets to track environmental changes in near real-time.