📆 Project Period | June - September, 2025 |
👤 CIN Visiting Researcher | David Șeu |
Project Summary
- Developed a continental-scale modeling framework to estimate key soil properties (SOC, N, P, K, pH) using remote sensing and environmental covariates.
- Integrated physics-informed radiative transfer models (RTMs) with machine learning and foundation models, enabling a hybrid approach that combines interpretability and scalability.
- Built a harmonized European soil dataset based on LUCAS and national surveys, covering diverse agro-ecological zones.
- Designed a robust spatial-block validation framework, ensuring realistic generalization and quantified model uncertainty via conformal prediction.
- Achieved state-of-the-art accuracy for SOC (MAE = 5.12 g/kg, CCC = 0.77) and N (MAE = 0.44 g/kg, CCC = 0.77).
- Collaboration with ESA Φ-Lab accelerated access to expert supervision, validation practices, and scientific visibility within Europe’s soil-monitoring ecosystem.
- Outcome: a validated framework supporting scalable digital soil mapping, with direct applications for sustainable precision agriculture.
Development Tools
- Earth observation datasets: HLS (Landsat + Sentinel-2), MODIS, ERA5, CERES SYN1Deg, CHELSA, GAEZ, EcoTapestry.
- Machine learning: XGBoost with randomized stability selection, conformal prediction for uncertainty calibration.
- Physics-based modeling: PROSAIL RTM inversion to derive canopy biochemical and structural traits (LAI, Cab, Cw, Cm).
- Representation learning: Presto Sentinel-2 foundation model embeddings for high-order temporal dependencies.
- Infrastructure: HPC clusters (EuroCC Netherlands, AWS, Azure) and cloud pipelines for scalable feature generation and validation.
- Languages and frameworks: Python (Polars, Xarray, Scikit-learn, XGBoost, Pandas, Numpy, Scipy, Matplotlib), AWS S3, EarthAccess, CDS, Planetary Computer.
Development Outputs
- Reports & papers: Pre-print: “Seeing Soil from Space: Towards Robust and Scalable Remote Soil Nutrient Analysis” (Oct 2025).
- Demo Maps: https://co2angels.com/select-map
- Upcoming outputs: SOC/N/P/K/pH prediction maps for Romania parcels; operational API integration in the CO2 Angels platform.
Project Description
Overview
The project Seeing Soil from Space aims to build a robust, scalable, and interpretable system for monitoring soil health from space, specifically targeting key indicators: soil organic carbon (SOC), total nitrogen (N), available phosphorus (P), extractable potassium (K), and soil pH. Developed in collaboration with the European Space Agency Φ-Lab, the system combines physics-informed machine learning and foundation models to derive nutrient estimates from multisource Earth observation (EO) and environmental data, enabling reliable soil assessment at a continental scale.
This effort directly supports Europe’s transition toward sustainable agriculture by offering a data-driven alternative to costly field sampling, aligned with the objectives of the EU Soil Monitoring Law, and the Green Deal.
Methodology
The framework follows the SCORPAN soil-forming factor model, formalized as S=f(c, o, r, p, a, n, t), and operationalizes it by coupling long-term, time-invariant descriptors with dynamic monthly trajectories up to the sampling year. Static covariates encode the environmental baseline: CHELSA bioclimatic normals capture multi-decadal climate constraints; GAEZ agro-ecological strata contextualize processes within crop–environment suitability; EcoTapestry landforms and lithology, together with global soil type layers, represent parent materials and pedogenic structure; and a 30 m terrain model provides elevation and derivatives.
On top of this baseline, the system integrates dynamic covariates that make the temporal component explicit. Harmonized Landsat–Sentinel (HLS, 30 m, ~2–3-day revisit) and MODIS (500 m, 8-day) surface reflectance are cloud-masked, fused and assembled into monthly sequences. From these gap-free sequences, we derive vegetation productivity and exposure descriptors, classical indices (e.g., NDVI), phenology metrics, and soil-exposure proxies such as NDTI, so the model can link above-ground biomass dynamics, residue deposition, and surface exposure events to below-ground nutrient cycling. To move beyond index-level proxies, we retrieve canopy biochemical and structural traits through PROSAIL inversions (LAI, chlorophyll/Cab, water/Cw, and dry matter/Cm), which provide a mechanistic bridge between plant physiology, stress, and nutrient assimilation. Complementing the physics-informed traits, we incorporate representation learning by extracting Sentinel-2 foundation-model embeddings (Presto). During bare-soil windows, spectral mixture analysis and soil-line criteria yield bare-soil reflectance composites, furnishing direct optical signatures of the soil surface.
Climate and energy flux drivers add the forcing context at monthly cadence: ERA5 reanalysis (∼5 km) provides temperature, precipitation, etc.; CERES shortwave fluxes (1°) integrate radiation; MODIS land-surface temperature (~1 km) further resolves thermal conditions. Together, these dynamic layers instantiate the ‘t’ component of SCORPAN in interaction with c, o, and p, allowing the model to learn spatio-temporal mappings from environmental drivers to soil properties.
Trajectories of canopy traits estimated from PROSAIL inversion over 24 months prior to sampling, stratified by AEZ. LAI is shown with the median and interquartile range; Cab, Cw, and Cm are normalized for comparability. Trait phenology differs by AEZ, reflecting contrasts in productivity, stress response, and allocation strategies that mediate soil–plant nutrient exchange.
Median and interquartile range of Normalized Difference Vegetation Index (NDVI), NDTI, and total precipitation (TP) in the 24 months preceding sampling, stratified by AEZ. Temperate systems show regular annual cycles, while subtropical and irrigated systems exhibit water-driven lags and extended greenness. These dynamics represent indirect proxies of nutrient cycling processes.
Given the breadth of inputs (initially >15k variables across static/dynamic sources), we employ a two-stage reduction pipeline tailored per target (SOC, N, P, K, pH). First, aggressive de-collinearity filtering prunes redundant predictors while preserving representative members of highly correlated families. Second, we run randomized stability selection with Extreme Gradient Boosting (XGB) over 64 resampling/perturbation iterations, ranking variables by consistency under repeated subsampling. This procedure concentrates the predictive signal into a compact, stable subset (e.g., 177–199 features per target after reduction), revealing that relatively few covariates carry robust information across heterogeneous European landscapes.
Category-level attribution shows the dominance of satellite-derived signals, especially bare-soil composites and temporal vegetation descriptors, while RTM traits and foundation-model embeddings contribute complementary yet still maturing signals that are expected to strengthen with hyperspectral inputs.
Modeling is performed with XGB on target-specific feature sets, prioritizing interpretability and efficiency for tabular, mixed-scale inputs. Calibration is stratified by Agro-Ecological Zones (AEZs) to respect environmental contrasts and mitigate global bias. To ensure realistic generalization, the evaluation design combines 100 km spatial blocking with AEZ stratification so that training and testing pools are both geographically disjoint and compositionally distinct. Uncertainty is quantified through split-conformal prediction.
Results
The models achieved state-of-the-art accuracy for continental-scale nutrient estimation:
- SOC – MAE 5.12 g/kg, CCC 0.77
- N – MAE 0.44 g/kg, CCC 0.77
- P – MAE 14.96 mg/kg, CCC 0.53
- K – MAE 76.99 mg/kg, CCC 0.58
- pH – MAE 0.55, CCC 0.67
Predictive performance remained stable across major Agro-Ecological Zones (AEZs), demonstrating robustness even under spatially disjoint evaluation.
The system achieved 90 % uncertainty coverage, confirming the reliability of confidence intervals.
Scientific and Operational Impact
This project establishes a new benchmark for soil property mapping by integrating physics-based interpretability with foundation-model scalability. It demonstrates that a single system can generalize across heterogeneous European landscapes while maintaining statistical rigor and quantified uncertainty.
The collaboration with ESA Φ-Lab was instrumental in:
- refining the methodological design through expert mentorship,
- ensuring scientific reproducibility and transparency,
- connecting the project with the Collaborative Innovation Network (CIN), and
- preparing its transition toward operational deployment.
These results form the core of CO2 Angels’ monitoring, reporting, and verification (MRV) platform, which empowers farmers and agribusinesses to track soil health and nutrient efficiency using remote sensing.
Future Developments
Building on this collaboration, the next phase will:
- Integrate hyperspectral data for improved bare-soil retrieval,
- Expand to additional soil properties and integrate crop growth models
- Scale the model globally
Ultimately, the framework provides a scientific and technological foundation for continental-scale soil properties monitoring, bridging research and practice toward a more sustainable and resilient agricultural future.
D. Șeu, N. Longépé, G. Cioltea, E. Maidik, and C. Andrei, “Seeing Soil from Space: Towards Robust and Scalable Remote Soil Nutrient Analysis,” preprint, European Space Agency Φ-Lab & CO2 Angels, Oct 2025.