📆 Project Period | February - August, 2025 |
👤 CIN Visiting Researcher |
Project Summary
- Objective: Develop an AI-powered framework for estimating carbon credits from reforestation projects using Earth Observation (EO) data, with a focus on predicting Above-Ground Biomass (AGB) in a no-project baseline scenario.
- Innovation: Shift from conventional canopy height metrics to AGB as a direct and reliable proxy for carbon storage, enabling more accurate and transparent credit estimation.
- Dataset Creation: Built a multivariate, spatiotemporal dataset from 74 Verra-registered reforestation projects (2020 start), integrating ESA CCI Biomass (2015–2020), climate variables, vegetation indices, biomass loss data, soil carbon, and static geospatial features.
- Methodology: Implemented a multi-modal deep learning model combining CNN encoders and ConvLSTM to predict 2020 baseline AGB using control patches selected for similarity in historical AGB trends and soil carbon levels.
- Performance: Achieved an RMSE of 21.97 Mg/ha on the test set and a Pearson correlation of 0.4967 between AI-predicted and Verra-issued credits.
- Key Insights: Identified challenges in data resolution, temporal coverage, and model generalization; highlighted need for ecozone-specific models.
- Outcomes: Delivered a proof of concept, a complete EO-to-credit calculation pipeline, and a foundation for a future decentralized verification system. The work will be further developed into a public GitHub repository, Hugging Face dataset, and a Master’s thesis with the University of Trento.
Development Tools
During the project, multiple research and development tools were used to manage large-scale geospatial data, develop predictive models, and evaluate performance. The ESA CCI Biomass dataset was downloaded directly from the official CCI data repository, while additional datasets such as climate variables, vegetation indices, and biomass loss products were sourced and preprocessed using Google Earth Engine (GEE). Geospatial processing and alignment tasks were handled with GDAL, Rasterio, and Geopandas, enabling the extraction of patches and selection of control areas from project shapefiles.
Model development was carried out in Python, using PyTorch to implement a multi-modal architecture combining CNN encoders and ConvLSTM layers. NumPy and Pandas supported data handling, while Matplotlib and Seaborn facilitated visualization and exploratory analysis. Model training was executed on a virtual machine with GPU acceleration and expanded storage capacity to manage the computational and memory demands of multi-channel spatiotemporal data.
Development Outputs
At present, no public repositories or datasets have been released from the project. However, upcoming outputs include the creation of a GitHub repository containing the main codes developed during the research, including the data preprocessing pipeline, the multi-modal deep learning architecture, and the biomass-to-carbon credit estimation framework. The custom dataset built during the project—comprising processed patches of ESA CCI Biomass, climate variables, vegetation indices, and biomass loss data—will be published on Hugging Face to allow broader accessibility for research and benchmarking.
In addition, a Master’s thesis will be written in collaboration with the University of Trento, providing a detailed account of the methodologies, experimental results, and key findings from the ESA Φ-lab partnership. This thesis will serve as a comprehensive reference document and will complement the public code and dataset releases, ensuring that the work developed during the project can be reproduced, validated, and further extended by the scientific community.
Project Description
1. Introduction and Context
The global carbon market is a trading scheme that enables the buying and selling of carbon credits. The underlying logic is straightforward: if an initiative removes CO₂ from the atmosphere, it can issue and sell one carbon credit for each metric ton removed to those who need these credits to emit. This mechanism, created in 1997 under the Kyoto Protocol, has become a cornerstone in the fight against climate change.
There are two main types of carbon markets:
- The Compliance Market: regulated by governments and international agreements, such as the European Union Emissions Trading System (EU ETS), operating under a cap-and-trade approach. In this system, certain companies are obliged to purchase carbon credits in order to emit, thereby creating a direct economic incentive to reduce their emissions.
- The Voluntary Carbon Market: allows companies and individuals to offset their carbon footprints voluntarily, for reasons ranging from corporate sustainability commitments to reputation enhancement. In some cases, entities also launch sustainable initiatives with the goal of generating and selling credits to companies operating within the compliance market.
Within the voluntary market, carbon credits are issued for two main categories of intervention:
- Reduction projects, which aim to lower emissions through measures such as energy efficiency improvements, forest conservation, or improved land-use practices.
- Removal projects, which actively capture CO₂ from the atmosphere through strategies such as reforestation, biochar application, or direct air capture technologies.
One of the most widely implemented removal strategies is reforestation, which restores degraded lands, enhances carbon sequestration, and fosters biodiversity. However, despite its environmental benefits, the current process of measuring and verifying carbon credits from reforestation projects faces significant challenges. These include lengthy verification timelines, high operational costs, and, in some cases, a lack of transparency and objectivity in determining the actual carbon sequestered.
This combination of factors undermines trust in reforestation projects, ultimately limiting their adoption and reducing their presence in the carbon market.
2. Problem Statement and Motivation
The main challenge in the current carbon credit system lies in its verification framework, which is still largely manual and highly project specific. The process typically requires extensive on-site measurements, long and complex bureaucratic procedures, and a strong reliance on declared project baselines that may not accurately reflect the “what-if” scenario, or, in other words, what the biomass and carbon stocks would have been if the project had never been implemented.
Without a reliable and independent method for modeling these baselines, the system is exposed to risks of both over-crediting and under-crediting, which in turn undermines trust in the market and reduces its overall efficiency. These difficulties, combined with the high costs of verification, limit the diffusion of the carbon market and prevent it from reaching many locations with strong reforestation potential but insufficient financial resources to initiate projects.
My period in Φ-lab was designed to directly address this gap by introducing AI-powered baseline prediction. The core idea was to apply advanced spatiotemporal deep learning techniques to Earth Observation (EO) data to predict Above-Ground Biomass (AGB) in the “no-project” scenario, and then compare it to the biomass actually observed after the reforestation project.
An other novelty of this approach is the shift from using only canopy growth, the standard metric until now, to using AGB, which provides a more direct and reliable proxy for carbon storage. Thanks to remote sensing, this method can deliver consistent, scalable monitoring across diverse regions and ecosystems.
The expected benefits of this approach were clear:
- Fairness, baselines derived from independent EO data and machine learning, rather than project-declared figures.
- Speed, automation drastically shortens the verification cycle.
- Cost-efficiency, fewer on-site measurements are required, without sacrificing accuracy.
- Transparency, methodologies can be fully documented, audited, and standardized for open verification.
3. Dataset Creation and Preprocessing
The first phase of the project focused on building a robust and diverse dataset to train and evaluate the AI models.
- Project Selection: 74 reforestation project areas were sourced from the Verra Verified Carbon Units (VCU) Registry. All projects began in 2020 and issued credits in the same year, providing a common temporal anchor for comparison. Projects were located across multiple continents and ecosystems, ensuring that the model would learn patterns from varied climates, vegetation types, and management approaches.
- Data Sources:
- Biomass Data: ESA Biomass CCI products (2015–2020).
- Climate Data: Precipitation, temperature, wind, humidity.
- Static Data: Digital Elevation Model (DEM), geographic coordinates (latitude, longitude).
- Vegetation Indices: NDVI, RVI, and Land Cover datasets.
- Biomass Loss Data: Fires and deforestation datasets.
All datasets were aligned spatially and temporally, though the process highlighted challenges with resolution mismatches and incomplete coverage, which later influenced the model’s accuracy.
4. Methodology and Model Design
By studying the various methodologies for carbon credit calculation defined by Verra, we derived the following summary formula for estimating issued credits:
Where:
In this formula, the fundamental component is the ΔAGB, defined as the change in Above-Ground Biomass between the baseline scenario (no project) and the observed scenario (with project).
To obtain this ΔAGB, we followed the procedure described below.
Phase 1: Multivariate Biomass Prediction
The main objective was to build a model capable of predicting 2020 AGB assuming no reforestation intervention had taken place.
To train this model, we used patches surrounding the shapefiles of reforestation project areas. From these, we selected the 100 most similar patches for each project area, based on a similarity score combining:
- The historical AGB trend in the years prior to the project start.
- The soil organic carbon level in the year before the project began.
On these selected patches, we implemented a multi-modal deep learning framework to predict 2020 biomass from historical data (2015–2019).
The model processed multiple input channels, each first encoded by a dedicated CNN encoder. All encoded features were then concatenated into a single tensor and passed into a ConvLSTM.
At each time step t, the model encoded:
Where:
- : AGB at time t
- : Vegetation indices (NDVI, RVI) at t
- : Land cover at t
- : Static variables (DEM, geographic coordinates)
- : Climate variables for t (past) and t+1(future)
- : Deforestation data for t and t+1
- : Fire occurrence for t and t+1
This approach enabled the model to learn the relationship between current forest state and environmental conditions and anticipate biomass changes by incorporating future climate conditions (t+1).
The ConvLSTM maintained hidden and cell states to generate predictions for the biomass map .
Performance:
- RMSE on the test set: 21.97 Mg/ha (improvement of 5.56% compared to the baseline model, considered to be only with the AGB time series)
- Accurate estimation of total biomass per area (see Figure 3).
Figure 3: Comparison between predicted and real total biomass per test area
Below are two examples of predicted biomass maps compared to observed values:
Figures 4: Example of real AGB values, predicted and the absolute error
Phase 2: Counterfactual Inference
Once trained on no-project areas, the model was applied to project areas to estimate what the 2020 biomass would have been without reforestation interventions.
This counterfactual estimate served as the project’s baseline and was compared to the actual observed biomass from remote sensing products. The difference between the two provided the ΔAGB, which was then inserted into the Verra credit calculation formula to estimate the number of carbon credits attributable to each project.
5. Conversion from Biomass to Carbon Credits
Once the biomass difference was established, the conversion to carbon credits followed IPCC guidelines and Verra AFOLU methodologies:
- Carbon Fraction of Biomass: Using default IPCC physical constants.
- CO₂ Equivalent Calculation: Factoring in molecular weight ratios.
- Credit Issuance: Expressed as Verified Carbon Units (VCUs), where 1 VCU = 1 metric ton of CO₂ equivalent removed or reduced.
This allowed direct benchmarking of AI-predicted credits against the credits actually issued by Verra for each project.
Results
The model achieved a Pearson correlation of 0.4967 between predicted biomass-based credits and actual issued credits. While moderate, this correlation validated the feasibility of AI-based estimation in a complex, data-limited context. Plus, it achieved comparable results to a resource consuming procedure but with a much faster, cheaper and objective method.
Figure 5: Comparison between estimated and Verra’s issued VCUs
6. Conclusions
The ESA Φ-lab project demonstrated that AI and Earth Observation data can independently and transparently estimate carbon credits, delivering a proof-of-concept and a complete data-to-credit pipeline. Results highlighted the gradual nature of biomass change, the importance of high-quality and well-aligned data, the limitations of small training datasets, and the need for ecozone-specific models. Key challenges included integrating heterogeneous datasets, ensuring fair baseline comparisons, and limited temporal coverage. Future work will focus on higher-resolution data, regional models, residual error modelling, integration of ESA BIOMASS mission data, and blockchain-based automation to build a fully decentralized, scalable verification system.