Researchers
Project Summary
- Show how crop yield prediction using Earth Observation data presents challenges due to the diverse data modalities and the limited availability of relevant datasets, which are often proprietary or private.
- However, the performance of federated learning is significantly influenced by the number of clients and the distribution of data among them. This study investigates the impact of aggregation levels on federated learning using a proxy model trained on crop type data derived from Copernicus Sentinel-2 images.
- The analysis also includes an examination of the current and future distributions of crop yield datasets to determine the optimal aggregation levels for effective federated learning.
- Findings highlight that dataset size directly affects the learning outcomes and the degree of privacy that can be maintained.
- Differential privacy techniques are also discussed in relation to the challenges posed by varying dataset sizes.
- Working with the Φ-lab to create federated learning software for Earth Observation data.
- Produced a journal paper on the above topics, currently in review for Nature Scientific Reports Journal Federated Learning Special Issue.
Development Tools
- Copernicus Sentinel-2 Images from SentinelHub – Used as satellite imagery input into federated learning.
- ESA WorldCereal – Used as data labels for satellite imagery in federated learning.
- Crop production in EU standard humidity by NUTS 2 region from EUROSTAT – Used as data labels for satellite imagery in federated learning.
- PyTorch and other assorted libraries – Used to develop code for federated learning.
- Hugging face and Github to publish data and code.
- The Φ-lab EOHPC was used to train the federated learning model
Development Outputs
- Github storing all code and links to original unprocessed datasets - https://github.com/strath-ace/smart-dao
- Hugging face storing all pre-processed datasets - https://huggingface.co/0x365 If any, provide the list of publications developed during this collaboration using IEEE citation style (here for reference).
- Article currently in review in Nature Scientific Reports Special Issue on Federated Learning (therefore no citation yet) (https://www.nature.com/collections/jfabbbcjbg)
Project Description
Abstract: Crop yield prediction using Earth Observation data presents challenges due to the diverse data modalities and the limited availability of relevant datasets, which are often proprietary or private. Decentralised federated learning has been proposed as a solution to address these privacy concerns. However, the performance of federated learning is significantly influenced by the number of clients and the distribution of data among them. This study investigates the impact of aggregation levels on federated learning using a proxy model trained on crop type data derived from Copernicus Sentinel-2 images. The analysis also includes an examination of the current and future distributions of crop yield datasets to determine the optimal aggregation levels for effective federated learning. The findings highlight that dataset size directly affects the learning outcomes and the degree of privacy that can be maintained. Differential privacy techniques are also discussed in relation to the challenges posed by varying dataset sizes. Please find attached in the email, the submitted publication and the images within. This has been only submitted so is not to be shared outside of what is required for this report. I worked with Nicolas Longépé and other members of the Φ-lab. I gained invaluable skills for EO as well as Machine Learning and was able to produce code for decentralised federated learning for EO, something that I believe had not been done before within the Φ-lab. I was also able to assist with topics for distributed systems, decentralised technologies such as those related to Web3 and federated learning. All in the context of EO and disaster response with EO. I was able to assist as an “expert” on Web3 in meetings from the Φ-lab with other companies relating to Web3.