Project Summary
- This project is a part of Giga, a global initiative to connect every school to the internet by 2030
- To achieve this ambitious goal, government agencies and connectivity providers require accurate school internet connectivity data to better estimate the costs of digitally connecting schools and to plan the strategic allocation of their financial resources
- My work investigates the use of open-source satellite imagery, electricity transmission network information, and internet speed test data to predict the internet connectivity status of schools, evaluating our models in five pilot countries: Bosnia and Herzegovina, Belize, Botswana, Guinea, and Rwanda
- My work also investigates the creation of our multi-modal, freely-available satellite imagery and survey information dataset, leverages the latest geographically-aware location encoders, and presents the first results of using a new geographically-aware foundational model to predict internet connectivity in Botswana and Rwanda
- We also investigate the use of MapBox Mobility data in our ML feature space to support school mapping
Tools Used
- Python was used for all code development
Development outputs
- Research code internet connectivity Github repository: https://github.com/kelsdoerksen/giga-connectivity
- Research code MapBox Mobility analysis Github repository: https://github.com/kelsdoerksen/giga-mobility
- IEEE format: K.Doerksen*, I. Tingzon*, and D.Kim. “AI-powered school mapping and connectivity status prediction using Earth Observation.” ICLR 2024 Machine Learning for Remote Sensing Workshop.
Project Description
Our work targets the United Nations' Sustainable Development Goal (SDG) 4 of Quality Education: Ensuring inclusive and equitable quality education and promote lifelong learning opportunities for all. According to a joint report by UNICEF and the International Telecommunication Union (ITU), approximately two-thirds of the world’s school-age children do not have access to the internet (Diallo, 2020). Not only does the lack of internet connectivity limit children’s opportunity to access online learning resources, but it also prevents them from developing the digital skills needed to thrive in today’s modern economy. Worldwide, schools provide critical online learning infrastructure to communities, and the digital divide between lower-income countries and more developed nations exacerbates already existing inequalities, causing children to fall even further behind. In response to these challenges, UNICEF and ITU jointly launched Giga, a global initiative to connect every school to the internet by 2030. To achieve this ambitious goal, government agencies and connectivity providers require accurate and complete school location and internet connectivity data to better estimate the costs of digitally connecting schools and to plan the strategic allocation of their financial resources. However, while governments generally have comprehensive records of schools within their national register, the corresponding geographical coordinates and internet connectivity status information are often incomplete, inaccurate, invalid, or completely non-existent, especially in developing nations. Recent advances in artificial intelligence (AI) and Earth Observation (EO) have led to promising new opportunities to fill data gaps in education infrastructure. In support of the Giga initiative, we leverage machine learning and remote sensing data to accelerate school mapping and enable internet connectivity prediction. We investigate the use of open-source satellite imagery, electricity transmission network information, and internet speed test data to predict the internet connectivity status of schools, evaluating our initial methodology in five pilot countries: Bosnia and Herzegovina, Belize, Botswana, Guinea, and Rwanda.
Following our initial investigation in our pilot countries, we develop a multi- modal, freely-available satellite imagery and survey information dataset, leverage the latest geographically-aware location encoders, and introduce the first results of using a new geographically-aware foundational model to predict internet connectivity in Botswana and Rwanda. Our work showcases a practical approach to support data-driven digital infrastructure development in low resource settings, leveraging freely available information, and provide cleaned and labelled datasets for future studies to the community.
Dataset development
For our dataset, we leverage open-source, satellite-based measurements. Taking each location with connectivity information in our dataset as the center point, we extract a 1,000 m radius extent of high-resolution satellite data from GEE including MODIS landcover (Sulla-Menashe & Friedl), VIIRS Nightlight (Elvidge et al.), Global Human Modification (Kennedy et al., 2019), Gridded Population of the World (CIESIN, 2018), and Global Human Settlement Layer (M. & Panagiotis, 2023) data products using the airPy data processing package. We used a subset of the official school dataset from Project Connect containing school connectivity information as our ground truth labels. We explore the use of vector embeddings extracted from the location encoders of three pre-existing models as inputs into our ML classifiers for connectivity prediction, namely SatCLIP, GeoCLIP and CSP, and our newly developed PhilEO Very High Resolution (VHR) Pre-cursor model with embedding sizes of 256, 512, 256, and 1024, respectively. Each location encoder used in our study is trained with a different dataset to explore performance differences between imagery sources.
Methodology
We model school connectivity as a binary classification task wherein we classify each school, represented by an n-featured vector (where n ranges on the number of one-hot encoded administrative boundaries we include as features for each country), based on its internet connectivity status (yes/no). We leverage shallow ML classifier architectures including Random Forest (RF), Gradient Boosting (GB), Support Vector Machines (SVM), Logistic Regression (LR), Extreme Gradient Boosting (XGB) and the Multi-Layer Perception (MLP) neural network. We selected these models over more complex deep learning architecture due to the small size of our dataset and the history of superiority of tree-based models on tabular data over deep learning.
Outcomes
Our work investigates, for the first time, ML for internet connectivity prediction in schools. We provide a new, multi-modal geospatial dataset and feature generation pipeline openly-available to the community, which can easily extend this study to other countries. We highlight the performance differences using hand-crafted features compared to geographically-aware location encoders, and show that incorporating auxiliary school information greatly improves predictive capabilities. We showcase the early results from the PhilEO Very High Resolution (VHR) Pre-cursor model, and showcase that freely available data can act as a sufficient starting point with Machine Learning for these types of socio-economic applications. Our work was selected to be included in the ICLR 2024 Machine Learning and Remote Sensing Workshop.