Logo
  • Home
  • About ESA Φ-lab CIN
  • CIN People
  • Opportunities
  • Projects
  • Φ-talks
  • News
→ THE EUROPEAN SPACE AGENCY
SatExtractor

SatExtractor

📆 Project Period
March, 2022
📍 GitHub
gitlab.esa.int

SatExtractor

Build, deploy and extract satellite public constellations with one command line.

image

About The Project

TLDR: SatExtractor gets all revisits in a date range from a given geojson region from any public satellite constellation and store it in a cloud friendly format.

The large amount of image data makes it difficult to create datasets to train models quickly and reliably. Existing methods for extracting satellite images take a long time to process and have user quotas that restrict access. Therefore, we created an open source extraction tool SatExtractor to perform worldwide datasets extractions using serverless providers such as Google Cloud Platform or AWS and based on a common existing standard: STAC. The tool scales horizontally as needed, extracting revisits and storing them in zarr format to be easily used by deep learning models. It is fully configurable using Hydra.

Getting Started

SatExtractor needs a cloud provider to work. Before you start using it, you'll need to create and configure a cloud provider account. We provide the implementation to work with Google Cloud, but SatExtractor is implemented to be easily extensible to other providers.

Structure

The package is structured in a modular and configurable approach. It is basically a pipeline containing 6 important steps (separated in modules).

Builder: contains the logic to build the container that will run the extraction.

‣
More info

Stac: converts a public constellation to the STAC standard.

‣
More info

Tiler: Creates tiles of the given region to perform the extraction.

‣
More info

Scheduler: Decides how those tiles are going to be scheduled creating extractions tasks.

‣
More info

Preparer: Prepare the files in the cloud storage.

‣
More info

Deployer: Deploy the extraction tasks created by the scheduler to perform the extraction.

‣
More info

All the steps are optional and the user decides which to run the main config file.

Prerequisites

In order to run SatExtractor we recommend to have a virtual env and a cloud provider user should already been created.

Clone the repo:

git clone https://github.com/FrontierDevelopmentLab/sat-extractor

Install python packages

pip install .

Usage

🔴🔴🔴

  • WARNING!!!!:Running SatExtractor will use your billable cloud provider services.We strongly recommend testing it with a small region to get acquainted with the process and have a first sense of your cloud provider costs for the datasets you want to generate. Be sure you are running all your cloud provider services in the same region to avoid extra costs.

🔴🔴🔴

Once a cloud provider user is set and the package is installed you'll need to grab the GeoJSON region you want (you can get it from the super-cool tool geojson.io) and change the config files.

  1. Choose a region name (eg cordoba below) and create an output directory for it:
mkdir output/cordoba
  1. Save the region GeoJSON as aoi.geojson and store it in the folder you just created.
  2. Open the config.yaml and you'll see something like this:

The important here is to set the dataset_name to <your_region_name>, define the start_date and end_date for your revisits, your constellations and the tasks to be run (you would want to run the build only one time and the comment it out.)

Important: the token.json contains the needed credentials to access you cloud provider. In this example case it contains the gcp credentials. You can see instructions for getting it below in the Authentication instructions.

  1. Open the cloud/<provider>.yaml and add there your account info as in the default provided file. (optional): you can choose different configurations by changing modules configs: builder, stac, tiler, scheduler, preparer, etc. There you can change things like patch_size, chunk_size.
  2. Run python src/satextractor/cli.py and enjoy!

See the open issues for a full list of proposed features (and known issues).

Authentication

Google Cloud

To get the token.json for Google Cloud, the recommended approach is to create a service account:

  1. Go to Credentials
  2. Click Create Credentials and choose Service account
  3. Enter a name (e.g. sat-extractor) and click Done (you may also want to modify permissions and users)
  4. Choose the account from the list and then to to the Keys tab
  5. Click Add key -> Create new key -> JSON and save the file that gets downloaded
  6. Rename to token.json and you're done!

For building the sat-extractor service, you may also need to configure the credentials used by the cloud provider commandline devkit. Permissions at the project-owner level are recommended. If using Google Cloud Platform, you can authorize the gcloud devkit to access Google Cloud Platform using your Google credentials by running the command gcloud auth login. You may also need to run gcloud config set project your-proj-name for sat-extractor to work properly.

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated. If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

Distributed under the BSD 2 License. See LICENSE.txt for more information.

Citations

If you want to use this repo please cite:

Acknowledgments

image

This work is the result of the 2021 ESA Frontier Development Lab World Food Embeddings team. We are grateful to all organisers, mentors and sponsors for providing us this opportunity. We thank Google Cloud for providing computing and storage resources to complete this work.

Logo

About ESA EO

About CIN

About Pi School

ESA Φ-lab Website

ESA Φ-lab Linkedin community

Copyright 2025 @ European Space Agency. All rights reserved.

LinkedInXGitHubInstagramFacebookYouTube
dataset_name: cordoba
output: ./output/${dataset_name}

log_path: ${output}/main.log
credentials: ${output}/token.json
gpd_input: ${output}/aoi.geojson
item_collection: ${output}/item_collection.geojson
tiles: ${output}/tiles.pkl
extraction_tasks: ${output}/extraction_tasks.pkl

start_date: 2020-01-01
end_date: 2020-02-01

constellations:
  - sentinel-2
  - landsat-5
  - landsat-7
  - landsat-8

defaults:
  - stac: gcp
  - tiler: utm
  - scheduler: utm
  - deployer: gcp
  - builder: gcp
  - cloud: gcp
  - preparer: gcp
  - _self_
tasks:
  - build
  - stac
  - tile
  - schedule
  - prepare
  - deploy

hydra:
  run:
    dir: .
@software{dorr_francisco_2021_5609657,
  author       = {Dorr, Francisco and
                  Kruitwagen, Lucas and
                  Ramos, Raúl and
                  García, Dolores and
                  Gottfriedsen, Julia and
                  Kalaitzis, Freddie},
  title        = {SatExtractor},
  month        = oct,
  year         = 2021,
  publisher    = {Zenodo},
  version      = {v0.1.0},
  doi          = {10.5281/zenodo.5609657},
  url          = {https://doi.org/10.5281/zenodo.5609657}
}