Logo
  • Home
  • About ESA Φ-lab CIN
  • CIN People
  • Opportunities
  • Projects
  • Φ-talks
  • News
→ THE EUROPEAN SPACE AGENCY
Machine Learning for Floods

Machine Learning for Floods

📆 Project Period
October, 2023
📍 GitHub
gitlab.esa.int
image

WorldFloods 2.0

In this notebook we will show the data available in the WorldFloods data repository and how it is derived.

Overview

In this notebook we will:

  1. Examine a specific flood event
  2. Build Flood MetaData and Floodmap using
    • area of interest (AOI)
    • observed flood event
    • hydrography areas
    • hydrography lines
  3. Retrieve Sentinel-2 and JRC Permanent Water for a Flood AOI

1 - Flood Events

Copernicus Emergency Management Service (EMS) provides warnings and risk assessment of floods and other natural disasters through geospatial data derived from satellite imagery. It provides data before, during, and after events in Rapid Mapping format for emergencies and Risk and Recovery format for prevention. In WorldFloods we are primarily interested in the satellite derived imagery with a focus on fluvial, or river, floods.

Copernicus EMS Rapid Activations

The Copernicus EMS Activation Mappings for individual events given by unique EMSR codes may be accessed here. Linked to each EMSR code per severe event in the Rapid Activations table is the individual webpage for the EMSR code featuring vector zip files from various sources. Each vector zip file may contain multiple products with respect to an event. For this notebook, we will walk through a single EMSR flood event from the alert code, to its associated url.

We may retrieve a pandas DataFrame format of the EMS Rapid Activations table using the function <code>table_floods_ems</code> from the activations module in ml4floods/src/data. In this function you may specify how far back in time you would like to retrieve flood emergency mappings for. Be careful when choosing dates prior to 23 June 2015 when Sentinel-2 was launched into space.

from pyprojroot import here
import sys
import os
root = here(project_files=[".here"])
sys.path.append(str(root))
from src.data.copernicusEMS import activations
from src.data import utils
from pathlib import Path

Let's take a look at the flood events that have occurred since the start of 2021.

table_activations_ems = activations.table_floods_ems(event_start_date="2021-01-01")
table_activations_ems
image

EMSR 501: Flood in Shkodra, Albania

During heavy rains 4000 hectares of land was affected by flooding impacting 200 people. We take this event as an example to show how we retrieve and ingest data as an entry point to the preprocessed data used in the WorldFloods machine learning pipeline.

Retrieve the Zip File URLs for a given EMSR Code

The Copernicus EMS Activation Mapping URL location for the January 6th flood event in Shkodra, Albania can be fetched using the activation code "EMSR501" and the function <code>fetch_zip_files</code> from <code>activation.py</code> which outputs the url locations as a list of strings. Each of these zip file urls allow us to retrieve the vector shapefiles for different areas of interest within a single activation code.

emsr_code = "EMSR501"
zip_files_activation_url_list = activations.fetch_zip_file_urls(emsr_code)
zip_files_activation_url_list

Notice that this particular EMSR flood code has several area of interest (AOI) files. These files are present in the ml4cc_data_lake Google Cloud Storage bucket repository in their unzipped raw format as well as their zipped format. They can be found in the following locations:

  • zipped files: gs://ml4cc_data_lake/0_DEV/0_Raw/WorldFloods/copernicus_ems/copernicus_ems_zip
  • unzipped files: gs://ml4cc_data_lake/0_DEV/0_Raw/WorldFloods/copernicus_ems/copernicus_ems_unzip

Saving the Zip Files and Unzipping Locally

If you would like to save the zip files locally, you may do so by downloading them directly from ml4cc_data_lake or by downloading the zip files directly from the Copernicus EMS rapid activations. Since this notebook goes over data ingestion from start to finish, we explain the process for both cases. To save the files locally we do the following:

2 - Building EMSR Flood Metadata and Floodmaps

Following the scraping and downloading of Copernicus EMS Rapid Mapping Products in, we unzipped the files in our local file directory or to the Google Cloud Storage Bucket.

Once unzipped, multiple .shp files can be associated with a single EMSR Flood Activation Code. These files represent different activation layer data available as Copernicus EMS rapid mapping products. For more details with respect to each of the layers see below.

The layers of particular value to WorldFloods are the Metadata (A), Crisis (B), and Base Data (D) layers. More specifically, we are interested retrieving:

  • A: Source and The Area of Interest (AOI)
  • B: Observed Event data for Floods
  • D: Hydrography areas and lines indicative of lakes and rivers

With these data we can create a composite floodmask over an AOI. We will later obtain Sentinel-2 images associated with an AOI which we will overlay with a floodmap layer derived from A, B, and D Copernicus EMS products.

These shapefiles will be used to compose a floodmap for use in the generation of ground truth labels to be used in a supervised learning model pipeline.

2a - Process Shapefiles

In function filter_register_copernicusems we check that all the .shp follow the expected conventions with respect to timestamp and data availability.

2b - Populate Copernicus EMSR Metadata Dictionary

filter_register_copernicusems upon extracting the source, AOI, observed event, and hydrography information, will populate a dictionary with keys associated with the source data and hold the filenames and paths of specific .shp files as values.

2c - Generate Floodmap

We then process the .shp files area of interest, observed event, and the hydrography into a single geopandas.GeoDataFrame object using generate_floodmap. This information will be uploaded to the WorldFloods data bucket for new events.

import numpy as np
np.unique(floodmap.source)
import matplotlib.pyplot as plt
# initialize figure
fig, ax = plt.subplots(figsize=(16,16))
floodmap[floodmap["source"] == "flood"].plot(ax=ax, facecolor="None", edgecolor="blue", label="Flood Maps", linewidth=0.8)
ax.set(
    title="Flood"
)
plt.show()
image
import matplotlib.pyplot as plt
# initialize figure
fig, ax = plt.subplots(figsize=(16,16))
floodmap[floodmap["source"] == "hydro_l"].plot(ax=ax, facecolor="None", edgecolor="blue", label="Flood Maps", linewidth=0.8)
ax.set(
    title="Hydrography"
)
plt.show()
image
import matplotlib.pyplot as plt
# initialize figure
fig, ax = plt.subplots(figsize=(16,16))
floodmap[floodmap["source"] != "hydro"].plot(ax=ax, facecolor="None", edgecolor="blue", label="Flood Maps", linewidth=0.8)
ax.set(
    title="Floodmap"
)
plt.show()
image
metadata_floodmap["area_of_interest_polygon"]
image

3 - Sentinel-2 and JRC Permanent Water Imagery Using Google Earth Engine

Sentinel-2 Image Retrieval Using Google Earth Engine

To retrieve Sentinel-2 images from Google Earth Engine, make sure you create an account and authenticate prior to running the cell below. The link to sign up is here.

To download Sentinel-2 images we will be using the module ee_download and interact with the map using geemap.eefolium.

We use the AOI polygons retrieved from CopernicusEMS to select the location and time that we are interested in and use Google Earth Engine to do the georeferencing. Notice that we may also render the hydrography and observed flood event.

from datetime import timedelta
from datetime import datetime
import geopandas as gpd
import pandas as pd
import ee
import geemap.eefolium as geemap
import folium
from src.data import ee_download, create_gt
Found 8 S2 images between 2021-02-12T16:32:21 and 2021-03-04T16:32:21
file_path = "gs://ml4cc_data_lake/0_DEV/1_Staging/WorldFloods/S2/EMSR501/AOI01/EMSR501_AOI01_DEL_MONIT01_r1_v1.tif"
import numpy as np
import rasterio
from rasterio import plot as rasterioplt
with rasterio.open(file_path) as src:
    # read image
    image = src.read()
    # convert to RBG Image
    rgb = np.clip(image[(3,2,1),...]/3000.,0,1)
    # plot image
    rasterioplt.show(rgb, transform=src.transform)
image
Logo

About ESA EO

About CIN

About Pi School

ESA Φ-lab Website

ESA Φ-lab Linkedin community

Copyright 2025 @ European Space Agency. All rights reserved.

LinkedInXGitHubInstagramFacebookYouTube
from tqdm import tqdm
folder_out = f"{root}/datasets/Copernicus_EMS_raw/{emsr_code}"
utils.create_folder(folder_out)

unzip_files_activation = []
for zip_file in tqdm(zip_files_activation_url_list):
    local_zip_file = activations.download_vector_cems(zip_file,
                                                      folder_out=folder_out)
    unzipped_file = activations.unzip_copernicus_ems(local_zip_file,
                                                     folder_out=folder_out)
    unzip_files_activation.append(unzipped_file)
# folder_out=f"Copernicus_EMS_raw/{activation}"
folder_out = f"{root}/datasets/Copernicus_EMS_raw/{emsr_code}"

code_date = table_activations_ems.loc[emsr_code]["CodeDate"]

registers = []
for unzip_folder in unzip_files_activation:
    metadata_floodmap = activations.filter_register_copernicusems(unzip_folder, code_date)
    if metadata_floodmap is not None:
        floodmap = activations.generate_floodmap(metadata_floodmap, folder_files=folder_out)
        registers.append({"metadata_floodmap": metadata_floodmap, "floodmap": floodmap})
        print(f"File {unzip_folder} processed correctly")
    else:
        print(f"File {unzip_folder} does not follow the expected format. It won't be processed")

metadata_floodmap 

{'event id': 'EMSR501_AOI01_DEL_PRODUCT_r1_VECTORS_v1_vector',
 'layer name': 'EMSR501_AOI01_DEL_PRODUCT_observedEventA_r1_v1',
 'event type': 'Flash flood',
 'satellite date': Timestamp('2021-02-12 16:32:21+0000', tz='UTC'),
 'country': 'NaN',
 'satellite': 'Sentinel-1',
 'bounding box': {'west': 19.238301964000073,
  'east': 19.710555657000043,
  'north': 42.095451798000056,
  'south': 41.873487114000056},
 'reference system': {'code space': 'epsg', 'code': '4326'},
 'abstract': 'NaN',
 'purpose': 'NaN',
 'source': 'CopernicusEMS',
 'area_of_interest_polygon': <shapely.geometry.polygon.Polygon at 0x7fe63eb86fd0>,
 'observed_event_file': 'EMSR501_AOI01_DEL_PRODUCT_observedEventA_r1_v1.shp',
 'area_of_interest_file': 'EMSR501_AOI01_DEL_PRODUCT_areaOfInterestA_r1_v1.shp',
 'ems_code': 'EMSR501',
 'satellite_pre_event': 'Open Street Map',
 'timestamp_pre_event': Timestamp('2021-02-12 00:00:00+0000', tz='UTC'),
 'hydrology_file': 'EMSR501_AOI01_DEL_PRODUCT_hydrographyA_r1_v1.shp',
 'hydrology_file_l': 'EMSR501_AOI01_DEL_PRODUCT_hydrographyL_r1_v1.shp'}
ee.Initialize()

bounds_pol = activations.generate_polygon(metadata_floodmap["area_of_interest_polygon"].bounds)
pol_2_clip = ee.Geometry.Polygon(bounds_pol)


x, y = metadata_floodmap["area_of_interest_polygon"].exterior.coords.xy
pol_list = list(zip(x,y))
pol = ee.Geometry.Polygon(pol_list)

date_event = datetime.utcfromtimestamp(metadata_floodmap["satellite date"].timestamp())

date_end_search = date_event + timedelta(days=20)

img_col = ee_download.get_s2_collection(date_event, date_end_search, pol)
permanent_water_img= ee_download.download_permanent_water(date_event, pol_2_clip)


n_images_col = img_col.size().getInfo()
print(f"Found {n_images_col} S2 images between {date_event.isoformat()} and {date_end_search.isoformat()}")
Map = geemap.Map()

imgs_list = img_col.toList(n_images_col, 0)
for i in range(n_images_col):
    img_show = ee.Image(imgs_list.get(i))
    Map.addLayer(img_show.clip(pol_2_clip),
                 {"min":0, "max":3000, "bands":["B4","B3","B2"]},f"S2 {i}", True)
    Map.addLayer(permanent_water_img, name="permanent water")

geojson_geojson = "geojson_show.geojson"
geojson_filepath = f"{root}/datasets/Copernicus_EMS_raw/{geojson_geojson}"
floodmap.to_file(geojson_filepath, driver="GeoJSON")
Map.add_geojson(geojson_filepath, name="FloodMap")

Map.centerObject(pol)
folium.LayerControl(collapsed=False).add_to(Map)
Map