Earth Observation & Artificial Intelligence for hydro-meteorological hazard modelling

📆 Project Period	December 2023 - December 2025
👤 CIN Visiting Researcher	Patrick Ebel

Project Summary

An initial activity provided a novel framework for forecasting storm surges by using neural networks to combine sparse in situ tide gauge data with coarse ocean and atmospheric reanalysis data.
A related project introduces the Global Flood Forecasting (GFF) dataset, a new resource for training and benchmarking machine learning models to predict flood inundation. The key outcome is a publicly available, comprehensive platform enabling the development and evaluation of flood forecasting solutions, with a special focus on (near-)coastal regions. This work aims to bridge the gap between weather prediction and post-event flood mapping by enabling direct, ahead-of-time forecasting of flood extent.
A final project proposes a method for improving the ERA5 precipitation product by using graph-based neural networks. The outcome is a global, post-processed precipitation dataset with improved accuracy, which is crucial for various climate and hydrological studies.

Development Tools

The research utilised a variety of computational tools. Spatio-temporal data was managed using NetCDF files with the xarray and dask Python libraries for efficient processing. The deep learning models, including LSTM, ConvLSTM, and transformer-based architectures like MaxVIT U-Net and FiLM U-TAE, were trained using the ADAM optimiser.
For dataset and benchmark curation on global flood forecasting, a variety of tools and datasets were used. Data processing and management were handled using rasterized TIFF files. The deep learning models, including LSTMs, 3DConv U-Nets, and vision transformers. The project also utilised several existing datasets and tools for data acquisition and preprocessing, including ESA's SNAP toolbox, NASA's HydroSAR package, and various Copernicus and ERA5 datasets.
The precipitation research utilises ERA5 and MSWEP as the base data. The core of the development is a graph-based neural network model, which is used for the post-processing. The implementation is based on PyTorch.

Development Outputs

Project Description

Map of in situ data points and exemplary cyclone best track estimates in 2024-19. The global distribution shows clear a geospatial bias in favor of well-monitored over developing countries.

The first study focuses on the short-to-medium range forecasting of storm surges and extreme sea levels. This is an important issue as coastal flooding and storm surges are among the most serious natural hazards affecting lives and infrastructure, and their risk is growing under climate change (sea-level rise, more intense storms). While operational storm surge forecasting solutions are available, their application is limited for ungauged sites where no tide‐gauge measurements exist, and achieving global, dense spatial coverage is hard with conventional numerical modelling or assimilation systems. The project addresses this gap by exploring how deep neural networks can implicitly assimilate sparse in situ tide-gauge data together with coarse atmospheric and ocean state reanalysis to produce densified forecasts of storm surge globally, including at previously unseen locations.

To do this the work introduces a novel global dataset spanning multiple decades at hourly resolution. This dataset pairs: (a) in situ tide‐gauge measurements drawn from the GESLA-3 collection (3,553 locations globally) after preprocessing (detrending, harmonic decomposition, de-noising); (b) atmospheric reanalysis from ERA5 (mean sea level pressure, 10 m winds); and (c) outputs from a coarse global tide+surge ocean model, the Global Tide and Surge Model (GTSM) forced by ERA5 meteorology, via the Copernicus Climate Data Store. The methodology is to formulate a spatio-temporal deep network: for each sample the proposed approach takes as input a time series of multiple channels (in situ gauge values where available, reanalysis atmosphere and ocean state) across a local spatial patch and forecast, in order to forecast for a given lead time a dense output map of surge residual at that patch plus a coarse ocean state output. A key novelty is the “densification” step: even though only sparse gauges exist, the network uses convolutional architectures and 1×1 convolution “broadcasting” to fill in predictions at unobserved pixels (ungauged locations), encouraged by auxiliary supervision on the coarse model output and dropout of some in situ gauge inputs to force learning of extrapolation. The work compares several architectures (LSTM, ConvLSTM, vision transformer backbone MaxVIT U-Net, and a temporal attention network called U-TAE conditioned on lead time via FiLM, i.e., feature-wise linear modulation) and benchmark against baselines (seasonal averages, linear extrapolation, GTSM numerical model output). The experimental setup uses both a “hyperlocal” evaluation (forecasting at previously unseen gauges but using the gauge in input) and a “densification” evaluation (forecasting at gauges not included in input at all, i.e., ungauged locations).

The key outcomes are that the best network (FiLM U-TAE) achieves around 0.158 m MAE (mean absolute error) in the hyperlocal setting (i.e., ~16 cm) and around 0.190 m MAE in the densification (ungauged) setting, outperforming the GTSM baseline (which exhibited much larger error and variance). The ablation studies reveal that including both the coarse ocean model (GTSM) input and the ERA5 atmospheric input improves performance; that including the dropout of in situ gauges (to force extrapolation) is important; and that longer input sequences (e.g., T = 18 or 24 h) slightly help, but with diminishing returns. The spatial error analysis shows that tropical regions, e.g., the Caribbean, Gulf of Mexico, Indian Ocean, are more challenging (higher errors), reflecting more extreme surge dynamics and under-representation in training data.

The take-home message is that it is feasible to use neural networks to implicitly assimilate sparse in situ gauge data together with coarse reanalysis to produce

dense, global

storm surge forecasts — even at previously ungauged coastal locations. This suggests a path forward for operational surge forecasting that is more globally inclusive (benefiting under‐served coastal communities) than traditional methods reliant on dense instrumentation or local numerical modelling. The dataset and code are made publicly available, which paves the way for further research. Future work may incorporate satellite altimetry, move from retrospective reanalysis to real‐time forecasting, and transition from surge predictions to flood inundation maps.

Schematic illustration of the two-stage global-local processing pipeline for flood inundation forecasting. Temporal information is provided on a coarse scale, and forcing a local modelling.

The second study is relevant because floods remain among the most common and damaging natural hazards worldwide, especially in coastal and near-coastal regions where sea-level rise, more extreme storms, and rapid urbanisation increase risk and vulnerability. While there has been major progress in weather forecasting and satellite-based flood mapping, these capabilities are seldom tied together in a way that enables forecasting flood inundation extent ahead of time. Moreover, existing datasets and benchmarks are largely lacking for this core forecasting task, particularly at global scale and for the (near)-coastal context. The absence of such data and standardised evaluation hampers development of machine-learning models for flood-extent forecasting, and this study aims to fill that gap by offering a new dataset & evaluation framework.

In terms of data and methodology, the core contribution is the GFF dataset: a multimodal, multitemporal global dataset focused on (near)-coastal flood events, covering 298 regions of interest (ROIs) around the world (2014-2020) that span six continents and 13 climate zones. Each ROI includes inputs such as atmospheric reanalysis from ERA5 and ERA5-Land, basin attributes from HydroATLAS, topography (via CopDEM30) and drainage-related maps (Height Above Nearest Drainage, HAND), synthetic-aperture radar (SAR) images from Sentinel-1 (pre-flood) and derived flood-extent labels at event time. The labels are created by ensembling state-of-the-art rapid-mapping models on SAR imagery, with careful post-processing to remove artefacts and ensure quality. The dataset defines two benchmark tracks: (1) general global flood-extent forecasting, and (2) specifically a coastal versus near-coastal/inland separation to highlight distinct dynamics. The evaluation uses a 5-fold cross-validation split where level-4 basins are used to partition ROIs so that test regions are hydrologically distinct from training ones, thereby stressing generalisation. For benchmarking, a two-stage baseline architecture is proposed: a ‘context network’ that processes coarse spatio-temporal inputs, whose embedding is fed into a ‘local network’ along with high-resolution local data to output a binary flood segmentation map conditioned on lead time (in days) via FiLM (feature-wise linear modulation). Several backbone architectures (LSTM, 3DConv-UNet, MaxViT-UNet, temporal attention U-TAE) and logistic-regression as a simple baseline are tested, evaluating performance using F1 score for flooded vs background pixels.

The key outcomes of the study show that on the global track, the best performing model (U-TAE) achieved an average F1 ≈ 0.77 (±0.04) overall, with strong performance on background class (~0.97) but more modest on the flooded class (~0.57) illustrating the challenge of predicting the minority flooded pixels. The benchmark also showed that performance is higher in coastal regions (F1 ≈ 0.80) than in near-coastal/inland regions (F1 ≈ 0.76). Logistic regression performed substantially worse (≈ 0.66), suggesting the value of spatio-temporal neural architectures. These results underscore that while forecasting flood-extent maps ahead of time is feasible, there remains significant room for improvement—especially for flooded-pixel detection, and for non-coastal or less-monitored regions.

The main contributions of the paper are the first large-scale dataset and evaluation benchmark explicitly aimed at ahead-of-time flood-extent forecasting (not just flood mapping) at global scale and with coastal focus. By making data, code, and models publicly available under CC0 license, it’s hoped to accelerate research in this important area. Models must handle multimodal data (weather, terrain, drainage, satellite), multiscale resolution differences, and generalise across very diverse geographies. It should be noted that the dataset uses re-analysis rather than real-time forecast forcings (which may over-estimate performance when deployed operationally), and that the flood labels while high-quality are derived via automated mapping rather than full manual annotation—so both model development and operational translation will require further work. Overall, the study lays a foundation for future work in machine-learning-based flood forecasting, and highlights that bridging the gap between weather/meteorology and actionable inundation-map prediction is both timely and essential for disaster-risk reduction.

Schematic illustration of the graph neural network architecture for precipitation post-processing. Component-wise, sufficient statistics are learned via a likelihood loss, weights via another loss.

The third activity post-processes global precipitation fields for bias correction purposes. Total precipitation is a key variable of the weather state, accumulated over a given period. Beyond their direct relevance, high-quality precipitation data are of importance for driving downstream applications in hydrology, e.g. river streamflow and runoff forecasting. However, common measurements of precipitation are either precise but sparse (as for in-situ recordings) or global but uncertain (as for spaceborne observations). Though reanalysis products such as ECMWF’s ERA5 provide a best estimate of the state of the atmosphere, the quality of their total precipitation reconstruction is imperfect. Following reports that ERA5 is prone to overestimating the occurrence of drizzle at the cost of underestimating extreme precipitation, prior work explored data-driven models for local post-processing to address the latter. However, the local models employed in preceding work do not easily extend to a global post-processing setup and an exclusive emphasis on outliers limits the ability to represent the full distribution of precipitation intensity, which limits their relevance.

This work proposes a novel approach for precipitation post-processing which models the entire globe in a single forward pass and models dryness, light rain and heavy rain alike. The post-processor is based on a graph neural network architecture, trained on decades of gauge-calibrated multi-source weighted estimates of precipitation. We demonstrate that our model learns to bias-correct ERA5 total precipitation information and consistently improves upon the baseline while maintaining its global applicability.

Publications

P. Ebel, B. Victor, P. Naylor, G. Meoni, F. Serva, and R. Schneider, "Implicit Assimilation of Sparse In Situ Data for Dense & Global Storm Surge Forecasting," in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2024, pp. 471-480.
B. Victor et al., "Off to new Shores: A Dataset & Benchmark for (near-) coastal Flood Inundation Forecasting," Adv. Neural Inf. Process. Syst., vol. 37, pp. 114797-114811, 2024.
P. Ebel, L. Magnusson, and R. Schneider, "Global post-processing of ERA5 precipitation product via graph-based neural networks," presented at the EGU General Assembly 2025, Vienna, Austria, Apr. 27-May 2, 2025, Art. no. EGU25-6965. DOI: 10.5194/egusphere-egu25-6965.