Learned quantization and its benefits for multimodal deep learning for Earth observation

May 27 2026 14:00 CEST

Live on Microsoft Teams.

On May 27 at 14:00 CEST, the ESA Φ-Lab Collaborative Innovation Network will host a new Φ-talk. Details are below.

Meet the speaker

Johannes Jakubik is a Research Scientist within the AI for Climate Impact team at IBM Research Europe. In his role, he is leading projects on multimodal generative modeling like TerraMind and am co-leading IBM Research's activities on deep learning for planetary observations. He especially focus on pretraining and scaling multi-modal deep learning models in collaborations with ESA, NASA, and within the EU Horizon program. In addition, Johannes co-leads work at the intersection of deep learning and quantum graph optimization. His work on deep learning for planetary observations and weather modeling has been awarded with the NASA Agency Group Award, NASA Marshall Space Flight Center Honor Award, several IBM accomplishment awards, and was featured in international and national media. As part of Johannes’s work, he is fortunate enough to co-supervise several exceptionally bright Ph.D. students at ETH Zurich.

Talk abstract

Earth observation (EO) combines heterogeneous data sources spanning modalities, scales, and physical processes, posing fundamental challenges for multimodal deep learning. This talk argues that learned quantization provides a unifying representation layer for scalable and physically meaningful EO foundation models. I first revisit TerraMind [1], an any‑to‑any generative multimodal model that leverages discrete token spaces to align nine geospatial modalities and enable strong downstream performance. I then show how quantization generalizes beyond images in Quantizing Space and Time [2], where learned tokenization of weather time series and imagery supports task‑agnostic fusion and cross‑modal generation. Building on this, TerraFlow [3] introduces temporal objectives over quantized representations, enabling robust multitemporal learning from irregular EO sequences. I further present Phaedra [4], a high‑fidelity tokenizer for physical data that factorizes morphology and amplitude to preserve dynamic range and physical consistency. Finally, I outline early results connecting quantized multimodal representations with open‑vocabulary segmentation and visual prompting. Together, these works position learned quantization as a central abstraction for multimodal EO intelligence.

Reading list:

[1] https://arxiv.org/pdf/2504.11171

[2] https://arxiv.org/pdf/2510.23118

[3] https://arxiv.org/pdf/2603.12762

[4] https://arxiv.org/pdf/2602.03915

Register here!