Final Report of the IntelliAQ Projekt
The IntelliAQ Grant project was active from October 2018 to September 2023 and aimed to explore novel deep learning methods for the analysis of air quality data. An important part of the project was the development of a suitable data infrastructure to make global air quality data easier accessible for machine learning tasks. For this, we teamed up with the international Tropospheric Ozone Assessment Report (TOAR) initiative. Five successful years ended with the following results:
1. Data Infrastructure Developments:
Core Trust Seal Certification:
The TOAR database has achieved Core Trust Seal certification, signifying its trustworthiness and commitment to open and FAIR (Findable, Accessible, Interoperable, and Reusable) data principles.
Recognition and Awards:
The TOAR data infrastructure received recognition by winning the second prize in the German “Open Data Impact Award” competition and it has been short-listed for the “Falling Walls” competition finals in 2022.
Extended Global Air Quality Data Repository:
The TOAR database was enhanced by including a large set of meteorological variables from the ERA5 reanalysis of ECMWF, thus adding the capability for meaningful attribution of air quality trends.
Upcoming TOAR Assessment Report:
The TOAR database has been complemented with web-based analysis services to support the upcoming second TOAR assessment report (planned for 2025).
2. Machine Learning Developments:
Spatial Mapping and Spatiotemporal Interpolation:
- Betancourt et al., 2023 present a novel method using a combination of random forest and a graph network for filling temporal gaps in time series and imputing data at missing locations.
- Betancourt et al., 2022 develop a global, high-resolution mapping of tropospheric ozone – explainable machine learning and impact of uncertainties
- Stadtler et al., 2022 show that explainable machine learning reveals capabilities, redundancy, and limitations of a geospatial air quality benchmark dataset
- Betancourt et al., 2021 provide the AQ-Bench dataset, thus contributing a first formal benchmark for machine learning on air quality data
Time Series Forecasting:
- Leufen et al., 2021 present MLAir, a comprehensive tool for fast and flexible machine learning on air data time series including domain-specific validation methods. MLAir forms the basis of all other works listed here.
- Leufen et al., 2023 develop a deep learning model for ozone point forecasts up to four days into the future, which demonstrates smaller errors and faster execution compared to conventional air quality ensemble models.
- Kleinert et al., 2022 investigate the impact of representing chemical history in ozone time-series predictions.
- Leufen et al., 2022 explore decomposition of temporal patterns to facilitate learning of ground-level daily maximum 8-hour average ozone prediction by neural networks.
- Falco Weichselbaum, Master thesis, University of Bonn, 2022, Deep Neural Network techniques for Weather Forecasting
- Kleinert et al., 2021 exhibit a first deep neural network to predict near-surface ozone concentrations at hundreds of stations in Germany with IntelliO3-ts v1.0
- Vincent Gramlich, Master thesis, University Cologne, 2021, Deep learning methods for forecasting of extreme ambient ozone values
Video Prediction Methods + AtmoRep
- Ji et al., 2022 document CLGAN: a generative adversarial network (GAN)-based video prediction model for precipitation nowcasting.
- Gong et al., 2022 explore temperature forecasting by deep learning methods from video prediction with the SAVP model showing best results.
- Schultz et al., 2021 ask the big question “Can deep learning beat numerical weather prediction?” and conclude that this may become possible. Only two years later, three AI-based weather forecasting models show better skill than the world’s best operational forecasting systems.
- Hußmann, S., Master thesis, Humboldt University Berlin, 2019, Deep Learning for Future Frame Prediction of Weather Maps
3. Software Developments:Software Developments:
- MLAir, or Machine Learning on Air data, is a software environment designed to simplify and accelerate the exploration of new machine learning (ML) models, particularly shallow and deep neural networks. It focuses on the analysis and forecasting of meteorological and air quality time series, addressing scientists with backgrounds in either meteorology or ML. Unlike abstract workflows, MLAir is developed in tandem with real scientific questions, making it more accessible to researchers. The software aims to cater to both meteorologists and ML experts, acknowledging the growing interest in neural networks and ML methods within weather and air quality research. Despite numerous internet resources on how to build ML models, newcomers often face challenges to adapt these concepts to their research. MLAir tackles these hurdles, making it easier for scientists from both domains to quickly initiate ML applications. MLAir can be deployed on various computing environments, from desktop workstations to high-end supercomputers, with or without GPUs.
- A number of deep learning models from video prediction were applied to weather forecasting. Novel deep learning architectures, including Generative Adversarial Networks (Wasserstein GAN) and U-Nets, were tested for temperature and precipitation forecasts. IntelliAQ collaborated with the EuroHPC project MAELSTROM to advance the use of deep learning in meteorological applications.
- The knowledge gained during IntelliAQ enabled us to team up with strong partners to develop AtmoRep, the world’s first foundational model of the atmosphere (Lessig et al., 2023).
5. Research output:
a) Workshops and Collaboration:
- IntelliAQ sponsored and co-organized several major international workshops:
- A dedicated IntelliAQ workshop to discuss the potential of deep learning for air quality research and applications
- A workshop of the TOAR initiative to obtain feedback on and improve the TOAR data infrastructure and plan the second TOAR assessment
- A workshop on “Transformers for the environmental sciences” (Magdeburg, September 2022)
- The first workshop on “Large-scale deep learning for the Earth system” (Bonn, September 2023)
b) International Presence:
- IntelliAQ achievements were presented at several international conferences, showcasing the global impact of the project.
c) Publication Output:
- The reporting period resulted in a substantial contribution to the scientific community, with a total of 50 publications reflecting the progress made.
The IntelliAQ project pushed the frontiers of deep learning applications for air quality and weather and contributed to the recognition of the revolutionary potential of machine learning for research and operations in these fields. This was possible, because the ERC grant enabled the formation of a strong team of researchers with expertise in both deep learning and atmospheric sciences.