Gaps in the measurement series of atmospheric pollutants can impede the reliable assessment of their impacts and trends. We propose a new method for missing data imputation of the air pollutant tropospheric ozone by using the graph machine learning algorithm “correct and smooth”. This algorithm uses auxiliary data that characterize the measurement location and, in addition, ozone observations at neighboring sites to improve the imputations of simple statistical and machine learning models. We apply our method to data from 278 stations of the year 2011 of the German Environment Agency (Umweltbundesamt – UBA) monitoring network. The preliminary version of these data exhibits three gap patterns: shorter gaps in the range of hours, longer gaps of up to several months in length, and gaps occurring at multiple stations at once. For short gaps of up to 5 h, linear interpolation is most accurate. Longer gaps at single stations are most effectively imputed by a random forest in connection with the correct and smooth. For longer gaps at multiple stations, the correct and smooth algorithm improved the random forest despite a lack of data in the neighborhood of the missing values. We therefore suggest a hybrid of linear interpolation and graph machine learning for the imputation of tropospheric ozone time series.
Clara Betancourt has completed her dissertation within the IntelliAQ project and successfully defended her thesis at the University of Bonn. Her thesis “Mapping and Interpolation of Tropospheric Ozone Data with Machine Learning Methods” develops spatio-temporal mapping and interpolation methods using machine learning techniques with the example application of ozone data. It trains the machine learning models on a large number of ozone measurements available in the Tropospheric Ozone Assessment Report (TOAR) database.
The synthesis of this work is that an interplay of physically sound data selection, uncertainty quantification, and explainability in machine learning can produce trustworthy environmental data products. Another finding is that the accuracy of the data products in a specific region is mainly dependent on good coverage with ozone measurements in that region. Therefore, this work contributes not only to the gapless quantification of ozone concentrations but also to trustworthy machine learning in the environmental sciences.
We are proud to announce that Lukas Hubert Leufen has completed his dissertation within the IntelliAQ project and successfully defended his thesis at the University of Bonn. His thesis “Time Filter Assisted Deep Learning to Predict Air pollution” builds on a time series filtering approach to split up long-term and short-term variations and uses several deep learning networks to accurately predict ground-level ozone air pollution. The neural networks have been trained on large amounts of data from air quality monitoring stations distributed across Central Europe, climatological statistics on air pollutants and meteorological data from numerical weather models. The deep learning models have been integrated into a well-defined workflow for training and validation called MLAir, which ensures the reproducibility of the findings. Results substantiate that the combination of sophisticated DL architectures and time series filtering enables accurate ozone predictions, which are superior to state-of-the-art numerical modelling results.
We are very proud to present our abstracts:
- END-TO-END PROCESS ORCHESTRATION OF EARTH OBSERVATION DATA WORKFLOWS WITH APACHE AIRFLOW ON HIGH PERFORMANCE COMPUTING
- MULTIMODAL SELF-SUPERVISED LEARNING FOR BOOSTING CROP CLASSIFICATION USING SENTINEL2 AND PLANETSCOPE
at IGARSS 2023 in Pasadena, California.
More Information will follow at https://2023.ieeeigarss.org/index.php
IntelliAQ is sponsoring two workshops in Cologne in early March 2023. The “IntelliAQ workshop on machine learning for air quality” (https://indico3-jsc.fz-juelich.de/event/68/) aims to bring together researchers from the air quality and machine learning communities for discussion of recent research progress and future priorities. Machine learning (ML) is rapidly gaining momentum as a new toolbox for analysing atmospheric data. While there are now several workshops, fora and conferences to discuss ML applications in the weather and climate domain, discussions on ML applications for air quality remain fragmented. The ERC project IntelliAQ has explored several modern ML concepts for air quality research and we would like to engage in a discussion with the international community about the potential and limitations of ML in this field. The second workshop, immediately following the first one, is a “Tropospheric Ozone Assessment Report (TOAR-II) science workshop” (https://indico3-jsc.fz-juelich.de/event/69/). IntelliAQ supports the development of the TOAR data infrastructure and draws on data and scientific insights from this global initiative.
JSC, the Otto von Guericke University of Magdeburg, and the Technical University of Munich jointly organised a workshop on “Transformers for Environmental Science” on September 22 and 23, 2022. The workshop was co-sponsored by the ERC grant IntelliAQ and brought together about 40 participants in Magdeburg and up to 20 additional online participants, who discussed the potential of this new AI technology for environmental applications. The program included lectures on recent advances in transformer architectures and transfer learning as well as on prototype developments focusing largely on atmospheric research and remote sensing. Keynote presentations were given by Peter Düben (ECMWF), Pedram Hassanzadeh (Rice University, Houston), Duncan Watson-Parris (Oxford University), Lucas Beyer (Google Brain) and Jonathan Godwin (Google Deepmind). A poster session and panel discussion provided opportunities for an exchange of ideas. Large transformer models require huge amounts of data and constitute an attractive application for accelerated supercomputers such as JUWELS Booster. Within the atmorep compute time project, first steps towards training such a model for atmospheric research are being taken.
Lukas H. Leufen, Felix Kleinert (both FZ Jülich and University of Bonn) and Martin G. Schultz (FZ Jülich) have published their latest research results of the study “Exploring decomposition of temporal patterns to facilitate learning of neural networks for ground-level daily maximum 8-hour average ozone prediction” in the Journal Environmental Data Science. The study shows how the accuracy of deep neural networks for forecasting ground-level ozone can be improved by splitting long-term and short-term weather patterns. The article is available at https://www.doi.org/10.1017/eds.2022.9 .
Felix Kleinert et al. submitted their article “Representing chemical history in ozone time-series predictions – a model experiment study building on the MLAir (v1.5) deep learning framework” to the Journal Geoscientific Model Development. It is now available as preprint for public discussion and review at https://gmd.copernicus.org/preprints/gmd-2022-122/ until July 6th 2022.
Scarlet Stadtler, Clara Betancourt (FZ Jülich) and Ribana Roscher (University of Bonn) published their study on “Explainable Machine Learning Reveals Capabilities, Redundancy, and Limitations of a Geospatial Air Quality Benchmark Dataset” in the Machine Learning and Knowledge Extraction Journal. In their study, they gained insights into the AQ-Bench dataset on air quality using explainable machine learning. The article is available at http://dx.doi.org/10.3390/make4010008.
The results achieved so far within the IntelliAQ project have been summarised in the article “Artifical intelligence for air quality“. It has been published in The Project Repository journal, ISSN 2632-4067, volume 12, January 2022, pp. 70-74.