Motivation
Artificial Intelligence (AI) is experiencing a second wave of enthusiasm, since ground-breaking results have been published on cognitive problems such as image and speech recognition, automated language translation, robotics, and strategic games. This has become possible because of recent advances in massive data processing capabilities (“Big Data”) and the development of deep learning (DL) machine learning architectures, which “learn” millions of parameters. Both, in turn, leverage the development of computer hardware, i.e. storage capacity and bandwidth, but also the development of massive parallel computing on specialized hardware, such as graphical processing units (GPUs).
The fundamental concept of machine learning (ML) dates back to the 1940s, and first algorithms and applications were built in the 1980s. Environmental scientists have also experimented with machine learning approaches early on, and good use has been made of classical ML concepts like random forest or decision trees. Several studies tried to use neural networks, but often the results of these studies have not been very convincing: small improvements (if at all) over plain statistical approaches together with loss of explainability prevented wide-spread adaptation of neural networks to environmental data analysis.
Air quality is determined by several factors including air pollutant emissions, chemical transformations, transport processes, and weather. To analyse and understand air quality data and assess changes in air pollution levels, all of these factors must be taken into account. Air pollutant concentrations exhibit complex, time-dependent spatial patterns. It can therefore be expected that complex DL architectures and comprehensive datasets are needed when we want to use AI for the analysis of air quality and build air quality predictions based on modern machine learning.
The IntelliAQ project aims at exploitation of new DL methods for the analysis of environmental data, and more specifically, global air quality measurements and analyses. It leverages from the large data collection effort at the Jülich Supercomputing Centre during the first phase of the Tropospheric Ozone Assessment Report (TOAR) and the build-up of substantial DL expertise and resources at Forschungszentrum Jülich.
Objectives
The project applies Deep Learning for Interpolation, Prediction, and Quality Control of Air Quality Data. We aim to:
- fill observation gaps in space and time,
- provide short-term forecasts of air quality, and
- assess the quality of air pollutant information from diverse measurements.
Thereby IntelliAQ will shift the analysis of global air pollutant observations to a new level and provide a basis for the future development of innovative air quality services with robust scientific underpinning. The IntelliAQ project aims to improve the global knowledge about global air pollution through further extending the TOAR data collection efforts and by applying modern artificial intelligence tools to the analysis of air quality and related weather and geospatial data.
The specific project objectives are:
- to develop novel spatial and temporal interpolation methods using deep neural networks in order to expand the coverage of historic and recent data while preserving fine-scale structures down to the street level,
- to develop an innovative air quality forecasting concept based on deep learning,
- to explore the use of deep neural networks to assess the quality of air pollution data and establish new, robust techniques for automated outlier detection and data screening.
In addition to the main objectives the project aims for
- Expanding the collection of surface ozone measurements.
- Contributing to the development of new deep learning techniques through the application and analysis of strengths and weaknesses of current methods and neural network architectures.
Data
The foundation of this project is the world’s largest collection of surface air quality measurements, which was recently assembled by the IntelliAQ principal investigator Dr. Martin Schultz of Forschungszentrum Jülich. This dataset has played a pivotal role in the first comprehensive Tropospheric Ozone Assessment Report (TOAR). For the first time ever, more than sixty air quality agencies, research organisations, and individuals contributed multi-year time series of ground-level ozone measurements at over 10,000 locations to one central data hub. This allowed TOAR to perform consistent statistical analyses on all of these data and evaluate over 30 different air quality metrics and their trends in time. Nevertheless, in spite of this effort, large gaps remain on the world map, where no air quality observations have been made, or from where they could not be integrated into the TOAR analysis.
Air quality observations are complemented with high-resolution geospatial data sets to provide information about surface orography, population density, and emission sources of air pollutants. To capture air pollutant transport patterns and meteorological factors, we also use information from numerical weather prediction models, specifically the ERA-5 reanalysis of the European Centre for Medium Range Weather Forecasts (see https://www.ecmwf.int/en/forecasts/datasets/reanalysis-datasets/era5) and from a COSMO model reanalysis performed at the University of Bonn and the German weather service (Bollmeyer et al., 2015). Furthermore, it is planned to also look at satellite data products measuring air pollutant column densities from space.
Reference: Bollmeyer, C. et.al.: Towards a high-resolution regional reanalysis for the European CORDEX domain, Q. J. R. Meteorol. Soc. 141: 1–15, January 2015 A DOI:10.1002/qj.2486