Sabine Schroeder – IntelliAQ https://intelliaq.eu Air Quality forecasting with machine learning. Thu, 24 Mar 2022 08:42:27 +0000 en-GB hourly 1 https://wordpress.org/?v=6.5.5 https://intelliaq.eu/wp-content/uploads/2020/05/cropped-intelliaq_logo_stacked_transparent-32x32.png Sabine Schroeder – IntelliAQ https://intelliaq.eu 32 32 Canonical Analysis Workflows – reproducibility and reusability on air quality data https://intelliaq.eu/2022/03/24/canonical-analysis-workflows-reproducibility-and-reusability-on-air-quality-data/ Thu, 24 Mar 2022 08:42:26 +0000 https://intelliaq.eu/?p=1816 The scientific research area suffers from a reproducibility crisis: A nature report from 2016 (https://www.nature.com/articles/d41586-019-00067-3) revealed that 70% of scientists tried to reproduce their research and failed.

Since then, a lot of effort has been undertaken to make workflows more reusable and thereby making results reproducible. With our TOAR-II (Tropospheric Ozone Assessment Report phase II) database infrastructure we are supporting this approach. It has been lifted to a new level of FAIRness (https://www.go-fair.org/fair-principles/) by integrating more of the FAIR principles through redesigning the database and related services. In addition, new concepts were developed to achieve reproducibility and reusability via standardized workflows and objects.

Canonical workflows consist of automated workflows or workflow fragments which allow for reusability of these snippets in different contexts. The development of reusable workflows and software for scientific data analysis depends on reusable data, which must be described appropriately and standardized to ensure reliable and meaningful analysis results.

We, therefore, developed a concept where we focus on two important, indispensable, and inseparable prerequisites for workflow sharing: data harmonization and documentation.

In our concept paper, we show that the necessary data harmonization for establishing online data analysis services goes much deeper than the obvious issues of common data formats, variable names, and measurement units, and we explore how the generation of FAIR Digital Objects (FDO) and Research Objects (RO) together with automatically generated documentation may support Canonical Analysis Workflows for air quality and related data. We are convinced that our experiences from the TOAR database will show that data harmonization alongside with documentation constitutes a big step towards realizing the potential of canonical workflows.

Integrating FDOs/ROs into the TOAR data ingestion workflow. For the TOAR database, an RO-Crate is used. All new data ingestions are registered as FDO packages within the RO over time. As new FDOs are created over time, the RO will be updated with a pointer to the latest FDO and associated data while earlier FDOs remain accessible. By this approach, a kind of snapshot enables the traceability of the database states.

Schröder et al., Enabling Canonical Analysis Workflows – Documented data harmonization on global air quality data, Data Intelligence Journal. 2022; in print

]]>
The TOAR database and its interfaces https://intelliaq.eu/2019/04/29/the-toar-database-and-its-interfaces/ Mon, 29 Apr 2019 08:26:00 +0000 https://intelliaq.eu/?p=1472 The foundation of the IntelliAQ project is the TOAR database, the world’s largest collection of surface observation data of ozone, ozone precursor gases, meteorological variables, selected tracers for pollution source attributions, and selected results from numerical models of the atmospheric dynamic and chemical composition. Most of these data constitute timeseries of measurements at specific point locations, the “stations”. Various researchers directly submit their data to the TOAR data centre, where they are reformatted, quality controlled, and inserted to the TOAR database. If requested by the data submitters, the reformatted and augmented files from these direct submissions will also be published in a FAIR data service, including a doi for reference in journal publications, presentations, and elsewhere. However, the majority of data in the TOAR database is not “primary data”, but a copy of data from other databases and repositories.

Particular strengths of this globally harmonized database are a unified access to the data and the application of consistent statistical methods everywhere, which make the results comparable.
The data and metrics are made available via a graphical web interface (for documentation see: https://join.fz-juelich.de/static/documentation/JOIN_FAQ.pdf ), a REST service (for documentation see: https://join.fz-juelich.de/services/rest/surfacedata/ ) and as aggregated products on PANGAEA (https://doi.pangaea.de/10.1594/PANGAEA.876108 ). A Collection of software tools to facilitate access to and processing of data can be found in the Gitlab-Repository https://jugit.fz-juelich.de/m.schultz/toar-public-utilities.

Figure 2:The JOIN Web-Interface to TOAR data showing ozone data from one station over a time span of 23 years
Figure 2:The JOIN Web-Interface to TOAR data showing ozone data from one station over a time span of 23 years
Figure 3:REST API, available parameters for surfacedataFigure 4: REST API, queries for detailed information
]]>