Publication Date:
2022-05-26
Description:
Presented at USGS Data Management Working Group, 9, November 2020
Description:
At domain-specific data repositories, curation that strives for FAIR principles often entails transforming data submissions to improve understanding and reuse. The Biological and Chemical Oceanography Data Management Office (BCO-DMO, https://www.bco-dmo.org) has been adopting the data containerization specification of the Frictionless Data project (https://frictionlessdata.io) in an effort to improve its data curation process efficiency. In doing so, BCO-DMO has been using the Frictionless Data Package Pipelines library (https://github.com/frictionlessdata/datapackage-pipelines) to define the processing steps that transform original submissions to final data products. Because these pipelines are defined using a declarative language they can be serialized into formal provenance data structures using the Provenance Ontology (PROV-O, https://www.w3.org/TR/prov-o/). While there may still be some curation steps that cannot be easily automated, this method is a step towards reproducible transforms that bridge the original data submission to its published state in machine-actionable ways that benefit the research community through transparency in the data curation process. BCO-DMO has built a user interface on top of these modular tools for making it easier for data managers to process submission, reuse existing workflows, and make transparent the added value of domain-specific data curation.
Description:
NSF #1924618
Keywords:
Data Curation
;
Provenance
;
Workflows
;
Frictionless Data
;
Data management
;
Data repository
Repository Name:
Woods Hole Open Access Server
Type:
Presentation
Permalink