Publication Date:
2016-08-30
Description:
As the amount of scientific data continues to grow at ever faster rates, the research community is increasingly in need of flexible computational infrastructure that can support the entirety of the data science life cycle, including long-term data storage, data exploration, and discovery services, and compute capabilities to support data analysis and reanalysis as new data is added and scientific pipelines are refined. The authors describe their experience developing data commons-interoperable infrastructure that collocates data, storage, and compute with common analysis tools. Across the presented case studies, several common requirements emerge, including the need for persistent digital identifier and metadata services, APIs, data portability, pay-for-compute capabilities, and data peering agreements between data commons. Although many challenges, including sustainability and developing appropriate standards remain, interoperable data commons bring us one step closer to effective data science as a service for the scientific research community.
Print ISSN:
1521-9615
Electronic ISSN:
1558-366X
Topics:
Computer Science
,
Natural Sciences in General
,
Technology
Permalink