Data management and sharing: from policy to practice
In today’s high tech world, vast amounts of data are generated, which must be stored and managed in an appropriate way. Marta Teperek, Research Data Facility Manager at the Cambridge University Library/Research Operations Office, outlined The University’s perspective on data management and sharing: from policy to practice at a seminar hosted by Dataracks and Cambridge Cleantech.
The University of Cambridge is a complicated entity, with researchers fragmented across the city. Each laboratory generates large quantities of active research data, which is managed by different organisations within the university, each with its own data management team. The University Information Services (UIS) provides support for active research data as it is generated and modified by researchers, whereas the Cambridge University Library is the home for the final archive data at the end of research project. The library ensures that this data is properly collated, preserved and made available for sharing.
Data management and sharing demands
Cambridge, in general, is considered a public environment, and the only restriction imposed on researchers is the centralised data management and sharing support required by funding organisations. However, research data resulting from publically-funded projects is considered a public good, and must be openly available to others. Archive data must be accurately described and understandable by others, and should be deposited in suitable – ideally discipline specific – repositories, so that there is a persistent link between publications and the supporting data. A particular challenge is the need to store research data for at least 10 years, depending on the funder’s policy and the type of study. Data from longitudinal studies, for example, may need to be stored indefinitely, while other funders may request data storage for 10 years from the date it was last accessed.
Implementation of data policies can have knock-on effects that leave researchers with unanswered questions: What is research data? Which data should be shared? Who owns it? At Cambridge, the primary IP ownership usually lies with the researcher, but if a third-party contract is involved, specific data sharing terms and conditions may be imposed. This can lead to a conflict of interest where a project is co-funded by a public body, such as Research Councils UK, and an industry partner, with one party wanting to share data and the other feeling that this would put them at a competitive disadvantage. There is a fear that the UK’s data sharing policies may discourage non-UK institutions from entering into collaborations, and it is not always possible to plan data sharing agreements in advance, for example if another partner is engaged once the study has begun. This may lead to a difference of opinion at a later date. The obligation to share data also creates a risk of bad quality data being released into the public domain, without peer review.
Data must be stored in a sustainable manner for many years – which might cost more than the project itself – and should be available for anybody to download, anywhere in the world. Describing the data properly can be resource intensive, and there is concern about the numerous data formats used; if the software becomes obsolete, the data will be useless. Funders recommend sharing the software with the data, but this is not possible with proprietary software. Access control must be well managed, particularly when personal or sensitive data is involved, and personal data anonymised without losing important information that prevents the data being related to its original source.
A sea change for data sharing
Successful data sharing requires a robust infrastructure and cultural change within the community, with researchers judged by the quality of their data, its management and sharing, and not just the number of high impact publications. It requires open discussions involving all stakeholders, and there is a role for institutions, commercial partners and national bodies. Data sharing should be rewarded, and researchers should embrace change and develop community standards for sharing. Policies for data management and sharing are very aspirational, but a lot of unanswered questions remain.