Visualizing open science with DataSeer

DataSeer was founded in June 2020 to solve a very specific problem: mandated data-sharing policies were becoming increasingly common—but compliance remained frustratingly low. Four years on, the expectation of public access to data and other raw research outputs has only increased. Both the United States and Europe have open data policies and recommendations that include monitoring and reporting.

Fortunately, DataSeer has developed a suite of solutions to help funders, publishers and institutions meet the practical challenges inherent in fulfilling policy requirements, and encouraging open science adoption. Our Open Science Metrics grant top-level insight into open practices; Compliance Checks provide in-depth reports to gauge policy compliance and help coach authors to succeed; and Open Science SnapShot offers efficient sense checks to flag potential issues and inform automated workflows.


Why is data sharing such a priority for funders?

Our founder and CEO, Dr. Tim Vines, contextualizes the problem this way: if a research story is a layer cake, the article is just the icing—the smooth and accessible upper layer, supported by everything that lies beneath. An article without underlying research outputs such as data, raw images, methods, protocols, and code, is like a hollow cake; all icing, no substance. 

And, like the hollowed-out cake, an article without the rest of the research story is also tremendously wasteful—both in terms of real cost and opportunity cost. It’s estimated that funders spend on average between $200,000 and $500,000 on each piece of research. No one benefits when the most valuable components remain hidden. Without publicly available data, research cannot be validated, reanalyzed, repurposed, or reused. Building upon the work in future research becomes more challenging, less efficient, and more expensive. For funders, a research publication that includes raw research outputs is more impactful, and offers a better return on investment.

For funders, a research publication that includes raw research outputs is more impactful, and offers a better return on investment.

In spite of the benefits associated with sharing data, it’s not surprising that mandates alone have proven insufficient to change author behavior. Expectations are often unclear. Sharing raw research outputs is a relatively new practice, and can have different meanings and applications across disciplines. Few researchers receive formal training in scientific communications, or in open science best practice. Written policies necessarily tend to use broad, inclusive language, without field- or study-specific details.


The DataSeer vision

To address the gap between policy and performance, DataSeer has developed Natural Language Processing (NLP) technologies to assess articles and identify associated datasets. Our reporting enables funders, publishers, and institutions to easily track policy compliance with different degrees of granularity, and to introduce systematic interventions to help authors successfully fulfill their data-sharing responsibilities. And we’ve gone beyond data alone, expanding our service to provide insight into a range of open science practices.


Three ways of measuring open science with DataSeer

DataSeer offers three distinct lenses through which to view open science policy compliance.


Open Science Metrics 

Get the view from the top.

An aggregate summation of open science practice across a corpus of articles—so you can benchmark, set goals, and measure progress over time. Available metrics include data-, code-, and protocol-sharing, types of data, preprint-posting, repository use, the presence of unique identifiers (e.g. RRID or ORCID), and more.

Partners like PLOS, The University of Manchester, and IOP use DataSeer’s Open Science Metrics to monitor open practices among their authors, and to understand how they stack up against their peers.

Of the new partnership Daniel Keirs, Head of Journal Strategy and Performance at IOP Publishing says, “Our partnership with DataSeer will provide an important insight into open science practices across the physical sciences and support our future efforts to help accelerate scientific discovery and promote a culture of transparency and reproducibility in scientific research.”


Compliance Checks

Let authors know exactly how to fulfill the requirements for their specific study.

A detailed, real-time, article-specific analysis that produces an actionable report tailored to your specific requirements. Each individual article Compliance Report lists all instances where data or code were generated, whether data or code were shared, and, if so, where. It even recommends the most suitable repository for each item.

Compliance Checks save countless editorial hours, and provide a personalized guide for authors describing exactly how to meet the requirements of their grant or journal. Checks can also be rerun to verify that the necessary changes have been implemented. Our partners report a noticeable improvement in results for authors who go through Compliance Checks across multiple revisions, as well as better initial performance for authors submitting a second manuscript.

For Sonya Dumanis, Deputy Director for Aligning Science Across Parkinson’s (ASAP) the most pressing issue was “how do we track, monitor and enforce open science policies?” Working together, ASAP and DataSeer “created a solution that didn’t exist…a personalized report that goes specifically to our grantees that reflects our open science policies.”


NEW: Open Science SnapShot

Gauge open science practices in seconds.

A light-touch, instantaneous, and scalable solution that lets journals scan submitted manuscripts for indicators of open science adoption—and by extension, indicators of research integrity. Short summary results can direct editorial attention to submissions that need it, or trigger efficient automated interventions or expressions of support. For example, if the authors provide data as a supporting information file, an automated message might encourage them to use a repository and provide an appropriate suggestion. If they provide data in a repository from the start, a message of thanks might generate goodwill. And when authors can see that data sharing policies are being checked and enforced, they are more likely to take those requirements seriously. We’re excited to roll this program out to our first stakeholders later this year.


Using data science to advance data sharing

Four years ago it was practically—perhaps even actually—impossible for funders, publishers, and institutions to monitor policy compliance at scale. Tasks like verifying the accuracy of each data availability statement, or validating each dataset link were simply too staff-intensive.

Today, we have ways of doing just that. Using NLP, DataSeer compares expectations and requirements against practice to accurately measure policy compliance and open science adoption. And we provide clarity and guidance to help support researchers in fulfilling their communication goals, and the requirements of their grants.

Together with partners and stakeholders like The American Naturalist, ASAP (Aligning Science Across Parkinson’s), Center for Open Science, GigaScience, IOP, PLOS, Royal Society, University of Santa Cruz, University of Manchester, UK Reproducibility Network, and others, we are using data science to increase the rate and thoroughness of data sharing, and improve research integrity.

Share This Post

More To Explore