DataSeer powers code sharing audit at PLOS Computational Biology

Code sharing rates (% of research articles that share code) of PLOS Computational Biology articles, based on the time of publication. Sharing rates hover around 60% between 2019 and Q1 2021, then rise to 85%.

PLOS journals have been long time leaders in promoting Open Science, so we jumped at the opportunity to help them quantify code sharing at their journal PLOS Computational Biology. PLOS mandated code sharing in PLOS CB in March 2021, so they were keen to a) have historical data on the proportion of author sharing code prior to the introduction of the policy, and b) quantify how code sharing changed in the year after the policy came into force.

The results have just appeared in an Editorial with Lauren Cadwallader – Open Research Manager at PLOS – as lead author.  She also describes the work in this video.

To conduct the audit DataSeer gathered articles published in PLOS CB between January 2019 and March 2022, and evaluated 2 criteria. First, did the authors generate any shareable code? This step is crucial to ensure that the proportion has the correct denominator: articles that do not generate any shareable code cannot be expected to share it, and so calculating the proportion based on the total number of articles assessed will underestimate the true proportion of articles sharing their code.

Articles using any command line program (such as R) or a programming language (such as Python) or conducting any sort of simulation were assumed to have written custom scripts that could be shared. Since all modern theoretical work makes use of programs like Mathematica or MatLab, we also counted articles that presented equations or describing models as ‘generated code’. Perhaps unsurprisingly, over 99% of articles in PLOS Computational Biology were assessed as generating shareable code of some sort.

We next assessed whether any code had been shared, either on a public repository like GitHub or – less commonly – as a supplemental text file. We parsed the Data Accessibility section for mentions of code sharing, and assessed the titles of the Supplemental files for clues that the file contained executable scripts. The code we developed for this work is available on Figshare, and the dataset is there too.

Please get in touch with us if you’ve got similar questions about your own articles!

Share This Post

More To Explore