4. Tools and Platforms for Reproducibility and Transparency in research

4.1. R And The History of Reproducible Research

(information from the Data CampL Jupyter And R Markdown)

In his talk, J.J Allaire, confirms that the efforts in R itself for reproducible research, the efforts of Emacs to combine text code and input, the Pandoc, Markdown and knitr projects, and computational notebooks have been evolving in parallel and influencing each other for a lot of years. He confirms that all of these factors have eventually led to the creation and development of notebooks for R.

Firstly, computational notebooks have quite a history: since the late 80s, when Mathematica’s front end was released, there have been a lot of advancements. In 2001, Fernando Pérez started developing IPython, but only in 2011 the team released the 0.12 version of IPython was realized. The SageMath project began in 2004. After that, there have been many notebooks. The most notable ones for the data science community are the Beaker (2013), Jupyter (2014) and Apache Zeppelin (2015).

Then, there are also the markup languages and text editors that have influenced the creation of RStudio’s notebook application, namely, Emacs, Markdown, and Pandoc. Org-mode was released in 2003. It’s an editing and organizing mode for notes, planning and authoring in the free software text editor Emacs. Six years later, Emacs org-R was there to provide support for R users. Markdown, on the other hand, was released in 2004 as a markup language that allows you to format your plain text in such a way that it can be converted to HTML or other formats. Fast forward another couple of years, and Pandoc was released. It’s a writing tool and as a basis for publishing workflows.

Lastly, the efforts of the R community to make sure that research can be reproducible and transparent have also contributed to the rise of a notebook for R. 2002, Sweave was introduced in 2002 to allow the embedding of R code within LaTeX documents to generate PDF files. These pdf files combined the narrative and analysis, graphics, code, and the results of computations. Ten years later, knitr was developed to solve long-standing problems in Sweave and to combine features that were present in other add-on packages into one single package. It’s a transparent engine for dynamic report generation in R. Knitr allows any input languages and any output markup languages.

Also in 2012, R Markdown was created as a variant of Markdown that can embed R code chunks and that can be used with knitr to create reproducible web-based reports. The big advantage was and still is that it isn’t necessary anymore to use LaTex, which has a learning curve to learn and use. The syntax of R Markdown is very similar to the regular Markdown syntax but does have some tweaks to it, as you can include, for example, LaTex equations.

4.2. The Open Science Framework

The OSF gives you free accounts for collaboration around files and other research artifacts.

Create an account and log in at: http://osf.io/

Each account can have up to 5 GB of files without any problem, and it remains private until you make it public.

You can add other authorized users as well.

Once you are ready to make the project public, you can do so in two ways:

  • you can “freeze” the project for publication (e.g. if you want to make available exactly the data and analysis you put in a paper); this is called registration, and you can give access to just the reviewers, too.
  • you can make the entire project public, in which case anyone can browse to it if they know the URL.

You can also cut DOIs for resources so that you can put them in papers.

The OSF is archival (they have a sustainability fund that guarantees their data will remain around for something like 30 years) and many publications will accept them as a place to make data public if you use a registration as above.

4.2.1. Other things the OSF supports

OSF Meetings - free poster and presentation sharing

OSF for institutions - branded for institutions, with logins/passwords tied into your institutional login.

4.3. Dryad

The Dryad Digital Repository (aka Dryad) is a curated resource that makes the data underlying scientific publications discoverable, freely reusable, and citable. Dryad provides a general-purpose home for a wide diversity of datatypes.

Dryad’s vision is to promote a world where research data is openly available, integrated with the scholarly literature, and routinely re-used to create knowledge. They provide the infrastructure for, and promote the re-use of, data underlying the scholarly literature.

Dryad is governed by a nonprofit membership organization. Membership is open to any stakeholder organization, including but not limited to journals, scientific societies, publishers, research institutions, libraries, and funding organizations.

Publishers are encouraged to facilitate data archiving by coordinating the submission of manuscripts with submission of data to Dryad. Learn more about submission integration.

Dryad originated from an initiative among a group of leading journals and scientific societies in evolutionary biology and ecology to adopt a joint data archiving policy (JDAP) for their publications, and the recognition that easy-to-use, sustainable, community-governed data infrastructure was needed to support such a policy.

4.4. Figshare

Figshare is an online digital repository where researchers can preserve and share their research outputs, including figures, datasets, images, and videos. It is free to upload content and free to access, in adherence to the principle of open data. Figshare is a portfolio company of Digital Science, operated by Macmillan Publishers.

Figshare is a repository where users can make all of their research outputs available in a citable, shareable and discoverable manner. It allows users to upload any file format to be previewed in the browser so that any research output, from posters and presentations to datasets and code, can be disseminated in a way that the current scholarly publishing model does not allow.

4.5. Zenodo

Zenodo is derived from Zenodotus, the first librarian of the Ancient Library of Alexandria and father of the first recorded use of metadata, a landmark in library history.

The OpenAIRE project, in the vanguard of the open access and open data movements in Europe was commissioned by the EC to support their nascent Open Data policy by providing a catch-all repository for EC funded research. CERN, an OpenAIRE partner and pioneer in open source, open access and open data, provided this capability and Zenodo was launched in May 2013.

4.6. References / Sources