Walking the talk – reflections on working ‘openly’

As part of Open Access Week 2016, the University of Cambridge Office of Scholarly Communication published a series of blog posts on open access and open research. In this post, Dr Lauren Cadwallader discusses her experience of researching openly. Earlier this year I was awarded the first Annual Research grant to carry out a proof-of-concept study looking at using altmetrics as a way of identifying journal articles that eventually get included into a policy document. As part of the grant condition I am required to share this work openly. “No problem!” I thought, “My job is all about being open. I know exactly what to do.” However, it’s been several years since I last carried out an academic research project and my previous work was carried out with no idea of the concept of open research (although I’m now sharing lots of it here!). Throughout my project I kept a diary documenting my reflections on being open (and researching in general) – mainly the mistakes I made along the way and the lessons I learnt. This blog post summarises those lessons. To begin at the beginning I carried out a PhD at Cambridge not really aware of scholarly best practice. […]

ReScience: ensuring that the original research is reproducible

Reproducibility is a cornerstone of science: the results obtained by researcher A must be identical to the results obtained by researcher B provided they follow identical protocols and use identical reagents. In reality, multiple factors can lead to irreproducible results. They include poor training of researchers in experimental design; increased emphasis on making provocative statements rather than presenting technical details; and publications that do not report basic elements of experimental design. Therefore, the initiatives working on the reproducibility issues are indispensable for the scientific progress. We are happy to present this guest post by Nicolas Rougier from ReScience – a peer-reviewed journal that targets computational research and encourages the explicit replication of already published research, promoting new and open-source implementations in order to ensure that the original research is reproducible. The ReScience initiative In March 2015, Nicolas Rougier and his colleagues published a commentary into the “Frontiers in Computational Neuroscience” journal that highlighted the difficulties they encountered when trying to replicate a model from the literature. Sources were not available on a public repository (they needed to be requested from one of the author), code was not under version control, there were some factual errors and ambiguities in the description […]

When Counting is Hard: the Making Data Count project

This is a guest post by Jennifer Lin, project manager for the Making Data Count project. Originally published here. Counting is hard. But when it comes to research data, not in the way we thought it was (example 1, example 2, example 3. The Making Data Count (MDC) project aims to go further – measurement. But to do so, we must start with basic counting: 1, 2, 3… uno, dos, tres… MDC is an NSF-funded project to design and develop metrics that track and measure data use, “data-level metrics” (DLM). DLM are a multi-dimensional suite of indicators, measuring the broad range of activities surrounding the reach and use of data as a research output. Our team, made up of staff from the University of California Curation Center at California Digital Library, PLOS, and DataONE, investigated the validity and feasibility of using metrics by collecting and investigating the use of harvested data to power discovery and reporting of datasets that are part of scholarly outputs. To do this, we extended Lagotto, an open source application, to track datasets and collect a host of online activity surrounding datasets from usage to references, social shares, discussions, and citations. During this pilot phase we […]

How doing Open Science has helped advance my career

Last week we sent details of how to win $1,000 in The Winnower open science writing competition. This week we bring you a blog post from Bastian Greshake, one of the participants in the competition. Bastian’s story shows how supporting open genetic data access had a lasting impact on his academic career, contributed to lots of new skills, led to winning awards and helped him find jobs and collaborators. Bastian Greshake, co-founder of OpenSNP. What Have I Done?! There are many firm believers in the different kinds of openness: open access, open source, open data, open science, open you-name-it. And at least to me, some of the most interesting things happen at the intersection of those different opens. Which probably is where openSNP – the project I co-founded in 2011 – can be located. It’s an open source project which tries to crowdsource collecting open genetic data. This is done by enabling people to donate their personal genetic information into the public domain, alongside phenotypic annotations. And for good measure we also factor in open access, by text mining the Public Library of Science and other open databases for primary literature. What started as a somewhat freakish idea in 2011 has by mid–2015 […]

DNAdigest symposium: summary

Last Friday, 21/08, Wayra hosted the DNAdigest symposium “Incentives for data sharing”. On a hot summer Friday in London, we asked the attendees: “How can we create incentives for data sharing in genomics research?” Despite the summer break, we had excellent speakers and a very engaged audience. Both attendees and speakers applauded the great quality of the discussions and the cosy atmosphere of the event. Morning session In the first part, Natalie Banner from the Wellcome Trust, Neil Walker from the University of Cambridge and Shahid Hanif from the Association of the British Pharmaceutical Industry presented multiple prospectives on data sharing. Natalie Banner made it clear that the objective of the Wellcome Trust as a funder is to maximise the benefits for health and society, and gaining the best possible impact of their funding for research. Best possible impact also means maximising data use and utility for reuse, which is why the EAGDA is investigating best practices for data sharing. Natalie’s presentation is available here. Neil Walker presented how funder policies can be difficult to implement for the individual researcher and shared many anecdotes on data sharing and how he uses data management plans to outline for funders how data will […]

YAAC conference

YAAC – Leveraging Open Science to facilitate interdisciplinary cancer research

The Young Alliance Against Cancer (YAAC) held its second conference on May 22-23 in Copenhagen. Fiona Nielsen attended the conference and had the organisers Benito Campos and Lars Rønn Olsen answer some of her questions. Fiona: The Young Alliance is lead by a group of young cancer researchers, but how and why did the Young Alliance start? Benito: It all started out in 2011 when a group of us (Aaron, Lars, Kunal, Benito) realised that cancer research is so complex and a large challenge for young researchers, and the narrow focus of each researcher brings a risk of doing redundant work because you are missing developments in the field. For instance we saw that publicly available data and tools from our lab could easily be unknown by the researchers in the next lab. We decided that it is necessary to help the young researchers know what data and tools are available. For just a small effort on increasing knowledge sharing, we could see a potential for making a big impact. We also saw the need for interdisciplinary knowledge exchange, so we decided to create an organisation to bring together young scientists with diverse backgrounds but with a common interest in […]

The culture of scientific research in the UK

The Nuffield Council on Bioethics. Ethics is concerned with what is good and what is bad for individuals and society. Bioethics is a branch of ethics studying the issues arising from the biological and medical sciences. In 1991, the Nuffield Foundation established the Nuffield Council on Bioethics as an independent body that examines and reports on ethical issues in biology and medicine. Since 1994 it has been funded jointly by the Nuffield Foundation, the Wellcome Trust and the Medical Research Council. The Council has achieved an international reputation for advising policy makers and stimulating debate in bioethics. The reports of the Nuffield Council on Bioethics cover multiple topics including public health, research in developing countries, animal research, biofuels, genetically modified crops in developing countries, neonatal medicine, emerging biotechnologies and so on. The main characteristic of these reports is their impartiality: they are based on exhaustive research work conducted under the supervision of independent renowned researchers. An unfortunate fact is that most of these reports are quite lengthy and use “high Academian” – a language not easily understood by people without research experience and scientific background. At DNAdigest we pay close attention to research reports published by the Nuffield Council on Bioethics. Today we present the highlights […]

Genomic Data

Genomic Data Sharing – Ethical and Scientific Imperative

This is a guest blog post writen by Mahsa Shabani (@Mahsashabani). Genomic data sharing has become an ethical and scientific imperative in the recent years. Funding organizations, research institutes and journals among others, endorsed the significance of data sharing practices to the progress of research and an optimal use of community resources. Consequently, researchers all around the world are extensively involved in the data sharing process, ranging from data production to data use. As sharing practices do involve individuals’ data, the associated ethical and legal concerns should receive thorough attention in order to respect individuals’ rights and maintain public trust. Sharing data via controlled-access public databases has been seen as an answer to the identified concerns at the moment. Data Access Committees (DACs) constructed locally or in a central fashion control access to these datasets according to defined criteria. Evaluating the qualification/eligibility of data users, ethical and scientific grounds of proposed uses and oversight on downstream data uses are considered as the main responsibilities of DACs. While the structure, membership and procedure of access review vary across DACs, some similarities in approaches and mechanisms are observed. A requirement of preparing a summary of data use and signing a data access agreement […]

Centre for Open Science

The Reproducibility Project: Cancer Biology

The Reproducibility Project: Cancer Biology has continued to make steady progress over the last few months.Since December, they have published four new Registered Reports with eLife, and one more has been accepted and on the way.Now that these protocols and analyses plans have been reviewed, the replication experiments themselves can begin. All of the protocols, analyses, and data are freely available on the Open Science Framework (OSF). In total, eleven replications have begun or are poised to begin in the coming weeks.You can keep track of theReproducibility Project progress for all these Registered Reports and all of the rest of the 50 studies included in the project on the Open Science Framework.Take a look at their most recent Science Exchange blog post and read the completed description of their progress so far.

DNAdigest interviews Free the Data – Part 2

As promised last week, we are publishing the second part of the so intriguing interview with Sharon Terry on the Free the Data Project. Last week you learned about the aims and goals of the project and what Sharon did to support and launch it. What about the future plans? Here it is, enjoy every bit of the interview as well as all the interesting video. Did you miss the first part of the interview? Read the first part of the interview with FreeTheData Sharon F. Terry, President and CEO of Genetic Alliance 4. Who is the intended audience for your campaign and how are you reaching out to them? Free the Data’s target audience is ultimately the public. As we move into an era of genomic medicine, it’s essential that we pool as many of our resources as possible to understand genetic variation and its effect on human health – and who has more information to offer than the men and women who have these mutations? It’s important to me that consumers understand the value of shared genetic data, and that they have the tools they need to share their data if they choose… whether this means simply knowing that a […]

Data Sharing Game

Research Data Sharing Game

As we all know the reuse of research data definitely benefits the scientific community as a whole, but the decision whether to archive and share these data or not depend primarily on individual researchers. For individuals, it is less obvious that the advantages of sharing data outweigh the associated costs, i.e. time and money. In this sense, the problem of data sharing is like a typical game in interactive decision theory, more commonly known as game theory. By definition, game theory is a study of mathematical models of conflict and cooperation between intelligent rational decision-makers. An obvious assumption herein is that an individual will always try to maximize his or her gains relative to the gains of others. In the paper “A Research Data Sharing Game” Pronk et al create a framework in order to investigate the community gains versus the advantages of the individual researcher in the competitive world of scientific research.  For the analysis, they have designed a simple model of a scientific community where researchers publish a certain amount of papers in a given year and have the choice either to share or not. Via this model, the effect of sharing policies, exploration of several cost scenarios, […]


Hamza Wahid presented his DNAdigest research project

Nuffield Research Placements (previously Nuffield Science Bursaries) provide over 1,000 students each year with the opportunity to work by the side of professional scientists, technologists, engineers and mathematicians. And this is exactly how our team met Hamza, a Sixth Form student at the Perse School in Cambridge, with an interest in molecular biology and genetics, studying Biology, Chemistry, Double Maths and Philosophy. Over the summer Hamza worked at DNAdigest as a part of the Nuffield Student Research Placement on the Genomic Data Sharing Project where his main task was assisting with the ongoing User Interaction Research. The aim of the project was to investigate how people working with human genetics access, use and store genetic data, how they share it or make it publicly available. During his work with our team, Hamza managed to successfully complete 9 face-to-face interviews with people that work with human genomic data from various different fields as well as help out with the completion of an online survey which was also important for the project. On the 23rd of October, Hamza presented a poster on the Genomic Data Sharing project at the Nuffield Celebration Event Gold CREST Awards, where Nuffield students report how their experience […]

Pistoia Alliance

Webinar: Genomics, Pharmaceutical R&D and Healthcare

As various genomics initiatives have promised to revolutionize healthcare for over 20 years, it is important to ask whether these have had the impact they were expected to make, and how we might take greater advantage of the technology available. The Pistoia Alliance, a global, not-for-profit alliance of life science companies, vendors, publishers, and academic groups that work together to lower barriers to innovation in R&D, is hosting a ‘Pistoia Alliance Debates’ webinar which will look at whether the economic, technical and regulatory barriers have been addressed, and how to overcome any that may remain. The webinar will see Gordon Baxter, CSO at Instem, chair a panel comprised of experts in the field including Abel Ureta-Vidal, CEO at Eagle Genomics, Fiona Nielsen, CEO at DNA Digest, Dan Housman, Director at ConvergeHealth by Deloitte, and Etzard Stolte, former CIO of the Jackson Laboratory. As well as exploring the remaining obstacles to greater adoption of genomics technology in healthcare, the panel will look at the success of recent genomics initiatives, the impact they have had, and will consider what future genomic innovation may bring to the industry. Following the discussion, webinar attendees will be able to ask questions of the panel. You […]

Code for Genomic and Health-Related Data Sharing

The sharing of scientific, genomic and health-related data for the sake of research is of a fundamental importance in order to provide continuous progress in our understanding of human health and wellbeing. While collaboration for data sharing is increasingly embraced by policymakers and the international biomedical community, we still lack a common ethical and legal framework to connect regulators, funders, consortia, and research projects to facilitate genomic and clinical data linkage, global science collaboration, and responsible research conduct. Such framework will definitely assist in the progress of global science and responsible research conduct. This is why BioSHaRE researchers in collaboration with P3G, the Global Alliance for Genomics and Health, IRDiRC (International Rare Diseases Research Consortium), H3Africa and other organizations started to work on the development of an International Code of Conduct for Genomic and Health-Related Data Sharing. This international code will give us the guidance on how to responsibly share genomic and health-related data. It also pushes for better access to the shared data, knowledge, and resources in presently under-served regions. Discussions on the topic had started back in 2013 and are currently continuing. The Code is built around a set of foundational principles and guidelines. It: interprets the right […]


DNAdigest Symposium 2014 Summary

This past weekend, DNAdigest organized a Symposium on the topic “Open Science in human genomics research – challenges and inspirations”. The event brought together very interested in the topic and enthusiastic people along with the DNAdigest team. We are very pleased to say that this day turned out to be a success, where both participants and organizers enjoyed the amazing talks of our speaker and the discussion sessions. The day started with a short introduction on the topic by Fiona Nielsen. Then our first speaker, Manuel Corpas was a source of inspiration to all participants, talking us through the process he experienced in order to fully sequence the whole genomes of his family and himself and to share this data widely with the whole world.  Here is a link to the presentation he introduced on the day. The Symposium was organized in the format of Open Space conference, where everybody got to suggest different topics related to Open Science or choose to join one which sounds most interesting. Again, we used HackPad to take notes and interesting thoughts throughout the discussions. You can take a look at it here. We had three more speakers invited to our Symposium: Tim Hubbard (slides) talked about how Genomics […]

Best practices for Genomic analysis

Nowadays, rare genetic variants begin to be discovered more and more often. And still no clear guidelines for distinguishing disease-causing sequence variants from the many potentially functional variants present in any human genome are available. Without accurate standards an acceleration of false-positive reports of causality is at a high probability, therefore obstructing the translation of genomic research findings into clinical diagnostics setting and hinder biological understanding of disease. So what are the best practices for genomic analysis? In the paper Guidelines for investigating causality of sequence variants in human disease D. G. MacArthur et al discuss the primary challenges of assessing sequence variants in human disease, integrating both gene-level and variant-level support for causality and introduce guidelines for summarizing in variant pathogenicity and highlight several areas that require further resource development. For us the most interesting part of the paper is the emphasis on the value of sharing sequence and phenotype data from clinical and research samples to the fullest possible extent. D. G. MacArthur team recognises that many investigators and research funders look at data sharing as a moral and professional imperative, nevertheless, sharing of sequence data among testing laboratories has often been blocked, so that many potentially pathogenic […]


Open Science in human genomics research

UPDATE: only few tickets left – do not forget to register This November 22nd, DNAdigest is organizing a collaborative symposium. The topic of the event will be “Open Science in human genomics research – challenges and inspirations”. It will take place at the Future Business Centre, Cambridge. You can take a look at this map for directions. At this upcoming collaborative symposium, we will introduce topics like open science, access to sequencing data, privacy concerns around human genomic data, etc., and the schedule of the day will be prepared as a combination of short presentations from invited speakers followed by interactive discussion groups. Join us at the Symposium by signing up here.   You can look forward to inspirational talks to spur excitement and discussions: Manuel Corpas, will talk about how he as a citizen scientist has crowdfunded and crowdsourced the analysis of his personal genome. Linda Briceno, will share her thoughts on legal and ethical implications of data sharing in genomics. Nick Sireau, will talk about how scientists and patients can engage in collaborations, and how Open Science may be either beneficial or challenging in this context. Tim Hubbard, will present how Genomics England is engaging the research community in the 100k […]

research data

Giving research data the credit it’s due

Guest post by Sarah H Carl (@sarahhcarl) In many ways, the currency of the scientific world is publications. Published articles are seen as proof – often by colleagues and future employers – of the quality, relevance and impact of a researcher’s work. Scientists read papers to familiarize themselves with new results and techniques, and then they cite those papers in their own publications, increasing the recognition and spread of the most useful articles. However, while there is undoubtedly a role for publishing a nicely-packaged, (hopefully) well-written interpretation of one’s work, are publications really the most valuable product that we as scientists have to offer one another? As biology moves more and more towards large-scale, high-throughput techniques – think all of the ‘omics – an increasingly large proportion of researchers’ time and effort is spent generating, processing and analyzing datasets. In genomics, large sequencing consortia like the Human Genome Project or ENCODE  were funded in part to generate public resources that could serve as roadmaps to guide future scientists. However, in smaller labs, all too often after a particular set of questions is answered, large datasets end up languishing on a dusty server somewhere. Even for projects whose express purpose is […]