Month: September 2015

When Counting is Hard: the Making Data Count project

This is a guest post by Jennifer Lin, project manager for the Making Data Count project. Originally published here. Counting is hard. But when it comes to research data, not in the way we thought it was (example 1, example 2, example 3. The Making Data Count (MDC) project aims to go further – measurement. But to do so, we must start with basic counting: 1, 2, 3… uno, dos, tres… MDC is an NSF-funded project to design and develop metrics that track and measure data use, “data-level metrics” (DLM). DLM are a multi-dimensional suite of indicators, measuring the broad range of activities surrounding the reach and use of data as a research output. Our team, made up of staff from the University of California Curation Center at California Digital Library, PLOS, and DataONE, investigated the validity and feasibility of using metrics by collecting and investigating the use of harvested data to power discovery and reporting of datasets that are part of scholarly outputs. To do this, we extended Lagotto, an open source application, to track datasets and collect a host of online activity surrounding datasets from usage to references, social shares, discussions, and citations. During this pilot phase we […]

How doing Open Science has helped advance my career

Last week we sent details of how to win $1,000 in The Winnower open science writing competition. This week we bring you a blog post from Bastian Greshake, one of the participants in the competition. Bastian’s story shows how supporting open genetic data access had a lasting impact on his academic career, contributed to lots of new skills, led to winning awards and helped him find jobs and collaborators. Bastian Greshake, co-founder of OpenSNP. What Have I Done?! There are many firm believers in the different kinds of openness: open access, open source, open data, open science, open you-name-it. And at least to me, some of the most interesting things happen at the intersection of those different opens. Which probably is where openSNP – the project I co-founded in 2011 – can be located. It’s an open source project which tries to crowdsource collecting open genetic data. This is done by enabling people to donate their personal genetic information into the public domain, alongside phenotypic annotations. And for good measure we also factor in open access, by text mining the Public Library of Science and other open databases for primary literature. What started as a somewhat freakish idea in 2011 has by mid–2015 […]

DNAdigest interviews Intel

Big Data Solutions is the leading big data initiative of Intel that aims to empower business with the tools, technologies, software and hardware for managing big data. Big Data solutions is at the forefront of big data analytics and today we talk to Bob Rogers, Chief Data Scientist, about his role, big data for genomics and his contributions to the BioData World Congress 2015. 1.What is your background and your current role? Chief Data Scientist for Big Data Solutions. My mission is to put powerful analytics tools in the hands of every business decision maker. My responsibility is to ensure that Intel is leading in big data analytics in the areas of empowerment, efficiency, education and technology roadmap. I help customers ask the right questions to ensure that they are successful with their big data analytics initiatives. I began with a PhD in physics. During my postdoc, I got interested in artificial neural networks, which are systems that compute the way the brain computes. I co-wrote a book on time series forecasting using artifical neural networks that resulted in a number of people asking me if I could forecast the stock market. I ended up forming a quantitative futures fund with three other […]

Blockchain and Digital Health – First Impressions

Guest Post by Rodrigo Barnes, Chief Technology Officer at Aridhia. This blog post was originally published on the Aridhia website on 25 August 2015. The blog post was inspired by the Ethereum Workshop at the Turing Festival in Edinburgh. Among the many great Edinburgh festivals, the Turing Festival is the most important to the tech start-up scene locally and beyond. This weekend, I attended the Ethereum Workshop to learn about a type of “blockchain” technology and to think about how it might facilitate innovation in digital health. There’s even interest in this for genomic data sharing, as the Global Alliance and Kaiser Permanente’s John Mattison has suggested. Most people in tech have heard of Bitcoin, the cryptocurrency that is exciting libertarians and central bankers alike. One thing I learned this weekend is that, at its heart, Bitcoin and related technologies can be seen as essentially ‘open ledgers’ where transactions are recorded in a very public way, and can’t be repudiated. The gist of this is that the open ledger can be trusted, even though because of the way it is implemented, there is no central authority vouching for it. The system of maintaining the ledger is the decentralised processing of the blockchain. The question I asked myself is “how could this be applied to digital […]

DNAdigest interviews Biopeer

Biopeer is a data sharing tool for small- to medium-scale collaborative sequencing efforts and begun its journey from a group of senior students from Bilkent University, Turkey. Today, DNAdigest interviews Can Alkan, an Assistant Professor in the Department of Computer Engineering at the Bilkent University and one of the minds behind Biopeer. 1. Please introduce yourself; what is your background, position? I am an Assistant Professor in the Department of Computer Engineering at the Bilkent University, Ankara, Turkey. I’m a computer scientist by training, I finished my PhD at Case Western Reserve University, where I worked on algorithms on the analysis of centromere evolution, and then RNA folding and RNA-RNA interactions. Later, I did a lengthy postdoc at the Genome Sciences Department of the University of Washington. I was lucky during my postdoc, that the next generation sequencing started a few months after I joined UW, and suddenly I found myself in many large scale sequencing projects such as the 1000 Genomes Project. Since NGS was entirely new, we needed to develop many novel algorithms to analyze the data. Together with my colleagues I developed read mappers (mrFAST/mrsFAST) specifically for segmental duplication analysis, which we used to generate the first personalized segmental duplication and copy number polymorphism […]