Last year we interviewed Can Alkan, an Assistant Professor in the Department of Computer Engineering at the Bilkent University, about Biopeer – a data sharing tool for small- to medium-scale collaborative sequencing efforts. Today we are talking to Can about his new project – Coinami.
Please tell us more about Coinami. How did it start?
Coinami is basically a volunteer grid computing platform that generates a new multi-centralised cryptocurrency, which uses high throughput sequence (HTS) read mapping as proof-of-work.
After Bitcoin gained popularity, many different currencies that are called “altcoins” emerged around the same structure: decentralised, secure transactions in a public ledger called blockchain. All cryptocurrencies basically are composed of two parts: mining, which is generating new coins (i.e. “printing banknotes”), and transactions, which is spending and receiving coins. To provide integrity and prevent “overprinting”, a computationally intensive task has to be performed, which is called proof-of-work. Different cryptocurrency systems use different proof-of-work schemes, but, including Bitcoin, all current proof-of-work tasks serve no practical purpose other than maintaining the currency.
Here, we suggest a different approach for proof-of-work. We propose that instead of impractical calculations, the miners should use their computational power for scientific computing. The idea is very similar to the Berkeley Open Infrastructure Network Computing (BOINC) that was popularised with the Search for Extraterrestrial Intelligence at Home (SETI@home) project, which was then adopted to other scientific problems, including protein folding. BOINC volunteers download problem sets from servers, solve them in their spare time (e.g. when screen saver is activated), and upload the results back to the server. Coinami’s novelty is merging the volunteer grid computing platform with a cryptocurrency to increase volunteer motivation. This is because of multiple factors. First, the HTS read mapping is not only compute intensive, but also memory and I/O intensive, which may make volunteers to stay away. Second, we always want a very quick turnaround in HTS data analysis, mapping only in spare time would generate a bottleneck. Additionally, we want to make sure that the volunteers upload correct results (i.e. BAM files), and the data privacy is maintained.
Where and by whom can it be used? Who is using it already? How can one start using it?
The source code is available at GitHub, but it is still in alpha version. We are currently the only ones using it for testing purposes, but we will deploy the system this summer for wider usage. We will first start to use it within the university campus to do some stress tests, and we will get more beta users afterwards. Although most of the work is done by the volunteers, the server side still has some work to do to validate returned BAM files, and ensure data privacy through obfuscation, so our current server infrastructure would not be sufficient for a world-wide use yet.
How could the research community benefit from using Coinami?
It was previously estimated that 1 million genomes will be sequenced by the end of 2017. I believe we will surpass this number. The HiSeq X Ten platform for instance can sequence about 18,000 genomes per year, and many sequencing centers installed these systems in addition to the systems with comparatively lower throughput. Many “100K Genome” projects are started such as the UK100K, the Genome Asia, and others. We keep generating more and more data that need to be processed, but no data center, no cluster, no cloud computing platform is infinite. In addition to this new data problem, we also often need to re-align existing data to new reference genome versions that are released every 3-4 years. There is also a new shift to non-linear reference genomes (“pan-genomes”) which will require additional remapping of old data. We believe Coinami can help with the remapping problem, thus lifting substantial compute burden from data centers and clouds that can focus on keeping up with new data.
Do you have any plans to promote Coinami?
We started with a preprint, which is available at arXiv. The arXiv preprint was picked up by an Italian newspaper, so I guess Coinami is better known in Italy than in Turkey right now! The blockchain technology behind Bitcoin and Coinami has many other use cases, and there is a small group within Global Alliance for Genomic Health (GA4GH) that aims to use this technology for federated sharing of cancer somatic variants data. We have been in touch with this group to further develop and eventually fully deploy Coinami as well. We also have a simple web page, which we will improve when we are ready for beta testing, and even a Twitter profile.
Are you part of a project that facilitates data sharing for genomics or other related research?
Are you directly or indirectly involved in the Open Science movement?
Would you like to be featured on our blog?
We would love to hear from you.