Interview with Dr Clare Turnbull – Clinical Lead for 100,000 Genomes Cancer Programme, Genomics England. Clare is one of the many great speakers at the BioData World Congress in Hinxton, UK (26-27 October 2016); her presentation will be “Next-Generation genomics for germline cancer susceptibility”.


Please introduce yourself, tell us about your background and your current role.

My role in the 100,000 Genomes Project is clinical lead for cancer data.  In the cancer programme, we are sequencing tumor-normal pairs acquired from a variety of tumour types and clinical settings.

In addition, across the programme, we are reporting on relevant variants in germline cancer susceptibility genes.  I work with the bioinformatics teams on the analysis and clinical interpretation of these data and return of findings back to laboratories and clinicians in the NHS.

I am a clinical geneticist by training and manage patients with inherited susceptibility to cancer at Guy’s and St Thomas’ hospital. My research background is in cancer genomics and genetic susceptibility to cancer and I have a small research team at the Institute of Cancer Research in London.


Please tell us more about Genomics England and the 100,000 Genomes Project.

The 100,000 Genomes Project is a government-funded programme sequencing 100,000 genomes of NHS patients. Genomics England is a company owned in its entirety by the Department of Health, established to deliver the 100,000 Genomes Project programme and set up longer term infrastructure for delivering large-scale sequencing within the NHS.

The 100,000 Genomes Project has two main components – the Cancer Programme and the Rare Disease Programme. In the cancer component, we are sequencing matched tumor-normal pairs. In addition, where informative for clinical research, additional longitudinal tumour samples, multi-region and multi-site tumour samples will be obtained from some patients. Detailed longitudinal clinical and treatment data is also being collected for these patients. The other part of the 100,000 Genomes Project programme is Rare Disease in which we are recruiting patients with one of >160 rare diseases with the intention of identifying the causative variant underlying their disease.  To understand the genome of the patient, we collect samples across the family: typically we seek to obtain a parent-offspring trio as this is usually most informative.  Therefore we anticipate inclusion of at least 15,000-20,000 families on the rare disease side.

To date, in total we have sequenced over 12,000 genomes and this number is scaling up every day as more participants join the Project.


What are the biggest hurdles in this project?

To derive long-term value from the 100,000 Genomes Project, we need to develop infrastructure to deliver genome sequencing at scale to the NHS.  In order to do this, we need to establish end-to-end processes and pipelines for obtaining, storing and processing high quality samples, clinical data and sample metadata collected from our hospitals in the NHS Genomic Medicine Centres.

There are many logistical hurdles and challenges to establishing and embedding such processes across >80 different NHS Trusts.

Some of these challenges pertain to the different informatics systems used across the NHS and developing central data structures to work with these different systems.  For the Cancer Programme, there are many interesting challenges in Molecular Pathology around acquiring and processing tumour samples from which to generate DNA of sufficiently high quality and quantity.  And exciting challenges around setting up the best systems for return, interpretation and implementation of results by the NHS laboratories and clinicians. Such challenges are inherent to this being an implementation project across multiple centres rather than a pure research project.

Currently, NHS genetics services comprise a set of regional diagnostic laboratories and clinical genetic services, essentially functioning independently of each other with different testing eligibilities, processes, sequencing techniques and storage of their data locally. Through 100,000 Genomes Project we are creating the central structure and harmonising the ways in which samples and data are handled across different clinical units and laboratories, as well as then putting all the data generated in a single central place to be access by all these different clinical units.


Once you solve all these problems and collect all the data, what will happen to it? Who will be able to access and analyse it?

Only NHS clinicians will have access to identifiable data which will enable them to feed back results to their patients once they have validated the reports they receive back from Genomics England.

Partners from academia and industry will be able to access de-identified data for specific research purposes via the Genomics England data centre..

In late 2014, we put out a call for expression of interest for researchers to come together in groups by theme and from this we have inaugurated  about 40 domains under GeCIP – the Genomics England Clinical Interpretation Partnership.  These domains include rare disease themes such as vision and hearing, inherited cancer, cardio-vascular, immunological, cancer themes such as breast cancer and colorectal cancer and cross-cutting themes such as population genomics, analytical methodologies, ethics and health economics. The detailed research applications from the GeCIP domains are scrutinised by our Science Committee and Access Review Committee and following approval they gain access to a data embassy in which to analyse the de-identified clinical and genome data relevant to their area.


What about industry? How will they be involved?

In the first instance, we brought together eleven companies involved in diagnostics, pharma or bioanalytics relating to genomics, and they have formed the Genomics Expert Network for Enterprises (GENE) consortium. GENE have worked closely with us during the setup of the programme and have also analysed the de-identified data through their data embassy.   With both partners from industry and the clinical/academic community working on the same data in a pre-competitive environment, this co-working will facilitate earlier translation of any new findings into development of therapies or diagnostics.


What impact do you expect to have in 5-10 years?

The overarching aim of this programme is to develop the structures that will enable large-scale NGS to be used as a routine test in the health service and the data generated therein to be used for productive clinical research.

Extensive training and development of skills, building the NHS sequencing centre in Hinxton, developing the data architecture and pipelines and developing infrastructure for storing, sharing and returning genomic findings to the NHS are all part of establishing these structures.

Finally, through this programme we are generating a central repository of linked clinical and molecular data that faces both the NHS and, in a de-identified form, the research community. In the first instance this central repository will contain the data generated by 100,000 Genomes Project but this is only the first step in centralising all the molecular data generated from NHS patients.  Rare diseases are rare and cancer is extremely complex; it is only through putting all our molecular data on all our patients in one place with good clinical data, that we can really make serious inroads in research on these conditions in order to improve how we manage patients in the future.



To get a 20% discount for the registration for BioData World Congress in Hinxton on 26-27 October 2016, use the discount code ECQS.


Are you part of a project that facilitates data sharing for genomics or other related research?

Are you directly or indirectly involved in the Open Science movement?

Would you like to be featured on our blog?

We would love to hear from you.

Write to us at or use our contact page to get in touch.