Blog
About

Data sharing is crucial for genomics research!

Our ongoing ‘Open Science Stars’ series has highlighted some of the vast variety of views, experiences, and facets of open science, and a cadre of great people working to drive real and positive change. This week, we spoke with Fiona Nielsen, who has founded two companies dedicated to the sharing of genomics data! Here’s her amazing story.

Hi Fiona! Thanks for joining us at the ScienceOpen blog. Could you start off by letting us know a bit about your background?

Pleased to join your blog series. 🙂

I am a bioinformatics researcher with a background in computer science. My first degree was a short computer science degree, which I then expanded by studying bioinformatics at the University of Southern Denmark, where I gradually moved more and more into genetics and DNA sequence analysis. After my masters I moved to Nijmegen, the Netherlands where I studied for a PhD in bioinformatics at the NCMLS. During my time as a PhD student, my mother was diagnosed with cancer, and I lost my motivation to work on scientific topics far removed from patient impact. I moved to Cambridge, UK to work for Illumina, and after two years I decided to leave my 9-5 job to start my own project: I founded first the charity DNAdigest and later the company Repositive to enable better data sharing within genomics research.

Cute! Had to share this from the Repositive site (https://repositive.io/)
Cute! Had to share this from the Repositive site (https://repositive.io/)

 

When did you first become interested in Open Access and Open Science? What was your initial reaction when you heard about it?

I do not recall when I first came across the terms of Open Access and Open Science, but I do recall that I repeatedly came across anecdotes from colleagues that could not access data or results from published papers, and how I looked up to the progressive researchers who would “go all the way” and make all data and results available immediately, even before publication of a paper.

 

How much of a struggle was access to data in your experience as a researcher?

My bioinformatics research has always been dependent on having access to the right type of data for testing my hypothesis. For small scale (algorithm) experiments, the data from colleagues or from an internal project was sufficient for me, but I always ended up looking for external data to validate my findings. I soon realised that 1) the right type of data is not easy to find 2) the data sources are not easy to access. When I realised the bigger picture of how this data bottleneck is inhibiting progress in genetics research, I started thinking about how to make more data available and accessible for research.

When I realised the bigger picture of how this data bottleneck is inhibiting progress in genetics research, I started thinking about how to make more data available and accessible for research.

How did you start the not-for-profit DNAdigest?

I got so frustrated with the lack of data access that I decided to leave my job at Illumina to found the charity DNAdigest to develop a community initiative to enable more efficient and ethical data sharing for the benefit of patients. This was in March 2013 and I immediately started running public events, workshops and hackdays.

 

Why did you choose to found Repositive as a spin-out of this company?

DNAdigest soon made lots of progress in building community and public engagement, but it was extremely difficult to raise funding to start building solutions, so in August 2014 Repositive was spun out of DNAdigest as a social enterprise, a commercial company driven by the same impact mission as the charity: to enable efficient and ethical data sharing for the benefit of patients. As a regular company, Repositive raised investment from angel investors to start building software to help researchers access more genomic data faster.

 

How important is data sharing in the field of genomics research?

Genomics is a data science. The human genome is vast – 3 billion base pairs. To make any significant findings from the data, you need lots of data for validation and comparison. It is impossible for any one research group, lab or institute to generate all the data that is relevant for any one disease, so data sharing and collaboration across institutions is paramount for the quality of genomics research.

It is impossible for any one research group, lab or institute to generate all the data that is relevant for any one disease, so data sharing and collaboration across institutions is paramount for the quality of genomics research

Why is there such a huge gap between data generation and data accessibility in genomics research?

While virtually all of the research community recognise the need for data sharing and collaboration, it is an uphill battle to change the culture and incentive structure of academia. The most recognised metric for research output number of papers published in high profile journals. As long as data sharing is not part of the agenda or incentive structure, it remains a side project – a nuisance – which is only taken care of last minute if at all when publishing papers. Researchers would rather spend their time writing papers and grants than spending time and effort making their data available and accessible.

 As long as data sharing is not part of the agenda or incentive structure, it remains a side project

What are some of the potential barriers to data sharing?

Data sharing in biomedical science, including genomics, is complicated further by the potentially personally identifiable information (PII) in the data. This means that data needs to be sufficiently de-identified and access processes need to be put in place to ensure that data is used in accordance to the consent given by the data/sample donor.

 

Whose responsibility do you think it is to lead this change?

I think all researchers who are using biomedical data should think twice about the impact of the data. The individuals, often patients with serious diseases, gave consent for their samples and data to be used for research to make a difference for research to help future patients. Not sharing data, not making the most of data, is not meeting the expectations of the data donors. So even if it takes time and effort, it is in my opinion, part of the obligation of the research community to enable data sharing to maximise research impact. Next to incorporating this mindset among researchers, I think the power to change the incentive system lies with the research funders. The day that funders require that all your biomedical research data must be made available for reuse by the research community before they will give you another research grant, I am sure that you will start taking data sharing seriously. 🙂

I think the power to change the incentive system lies with the research funders

Do you think we need to give more consideration to data as a measurable output from research, rather than focussing almost exclusively on papers?

I think data should be cited and data usage measured with as much emphasis that we are currently giving to papers and paper citations. Today the tools for doing this are already available: Data deposited in repositories are given digital object identifiers (DOIs) and it is possible to make specific data publications to draw attention to the effort that was put into creating the data and suggest possible reuse. All of these research outputs can and should be cited to encourage more data to be made available.

 

If you could give one piece of advice to students looking to pursue a research career, what would it be?

I usually give three tips to young researchers who work with biomedical or genomics data:

  1. Cite data – to encourage more data sharing
  2. Publish data – you will get more citations and visibility the more data you publish
  3. Understand consent – you must always understand the consent for the data you use.

The more you share, the more the research community will benefit, and the more visibility and credibility you will get for your further research career.

The more you share, the more the research community will benefit, and the more visibility and credibility you will get for your further research career.

Thanks, Fiona!

Thanks for inviting me to contribute 🙂

 

Credit: Fiona Nielsen
Credit: Fiona Nielsen

Fiona Nielsen, founder and CEO of DNAdigest and Repositive, is a bioinformatics scientist turned entrepreneur. She used to work at Illumina developing tools for interpretation of next-generation sequencing data and analysing cancer and FFPE samples sequenced on the latest technology. There she realised how difficult it is to find and get access to genomics data for research, which led to DNAdigest being founded as a charity to promote best practices for efficient and ethical data sharing – aligning the interests of patients and researchers. In August 2014 Repositive Ltd was spun out of DNAdigest as an entity to develop and provide a novel software tools and mechanisms for sharing of data, including a data brokering mechanism, enabling easy access to anonymised aggregated data. Fiona was nominated for the 2013 WiSE award for Entrepreneurship and Innovation.

Leave a Reply

Your email address will not be published. Required fields are marked *