Scientists are using machine learning to interpret “dark matter” DNA

Scientists at Gladstone Institutes are using machine learning to target genetic disorders in so-called genomic “dark matter”.

The computational method being used, called TargetFinder, predicts where non-coding DNA – the DNA that does not code for proteins – interacts with genes. By analysing big data, researchers are abble to connect mutations in genomic “dark matter” with the genes they affect, potentially revealing new targets for genetic disorders.

In the study, published in Nature Genetics, the team from Gladstone Institutes looked at fragments of non-coding DNA called enhancers which act like an instruction manual for a gene, dictating when and where a gene is turned on.

“Most genetic mutations that are associated with disease occur in enhancers, making them an incredibly important area of study,” said the study’s senior author, Katherine Pollard. “Before now, we struggled to understand how enhancers find the distant genes they act upon.”

dna

The new study revealed that, on a strand of DNA, enhancers can be millions of letters away from the gene they influence.

However, using machine learning technology, the researchers were able to analyse hundreds of existing datasets to look for patterns in the genome and identify where a gene and enhancer interact.

They discovered that when an enhancer is far away from the gene it affects, the two connect by forming a three-dimensional loop, like a bow on the genome.

“It’s remarkable that we can predict complex three-dimensional interactions from relatively simple data,” said biostatistician at Gladstone, Sean Whalen. “No one had looked at the information stored on loops before, and we were surprised to discover how important that information is.”

dna1

The new computational approach is a much cheaper and a less time-consuming way to identify gene-enhancer connections in the genome as performing experiments in the can take millions of dollars and years of research.

The technology also gives an insight into how DNA loops form and how they might break in disease.

“Our ability to predict the gene targets of enhancers so accurately enables us to link mutations in enhancers to the genes they target,” said Pollard. “Having that link is the first step towards using these connections to treat diseases.”

Gladstone is set to offer all of the code and data from TargetFinder online for free.

Researchers believe they can predict how long you will live using big data

Researchers from the University of East Anglia (UEA) claim they can predict life expectancy by studying data collected by healthcare providers.

Statisticians, computer scientists and medics from UEA have launched a four-year project that will test how factors such as lifestyle, medical conditions and medical interventions affect “mortality and longevity”.

“People around the world are living longer. We want to develop software tools that use big data routinely collected by healthcare providers to forecast longevity,” said lead researcher, Prof Elena Kulinskaya.

“When we talk about Big Data what we mean is data that is vast, complex and difficult to analyse. We want to be able to use it to see statistical life expectancy trends, based on large-scale population-based data collected over the long term.”

health3

The research team imagine that using big data to predict lifespan will benefit both patients and healthcare providers.

Knowing how long you are likely to live benefits patients by helping them plan for retirement, while healthcare providers will enjoy savings from knowing how particular drugs such as statins or beta-blockers affect longevity.

“Pension contributions were recently freed, so now people can take their pension pots out and use them as they wish. But to be able to plan for retirement, and to understand how much you can spend, it is good to have some idea of your life expectancy,” said Prof Kulinskaya.

“As well as being useful for people planning retirement, it is also important for GPs deciding whether and when to prescribe particular drugs or how to advise their patients. It could also benefit local health authorities planning resources, and insurance companies deciding on the size of pension you can buy with your pension pot.”

data1

The research project, called ‘Use of Big Health and Actuarial Data for Understanding Longevity and Morbidity Risks’, is particularly focused on finding out how various chronic diseases and their treatments impact life expectancy.

However, the team from UEA aren’t the only researchers trying to unlock the value of big data to healthcare.

Big Data Partnership are currently involved in a project that aims to use big data analysis to reduce complications associated with type 2 diabetes.