Genome World Window - Stephen Brobst

Learn Data Science

Genome World Window - Stephen Brobst

0 Kudos
Started ‎09-19-2017 by
Modified ‎09-19-2017 by





Stephen Brobst


1000 genome project, art of analytics, healthcare, advanced analytics

About the InsightsAbout the AnalyticsAbout the Benefits

‘Genome World Window’ uses data from the 1000 Genomes Project (2008-2015, now being extended) to show genetic variations (and similarities) across multiple human populations and geographies.

The 1000 Genomes Project sequenced the genomes of at least 1000 people, creating the most detailed and medically-useful catalog of human genetic variation to date. It has helped researchers discover more than 100 regions of the genome that contain genetic variants associated with common diseases like diabetes, coronary artery disease, prostate and breast cancer, rheumatoid arthritis, inflammatory bowel disease, and age-related macular degeneration.

In ‘Genome World Window,’ each frame shows a different community or geography within the 1000 Genome Project and was built from the pure genome data. The observer can clearly see the variations between communities proving that large-scale genome data provides clear insight into geographic communities across the globe.

The goal of the project is to prove the value of large-scale genome analytics using high intensity super graphic methods to understand the genetic patterns of cancer better and how to develop personalized medical treatments aligned to the genetic composition of individuals.

This visualization shows a collection of Quartal Super Graphics created by VizExplorer, sitting on top of a Teradata Database using query pushdown for large-scale data processing.

The large-scale processing began by applying the Quartal tree algorithm using an in-database, recursive algorithm that processed the entire 1000-genome-population positional information into a common, hierarchical Quartal grid. Then a database query was used to build the subset of data for each of the corresponding communities within the total population. The subset was used to render a heat map on each frame.

Finally the frames were assembled into a graphic so the pattern of sequence data could be observed across communities for the entire 1000 Genome Project. Genomic data is extremely large; a database of just 25,000 tumors contains more than 75 trillion data records.

The analysis helps decode the human genome, using the data to accelerate the development of treatments and medication. It’s a prime example of how analytics can be used in modern medical research to identify cause and effect, correlations, and links; vital keys that unlock disease management and cures. The genome analysis also helps map the development and movement of historical migration patterns (the slave trail, for instance), and how demographics correlate with social aspects.




The Art of Analytics
To learn more about the Art of Analytics business case visualization initiative, please visit us on
Aster, Teradata, and the Teradata logo are registered trademarks of Teradata Corporation and/or its affiliates in the U.S. and worldwide

Version history
Revision #:
1 of 1
Last update:
‎09-19-2017 10:29 AM
Updated by:
Labels (1)