Art of Analytics: Genome World Window

Learn Data Science
Teradata Employee

GenomeWorldWindow-StephenBrobst-Web-650.png

 

About the Insights

This data visualization shows genetic variations (and similarities) across multiple human populations and geographies using data from the 1000 Genomes Project. Each frame shows a different community or geography within the 1000 Genome Project and was built from the pure genome data. The observer can clearly see the variations between communities proving that large scale genome data provides clear insight into geographic communities across the globe. The goal of the project is to prove the value of large scale genome analytics using high intensity super graphic methods to better understand the genetic patterns of cancer and how to develop personalized medical treatments aligned to genetic composition of individuals.

 

About the Analytics

This visualization shows a collection of Quartal Super Graphics created by VizExplorer sitting on top of a Teradata relational database using query pushdown for large scale data processing.

 

The large scale processing starts with applying the quartal tree algorithm, using an in-database recursive algorithm, that processes the entire 1000 genome population positional information into a common hierarchical quartal grid. A database query is then used to build the subset of data for each of the corresponding communities within the total population. The subset of data is used to render a heatmap shown on each frame. Finally the frames are assembled into a graphic made up of 'small multiples' so the pattern of sequence data can be observed across the communities in the entire 1000 Genome Project. Genomic data is extremely large in size: a database of just 25,000 tumors implies over seventy-five trillion data records.

 

About the Analyst

Andrew is the Chief Technology Officer for VizExplorer. He holds a Bachelor of Surveying from Otago University and a Diploma of Computer Science from Victoria University. He is a cartographer by training and has created over 60 patents and inventions in the areas of cartography, data visualization, and high performance database design. He and his team have been awarded two Smithsonian laureates for heroism in information technology related to data visualization. Andrew has co-authored a book on mathematical gaming analytics as well as over sixty articles in areas related to data visualization and advanced analytics. Andrew was born and raised in the South Island of New Zealand. He now lives in California with his wife and four children.

 

Stephen Brobst

 

Stephen is the Chief Technology Officer for Teradata Corporation. Stephen performed his graduate work in Computer Science at the Massachusetts Institute of Technology where his Masters and PhD research focused on highperformance parallel processing. He also completed an MBA with joint course and thesis work at the Harvard Business School and the MIT Sloan School of Management. During Barack Obama's first term he was also appointed to the Presidential Council of Advisors on Science and Technology (PCAST) in the working group on Networking and Information Technology Research and Development (NITRD). He was recently ranked by ExecRank as the #4 CTO in the United States (behind the CTOs from Amazon.com, Tesla Motors, and Intel) out of a pool of 10,000+ CTOs.

 

Stephen is the data guy. Andrew is the visualization guy. Together they have been teaching advanced data visualization for over ten years at The Data Warehousing Institute and in other forums. Included in this course is a deep examination of the patterns in the genomics super graphic. Stephen and Andrew are both avid fans of the outdoors and have backpacked together in New Zealand and across the USA.