Learn Data Science

Learn Data Science
Looking Glass

Explore and Discover

Latest Articles, Videos and Blog Posts speak with those interested in analytics-based decision making. Browse all content in the Teradata Data Science Community to gain valuable insights.


The value of mapping customer journeys and how it’s possible to do so accurately and efficiently, even in a B2C industry.


To learn more, click HERE


Vantage is not just for the data scientist.  Despite the fact that we use this term to indicate a laughable monolith that the profession is not, we also built Vantage precisely because not everyone is or wants to be a data scientist.


To learn more, click HERE




About the Insights

This data visualization shows genetic variations (and similarities) across multiple human populations and geographies using data from the 1000 Genomes Project. Each frame shows a different community or geography within the 1000 Genome Project and was built from the pure genome data. The observer can clearly see the variations between communities proving that large scale genome data provides clear insight into geographic communities across the globe. The goal of the project is to prove the value of large scale genome analytics using high intensity super graphic methods to better understand the genetic patterns of cancer and how to develop personalized medical treatments aligned to genetic composition of individuals.


About the Analytics

This visualization shows a collection of Quartal Super Graphics created by VizExplorer sitting on top of a Teradata relational database using query pushdown for large scale data processing.


The large scale processing starts with applying the quartal tree algorithm, using an in-database recursive algorithm, that processes the entire 1000 genome population positional information into a common hierarchical quartal grid. A database query is then used to build the subset of data for each of the corresponding communities within the total population. The subset of data is used to render a heatmap shown on each frame. Finally the frames are assembled into a graphic made up of 'small multiples' so the pattern of sequence data can be observed across the communities in the entire 1000 Genome Project. Genomic data is extremely large in size: a database of just 25,000 tumors implies over seventy-five trillion data records.


About the Analyst

Andrew is the Chief Technology Officer for VizExplorer. He holds a Bachelor of Surveying from Otago University and a Diploma of Computer Science from Victoria University. He is a cartographer by training and has created over 60 patents and inventions in the areas of cartography, data visualization, and high performance database design. He and his team have been awarded two Smithsonian laureates for heroism in information technology related to data visualization. Andrew has co-authored a book on mathematical gaming analytics as well as over sixty articles in areas related to data visualization and advanced analytics. Andrew was born and raised in the South Island of New Zealand. He now lives in California with his wife and four children.


Stephen Brobst


Stephen is the Chief Technology Officer for Teradata Corporation. Stephen performed his graduate work in Computer Science at the Massachusetts Institute of Technology where his Masters and PhD research focused on highperformance parallel processing. He also completed an MBA with joint course and thesis work at the Harvard Business School and the MIT Sloan School of Management. During Barack Obama's first term he was also appointed to the Presidential Council of Advisors on Science and Technology (PCAST) in the working group on Networking and Information Technology Research and Development (NITRD). He was recently ranked by ExecRank as the #4 CTO in the United States (behind the CTOs from Amazon.com, Tesla Motors, and Intel) out of a pool of 10,000+ CTOs.


Stephen is the data guy. Andrew is the visualization guy. Together they have been teaching advanced data visualization for over ten years at The Data Warehousing Institute and in other forums. Included in this course is a deep examination of the patterns in the genomics super graphic. Stephen and Andrew are both avid fans of the outdoors and have backpacked together in New Zealand and across the USA.


The latest episode of the Teradata Datacast podcast is now available. To listen and subscribe, just search for “Teradata” in your favorite podcast app. Direct links for select podcast apps include:


Links: Apple Podcasts (iTunes) | Google Play | Spotify | Stitcher




About the Insights

This anonymized visualization is one of a series of analytics mapping the money flows between large Chinese companies for a Corporate Banking Risk Analysis project at a large Chinese bank. The analysis uses Fund Transfer transaction data to understand risk and uncover market opportunities.


In this graph, the dots (nodes) represent the companies, via their account holdings. The lines (edges) represent a transfer of funds between the companies, so each line shows a movement of money from one account to another.


The charts shows all the money flows between the different colored companies. We can map flows through 2,3 and 4 subsequent transactions, such as the light green company, to understand upstream supply chains and the interdependency companies have on each other.


To manage risk, the bank can identify any large exposure concentrations to groups of highly interdependent companies, where a single failure may bring down all the companies. It allows the bank to identify the critical companies in the supply chains and independently cross check a company's cash flow to verify its health. It also helps identify fraud. The bank can check the true business activity of a company and can verify that loaned funds are used for their stated purpose. For example a manufacturer that is investing speculative funds in the stock market rather than paying suppliers or who took out a loan to build a factory but really used the funds for short-term residential real estate trades.


For marketing it highlights gaps in the banks servicing. Where high volumes of funds flow out (or in) to the chains identifies high value prospect companies. For existing clients it reveals any high value gaps in service provision for wider financial services such as financing, clearing and risk management.


About the Analytics

This analysis uses Teradata Aster and Aster Lens. The transaction data loaded was very large in size: 60,802,990 records for over 670,000 companies. The company records contain industry classification codes so we can understand their business activity. For this chart PageRank was used to select the top 32 important customers and we included all the relevant counterparties with total transactions greater than or equal to CNY 700,000. (USD 115k).


In this graph, there are 3883 nodes and 3943 edges. The nodes represent the companies while the edges represent the cash flows between the companies.


About the Analyst

Qiling (Mary) Shi is part of a pioneering group of Chinese Data Scientists that have been partnering with the banks in China to experiment with large-scale risk analytics using high intensity super graphic methods. Their goal is to uncover new ways of managing risk in Chinas highly complex commercial system. Their work on corporate customers, including 'Fund Fountains', is just one example of a series of innovations that this talented group have given to the wider banking world, to help de-risk our financial systems.


Qiling is a presale consultant for Teradata China's Aster & Hadoop Big Data Center Of Excellence (COE). Qiling got her PhD degree in Applied Mathematics at University of Central Florida. She is currently doing her MBA part-time at the University of Delaware. Prior to Teradata, she worked in the risk management department of PNC Bank in Pittsburgh for over 2 years. During that time, she developed many algorithms to fight fraud and money laundering; several of which were reported to the Office of Currency Controller by PNC. She has also developed and published computer programs in SAS conferences while working for the Computer Sciences Corporation.



To find answers to their toughest challenges, companies seek pervasive data intelligence, which encompasses all data, all the time, while also assuming that data alone is not enough.


To learn more, click HERE


If you build on Teradata, the industry’s North Star, we de-risk your investment—and deliver real answers, not just analytics.


To learn more, click  HERE




About the Insights

If the Vietnam war was a 'Television War', the on-going conflict in Iraq and Syria involving the organisation commonly referred to as the Islamic State (aka ISIS or ISIL) is a 'Social Media War'. Members of ISIS have regularly exploited social media such as Twitter to recruit followers, spread propaganda, and as a weapon of terror. They have been able to exert enormous influence on how the world perceives them by posting images and videos of extreme violence.


Through their activities they have been highly successful in radicalising disenfranchised Muslim youths and inspiring numerous terrorist attacks around the world. The ISIS Twitter machine is highly organised and tech-savvy, making them difficult foe to combat. This problem is magnified by the dynamic nature of the ISIS social media network. Users are regularly suspended but continuously reconnect under new accounts, making them very difficult to track.


This visualisation shows a small part of the ISIS Twitter network, demonstrating the complexity of the social interactions and the difficulty faced in identifying and tracking individual people of interest. The problem is exacerbated by connections between ISIS members and news sources, political activists and academic researchers, all of which in turn have thousands of connections that are mostly benign.


The highlighted nodes in this visualisation represent a sample of users who all have a history of tweeting messages of hate and violence in support of the activities of ISIS. These users were identified using a sample of 33 supporters and members of ISIS, including known recruiters and propagandists. Graph analysis techniques were employed to analyse the connections between these radicals and their friends and followers, in order to identify the most influential users in the network. Advanced analytics can thus provide clarity where before there was only chaos and confusion. It allows us to study the social network connections of ISIS supporters and followers and ultimately identify the movers and shakers in their web.


About the Analytics

This Teradata Aster visualization shows a Gephi representation of an Aster AppCenter produced graph using the list of friends and followers of the 33 Twitter users who are members or supporters of ISIS and, in turn, their friends and followers. The total number of Twitter users analysed is 334,370, though only the top 10% of nodes are shown here. The plot uses eigenvector centrality to measure the influence of each user in the network, with the dots (nodes) representing the users and the lines (edges) indicating the other users each node is connected to. The highlighted nodes indicate the users that are known or suspected supporters/members of ISIS, while the highlighted edges trace the connections between these users and other users in the network.


About the Analyst

Unfortunately it is a sign of the times we live in that the analyst for this piece must remain anonymous. Analytics plays a vital role in society across a wide variety of sensitive topics such as helping maintain our peace, medical research, prevention of the spread of disease, border integrity and managing risk deep within our financial system. The analysts that support this work, by its nature, are often working in secrecy due to sensitivity or to the need to protect them from becoming targets. Although we are unable to credit the analyst in this case, we do appreciate their daily work and the role all such analysts provide behind the scenes on behalf of our global society.







About the Insights

This data visualization represents claims made by a Service Provider against an Employer. The nodes in the middle of each small "explosion" represent the Service Provider, the nodes at the periphery represent Employers and the edges between them represent the relationships. The thickness of the edge is proportional to the value claimed.


The visualization was used to look at the relationship structures between service providers and employers. Service Providers help people find employment and also provide job seekers with ongoing support to retain their jobs. To be effective and to provide personalized and flexible services to job seekers, a Service Provider would typically have strong connections with a large number of Employers.


The visualization was used to look for unusual type of connections. For example:

  • An isolated group where a Service Provider is connected to many Employers but in a network separated from the rest of the graph.
  • One Service Provider connected to only one employer.
  • Loops where a Service Provider is also an employer.

Driver behind this business case is a government body, the Department of Employment, which is responsible for monitoring the way employment services are delivered. The providers liaise with local employers and registered training organizations, to provide the right mix of support for job seekers. The goal of this project was to investigate significant and systemic non-compliance that exists in the claims made.


About the Analytics

This visualization shows a network graph created using Teradata Aster Lens. Claims data from the Department of Employment were loaded into the Teradata Aster discovery platform.


Claims were classified then tested for veracity by chronology, geolocation and variation; and analysed longitudinally for processing and event anomalies. Network graphs were generated to observe patterns of collusion. This provided a quick way to see which Service Providers claimed money against which Employers.


The visualization was also used for comparison between different time periods. Similar graphs can be constructed on a periodic basis to see if new isolations or patterns appear over time in the network.

About the Analyst

Tatiana Bokareva is a Data Scientist with the Teradata Advanced Analytics team in Australia and New Zealand. Originally from Moscow, Tatiana now lives in Sydney with her 2 young Australians and she works with key clients in the financial services, government and telecommunications industries. Tatiana is a Teradata blogger and particularly enjoys combining her passion for analytics with her practical, hands on experience in fashion and retail! She is published extensively and has presented at many international conferences.


Tatiana has a Bachelor degree with Honours and PhD in Computer Science from the University of New South Wales. Tatiana was runner up for the Dean Postgraduate Research Award in Information and Communication sector. She was awarded the prestigious Women in Engineering Scholarship. During her PhD she was trained at NICTA (National ICT Australia).


Tatiana's PhD was in the area of the Internet of Things (IoT), namely her work evolved around building self-reliant, self-healing, fault tolerant sensor networks. She also held part time research appointments at the university.





Are you ready to unlock the digital world of retail? Style & Statistics is an approachable, innovative guide to analytics in retail – encompassing everything from merchandising and pricing to consumer marketing and promotions. Brittany Bullard, a retail aficionado and analytics guru for SAS Retail, lays out how you can enhance your organization’s value and create your own success story.



Top Kudoed Authors

Data Science Informative Articles and Blog Posts

Our blogs allows customers, prospects, partners, third-party influencers and analysts to share thoughts on a range of product and industry topics. Also, we have lots of content in the community; allowing you to gain valuable insights from Teradata data scientists.