Learn Data Science

Learn Data Science
Looking Glass

Explore and Discover

Latest Articles, Videos and Blog Posts speak with those interested in analytics-based decision making. Browse all content in the Teradata Data Science Community to gain valuable insights.

48 Views
0 Comments

There’s a slew of data and a variety of analytics. How do companies make sense of the mess and embark on a clear path?

 

To learn more, click HERE

63 Views
0 Comments

Technology alone is never enough; success requires a balance of technological and human capabilities. The same thinking should be extended to analytics, whether a rules engine, machine learning, deep learning, or artificial intelligence.  

 

To learn more, click  HERE

145 Views
0 Comments

The value of mapping customer journeys and how it’s possible to do so accurately and efficiently, even in a B2C industry.

 

To learn more, click HERE

144 Views
0 Comments

Vantage is not just for the data scientist.  Despite the fact that we use this term to indicate a laughable monolith that the profession is not, we also built Vantage precisely because not everyone is or wants to be a data scientist.

 

To learn more, click HERE

171 Views
0 Comments

GenomeWorldWindow-StephenBrobst-Web-650.png

 

About the Insights

This data visualization shows genetic variations (and similarities) across multiple human populations and geographies using data from the 1000 Genomes Project. Each frame shows a different community or geography within the 1000 Genome Project and was built from the pure genome data. The observer can clearly see the variations between communities proving that large scale genome data provides clear insight into geographic communities across the globe. The goal of the project is to prove the value of large scale genome analytics using high intensity super graphic methods to better understand the genetic patterns of cancer and how to develop personalized medical treatments aligned to genetic composition of individuals.

 

About the Analytics

This visualization shows a collection of Quartal Super Graphics created by VizExplorer sitting on top of a Teradata relational database using query pushdown for large scale data processing.

 

The large scale processing starts with applying the quartal tree algorithm, using an in-database recursive algorithm, that processes the entire 1000 genome population positional information into a common hierarchical quartal grid. A database query is then used to build the subset of data for each of the corresponding communities within the total population. The subset of data is used to render a heatmap shown on each frame. Finally the frames are assembled into a graphic made up of 'small multiples' so the pattern of sequence data can be observed across the communities in the entire 1000 Genome Project. Genomic data is extremely large in size: a database of just 25,000 tumors implies over seventy-five trillion data records.

 

About the Analyst

Andrew is the Chief Technology Officer for VizExplorer. He holds a Bachelor of Surveying from Otago University and a Diploma of Computer Science from Victoria University. He is a cartographer by training and has created over 60 patents and inventions in the areas of cartography, data visualization, and high performance database design. He and his team have been awarded two Smithsonian laureates for heroism in information technology related to data visualization. Andrew has co-authored a book on mathematical gaming analytics as well as over sixty articles in areas related to data visualization and advanced analytics. Andrew was born and raised in the South Island of New Zealand. He now lives in California with his wife and four children.

 

Stephen Brobst

 

Stephen is the Chief Technology Officer for Teradata Corporation. Stephen performed his graduate work in Computer Science at the Massachusetts Institute of Technology where his Masters and PhD research focused on highperformance parallel processing. He also completed an MBA with joint course and thesis work at the Harvard Business School and the MIT Sloan School of Management. During Barack Obama's first term he was also appointed to the Presidential Council of Advisors on Science and Technology (PCAST) in the working group on Networking and Information Technology Research and Development (NITRD). He was recently ranked by ExecRank as the #4 CTO in the United States (behind the CTOs from Amazon.com, Tesla Motors, and Intel) out of a pool of 10,000+ CTOs.

 

Stephen is the data guy. Andrew is the visualization guy. Together they have been teaching advanced data visualization for over ten years at The Data Warehousing Institute and in other forums. Included in this course is a deep examination of the patterns in the genomics super graphic. Stephen and Andrew are both avid fans of the outdoors and have backpacked together in New Zealand and across the USA.

203 Views
0 Comments

The latest episode of the Teradata Datacast podcast is now available. To listen and subscribe, just search for “Teradata” in your favorite podcast app. Direct links for select podcast apps include:

 

Links: Apple Podcasts (iTunes) | Google Play | Spotify | Stitcher

149 Views
0 Comments

FundingFountains-QilingShi-Web-650.png

 

About the Insights

This anonymized visualization is one of a series of analytics mapping the money flows between large Chinese companies for a Corporate Banking Risk Analysis project at a large Chinese bank. The analysis uses Fund Transfer transaction data to understand risk and uncover market opportunities.

 

In this graph, the dots (nodes) represent the companies, via their account holdings. The lines (edges) represent a transfer of funds between the companies, so each line shows a movement of money from one account to another.

 

The charts shows all the money flows between the different colored companies. We can map flows through 2,3 and 4 subsequent transactions, such as the light green company, to understand upstream supply chains and the interdependency companies have on each other.

 

To manage risk, the bank can identify any large exposure concentrations to groups of highly interdependent companies, where a single failure may bring down all the companies. It allows the bank to identify the critical companies in the supply chains and independently cross check a company's cash flow to verify its health. It also helps identify fraud. The bank can check the true business activity of a company and can verify that loaned funds are used for their stated purpose. For example a manufacturer that is investing speculative funds in the stock market rather than paying suppliers or who took out a loan to build a factory but really used the funds for short-term residential real estate trades.

 

For marketing it highlights gaps in the banks servicing. Where high volumes of funds flow out (or in) to the chains identifies high value prospect companies. For existing clients it reveals any high value gaps in service provision for wider financial services such as financing, clearing and risk management.

 

About the Analytics

This analysis uses Teradata Aster and Aster Lens. The transaction data loaded was very large in size: 60,802,990 records for over 670,000 companies. The company records contain industry classification codes so we can understand their business activity. For this chart PageRank was used to select the top 32 important customers and we included all the relevant counterparties with total transactions greater than or equal to CNY 700,000. (USD 115k).

 

In this graph, there are 3883 nodes and 3943 edges. The nodes represent the companies while the edges represent the cash flows between the companies.

 

About the Analyst

Qiling (Mary) Shi is part of a pioneering group of Chinese Data Scientists that have been partnering with the banks in China to experiment with large-scale risk analytics using high intensity super graphic methods. Their goal is to uncover new ways of managing risk in Chinas highly complex commercial system. Their work on corporate customers, including 'Fund Fountains', is just one example of a series of innovations that this talented group have given to the wider banking world, to help de-risk our financial systems.

 

Qiling is a presale consultant for Teradata China's Aster & Hadoop Big Data Center Of Excellence (COE). Qiling got her PhD degree in Applied Mathematics at University of Central Florida. She is currently doing her MBA part-time at the University of Delaware. Prior to Teradata, she worked in the risk management department of PNC Bank in Pittsburgh for over 2 years. During that time, she developed many algorithms to fight fraud and money laundering; several of which were reported to the Office of Currency Controller by PNC. She has also developed and published computer programs in SAS conferences while working for the Computer Sciences Corporation.

 

207 Views
0 Comments

To find answers to their toughest challenges, companies seek pervasive data intelligence, which encompasses all data, all the time, while also assuming that data alone is not enough.

 

To learn more, click HERE

178 Views
0 Comments

If you build on Teradata, the industry’s North Star, we de-risk your investment—and deliver real answers, not just analytics.

 

To learn more, click  HERE

193 Views
0 Comments

ExtremeNetworking-Anonymous-Web-650.png

 

About the Insights

If the Vietnam war was a 'Television War', the on-going conflict in Iraq and Syria involving the organisation commonly referred to as the Islamic State (aka ISIS or ISIL) is a 'Social Media War'. Members of ISIS have regularly exploited social media such as Twitter to recruit followers, spread propaganda, and as a weapon of terror. They have been able to exert enormous influence on how the world perceives them by posting images and videos of extreme violence.

 

Through their activities they have been highly successful in radicalising disenfranchised Muslim youths and inspiring numerous terrorist attacks around the world. The ISIS Twitter machine is highly organised and tech-savvy, making them difficult foe to combat. This problem is magnified by the dynamic nature of the ISIS social media network. Users are regularly suspended but continuously reconnect under new accounts, making them very difficult to track.

 

This visualisation shows a small part of the ISIS Twitter network, demonstrating the complexity of the social interactions and the difficulty faced in identifying and tracking individual people of interest. The problem is exacerbated by connections between ISIS members and news sources, political activists and academic researchers, all of which in turn have thousands of connections that are mostly benign.

 

The highlighted nodes in this visualisation represent a sample of users who all have a history of tweeting messages of hate and violence in support of the activities of ISIS. These users were identified using a sample of 33 supporters and members of ISIS, including known recruiters and propagandists. Graph analysis techniques were employed to analyse the connections between these radicals and their friends and followers, in order to identify the most influential users in the network. Advanced analytics can thus provide clarity where before there was only chaos and confusion. It allows us to study the social network connections of ISIS supporters and followers and ultimately identify the movers and shakers in their web.

 

About the Analytics

This Teradata Aster visualization shows a Gephi representation of an Aster AppCenter produced graph using the list of friends and followers of the 33 Twitter users who are members or supporters of ISIS and, in turn, their friends and followers. The total number of Twitter users analysed is 334,370, though only the top 10% of nodes are shown here. The plot uses eigenvector centrality to measure the influence of each user in the network, with the dots (nodes) representing the users and the lines (edges) indicating the other users each node is connected to. The highlighted nodes indicate the users that are known or suspected supporters/members of ISIS, while the highlighted edges trace the connections between these users and other users in the network.

 

About the Analyst

Unfortunately it is a sign of the times we live in that the analyst for this piece must remain anonymous. Analytics plays a vital role in society across a wide variety of sensitive topics such as helping maintain our peace, medical research, prevention of the spread of disease, border integrity and managing risk deep within our financial system. The analysts that support this work, by its nature, are often working in secrecy due to sensitivity or to the need to protect them from becoming targets. Although we are unable to credit the analyst in this case, we do appreciate their daily work and the role all such analysts provide behind the scenes on behalf of our global society.

 

 

 

Bloggers

Data Science Informative Articles and Blog Posts

Our blogs allows customers, prospects, partners, third-party influencers and analysts to share thoughts on a range of product and industry topics. Also, we have lots of content in the community; allowing you to gain valuable insights from Teradata data scientists.