Learn Data Science

Learn Data Science
Looking Glass

Explore and Discover

Latest Articles, Videos and Blog Posts speak with those interested in analytics-based decision making. Browse all content in the Teradata Data Science Community to gain valuable insights.

82 Views
0 Comments

GenomeWorldWindow-StephenBrobst-Web-650.png

 

About the Insights

This data visualization shows genetic variations (and similarities) across multiple human populations and geographies using data from the 1000 Genomes Project. Each frame shows a different community or geography within the 1000 Genome Project and was built from the pure genome data. The observer can clearly see the variations between communities proving that large scale genome data provides clear insight into geographic communities across the globe. The goal of the project is to prove the value of large scale genome analytics using high intensity super graphic methods to better understand the genetic patterns of cancer and how to develop personalized medical treatments aligned to genetic composition of individuals.

 

About the Analytics

This visualization shows a collection of Quartal Super Graphics created by VizExplorer sitting on top of a Teradata relational database using query pushdown for large scale data processing.

 

The large scale processing starts with applying the quartal tree algorithm, using an in-database recursive algorithm, that processes the entire 1000 genome population positional information into a common hierarchical quartal grid. A database query is then used to build the subset of data for each of the corresponding communities within the total population. The subset of data is used to render a heatmap shown on each frame. Finally the frames are assembled into a graphic made up of 'small multiples' so the pattern of sequence data can be observed across the communities in the entire 1000 Genome Project. Genomic data is extremely large in size: a database of just 25,000 tumors implies over seventy-five trillion data records.

 

About the Analyst

Andrew is the Chief Technology Officer for VizExplorer. He holds a Bachelor of Surveying from Otago University and a Diploma of Computer Science from Victoria University. He is a cartographer by training and has created over 60 patents and inventions in the areas of cartography, data visualization, and high performance database design. He and his team have been awarded two Smithsonian laureates for heroism in information technology related to data visualization. Andrew has co-authored a book on mathematical gaming analytics as well as over sixty articles in areas related to data visualization and advanced analytics. Andrew was born and raised in the South Island of New Zealand. He now lives in California with his wife and four children.

 

Stephen Brobst

 

Stephen is the Chief Technology Officer for Teradata Corporation. Stephen performed his graduate work in Computer Science at the Massachusetts Institute of Technology where his Masters and PhD research focused on highperformance parallel processing. He also completed an MBA with joint course and thesis work at the Harvard Business School and the MIT Sloan School of Management. During Barack Obama's first term he was also appointed to the Presidential Council of Advisors on Science and Technology (PCAST) in the working group on Networking and Information Technology Research and Development (NITRD). He was recently ranked by ExecRank as the #4 CTO in the United States (behind the CTOs from Amazon.com, Tesla Motors, and Intel) out of a pool of 10,000+ CTOs.

 

Stephen is the data guy. Andrew is the visualization guy. Together they have been teaching advanced data visualization for over ten years at The Data Warehousing Institute and in other forums. Included in this course is a deep examination of the patterns in the genomics super graphic. Stephen and Andrew are both avid fans of the outdoors and have backpacked together in New Zealand and across the USA.

104 Views
0 Comments

FundingFountains-QilingShi-Web-650.png

 

About the Insights

This anonymized visualization is one of a series of analytics mapping the money flows between large Chinese companies for a Corporate Banking Risk Analysis project at a large Chinese bank. The analysis uses Fund Transfer transaction data to understand risk and uncover market opportunities.

 

In this graph, the dots (nodes) represent the companies, via their account holdings. The lines (edges) represent a transfer of funds between the companies, so each line shows a movement of money from one account to another.

 

The charts shows all the money flows between the different colored companies. We can map flows through 2,3 and 4 subsequent transactions, such as the light green company, to understand upstream supply chains and the interdependency companies have on each other.

 

To manage risk, the bank can identify any large exposure concentrations to groups of highly interdependent companies, where a single failure may bring down all the companies. It allows the bank to identify the critical companies in the supply chains and independently cross check a company's cash flow to verify its health. It also helps identify fraud. The bank can check the true business activity of a company and can verify that loaned funds are used for their stated purpose. For example a manufacturer that is investing speculative funds in the stock market rather than paying suppliers or who took out a loan to build a factory but really used the funds for short-term residential real estate trades.

 

For marketing it highlights gaps in the banks servicing. Where high volumes of funds flow out (or in) to the chains identifies high value prospect companies. For existing clients it reveals any high value gaps in service provision for wider financial services such as financing, clearing and risk management.

 

About the Analytics

This analysis uses Teradata Aster and Aster Lens. The transaction data loaded was very large in size: 60,802,990 records for over 670,000 companies. The company records contain industry classification codes so we can understand their business activity. For this chart PageRank was used to select the top 32 important customers and we included all the relevant counterparties with total transactions greater than or equal to CNY 700,000. (USD 115k).

 

In this graph, there are 3883 nodes and 3943 edges. The nodes represent the companies while the edges represent the cash flows between the companies.

 

About the Analyst

Qiling (Mary) Shi is part of a pioneering group of Chinese Data Scientists that have been partnering with the banks in China to experiment with large-scale risk analytics using high intensity super graphic methods. Their goal is to uncover new ways of managing risk in Chinas highly complex commercial system. Their work on corporate customers, including 'Fund Fountains', is just one example of a series of innovations that this talented group have given to the wider banking world, to help de-risk our financial systems.

 

Qiling is a presale consultant for Teradata China's Aster & Hadoop Big Data Center Of Excellence (COE). Qiling got her PhD degree in Applied Mathematics at University of Central Florida. She is currently doing her MBA part-time at the University of Delaware. Prior to Teradata, she worked in the risk management department of PNC Bank in Pittsburgh for over 2 years. During that time, she developed many algorithms to fight fraud and money laundering; several of which were reported to the Office of Currency Controller by PNC. She has also developed and published computer programs in SAS conferences while working for the Computer Sciences Corporation.

 

148 Views
0 Comments

ExtremeNetworking-Anonymous-Web-650.png

 

About the Insights

If the Vietnam war was a 'Television War', the on-going conflict in Iraq and Syria involving the organisation commonly referred to as the Islamic State (aka ISIS or ISIL) is a 'Social Media War'. Members of ISIS have regularly exploited social media such as Twitter to recruit followers, spread propaganda, and as a weapon of terror. They have been able to exert enormous influence on how the world perceives them by posting images and videos of extreme violence.

 

Through their activities they have been highly successful in radicalising disenfranchised Muslim youths and inspiring numerous terrorist attacks around the world. The ISIS Twitter machine is highly organised and tech-savvy, making them difficult foe to combat. This problem is magnified by the dynamic nature of the ISIS social media network. Users are regularly suspended but continuously reconnect under new accounts, making them very difficult to track.

 

This visualisation shows a small part of the ISIS Twitter network, demonstrating the complexity of the social interactions and the difficulty faced in identifying and tracking individual people of interest. The problem is exacerbated by connections between ISIS members and news sources, political activists and academic researchers, all of which in turn have thousands of connections that are mostly benign.

 

The highlighted nodes in this visualisation represent a sample of users who all have a history of tweeting messages of hate and violence in support of the activities of ISIS. These users were identified using a sample of 33 supporters and members of ISIS, including known recruiters and propagandists. Graph analysis techniques were employed to analyse the connections between these radicals and their friends and followers, in order to identify the most influential users in the network. Advanced analytics can thus provide clarity where before there was only chaos and confusion. It allows us to study the social network connections of ISIS supporters and followers and ultimately identify the movers and shakers in their web.

 

About the Analytics

This Teradata Aster visualization shows a Gephi representation of an Aster AppCenter produced graph using the list of friends and followers of the 33 Twitter users who are members or supporters of ISIS and, in turn, their friends and followers. The total number of Twitter users analysed is 334,370, though only the top 10% of nodes are shown here. The plot uses eigenvector centrality to measure the influence of each user in the network, with the dots (nodes) representing the users and the lines (edges) indicating the other users each node is connected to. The highlighted nodes indicate the users that are known or suspected supporters/members of ISIS, while the highlighted edges trace the connections between these users and other users in the network.

 

About the Analyst

Unfortunately it is a sign of the times we live in that the analyst for this piece must remain anonymous. Analytics plays a vital role in society across a wide variety of sensitive topics such as helping maintain our peace, medical research, prevention of the spread of disease, border integrity and managing risk deep within our financial system. The analysts that support this work, by its nature, are often working in secrecy due to sensitivity or to the need to protect them from becoming targets. Although we are unable to credit the analyst in this case, we do appreciate their daily work and the role all such analysts provide behind the scenes on behalf of our global society.

 

 

 

163 Views
0 Comments

EmploymentFlares-TatianaBokareva-Web-650.png

 

About the Insights

This data visualization represents claims made by a Service Provider against an Employer. The nodes in the middle of each small "explosion" represent the Service Provider, the nodes at the periphery represent Employers and the edges between them represent the relationships. The thickness of the edge is proportional to the value claimed.

 

The visualization was used to look at the relationship structures between service providers and employers. Service Providers help people find employment and also provide job seekers with ongoing support to retain their jobs. To be effective and to provide personalized and flexible services to job seekers, a Service Provider would typically have strong connections with a large number of Employers.

 

The visualization was used to look for unusual type of connections. For example:

  • An isolated group where a Service Provider is connected to many Employers but in a network separated from the rest of the graph.
  • One Service Provider connected to only one employer.
  • Loops where a Service Provider is also an employer.

Driver behind this business case is a government body, the Department of Employment, which is responsible for monitoring the way employment services are delivered. The providers liaise with local employers and registered training organizations, to provide the right mix of support for job seekers. The goal of this project was to investigate significant and systemic non-compliance that exists in the claims made.

 

About the Analytics

This visualization shows a network graph created using Teradata Aster Lens. Claims data from the Department of Employment were loaded into the Teradata Aster discovery platform.

 

Claims were classified then tested for veracity by chronology, geolocation and variation; and analysed longitudinally for processing and event anomalies. Network graphs were generated to observe patterns of collusion. This provided a quick way to see which Service Providers claimed money against which Employers.

 

The visualization was also used for comparison between different time periods. Similar graphs can be constructed on a periodic basis to see if new isolations or patterns appear over time in the network.

About the Analyst

Tatiana Bokareva is a Data Scientist with the Teradata Advanced Analytics team in Australia and New Zealand. Originally from Moscow, Tatiana now lives in Sydney with her 2 young Australians and she works with key clients in the financial services, government and telecommunications industries. Tatiana is a Teradata blogger and particularly enjoys combining her passion for analytics with her practical, hands on experience in fashion and retail! She is published extensively and has presented at many international conferences.

 

Tatiana has a Bachelor degree with Honours and PhD in Computer Science from the University of New South Wales. Tatiana was runner up for the Dean Postgraduate Research Award in Information and Communication sector. She was awarded the prestigious Women in Engineering Scholarship. During her PhD she was trained at NICTA (National ICT Australia).

 

Tatiana's PhD was in the area of the Internet of Things (IoT), namely her work evolved around building self-reliant, self-healing, fault tolerant sensor networks. She also held part time research appointments at the university.

 

 

 

225 Views
0 Comments

ConnectedNetworks-YasmeenAhmad-Web-650.png

 

About the Insights

This anonymized visualization was created for a Telco operator analyzing residential Telco lines. The project aimed to identify linkages between line and network hardware performance that may impact customer experience.

 

The dots (nodes) represent DSLAM (Digital Subscriber Line Access Multiplexer) on the Telco's network. DSLAM provide a vital service that can impact customer call experience; they connect customer lines to the main network. DSLAM service levels were measured by metrics, such as attenuation, bit rate, noise margin and output power, and clustered into three performance categories for each line. The purple nodes show DSLAM with excellent performance, orange: good performance and white: poor performance.

 

In the chart only a small number of DSLAMs experienced a high quality of service (purple). These DSLAM were co-located in the same building as the main network infrastructure, hence their proximity to the central network hub results in a premium service. The majority of customers achieve a good experience (orange), however there are a large number of DSLAM delivering a poor service (white) that were found to be located outside of the main city.

 

Customer experience and satisfaction suffers most when customers receive variable network quality. The Telco's primary concern is to ensure customers receive a consistent experience, even if that may be consistently poor due to their location is outside of the main city. The chart pinpoints every DSALM that delivers variable service levels; represented by the shared nodes between the good (orange) and poor (white) clusters. Armed with this data the Telco can now investigate and optimize the variable DSLAM.

 

About the Analytics

This sigma visualization was created using the in-built analytics and visualizations found in the Teradata Aster platform.

 

Data attributes from residential lines across the city were gathered, such as attenuation, bit rate etc. These attributes were clustered to identify performance bands indicating customer network experience.

 

These clusters formed a basis for correlation and regression analyses to determine how the network performance varied in conjunction with factors such as: line technology and length, modem type and configuration, DSLAM, card technology, geographic location etc.

 

The sigma visualization shows only one part of the overall analysis, namely the linkage between DSLAM'’s and network performance.

 

About the Analyst

Yasmeen is one of the most creative and insightful Data Scientists at Teradata. Yasmeen grew up in Scotland, where she enjoys the great outdoors, in particular hiking the Scottish Munros and sea kayaking.

 

Her work has seen her traverse many countries, including the UK, Ireland, Netherlands Turkey, Belgium and Denmark where she covers the finance, telecommunications, retail and utilities industries. Yasmeen specializes in working with businesses to identify their challenges and translate them into an analytical context. She has a unique ability to focus on how businesses can leverage new or untapped sources of data, alongside novel techniques, to enhance their competitive capabilities.

 

Yasmeen has worked with many analytical teams, providing leadership, training, guidance and hands-on support to deliver actionable insights and business outcomes. She uses various analytical approaches, including text analytics, predictive modelling, development of attribution strategies and time series analysis. She believes strongly in the power of visualizations and their ability to communicate complex findings to business users in a way that makes taking action easy.

 

Prior to Teradata, Yasmeen worked as a Data Scientist in the life sciences industry, building analytical pipelines for complex, multi-dimensional data types. Yasmeen also holds a PhD in Data Management, Mining and Visualization, which was carried out at the Wellcome Trust Centre for Gene Regulation & Expression. She has published several papers internationally and is a speaker at International conferences and events. In addition she has taught on MSc courses related to Data Science and Business Intelligence.

 

Yasmeen developed a keen passion for data analytics and visualization through her studies, having always been curious to ask questions and learn more. These skills have allowed Yasmeen to explore many opportunities in multiple disciplines, providing her with an endless world of new challenges!

256 Views
0 Comments

Fusing business acumen, data science, and creative visualization, the Burning Leaf of Spending enabled a major bank to detect anomalies in customer spending patterns that indicate major life events, and provided artful insights into the personalized service required to enhance the customer experience, improving lifetime value.

 

 

240 Views
0 Comments

TheLeaf-AlexanderHeidl-Web-650.png

 About the Insights  
'The Leaf' fuses real life imagery with a data visualization to provide a vivid demonstration of where the future of analytics may be going. As technology improves both the graphics and the speed and ease with which data can be visualized, one emerging form is using real life imagery to replace the technical diagrams of the past.

 

The implications are huge. Free of imposing technical diagrams, visualizations using real life imagery allow insights to be easily consumed by anyone, even small children. Marketers can translate product benefits using real life representations. For example, showing farmers the physical benefits of fertilizers and chemical protectants by using real life images of their farms with the different crop growth they can achieve, may translate a sales message with a remarkable insight not many farmers would getfrom graphs alone.

 

The Leaf image was created using Kailash Purang's 'Single Malt Sampler' data set. In this graph the dots (nodes) that form the spine of the leaf are the whisky brands, similar tasting whiskies appear closest together. The lines (edges) link each brand to other brands, which share a flavour characteristic. The result was this near perfect leaf image.

 

Thus 'The Leaf' adapts to what Kevin Slavin refers to in his brilliant TED talk about a world run by algorithms — it is a metaphor to encourage us to think about data and maths from a contemporary point of view.

 

About the Analytics
The underlining data set has been extracted from the Teradata Aster Lens environment and processed with Gephi; an open source tool for visual data analysis and exploration.

 

"The Leaf" applies a Radial Axis Layout, which distributes the nodes on linear axes radiating from a circle. Grouping and ordering the nodes on an axis by degree produces the straight line of nodes along the centre of the graph (leaf). The actual leaf is then automatically drawn by curved edges between the nodes and applying a greenish colour range to nodes and edges. Et voilá, here is "The Leaf" shown in the bottom right of the picture.

 

The single leaf created by the data visualization was added to the real world photograph of the plant using Photoshop. This allows us to see how life like the digitally created leaf appears next to the real world leaves of the plant.

 

About the Analyst
Alexander is a founding contributor to The Art Of Analytics project. He has an unusually strong design eye matched with the technical proficiency to manipulate complex analytical images to emphasise their insights. Alexander is the producer of all The Art Of Analytics images, working with Teradata's Data Scientist Community. He specializes in manipulating Aster Lens and Gephi images to produce the exceptional high quality, high resolution 'Art' pieces found in the collection.

 

Alexander is currently based in Zurich, having grown up near Frankfurt, Germany and graduated from Kingston University in London.

 

Shortly after, he began his analytics career working as a Business Intelligence Project Manager across various industries and geographical regions. During this time Alexander developed a keen understanding of the importance different visual imagery can have on the ability to effectively communicate a message. In particular, when dealing with mixed audiences, no matter the organizational hierarchy, expertise level or language skills; he found that pictures and visualizations were instrumental in forming a common understanding among the audience. Thus Alexander took an early interest in the importance of the form and structure of the different visual elements that aid communication. Today Alexander is working as a cross industry Account Executive for Teradata in Switzerland, looking after and supporting a variety of Teradata customers as well as prospects. His passion for visual representation plays a major role in his current job, as he shares complex concepts and analytic insights with his clients.

 

And when Alexander is not out and about with his customers or prospects, or working late at night creating amazing pieces of 'Analytic Art' you might find him cruising on his motorbike through the Alps or traveling the world with his camera — always on the hunt for the next geocache and picture.

 

 

256 Views
0 Comments

Using analytic techniques that normally follow the "Customer Journey," Teradata Think Big consultants and data scientists use data and analytics to visualize & identify ‘The Human Journey,” allowing Buttle UK to identify and fulfill needs for at risk

 

 

286 Views
0 Comments

TrappingAnaomalies-YasmeenAhmad-Web-650.png

 

About the Insights

This visualisation represents the detection of anomalous broker behaviours found by an insurance provider. The visual representation of the data highlights how quickly these anomalies become apparent when looking at connections in a graphical format.

 

The dots (nodes) represent quotes that are created by brokers using a platform provided by the insurer. Links between nodes indicate quotes that are associated, i.e. a broker used a previously generated quote (node) to build a new quote (linked node) after making some changes. Typical broker behavior indicates that once a broker has generated a quote, it would only be accessed and refreshed if the quote lifespan ends before a customer has taken a decision to accept the quote. The two clusters in the centre (bluish) depict anomalous behavior, where a broker is continuously returning and refreshing the same quote after changing a small number of attributes on that quote. This indicates the broker is gaming the insurer's system in an attempt to determine how the pricing engine works. This is undesired behavior and a fraudulent use of the insurer's system.

 

The goal of this analysis was to identify how broker's use the insurer's system and understand positive broker behaviours that lead to product sales. The aim was to identify how the system could be improved to support brokers and provide a better experience, as well as find preferential behaviours that support the insurer’s business and could be promoted to less successful brokers. This fraudulent finding was a byproduct of this analysis. The insurer can use this visual as evidence when holding follow-up conversations with the broker involved.

 

About the Analytics

This sigma visualization depicts analysis of data generated by a platform provided by an insurer for their brokers. This system logs all actions carried out by a broker on the platform. The initial part of the analysis involved identification of broker sessions on the platform and matching of sessions to a specific broker and customer. Within these sessions, this analysis focused on the quote related actions logged by the broker platform. These actions were captured and modeled as nodes.

 

Each node represents a quote generated for a customer in a distinct session. Links were created between nodes if the broker accessed the same quote and generated a refreshed quote in a new session. Graph analysis identified two large unexpected clusters of highly interconnected nodes that were anomalous from the other nodes in the dataset.

 

About the Analyst

Yasmeen is one of the most creative and insightful Data Scientists at Teradata. Yasmeen grew up in Scotland, where she enjoys the great outdoors, in particular hiking the Scottish Munros and sea kayaking.

 

Her work has seen her traverse many countries, including the UK, Ireland, Netherlands Turkey, Belgium and Denmark where she covers the finance, telecommunications, retail and utilities industries. Yasmeen specializes in working with businesses to identify their challenges and translate them into an analytical context. She has a unique ability to focus on how businesses can leverage new or untapped sources of data, alongside novel techniques, to enhance their competitive capabilities.

 

Yasmeen has worked with many analytical teams, providing leadership, training, guidance and hands-on support to deliver actionable insights and business outcomes. She uses various analytical approaches, including text analytics, predictive modelling, development of attribution strategies and time series analysis. She believes strongly in the power of visualizations and their ability to communicate complex findings to business users in a way that makes taking action easy.

 

Prior to Teradata, Yasmeen worked as a Data Scientist in the life sciences industry, building analytical pipelines for complex, multi-dimensional data types. Yasmeen also holds a PhD in Data Management, Mining and Visualization, which was carried out at the Wellcome Trust Centre for Gene Regulation & Expression. She has published several papers internationally and is a speaker at International conferences and events. In addition she has taught on MSc courses related to Data Science and Business Intelligence.

 

Yasmeen developed a keen passion for data analytics and visualization through her studies, having always been curious to ask questions and learn more. These skills have allowed Yasmeen to explore many opportunities in multiple disciplines, providing her with an endless world of new challenges!

297 Views
0 Comments

Combining the collaborative expertise of data scientists, geophysicists and data visualization an integrated oil company developed new understandings of complex reservoir management with data and analytics. This business case easily transcends multiple industries focused on asset utilization and optimization.

 

 

Bloggers
Top Kudoed Authors

Data Science Informative Articles and Blog Posts

Our blogs allows customers, prospects, partners, third-party influencers and analysts to share thoughts on a range of product and industry topics. Also, we have lots of content in the community; allowing you to gain valuable insights from Teradata data scientists.