Learn Data Science

Learn Data Science
Looking Glass

Explore and Discover

Latest Articles, Videos and Blog Posts speak with those interested in analytics-based decision making. Browse all content in the Teradata Data Science Community to gain valuable insights.


Using analytic techniques that normally follow the "Customer Journey," Teradata Think Big consultants and data scientists use data and analytics to visualize & identify ‘The Human Journey,” allowing Buttle UK to identify and fulfill needs for at risk




Seattle has seven draw bridges that are frequently closed to traffic so that boats can enjoy the beautiful Pacific Northwest. These bridges have sensors that tweet every time they open or close giving us a well formatted dataset to explore and play with. In this post we begin by prepping and profiling the tweets from the last month.





Don’t miss out this opportunity to witness how the new Teradata Analytics Platform modernizes an analytics environment and drives insights that produce high-impact, trusted business outcomes. Register for the webinar on Wednesday, September 12, at 11am PT (2pm ET).


Register Now




About the Insights

This visualisation represents the detection of anomalous broker behaviours found by an insurance provider. The visual representation of the data highlights how quickly these anomalies become apparent when looking at connections in a graphical format.


The dots (nodes) represent quotes that are created by brokers using a platform provided by the insurer. Links between nodes indicate quotes that are associated, i.e. a broker used a previously generated quote (node) to build a new quote (linked node) after making some changes. Typical broker behavior indicates that once a broker has generated a quote, it would only be accessed and refreshed if the quote lifespan ends before a customer has taken a decision to accept the quote. The two clusters in the centre (bluish) depict anomalous behavior, where a broker is continuously returning and refreshing the same quote after changing a small number of attributes on that quote. This indicates the broker is gaming the insurer's system in an attempt to determine how the pricing engine works. This is undesired behavior and a fraudulent use of the insurer's system.


The goal of this analysis was to identify how broker's use the insurer's system and understand positive broker behaviours that lead to product sales. The aim was to identify how the system could be improved to support brokers and provide a better experience, as well as find preferential behaviours that support the insurer’s business and could be promoted to less successful brokers. This fraudulent finding was a byproduct of this analysis. The insurer can use this visual as evidence when holding follow-up conversations with the broker involved.


About the Analytics

This sigma visualization depicts analysis of data generated by a platform provided by an insurer for their brokers. This system logs all actions carried out by a broker on the platform. The initial part of the analysis involved identification of broker sessions on the platform and matching of sessions to a specific broker and customer. Within these sessions, this analysis focused on the quote related actions logged by the broker platform. These actions were captured and modeled as nodes.


Each node represents a quote generated for a customer in a distinct session. Links were created between nodes if the broker accessed the same quote and generated a refreshed quote in a new session. Graph analysis identified two large unexpected clusters of highly interconnected nodes that were anomalous from the other nodes in the dataset.


About the Analyst

Yasmeen is one of the most creative and insightful Data Scientists at Teradata. Yasmeen grew up in Scotland, where she enjoys the great outdoors, in particular hiking the Scottish Munros and sea kayaking.


Her work has seen her traverse many countries, including the UK, Ireland, Netherlands Turkey, Belgium and Denmark where she covers the finance, telecommunications, retail and utilities industries. Yasmeen specializes in working with businesses to identify their challenges and translate them into an analytical context. She has a unique ability to focus on how businesses can leverage new or untapped sources of data, alongside novel techniques, to enhance their competitive capabilities.


Yasmeen has worked with many analytical teams, providing leadership, training, guidance and hands-on support to deliver actionable insights and business outcomes. She uses various analytical approaches, including text analytics, predictive modelling, development of attribution strategies and time series analysis. She believes strongly in the power of visualizations and their ability to communicate complex findings to business users in a way that makes taking action easy.


Prior to Teradata, Yasmeen worked as a Data Scientist in the life sciences industry, building analytical pipelines for complex, multi-dimensional data types. Yasmeen also holds a PhD in Data Management, Mining and Visualization, which was carried out at the Wellcome Trust Centre for Gene Regulation & Expression. She has published several papers internationally and is a speaker at International conferences and events. In addition she has taught on MSc courses related to Data Science and Business Intelligence.


Yasmeen developed a keen passion for data analytics and visualization through her studies, having always been curious to ask questions and learn more. These skills have allowed Yasmeen to explore many opportunities in multiple disciplines, providing her with an endless world of new challenges!


Combining the collaborative expertise of data scientists, geophysicists and data visualization an integrated oil company developed new understandings of complex reservoir management with data and analytics. This business case easily transcends multiple industries focused on asset utilization and optimization.






About the Insights

The mobile phones that we use everyday and carry around everywhere with us, create huge amounts of data that trace the daily patterns of our behavior. The interactions we have with others through calls or messages map out our social relationships, business dealings and interactions with the wider community as complex interconnected circles of calls.


This data visualization is created using mobile phone subscriber calling patterns. Each dot (or node) represents a phone number that is called by a subscriber, the larger the node size the more often it is called. The lines (or edges) between nodes represent a call from one number to another.


Each subscriber will have a unique calling pattern that can be used to develop pricing plans, identify him or her and can even predict his or her behavior. For instance a subscriber that is in the process of switching to a different network provider will show up as two similar patterns one from an on-net number and one from an off-net number.


This particular chart was produced at the early stage in a series of analytics and was used to filter out the first level of calling patterns types. The data used here represents a very short period of time, just a few seconds. We can see at the top right-hand side of the graph large loops that show numbers, which have been called many times in this short period. These are likely to be machines, such as the auto dialer systems that use pre recorded messages when answered, Interactive Voice Response (IVR) systems, security systems and alarms. Humans would not be able to make so many calls so quickly. These numbers were isolated out as a separate segment and subsequent analysis focused in on the detailed individual human calling patterns.


About the Analytics

This visualization shows a representation of a graph, although the layout parameters have been used to create a format that is unlike those typical used to display graphs. An issue commonly faced in this area is that the connected graphs quickly become huge and are almost impossible to visualize due the sheer number of callers and interactions. To take a sample from a highly connected graph is a difficult problem, as we need to decide which connections to ignore. In this case a very short period of time is used to cut down the output to a manageable size.


The underlying data format is rather simple, calling number, called number, time of day and duration. The data is first clustered using a machine-learning algorithm to create the groups and then displayed as a graph using Aster Lens.


About the Analyst

Christopher Hillman is based in London UK with his wife and two kids and is a Principal Data Scientist in the Advanced Analytics team at Teradata travelling extensively in the International Region.


His passion for analytics spans 20 years of experience working in the business intelligence and advanced analytics industries. Prior to Teradata Chris specialized in the Retail and CPGN vertical, working as Solution Architect, Principal Consultant and Technology Director. Chris currently works together with the Teradata Aster Centre of Expertise and is involved in start-up analytics for Big Data projects helping customers to unlock insights from their data including understanding where MapReduce or SQL is an appropriate technique to use.


As well as working for Teradata, Christopher is currently studying part-time for a PhD in Data Science at the University of Dundee applying Big Data analytics to the data produced from experimentation into the Human Proteome. His research area involves real-time analysis of Mass Spectrometer data using Parallel algorithms. Part of his duties at the University include lecturing on Hadoop and MapReduce coding.




           AI started a race in the automotive industry. This race will change the whole industry. Souma Das, Teradata India, talks about the importance of Analytics and AI in this race.


To learn more, click HERE


hurdle 2.jpg


The most difficult aspect to realizing success with analytics has been, and continues to be, the organizational challenges in getting stakeholders to embrace an innovative mindset.


To learn more, click HERE




About the Insights

As of Jan 2012 there were roughly 60K direct flights between 3000+ airports with 500+ airlines recorded on the open source website OPENFLIGHTS.ORG.


Seen through the lens of advanced analytics, the different airline carriers of the world appear together like a beautiful nebula (interstellar cloud formations). Similar color groupings of nodes with thick edges provide insight into airlines that has common routes exposing competition and also potential synergies in different regions.


This Sigma graph based data visualization shows airline carrier similarity measured by the common cities they serve. The nodes or circles in the graph are the airline carriers and the edge thickness and proximity of the nodes are indicative of the degree of similarity. The thicker the edge and closer the nodes, the more cities the carriers serve in common. This visual has multiple clusters of airlines which intuitively maps it to geographical regions they serve. Some of the key insights in the visual is the similarities or overlap between China Southern and China Eastern Airlines, Emirates and Qatar, British Airways and Lufthansa, American and Delta — indicating a competitive situation. Ryan Air seems to have carved a niche by serving cities with potential synergies with Lufthansa and British Airways. Air France has more similarity with US carriers like United compared to other European carriers like Alitalia, Lufthansa etc.,— probably can be explained by co-branding. In essence the visualization is a multi-dimensional Venn Diagram that exposes the complex relationships rather succinctly.


Overall the graph allows to study the similarities in the competition with other players for a potential partnership or to grow market share and coverage. Similar insights can be developed for any problem that involves multiple players in an ecosystem with common variables they touch.


About the Analytics

The visualization was created in Aster App Center. The analytics falls into the category of associative mining where we look at co-occurrences of items within a context. The associative mining algorithm that was used is Collaborative Filtering — unleashed on the airline and city data which was treated like retail basket data. The basket would be the city and the airline carriers would be the items. The commonality of any two airlines is determined by a score which considers what cities any two airlines fly into independently by itself vs what's common. The pair-wise affinity score is then treated as an edge weight with the two airlines treated as nodes, which is fed into a visualizer to create beautiful clusters using the force-atlas algorithm with modularity coloring.


About the Analyst

Karthik Guruswamy is based in San Francisco Bay Area and lives with his wife Vidhya and two daughters. Karthik works as a Principal Consultant with Big Data & Advanced Analytics, Americas for Teradata.


Karthik's passion for Data Science, Analytic spans 25+ years starting out as RDBMS developer in Informix. Karthik has worked with several startups in silicon valley as Data and Server Architect. Karthik joined Teradata through the acquisition of Aster Data where he was a Senior Consultant working with almost all of Aster's marquee customers. While in Teradata & Aster Professional Services, Karthik was engaged with social media customers such as Linkedin and Edmodo. Most recently Karthik has been advising Fortune customers such as Dell, Big Automotive, Overstock and Wells Fargo Bank.


Karthik specializes in MPP / Map Reduce / Graph and works to unravel hidden patterns in customer data and create powerful visualizations to bring the insight to the business users. Karthik uses a wide variety of algorithms around Time Series & Pattern Detection, Data Mining, Machine Learning, Neural Nets, Text Disambiguation and Statistical Analysis in his projects. Karthik is also a data science blogger in Linkedin. He has written a number of blog posts on data science concepts, primarily targeted to a business audience.





About the Insights

This visualization captures the journey of Sundara Raman as he rides the commuter train corridors in Sydney, Australia. Armed with his mobile phone and special software, Sundara's train ride through Sydney can be traced via his mobile phone cell tower connections, represented by the colored dots (or nodes) on the chart, as his train hurtles through the city


Its part of a new form of analytics that uses mobile phone data to study the traffic patterns caused by movement and mass congregations of people. Its primary purpose is to optimize the cell tower network to avoid performance issues and improve customer experience. However, it also supports emerging data monetization initiatives where detailed traffic flows can be used for urban planning, retail store location analysis and marketing offers.


In this analysis Sundara is looking for cell signal 'storms' that can overwhelm towers and impact performance. As crowded commuter trains run down the lines and pool at stations, they send out 100's to 1000's of signals that move rapidly across towers and can overwhelm them. This visualization is part of a series of charts that overlay tower performance data, commuter traffic volumes and tower hand offs to pinpoint cell signal 'storm surges' enabling detailed recommendations to optimize the network.


The chart also highlights specific customer experience issues caused by the transfer between 4G cell towers (darker shade dots) and lower speed 3G cell towers (lighter shade dots) and 'ping pong' impacts from the to-ing and fro-ing of signals between towers, represented by close clusters of connected towers near Lindfield, Killara, Waitara, North Sydney and Chatswood stations.


About the Analytics

This visualization was created using Teradata Aster and Aster Lens. Smartphone signaling data was collected from simultaneous use of 3G and 4G mobile phones using special purpose software when travelling on crowded public transport lines along North Shore and Strathfield Lines in Sydney, Australia. Geospatial analytics were included using the geo location data for train stations and cell towers to isolate the cell towers located within a 1km radius of the train stations.


This approach was used to measure the impact of signal propagation among cell towers within a short defined range from train stations. Color codes were added to the sigma chart on the GEXF file using Visual Basic scripts to uniformly distinguish between 4G and 3G cell tower areas. Each color signifies the network coverage area to which a group of cell towers belong. Statistics published by Sydney City Rail, covering peak time train traffic loads for each train station, were used to correlate cell site performance.


About the Analyst

Sundara is a Senior Telecom Industry Consultant by day and an aspiring Data Scientist by night. He has a Master's degree in Business and Administration from Massey University, New Zealand. He lives in Sydney, Australia with his wife and two children.


Sundara is an inventor and a joint holder of an Australian patent with his wife on Computer Assisted Psychological Assessment and Treatment that applies the principles of Cognitive Behavioural Therapy (CBT). So now, if during your next daily commute you happen to catch a glimpse of Sundara, juggling his multiple mobile phones, then you will know he is not crazy. He is just using analytics to gain insights that can help his Telecom clients improve their mobile network customer experience.

Top Kudoed Authors

Data Science Informative Articles and Blog Posts

Our blogs allows customers, prospects, partners, third-party influencers and analysts to share thoughts on a range of product and industry topics. Also, we have lots of content in the community; allowing you to gain valuable insights from Teradata data scientists.