Art of Analytics: Air Nebula

The best minds from Teradata, our partners, and customers blog about relevant topics and features.
Teradata Employee



About the Insights

As of Jan 2012 there were roughly 60K direct flights between 3000+ airports with 500+ airlines recorded on the open source website OPENFLIGHTS.ORG.


Seen through the lens of advanced analytics, the different airline carriers of the world appear together like a beautiful nebula (interstellar cloud formations). Similar color groupings of nodes with thick edges provide insight into airlines that has common routes exposing competition and also potential synergies in different regions.


This Sigma graph based data visualization shows airline carrier similarity measured by the common cities they serve. The nodes or circles in the graph are the airline carriers and the edge thickness and proximity of the nodes are indicative of the degree of similarity. The thicker the edge and closer the nodes, the more cities the carriers serve in common. This visual has multiple clusters of airlines which intuitively maps it to geographical regions they serve. Some of the key insights in the visual is the similarities or overlap between China Southern and China Eastern Airlines, Emirates and Qatar, British Airways and Lufthansa, American and Delta — indicating a competitive situation. Ryan Air seems to have carved a niche by serving cities with potential synergies with Lufthansa and British Airways. Air France has more similarity with US carriers like United compared to other European carriers like Alitalia, Lufthansa etc.,— probably can be explained by co-branding. In essence the visualization is a multi-dimensional Venn Diagram that exposes the complex relationships rather succinctly.


Overall the graph allows to study the similarities in the competition with other players for a potential partnership or to grow market share and coverage. Similar insights can be developed for any problem that involves multiple players in an ecosystem with common variables they touch.


About the Analytics

The visualization was created in Aster App Center. The analytics falls into the category of associative mining where we look at co-occurrences of items within a context. The associative mining algorithm that was used is Collaborative Filtering — unleashed on the airline and city data which was treated like retail basket data. The basket would be the city and the airline carriers would be the items. The commonality of any two airlines is determined by a score which considers what cities any two airlines fly into independently by itself vs what's common. The pair-wise affinity score is then treated as an edge weight with the two airlines treated as nodes, which is fed into a visualizer to create beautiful clusters using the force-atlas algorithm with modularity coloring.


About the Analyst

Karthik Guruswamy is based in San Francisco Bay Area and lives with his wife Vidhya and two daughters. Karthik works as a Principal Consultant with Big Data & Advanced Analytics, Americas for Teradata.


Karthik's passion for Data Science, Analytic spans 25+ years starting out as RDBMS developer in Informix. Karthik has worked with several startups in silicon valley as Data and Server Architect. Karthik joined Teradata through the acquisition of Aster Data where he was a Senior Consultant working with almost all of Aster's marquee customers. While in Teradata & Aster Professional Services, Karthik was engaged with social media customers such as Linkedin and Edmodo. Most recently Karthik has been advising Fortune customers such as Dell, Big Automotive, Overstock and Wells Fargo Bank.


Karthik specializes in MPP / Map Reduce / Graph and works to unravel hidden patterns in customer data and create powerful visualizations to bring the insight to the business users. Karthik uses a wide variety of algorithms around Time Series & Pattern Detection, Data Mining, Machine Learning, Neural Nets, Text Disambiguation and Statistical Analysis in his projects. Karthik is also a data science blogger in Linkedin. He has written a number of blog posts on data science concepts, primarily targeted to a business audience.