Data Science - Quantitative Graph Measures on Complex Networks

Learn Data Science
Teradata Employee

Whether we are doing Social Network Analysis (SNA),  Supply Chain Optimization or solving a Routing Problem, Connection or Graph Analytics can provide solid insights into how networks function and can help make decisions on a complex mesh of relationships. The analysis can  tell us who the influential players & community connectors are, lynch pin nodes etc.,. Take airline routing for example. Imagine if you can draw a line between two cities if there is at-least 1 airline connection. The resultant map and/or graph would look like below :). The data is from Openflights.org and the visual was created by Tableau pointing to an Edge Table in Aster.

In this map/graph mashup, the vertices are the cities and the edges are direct flights. The edges also carry weights in this diagram. If there is only 1 carrier that flies between two cities then the edge weight is 1. In this graph the edge weights go up to 22 (Atlanta to New York has 22 different carriers). One could also construct the map/graph with # of miles, # of flights, # of passengers, freight volume etc., as edge weights. How about creating a graph per country or region ! See below for a US only airline route map/graph:

If we use # of miles as the length of the edges, the graph should mimic a geographical network (without require a map to support it).  But it doesn't have to be ...

Once we construct a graph after deciding if it's directed, undirected, edge weighted or not, we will be able to answer a wide range of questions like:

  1. What top N cities or metros are "central" to the network in terms of 'prominence', 'least # of hops from any city' ?
  2. What top N cities or metros are gateway nodes to connect communities farther from the center ?
  3. What top N airlines are similar to each other in terms of the common cities they fly to ?
  4. What top N airports are similar to each other in terms of the common airlines that connect to it ?
  5. What is the shortest # of hops between any two cities in the network ?
  6. What is the most optimal freight cost  between two cities ?

If the graph has really really small # of vertices and edges, one can visually answer these questions However even for a moderately small graph hairball like the airline network , you can see how difficult it is to answer the questions by just looking at it. This is where the quantitative graph measures come into play - even though you may guess the answers in this particular airline use case - always good to compare with the results the analytics finds.

Centrality Measures like EigenVectorCentrality, PageRank, Betweenness, Closeness etc., can be calculated real quickly by pointing a Aster SQL/GR function to a "distributed" edge table with a handful of params. These functions can also linearly scale with the # of nodes in the Aster cluster ..
Here are the results from a SQL/GR query that was run using US routes only:

To read more about the other measures and interesting results obtained from the graph, check out the slides from  Aster Community Portal. The results are from analytics on both US & Worldwide routes. The SQL/GR queries ran in couple of seconds to at-most a minute on views on an Aster table that had data for the three columns :

city1, city2, #of unique airlines connecting city1 and city2

Graph Measures can also be applied to graphs from social network data such as twitter feeds, emails, facebook connection data etc., to find key nodes of interest!