Aster Database includes a native graph processing engine (SQL-GR) designed to perform large-scale graph analysis on distributed graph (or network) structures. This data still lives in Aster database and, hence, is also available for processing with SQL and SQL-MR functions. In combination with SQL-GR it makes Aster a powerful platform for the best of breed approach to solving analytical problems on graphs and other network-like big data.
Aster SQL-GR functions offer wide range of graph analytics including analysis, discovery, and navigation. The names of the functions though sometimes look like a Wikipedia page on graph theory, which makes it harder answering questions like these:
Below we demonstrate a novel approach to analyzing and working with graphs in Aster using R and toaster (version 0.5.1 or later) and show how to answer the questions like above not just simpler - but more natural.
R package toaster
Please see my previous post for the overview of toaster, then post on k-means clustering with toaster, and also find more detailed discussions and examples on toaster wiki. If you are still here or already back, then this short description may be helpful: toaster is designed to script Aster SQL and analytical functions into higher level tasks (as toaster functions) executed in-database and instantly available for visualizations (also as toaster functions) and analysis in R.
Teradata Aster is relational data store utilizing relational tables for all data structures. By convention, a graph in Aster is represented using two tables:
In turn, every SQL-GR function at minimum expects the following:
To simplify and streamline graph operations toaster introduces specialized graph object toagraph. The toagraph object encapsulates all information about graph in Aster without requiring database connection when created. In essence, it holds all metadata necessary to call up any of Aster SQL-GR functions, so every toaster graph function will rely on it. toagraph contains the following metadata:
toaster Graph Functions
Designed around concrete questions about networks and by convention starting with verb compute the 4 toaster graph functions are (as of version 0.5.1 that introduced graph features):
We will use Dallas Open data set that includes the graph of police officers. The edges between them were derived from the market basket analysis on police offense reports, where each report has up to two officers assigned to it. We will use the graph in all examples just by defining once this toagraph object::
Next, we answer the questions listed above about this graph. To run examples in R you will need to install toaster and also use functions from the packages network and GGally (installed as dependencies with toaster). For more details on how to configure packages, ODBC, and more information on toaster please see its wiki.
How does Dallas police officer graph look?
This usually doesn't target any concrete problems but offers the comfort and guidance with bird's-eye view of the network:
What is the average shortest path for police graph and how these values are distributed?
This is famous Six degree of separation concept (or Six Degree of Kevin Bacon) that introduces Small World graphs in social networks. computeGraphHistogram computes distributions over graph metrics, in particular, for the shortest paths calculated between all connected vertices. Then function createHistogram shows that police graph is indeed a small world network:
which yields this distribution:
What are the most prominent officers based on police graph?
The question doesn't have a singular answer since "most prominent" doesn't have a singular definition in graph theory. There are different metrics that may answer the question, in particular, various centrality measures in graphs. Aster has functions for many of them, but with toaster we can retrieve top vertices for any graph metric using computeGraphMetric. For example, betweenness indicates node's centrality in a network and so if by "prominent" we mean "central" then 30 most central officers are:
What are relationships between all neighbors for a certain node?
Though there is no function for this in Aster it has all necessary parts to answer it. And with toaster function computeEgoGraph the assembly is not required - it was designed to build neighborhoods (ego-graphs) of given order in Aster and materialize results as subgraphs in R. Below we find the officer of the highest vertex degree (actually we find top 50 but use only the 3 highest officers to run faster) and materialize their neighborhoods of 2d order (that includes every vertex within distance of 2 and all edges between them):
How are certain nodes related to each other?
This question can usually be reduced to a problem of extracting a subgraph with given vertices. Versatility of computeGraph function makes it perfect fit to find subgraphs in Aster - in this example subgraph contains only officers having a letter in their name (key):
While Aster offers rich and advanced set of graph functions the above approach targets higher level tasks and problems with graphs and not intended to replace or encapsulate all of Aster functionality. It intends on providing effective way to visualize and analyze graphs in R while is still powered by Aster in-database performance. In particular, toaster functions can serve as robust graph query interface to Aster database among other applications. You can find more examples on using toaster graph functions on Rpubs. We appreciate your feedback and suggestions on where to take the toaster functions next.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.