Learn Data Science

turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Teradata
- :
- Data Science Blog Posts
- :
- Learn Data Science
- :
- Fast Track to Aster Graph Database with R and toas...

03-27-2016
07:21 PM

- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Email to a Friend
- Printer Friendly Page
- Report Inappropriate Content

03-27-2016
07:21 PM

Aster Database includes a native graph processing engine (SQL-GR) designed to perform large-scale graph analysis on distributed graph (or network) structures. This data still lives in Aster database and, hence, is also available for processing with SQL and SQL-MR functions. In combination with SQL-GR it makes Aster a powerful platform for the best of breed approach to solving analytical problems on graphs and other network-like big data.

Aster SQL-GR functions offer wide range of graph analytics including analysis, discovery, and navigation. The names of the functions though sometimes look like a Wikipedia page on graph theory, which makes it harder answering questions like these:

- How does the network look overall?
- What is the average shortest path within the network and how these values are distributed?
- What are the most prominent nodes of the network?
- What are relationships between all neighbors of a certain node?
- How are the certain nodes related to each other?

Below we demonstrate a novel approach to analyzing and working with graphs in Aster using R and **toaster** (version 0.5.1 or later) and show how to answer the questions like above not just simpler - but more natural.

R package **toaster**

Please see my previous post for the overview of **toaster**, then post on k-means clustering with **toaster**, and also find more detailed discussions and examples on **toaster **wiki. If you are still here or already back, then this short description may be helpful: **toaster** is designed to script Aster SQL and analytical functions into higher level tasks (as **toaster** functions) executed in-database and instantly available for visualizations (also as **toaster** functions) and analysis in R.

Graph Object

Teradata Aster is relational data store utilizing relational tables for all data structures. By convention, a graph in Aster is represented using two tables:

- the vertices table must contain unique vertex key and optional columns with vertex attributes
- the edges table must contain source and target columns (referencing vertex keys) and optional columns with edge attributes

In turn, every SQL-GR function at minimum expects the following:

- the vertex table, view or SQL query with the vertex key
- the edges table, view or SQL query with target and source keys
- if graph is directed or not
- if edges have weight attribute

To simplify and streamline graph operations **toaster** introduces specialized graph object *toagraph*. The *toagraph* object encapsulates all information about graph in Aster without requiring database connection when created. In essence, it holds all metadata necessary to call up any of Aster SQL-GR functions, so every **toaster** graph function will rely on it. *toagraph* contains the following metadata:

*vertices*the name of table or view, or SQL query representing the vertices*edges*the name of table or view, or SQL query representing the edges*directed*logical flag indicating if graph is directed or not*key*the name of a unique column (key) in the vertex table or query*source*the name of a source column in the edge table or query*target*the name of a target column in the edge table or query*vertexAttrnames*vector with the names of vertex attributes (optional)*edgeAttrnames*vector with the names of edge attributes (optional)*vertexWhere*a SQL expression to filter vertices inside WHERE clause (optional)*edgeWhere*a SQL expression to filter edges inside WHERE clause (optional)

**toaster** Graph Functions

Designed around concrete questions about networks and by convention starting with verb *compute* the 4 **toaster** graph functions are (as of version 0.5.1 that introduced graph features):

*computeGraph*: materializes Aster graph or its subgraph as a*network*object in R (for more about**network**package see here)*computeEgoGraph*: finds and materializes the neighborhoods of given order for each of given vertices.*computeGraphHistogram*: finds distributions of various graph metrics, such as shortest path, degree, centrality measures, etc.- computeGraphMetric: finds top vertices (or all of them) ranked by the graph metrics (e.g. degree, clustering, centrality measures, etc.)

Examples

We will use Dallas Open data set that includes the graph of police officers. The edges between them were derived from the market basket analysis on police offense reports, where each report has up to two officers assigned to it. We will use the graph in all examples just by defining once this *toagraph* object::

policeGraphUn = toaGraph(vertices = "dallaspolice_officer_vertices",

edges = "dallaspolice_officer_edges_un",

directed = FALSE,

key = "officer", source = "officer1", target = "officer2",

vertexAttrnames = c("offense_count"),

edgeAttrnames = c("weight"))

Next, we answer the questions listed above about this graph. To run examples in R you will need to install **toaster** and also use functions from the packages **network** and **GGally** (installed as dependencies with **toaster**). For more details on how to configure packages, ODBC, and more information on **toaster** please see its wiki.

How does Dallas police officer graph look?

This usually doesn't target any concrete problems but offers the comfort and guidance with bird's-eye view of the network:

netPoliceUn = computeGraph(conn, policeGraphUn)

GGally::ggnet2(netPoliceUn, node.label="vertex.names", node.size="offense_count",

legend.position="none", label.size=3)

# or

showGraph(conn, policeGraphUn, node.label="vertex.names", node.size="offense_count",

legend.position="none", label.size=3)

What is the average shortest path for police graph and how these values are distributed?

This is famous Six degree of separation concept (or Six Degree of Kevin Bacon) that introduces Small World graphs in social networks. c*omputeGraphHistogram* computes distributions over graph metrics, in particular, for the shortest paths calculated between all connected vertices. Then function *createHistogram* shows that police graph is indeed a small world network:

hshortestpathPolice = computeGraphHistogram(conn, policeGraphUn, type='shortestpath',

numbins = 10)

createHistogram(hshortestpathPolice,

title = "Dallas Police Shortest Path Distribution",

xlab = "Distance", ylab = "Count",

themeExtra = theme(axis.text.x = element_text(angle = 0)))

which yields this distribution:

What are the most prominent officers based on police graph?

The question doesn't have a singular answer since "most prominent" doesn't have a singular definition in graph theory. There are different metrics that may answer the question, in particular, various centrality measures in graphs. Aster has functions for many of them, but with **toaster** we can retrieve top vertices for any graph metric using *computeGraphMetric*. For example, betweenness indicates node's centrality in a network and so if by "prominent" we mean "central" then 30 most central officers are:

topbetweenness = computeGraphMetric(conn, policeGraphUn, type='betweenness', top=30)

createTopMetricPlot(topbetweenness, 'betweenness', ylab='Betweenness',

title='Top 30 Officers (Betweenness)')

What are relationships between all neighbors for a certain node?

Though there is no function for this in Aster it has all necessary parts to answer it. And with **toaster** function *computeEgoGraph* the assembly is not required - it was designed to build neighborhoods (ego-graphs) of given order in Aster and materialize results as subgraphs in R. Below we find the officer of the highest vertex degree (actually we find top 50 but use only the 3 highest officers to run faster) and materialize their neighborhoods of 2d order (that includes every vertex within distance of 2 and all edges between them):

library(network)

library(GGally)

topDegree = computeGraphMetric(conn, policeGraphUn, type="degree", top=50)

topDegreeEgo = computeEgoGraph(conn, policeGraphUn, ego=as.list(as.character(topDegree$key[1:3])), order=2)

topDegreeEgo[[1]] %v% "color" =

ifelse(get.vertex.attribute(topDegreeEgo[[1]], "vertex.names") == as.character(topDegree$key[[1]]),

"red", "grey")

ggnet2(topDegreeEgo[[1]], node.label="vertex.names", node.size="offense_count", node.color="color", legend.position="none")

How are certain nodes related to each other?

This question can usually be reduced to a problem of extracting a subgraph with given vertices. Versatility of *computeGraph* function makes it perfect fit to find subgraphs in Aster - in this example subgraph contains only officers having a letter in their name (key):

subGraphOfficerLetters = computeGraph(conn, policeGraphUn,

v = "SELECT officer FROM dallaspolice_officer_vertices WHERE officer ~ '[A-Z].*'")

subGraphOfficerLetters %v% "color" =

substr(get.vertex.attribute(subGraphOfficerLetters, "vertex.names"), 1, 1)

GGally::ggnet2(subGraphOfficerLetters, node.label="vertex.names", label.size=3,

node.size="offense_count", size.cut=TRUE, node.color="color",

palette = "Set2", legend.position="none")

Conclusion

While Aster offers rich and advanced set of graph functions the above approach targets higher level tasks and problems with graphs and not intended to replace or encapsulate all of Aster functionality. It intends on providing effective way to visualize and analyze graphs in R while is still powered by Aster in-database performance. In particular, **toaster** functions can serve as robust graph query interface to Aster database among other applications. You can find more examples on using toaster graph functions on Rpubs. We appreciate your feedback and suggestions on where to take the **toaster** functions next.

Labels:

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.