Visualizing View Dependencies with Named Entity Recognition and gTree (Part 2)

Learn Data Science
Teradata Employee

This is part 2 of my original post, which focused on extracting object names from the DDL of database views using Aster's NER function.  In this post I'll talk about how to visualize the sub-dependencies of a view that references multiple other views.  The ultimate goal of this exercise is to be able to identify, for each view, the base tables that store the data being retrieved.

At the end of the last post we were left with a table (objectlinks) that had 4 columns.  Objectlinks holds one row for each object that is referenced in a given view's DDL.  Let's take a look at another DBC view, DBC.ChildrenX:

   select * from objectlinks where objectfullname = 'DBC.CHILDRENX'

objectfullnameobjecttypeentitytype
DBC.CHILDRENXVDBC.ACCESSRIGHTST
DBC.CHILDRENXVDBC.OWNERST
DBC.CHILDRENXVDBC.ROLEST
DBC.CHILDRENXVDBC.DBASET
DBC.CHILDRENXVDBC.ROLEGRANTST
DBC.CHILDRENXVDBC.OWNERDBV
DBC.CHILDRENXVDBC.VIEW_2ORV
DBC.CHILDRENXVDBC.CHILDRENX

V

Here we see that DBC.ChildrenX is a view and its DDL contains references to 7 other objects, 5 of which are tables and 2 are other views.  We know that some of the data contained within DBC.ChildrenX comes from those 5 physical tables, but what other references exist within the two other views that are dependents of ChildenX?  In order to do this, we need to self-join the table objectlinks every time that we see a dependency of type "V".  We also need to do this dynamically since there is no limit to the number of nested views that can exist as dependencies of other views.  This type of dynamic recursive logic is very difficult to implement with standard SQL.

Image result for tree structure

Luckily, we have a SQL-GR function in our Aster toolkit called gTree.  gTree, which runs on the Aster graph engine and is used for hierarchical analysis, can be used to build and traverse tree structures.  Let's take a look at the syntax:

create table viewpaths
distribute by hash(path)
as (
SELECT *
FROM gTree (
ON (select distinct dict as objectfullname from tblvw_dict) AS vertices PARTITION BY objectfullname
ON (select objectfullname as objectfullname,entity as entity from objectlinks
where objectfullname <> entity) AS edges PARTITION BY objectfullname
ON (select dict as objectfullname from tblvw_dict where objecttype = 'V') AS root PARTITION BY objectfullname
TargetKey ('entity')
AllowCycles ('t')
Results (
'Propagate (objectfullname) AS start_vertex',
'Current (objectfullname) AS end_vertex',
'Path (objectfullname)',
'Cycle()',
'Leaf()'
)
Output ('all')
)
where "leaf()" = 'true'
order by start_vertex);

We need to provide gTree with three inputs - the edges, the vertices, and the roots of our tree.  The roots of our tree are the individual views that we want to see the dependencies for.  We want to tree to start propagating from the view and then branch/leaf out to show the dependencies underneath that view.  The vertices are the different objects that can possibly exist in the tree - in our case, this is just a list of all the tables and views that could possibly appear as dependencies of the views.  Finally, the edges are rows that indicate the relationships between all of the vertices and the roots - this is just think information from our objectlinks table.

As I'm sure you've noticed, there are a couple of other arguments that we need to pass the gTree function:

  • TargetKey - indicates which column we want to use to join our edges table to our vertices.
  • AllowCycles - allows for cycles, or visiting a specific vertex a second time
  • Output - specifies whether the gTree function outputs all paths ('all') or only paths that end by reaching a leaf vertex, a cycle, or the maximum number of iterations ('end').
  • Results - any aggregate functions that the function calculates along each vertex in each path.  Each of these will be returned as a column in our result set:
    • Propagate(objectfullname) - returns objectfullname of the root vertex
    • Current(objectfullname) - returns objectfullname of the final vertex in the path
    • Path(objectfullname) - a string of concatenated objectfullname's that represent the path from the root vertex to the last vertex visited
    • Cycle - returns 'true' if the gTree function ends the path by completing a cycle (that is, by visiting a vertex a secondtime), 'false' otherwise.
    • Leaf - returns 'true' if the gTree function ends the path by reaching a leaf vertex (that is, a vertex with no outgoing edges), 'false' otherwise.

Let's take a look a the gTree output from our function call above.  Since we're looking at the view DBC.ChildrenX, we want to see all rows that started with the DBC.ChildrenX root:

SELECT * FROM viewpaths WHERE start_vertex = 'DBC.CHILDRENX'

start_vertexend_vertexpathcycle()leaf()
DBC.CHILDRENXDBC.ACCESSRIGHTSDBC.CHILDRENX->DBC.ACCESSRIGHTSfalsetrue
DBC.CHILDRENXDBC.DBASEDBC.CHILDRENX->DBC.DBASEfalsetrue
DBC.CHILDRENXDBC.OWNERSDBC.CHILDRENX->DBC.OWNERSfalsetrue
DBC.CHILDRENXDBC.ACCESSRIGHTSDBC.CHILDRENX->DBC.VIEW_2OR->DBC.ACCESSRIGHTSfalsetrue
DBC.CHILDRENXDBC.DBASEDBC.CHILDRENX->DBC.OWNERDB->DBC.DBASEfalsetrue
DBC.CHILDRENXDBC.DBASEDBC.CHILDRENX->DBC.VIEW_2OR->DBC.DBASEfalsetrue
DBC.CHILDRENXDBC.OWNERSDBC.CHILDRENX->DBC.VIEW_2OR->DBC.OWNERSfalsetrue
DBC.CHILDRENXDBC.ROLESDBC.CHILDRENX->DBC.ROLESfalsetrue
DBC.CHILDRENXDBC.DBASEDBC.CHILDRENX->DBC.VIEW_2OR->DBC.USERDB->DBC.DBASEfalsetrue
DBC.CHILDRENXDBC.ROLEGRANTSDBC.CHILDRENX->DBC.ROLEGRANTSfalsetrue

These result rows represent all of the branches of a tree with DBC.ChildrenX as the root vertex.  With this output, we can easily use Aster's Visualizer function to show the view dependencies as a sankey chart like below.  I'll demonstrate how to build an App Center app for quick view dependency visualization in a later blog post.