Unifying SQL


Unifying SQL


Unified SQL is a method I developed  to transform the historized  DBQL SQL Statements  into a pattern and assigning  a surrogate unique key named unifiedsqlid to that pattern. This key with the DBQL sql_id is uploaded to a reference table.  Via the reference table iit is possible to compare and measure resource usage of this sql regardless of repetitions with different  values which for ex. might be useful in finding deterioration of callcenter query performance. The reference is updated daily, so sql patterns can be tracked over time.

In the system I administer the DBQL daily data  kept for 3 months with all details and around a year with loss of detail in history.

 TheDBQL  SQL gets  normalized, byte wise  identical SQLS are stored once with a sql_id  and linked with the queryid.

Here is where the unifiedsql  approach comes into play.

The unified SQL patterns are created by unloading the DBQL Sql history data to a flat file via fastexport. The single SQL’S are processed by a perl script, which replaces all occurences of constants in where ,order by, group by,in clauses with a % sign. This end up in an unique pattern for identical sqls which differ only in the constants replaced. By the way the SQL is hashed against reference data which is build up by the actual database and tablenames. Recognized tablenames are stored with the unifiedsqlid  and uploaded to a reference table too.  The procedure runs every night and new sql from dbql is matched to the patterns caught so far. In case of match the reference is written otherwise the new pattern is stored and linked.

Since pattern matching and replacement with perls  regular expressions might be a hard job, not every item is yet recognized and incorrect replacements have been found and corrected.  To come up with details I currently  probe the detection of database objects  in analyzing the objectaccess data and linking the hereby found tables to the unifiedsqlid for comparision. Multline SQLS , SQL exceeding 31000 bytes in lenght and therefore creating 2 entries,  are treared as single ones. This might be improved in the future.

The unified executions are linked to the department and user “executing” that pattern, providing number of pattern execution, sum ampcputime, min and max logdate, number of detections. date of insert and update of this numbers.

For example the linkage between SQL and table makes  it quiet  easy to perform analytics for improving collect stats which is often performed without knowledge of what  sql (pattern) might benefit most  from proper stats.

Some figures.

Queries per month avg…………..: 86.402.221

Number of sql_ids (DBQL)……….: 103.682.663 (2011-07-17 – 2013-04-11)

Number of unified patterns…….: 1.966.010 ( 2011-12-08 . 2014-04-10)

 Number of unified links to sql_id:.. 66.057.776

The unified structure in conjunction with the DBQL builds a more and more consistent analytical platform to measure the ressource usage on sql and helps identifying hot spots of table access and most important repetition with different assignments and value lists.

Guido Schasiepen

Technical Specialist Data Warehouse

Vodafone D2 Gmbh, Germany