FPGrowth - Finding Deeper Patterns with Aster

Learn Aster
Teradata Employee

Uncovering relationships within data has long been a cornerstone analytic with Aster.  As many of our CFilter tutorials have shown, we've found relationships in retail market baskets, online web searching and shopping, sensor data and even uncovered social networks with online gaming.  With the Aster Analytics 6.20 library, we now have the ability to uncover even deeper relationships within these data sets using the FPGrowth function.  This Frequent Pattern Growth algorithm performs deep association rule mining well beyond the 1-1 pair affinity of CFilter.  FPGrowth will detect combinations of items (called Antecendents) that result in combinations (called Consequences). 

 

To demonstrate the power of FPGrowth, here's a very simple grocery basket example visualizing the output using a pattern of  '1-2 Antecedents -> 1 Consequence'.  We'll highlight the pattern parameters when we look at the function syntax.  Notice the nodes that are comprised of pairs of items; for example, 'milk,butter' or 'cereal,paper towels'.FPGrowth example 1 - Grocery Items

 

With FPGrowth, these combination relationships, which may have been hidden before, are now easily uncovered.  In the graph above we see that milk and diapers together have an affinity score of 3.4 to bread, whereas milk alone to bread has a lower score of 2.0 and diapers alone to bread was only 2.1.  The interpretation is 'baskets with both milk and diapers are much more likely to include bread than those baskets with only milk or diapers".  We can then go deeper into these relationships by simply changing the search pattern parameters.  For example, let's look for combinations that include up to 3 Antecedents that lead to combinations of 1 and 2 Consequences:

 

 

Here we now see more complex combinations, such as 'milk,chips,diapers' as an Antecedent, and 'milk,cereal' as a 2-item Consequence.  This is very exciting!  I hope this simple example clearly shows how Aster can dig deep into these data sets to uncover potentially very valuable insights from these more complex affinity relationships.

 

Here is the syntax for FPGrowth, not much different than CFilter, with the addition of some threshold filters and the Antecedent/Consequence patterns.  It also can output 2 result tables; rules, patterns or both.  I find myself using the Rules table, as it splits Antecedents and Consequences into separate fields, as well as the usual statistical output expected, very much like CFilter.  The pattern in this example is '1-2 Antecendents leading to 1 Consequence'.

 

SELECT * FROM FPGrowth
(
    ON (SELECT 1)
    PARTITION BY 1
    InputTable ('grocery_baskets')
    OutputRuleTable ('grocery_baskets_fpgrowth_out_rule')
    OutputPatternTable ('grocery_baskets_fpgrowth_out_pattern')
    compress('low')
    droptable('true')
    TranItemColumns ('item')
    TranIDColumns ('entity_id','basket_id')
    MinSupport (0.001)
    MinConfidence (0.1)
    --MaxPatternLength (5)  -- can be used as a combination 'Antecedent+Consequences' pattern size limit
    AntecedentCountRange ('1-2')  -- range size for Antecedent pattern; 'min-max'
    ConsequenceCountRange ('1-1')  -- range size for Consequence pattern; 'min-max'
   -- PatternsOrRules ('rules')  -- syntax is 'rules', 'patterns', or 'both'.  Default is 'both'
);

 

Here is the output from the Rules table:

select * from grocery_baskets_fpgrowth_out_rule limit 10;

 

Note a few new statistics columns, notably 'conviction'.  A word of warning, this field is null for the highest affinity values because the formula divides by '1-confidence', which is a divide-by-zero.  So don't assume sorting by conviction desc will give you the highest matches.

 

antecedent_itemconsequence_itemcount_of_antecedentcount_of_consequencecntbcnt_antecedentcnt_consequencescoresupportconfidenceliftconvictionleveragecoveragechi_squarez_score
bread,buttercereal212290.220.081.002.670.050.083.64-0.03
bread,buttermilk2122120.170.081.002.000.040.082.18-0.03
cerealbread114970.250.170.441.521.280.060.381.631.27
milk,chipsbread211670.020.040.170.570.85-0.030.250.61-0.68
milk,cheesebeer2122130.150.081.001.850.040.081.85-0.03
milk,cheesechips2122100.200.081.002.400.050.083.05-0.03
beer,diapersbread211370.050.040.331.141.060.010.130.03-0.68
cerealcheese112990.050.080.220.590.80-0.060.381.43-0.03
milk,chipscheese212690.070.080.330.890.94-0.010.250.06-0.03
beer,diaperschips2123100.130.080.671.601.750.030.130.88-0.03

 

So there it is, a very much more powerful Affinity detection library function from Aster.  This can be used in so many scenarios beyond these shopping examples, from healthcare life saving pattern detection to sensor data manufacturing quality analytics.  And all with the ease and performance of Aster.  Have fun!