Modeling Behavior with Hidden Markov Models

Learn Aster

Behavioral analytics is hard.

Unlike objective classification problems such as churn/stay or fraud/not fraud, behavioral analytics relies on appreciation and judgment.

Beauty, like madness, is in the eye of the beholder, and a person’s satire is another person’s fake news.

 

A central challenge of behavioral analytics is inferring a perceptual or psychological behavior from a series of quantifiable observations[1] under two constraints:

  • The data available is not “designed for purpose”, requiring an “observation to higher order features” model type
  • The quantity and quality of data and ground truth (when available) are too low to use certain types of learning systems[2]

 

To understand behavior form data, a proven machine learning approach is Hidden Markov Models (HMM). In essence, HMMs infer a specified number of (hidden) states from a sequence of observations.

Hidden Markov Models offer non-negligible advantages for behavioral analytics:

  • Provide classification without the need to specify higher order behaviors a priori
  • Provide evidence-based likelihood outputs that:
    • Allow for lifetime behavioral analysis (including change in behavior over time)
    • Easily adjustable to classify behaviors when business outcomes change (e.g., prioritizing precision over recall)
    • Competing hypotheses can be compared by training different models and testing the likelihood of an observation sequence for each model

Example

To illustrate the potential of HMMs for behavior prediction we describe a use case of problem gambling.

Problem Gambling is a behavioral disorder recognized by the American Psychiatric Association and characterized by an addictive relationship to gambling. Problem gambling is a serious issue in some jurisdictions; Australian government sources estimate that up to 2% of the Australian population may have a gambling problem.

 

Problem gambling cannot be quickly found from transactional data: a person’s addictive behavior may look similar to another person’s pastime and this differentiation can be subjective. Also, it is likely that gamblers visit more than one gambling venue regularly, making datasets incomplete.

In this example, the data available comprises betting transactions, bet outcomes, and account transactions. Because addiction-type behaviors develop over time, we design features with respect to an individual baseline instead of using global variables such as “amount of dollars wagered”. The customer activity is then aggregated over a time period, e.g., weekly; this aggregation serves as observation.

Customer

Week

Observation

CID1234

201445

One Deposit

CID1234

201446

Base Line Profile

CID1234

201447

Small Loss

CID1234

201448

No Activity

CID1234

CID1234

201510

No Activity

CID1234

201511

One Deposit

CID1234

201512

Decreased bets

CID1234

201513

Small Loss

CID1234

201514

Increased money

CID1234

201515

No Activity

CID1234

201516

Increased markets

CID1234

201517

Big Loss

 

We build two HMMs from these observations: one for customers who eventually “Self-Exclude” from the platform, and another for the ones who stay active. Self-exclusion is the best available proxy for problem gambling but is far from being optimal, as only a small proportion of problem gamblers seek help or self-exclude. This has important implications, as we cannot (and should not) optimize for 100% classification rate from the training data.

 

Hidden Markov Models in Aster are easy to implement. A model is created with the HMMUnsupervisedLearner function, and new data is scored with the HMMEvaluator function. The learner function calculates the composition of states and transition probabilities while the evaluator function expresses how likely a certain observation sequence is for a specific model.

--SELF EXCLUDING MODEL

SELECT * FROM HMMUnsupervisedLearner(

       ON self_exclude_train AS vertices

       PARTITION BY customer_id ORDER BY week

       HiddenStateNum('30') --# of hidden states, this can be optimised

       MaxIterNum('100') --convergence criterion

       Epsilon('0.01') --convergence criterion

       InitMethods('random')

       SeqColumn('customer_id')ObsColumn('keyword')

       OoutputTables(

              'init_state_prob'

              ,'transition_prob'

              ,'emission_prob'

       )

);

Code for creating the Self Excluding Hidden Markov Model

 

 

 

 

Hidden states and transitions for “Active” customers (with manual labels)

 

 

Customers are “scored” by both self-excluding and active model on a weekly basis, assigning a label for that week according to the highest likelihood. This allows us to see how customers’ behavior change over time. In the below example there are 5 weeks where the customer alternates between behaviors before consistently looking like an addictive gambler.

 

 

Customer

Week ID

Prediction

Likelihood

CID1234

1

Active

0.477027359

CID1234

2

Active

0.147745976

CID1234

Active

CID1234

13

Active

1.99E-04

CID1234

14

Self Excluding

4.61E-05

CID1234

15

Active

2.60E-05

CID1234

16

Active

1.83E-05

CID1234

17

Self Excluding

3.87E-06

CID1234

18

Active

2.40E-06

CID1234

19

Self Excluding

6.26E-07

CID1234

Self Excluding

CID1234

114

Self Excluding

5.53E-82

 

We can further refine the classification by using the likelihood of a classification to implement thresholds. Specifically, we can decide not to classify individuals whose likelihood for either model lies under a specific threshold, implicitly considering them non problem gamblers until more activity is recorded. The rationale is that addictive behaviors are more constrained than general behavior.

 

 

Overall, despite a severe imbalance in data (98% of the ground truth is “Active”, only 2% is “Self Excluding”) and the noise in the data, the results are remarkable, with an f-score of 0.79 when a 1e-100 threshold is applied.  Again, we are not trying to get an f-score of 1.0 as many problem gamblers have likely not self-excluded and some self-excluders have done so for other reasons.

 

 

Classification rates for various thresholds

Conclusion

Hidden Markov Models offer an easy, effective method to quantify and classify behaviors from noisy transactional data. With its efficient Aster implementation, we can accurately score millions of customer interactions over years’ worth of data

 

For more information on behavioral analytics, contact Clement Fredembach or Michelle Tanco