The Friday Night Project #3 - TZA-Insurance

UDA
The UDA channel is for Teradata’s Unified Data Architecture including the Analytical Ecosystem and other UDA influences. This channel provides information specific to the integration and co-existence of multiple systems, in particular when a mix of Aster, Teradata, and Hadoop are present. It is also meant to support information around the UDA enabling technologies so products like Viewpoint, Data Mover, Connectors, QueryGrid, etc.
Teradata Employee

The Friday Night Project #3 - TZA-Insurance

After a couple of Weeks of Introduction to the “What” and “Why” of Active Integration plus a start at the “How” around creating a Solid Architecture, this episode of the Friday Night Project introduces the Teradata Sample Application (TZA-Insurance).

Property or House Insurance

The Friday Night Project will build a number of Integrated Applications (using Web, Portal and SOA presentation veneers), based upon the architectural principals highlighted previously. In order to provide a realistic example against which to work, we use a very simple Business Problem that everyone should have some understanding of, whether they own their own home, rent or merely have to listen to their parents complain about the cost i.e. Home Insurance. Using this example we will show how we might model the data and build up the various layers of Business Services, Business Objects and Data Access Objects.

TZA-Insurance

Based upon this simple Business Problem we define a fictitious company called TZA-Insurance (after Teradata Sample Application) and will build their Enterprise Applications. Note: ‘TZA’ is a deliberate spelling mistake as all of the code presented within the Friday Night Project is taken from example code that sits in the same Subversion repository as the real Client and Enterprise Applications developed within Teradata, such as Viewpoint or TRM. Therefore it was appropriate to create the artificial Subversion short of name TZA to make sure there was no conflict with any current or possible future application short name requirements.

Core Business Algorithm

Based upon the premise that TZA-Insurance is an Insurance Underwriter providing protection (Insurance) against Property Damage the core Business Algorithm (Quotation Algorithm) will operate against the details about the property / house that we wish to insure and its contents (their value, the level of cover required). The property details will include artifacts like how many rooms are in the property and what type of property it is (house, apartment, flat, condominium or duplex). However, the most important artifact, for this simple example, will be the Physical Location of the property / house within a given country based on its Postal Code or Region.

So consider the TZA Quotation Algorithm, which while unlikely to be troubling anyone from the Insurance Industry, sets out a simple combination of algorithmic processes that ultimately rely upon Data held within the Enterprise Data Warehouse.

This core Business Algorithm is based upon the “Property” Object that some Presentation Veneer (as yet to be defined) will provide to the algorithm. From this “Property” Business Object we can obtain the amount we want to insure the building for and take say 0.1% of that as the base Quote amount.

// calculate the Quotation based on 0.1% of property value
buildingsQuote = propertyDetails.getBuildingsAmountInsured();
buildingsQuote = (buildingsQuote * 0.0001);

The algorithm then adds and subtracts some percentage based on artifacts of the “Property” Business Object such as if it has an alarm reduce the quotation by 5%.

….
// if Alarm Fitted reduce quote by -5%,
if (alarmed)
buildingsQuote -= (buildingsQuote * 0.05);
….

Using this base “Property” agnostic quotation all equivalent properties (i.e. same number of rooms, alarmed, property type, etc) would end up with the same quote amount.

Location Based Algorithm

Finally within the algorithm the concept of Physical Location is introduced and used to modify the quote in line with the Risk Factors associated that location.

….
// modify the quote based upon the Risks for this Location
buildingsQuote = buildingsQuote * FireRisk(Location);
buildingsQuote = buildingsQuote * FloodRisk(Location);
buildingsQuote = buildingsQuote * TheftRisk(Location);
buildingsQuote = buildingsQuote * SubsidenceRisk(Location);
buildingsQuote = buildingsQuote * OtherRisk(Location);
.…

Location is Everything

So clearly Location is Everything Toto. Different countries employ a range of techniques, typically as part of their Postal System, to identify specific regions of their physical country. Insurance companies typically use these “Postal Oriented” techniques to determine the Region and the associated Risk Factors.

For example the United Kingdom Postcode System [http://en.wikipedia.org/wiki/UK_postcodes] allows a specific region, such as DD (for Dundee, Tayside and Angus) to be identified.

UK Postal Code Regions

Alternatively a more specific geographic area such as DD2 (representing the West End of Dundee, Scotland) might be used to define a certain set of associated Risk Factors.

DD2 Area of Scotland

The UK Post Code systems actually allows for the identification of a group of up to 12 houses based upon a fully qualified Post Code.

DD2 1JS Postal Code Area

Typically, however, for property or house insurance it is not necessary to mange risk at the individual property level, hence the region (DD) or area (DD2) based approach is more applicable.

The Canadian Postal Districts [http://en.wikipedia.org/wiki/Canadian_postal_code] map below illustrates another region based approach (with the ability to get down to areas through the Forward Sortation Area i.e. K1A of K1A 0B1).

Canadian Postal Districts

For the sake of simplicity (and because Toto wants to get back to Kansas) the Friday Night Project will use the US ZIP Code system [http://en.wikipedia.org/wiki/ZIP_code]. The map below illustrates how numerical ranges of Zip Codes are allocated across the continental US and beyond.

US Zip Codes

From this diagram it is clear how ranges of Zip Codes apply to specific, state wide, geographical regions of the physical country and as with the examples this “Postal Code” designation can be used to associate a specific set of risk factors against the general region or area that a property or house is in.

From the various “Location” definition schemes available the Numerical oriented Zip Code approach of the USA represent the easiest implementation and so follows our “Keep it Simple” principle.

Mapping the Risk Factors

While individual Zip Codes are not as focused as a fully defined as UK or Canadian Post Codes they still, potentially, represent an overly focused location definition. The numerical nature of the Zip Code “Location” definition allows for a “Range” based approach to mapping Regions or Areas to Risk Factors. Therefore instead of requiring the “Expert” user to mange the Risk Factors associated with every single possible ZIP code, we define a Control Table (called ZipCodeRiskFactors) where we use the Range based aspect of the US Zip Code approach to “Location” in order to minimize the amount of data but more importantly the amount of data management the Expert is required to do (imagine if you had to manage individual Risk Factors for all 900,000 primary Zip Codes within the US). The Start and End of a given range of Zip Codes can be used as the index into a database table that holds the Risk Factors associated with each range. The Risk Factors themselves are associated with elements such as Fire, Flood, Theft, Subsidence (earth quake), etc.

Using this Range based approach Risk Factors can be associated with entire US States (which typically represent geographical regions), through Cities or even individual Zip Codes (where Start and End are the same value). In the example below three Zip Code Ranges and their associated Risk Factors are represented:

StartZip EndZip Fire Flood Theft Subsidence Other
32000 34999 0.5 2.0 1.0 1.0 1.5
35000 89999 1.0 1.0 1.0 1.0 1.0
90000 96199 2.0 0.5 1.0 1.5 1.

The first range represents Florida where the risk of Fire is seen as lower than average while the risk of Flood is seen as much higher, due to the state being on the hurricane path. The third range represents California where the Fire risk is higher although the risk of Flood is seen as lower, than average. The middle range, in this example, represents everything in between but you understand how we can split larger ranges into more dedicated sections without having to get down to managing the individual Zip Code elements. A table definition to support this might look like this.

Create Table ZipCodeRiskFactors
(
ID integer NOT NULL PRIMARY KEY,
StartZipRange integer NOT NULL,
EndZipRange integer NOT NULL,
FireRisk float DEFAULT 1.0,
FloodRisk float DEFAULT 1.0,
TheftRisk float DEFAULT 1.0,
SubsidenceRisk float DEFAULT 1.0,
OtherRisk float DEFAULT 1.0
)
UNIQUE SECONDARY INDEX (StartZipRange, EndZipRange);

Introductions Over, Time to Dance

With the Introductions around the “What”, “Why” and “How” of Active Integration, Solid Architecture and the Teradata Sample Application itself completed, it is now time to dance Toto and start to create the TZA Database and User identity.

2 REPLIES
Teradata Employee

Re: The Friday Night Project #3 - TZA-Insurance

great stuff, a "nit" but, "....UNIQUE SECONDARY INDEX (StartZipRange, EndZipRange);" "SECONDARY" is not valid DDL in Teradata
Teradata Employee

Re: The Friday Night Project #3 - TZA-Insurance

nevermind, my bad. it is correct in the following modules