How to efficiently join GEO IP RANGE data - How to avoid product JOIN

Database
Enthusiast

How to efficiently join GEO IP RANGE data - How to avoid product JOIN

Problem Description

A typical lookup kind of JOIN between FACT table with hundreds of millions of web clicks, and DIM table with 55M IP ranges in numeric style (e.g. downloaded from MaxMind).

The explain plan choose to replicate DIM to all AMP, which is reasonable; it scan and filter the FACT table in parallel; but then it will use "product join" to find the matching GEO_IP_LOCATION_ID. Quickly, sun query runs out of spool or gets killed by DBA.

Not sure if value-ordered unique secondary index on (start_ip_num, end_ip_num) will help, because it can only be created in TD 14.xx due to the BIGINT data type. But this is such a typical JOIN, there should be a more efficient way to do it, right?

-- DIM_GEO_IP_RANGE
create set table DIM_GEO_IP_RANGE (
START_IP_NUM BIGINT NOT NULL,
END_IP_NUM BIGINT NOT NULL,
GEO_IP_LOCATION_ID INT
) UNIQUE PRIMARY INDEX(START_IP_NUM)
;

-- FACT_WEB_CLICK
create multiset table FACT_WEB_CLICK (
CLICK_TIME_UTC TIMESTAMP(3) NOT NULL,
LOCAL_TIME_ZONE_OFFSET DECIMAL(4,2),
WEBSITE_ID SMALLINT NOT NULL,
SESSION_ID BIGINT NOT NULL,
CLICK_ID INT NOT NULL,
CLICK_TYPE SMALLINT,
IP_NUM BIGINT NOT NULL,
PAGE_ID INT NOT NULL,
USER_ID INT,
REF_STRING VARCHAR(256) CHAR SET LATIN,
) PRIMARY INDEX (SESSION_ID, CLICK_ID)
PARTITION BY (
RANGE_N( CAST(CLICK_TIME_UTC as DATE) BETWEEN
DATE '2013-01-01' AND DATE '2013-12-13' EACH INTERVAL '1' DAY)
);

-- LOOKUP JOIN (explain plan has product join)
explain
select c.*, g.geo_ip_location_id
from fact_web_click c join dim_geo_ip_range g
on c.ip_num between g.start_ip_num and g.end_ip_num
where c.date_sk = current_date - 1
and user_id is not null
and website_id = 1234
;
1 REPLY
Enthusiast

Re: How to efficiently join GEO IP RANGE data - How to avoid product JOIN

Any suggestion?