As Teradata customers discover and begin to utilize the native Teradata database geospatial capabilities, one of the first questions that inevitably comes up is, how do I “Geocode” my data? In fact, Geocoding will often be an important first phase of any Geospatial implementation project and sometimes even a barrier to start the project all together. The purpose of this article is to discuss what Geocoding is, how it works, Geocoding options, precision, and sources available today for Geocoded information.
Geocoding is a process of calculating geographical coordinates (longitude and latitude – x and y) of various business entities (customers, suppliers, stores, assets, etc.) based on the entities location on the surface of the earth. It is often defined as an address latitude/longitude append operation. Once the coordinates of these entities are acquired, it will be possible to use them in geospatial proximity and location focused analytics within the Teradata database.
For example, here is the address of the Teradata R&D facility in San Diego the way we are typically used to identifying a location:
17095 Via Del Campo
San Diego, CA 92127
These are the associated latitudes and longitudes after this location is Geocoded:
Latitude: 33° 01’20.90” N
Longitude: -117° 05’33.75” W
When we talk about Geocoding we are talking about latitudes and longitudes (remember 5th grade geography?). There are three generally accepted formats for representing coordinates in latitude and longitude:
Degrees, Minutes, Seconds is the most common format that we are used to seeing on all charts and maps. Decimal Degrees expresses coordinates as decimal fractions and is the most convenient for analysis in the database. Ultimately, all Geocoded locations, whether it is an address, zip code, zip+3, zip+4, city center, etc., must be converted to this format in order to be analyzed with the in-database SQL spatial functions in Teradata.
There are different levels of Geocoding granularity that you should take into consideration. You can Geocode to the roof-top, parcel centroid, street address, zip, zip+4, city center, state center, etc. At what level you Geocode to is really based on your business need, analytics requirements, and of course the data available to you.
If the postal code is sufficient spatial information or if the business is reasonably well aligned to postal code boundaries, then it may not be necessary to do any Geocoding at a lower level of granularity (e.g. address, roof-top, parcel, etc.).
On the other hand, if the postal code information is not granular enough, does not align with the spatial pattern you are trying to analyze (e.g. insurance flood zones, cell signal coverage areas, sales regions, or other man-made boundaries, etc.), or there are no static addresses because the object is moving (rail cars, trucks, ships, parcels, RFID tags, etc.), then Geocoding to a more granular unit may add significant value to the data and the resulting analysis.
Once you’ve determine if Geocoding is indeed necessary for your analytical purposes, then the next step is to figure out how to go about Geocoding your data.
Here are some options:
The specific Geocoding method to be used will largely depend on the application requirements, data volumes, and data update frequency (large batches, small batches, real-time, etc.). It also will depend on how stringent the data security is, what are the data latency requirements, data accuracy, service level requirements, and the budget available.
Not all Geocoding is equal and it can be a very complex activity. Many Geocoder vendors have proprietary and patented approaches and algorithms to provide very accurate, precise, and fast Geocoded results. In fact, depending on the technique used to collect the coordinates, the precision of the location results can differ widely from approach to approach and vendor to vendor. Also, the price of the Geocoding solution will vary. In fact, you can send the same address to 10 different vendors and different Geocoding processes, and you will get 10 different answers. Here’s a summary of the some of the most common Geocoding methods used by vendors today and their relative accuracy:
It is recommended to ask the Geocoding vendor about the method used for Geocoding and the precision. As mentioned, the precision (and price) of the Geocoding solution will depend on your requirements and the application that is being designed.
Country and regional coverage requirements will have to be taken into consideration before starting any Geocoding process. The most extensive coverage will typically be found in North America (US, Canada) and in some areas of Western Europe. Coverage of other areas will vary from country to country and from area to area. For instance, urban areas usually have a better coverage than rural areas. The type of geocoding available will also vary with very precise geocoding available for some countries and less precise or even very approximate interpolated geocoding solutions available for others.
Teradata does not have a native in-database Geocoding solution or provide a Geocoding subscription service. However, Teradata does partner with some of the industry’s premier Geocoding technology providers. If you are interested in learning more about Teradata’s Geocoding software and data partners and how they integrate with Teradata, please contact your local Teradata account team or Professional Services for more information.
There are a number of free Geocoders, which usually can be accessed through web-services APIs. These Geocoders usually allow a certain number of records to be Geocoded per day from the same IP address. If you want to Geocode a larger number of records a commercial license is available. As of March 2010, the most popular are the Yahoo (up to 5,000 Geocodes per day per IP address) and Google (15,000 per day per IP address), although others are also available. These Geocoders will usually be adequate for testing and development purposes or even for some low-volume geocoding maintenance applications. However, if you want to experiment using some of these for business critical applications, you probably want to do thorough research and testing because some of these may not offer acceptable service level guarantees or accuracy for your production operations.
Once you have the location coordinate data (latitudes and longitudes) now it's time to load that into the Teradata ST_Geometry data type in your database tables. My colleague, Mike Riordan, has put together another good article on how to load and convert Geocoded location data into the Teradata ST_Geometry data type to begin your Geospatial analytics.
For more information about Teradata Geospatial and/or your Geocoding options, please contact your local Teradata Account Team or Professional Services for more information.