In my previous article we explored Teradata's spatial features using geometric shapes on a 10x10 grid. I hope this was a useful introduction into the new ST_Geometry data type and its powerful set of functions. And now that we've covered the basics, let's take a step into the world of geospatial. By geospatial, we're now talking about locations based on geographic reference points on Earth; for example longitude and latitude. (This can and does get more complicated, but we'll try to keep this as light as possible).
So just like we did in the previous article, we'll start with an example using ST_Geometry for point data. Here's a table to store city locations as points on a map.
CREATE TABLE cities
id INTEGER NOT NULL PRIMARY KEY,
This table includes an ID column, a name column and an ST_Geometry column for storing the location. This of course is the field that we'll be using to flex Teradata's geospatial powers.
Now to add some data.
INSERT INTO cities VALUES (1, 'New York','POINT( -73.960257 40.768823)');
INSERT INTO cities VALUES (2, 'San Francisco', 'POINT(-122.420782 37.795219 )');
INSERT INTO cities VALUES (3, 'San Diego', 'POINT(-117.074830 32.995484 )');
INSERT INTO cities VALUES (4, 'Los Angeles', 'POINT(-118.243 34.05219 )');
INSERT INTO cities VALUES (5, 'Atlanta','POINT( -84.3897 33.75449)');
INSERT INTO cities VALUES (6, 'Dallas','POINT(-96.7971 32.78108 )');
The first difference you should notice is the value of the X,Y coordinates. We're using geospatial references; longitude and latitude.
Longitude are the verticle lines on a map, representing the X coordinate. And latitude are the horizontal lines, representing the Y coordinate.
OK, so far nothing too complex here. Let's try our distance query just like we did with our simply geometry example.
SELECT C1.name, C2.name, C1.location.ST_Distance(C2.location) AS Distance
FROM cities C1, cities C2
WHERE C1.id != C2.id
ORDER BY 1,3;
name name Distance
Atlanta New York 12.57
Atlanta San Diego 32.69
Atlanta Los Angeles 33.85
Atlanta San Francisco 38.25
Los Angeles San Diego 1.58
Los Angeles San Francisco 5.61
Los Angeles Atlanta 33.85
Los Angeles New York 44.79
New York Atlanta 12.57
New York San Diego 43.81
New York Los Angeles 44.79
New York San Francisco 48.55
San Diego Los Angeles 1.58
San Diego San Francisco 7.18
San Diego Atlanta 32.69
San Diego New York 43.81
San Francisco Los Angeles 5.61
San Francisco San Diego 7.18
San Francisco Atlanta 38.25
San Francisco New York 48.55
Hmm, what's going on here? Atlanta to New York is 12.57? What units are those - they can't be miles or kilometers?
What do these numbers mean?
OK, let's back up a little and talk through this. We've entered our coordinates as longitude and latitude, right. So what should expect from the distance calculation. To Teradata these are arbitrary units. We haven't done anything to tell the database that these now represent geospatial coordinates (long/lat). We created the table and added data just the same way we did for our geometry shapes, so how is the database to know anything differently?
The answer to this problem lies with an ST_Geometry attribute called SRS - Spatial Reference Systems. Until we do something to indicate that we are using a geographic SRS, all the ST_Distance function is going to do is assume that these are simple units. And in that simple world, yes, Atlanta and New York are only 12.57 units apart.
This is where the complexity comes in; converting longitude and latitude calculations into units like miles or meters. Not nearly as easy as it sounds. A quick web search brings us information like "on Earth, the length of an arcdegree of north-south latitude difference is about 60 nautical miles, 111 kilometres or 69 statute miles at any latitude."
Let's try this simple rule of thumb value and try our calculation using each unit of latitude as 69 miles. If we take our Atlanta to New York value of 12.57 and multiply by 69 we then have 867.33 miles. That seems more reasonable. A web search confirms that the driving distance at about 850 miles. But wait, another web search has the directly flying distance between these cities at 748 miles. That's the one we should be using to compare with our database results. Why then is our calculation off by over a hundred miles? Well, let's go back to latitude and longitude.
Latitudes are parallel lines running horizontal across the globe. They are always the same distance apart (69 miles). Longitude lines, though, are a much different story. At the equator they too are roughly 69 miles apart, but as they reach the North and South poles, the converge and the distance between them goes to zero. If we are going to accurately calculate distances, we're going to need to account for these narrowing longitude uints (Atlanta and New York are separated by almost 11 degrees longitude). Doing this will involve telling Teradata about our coordinates when we ask it to perform any calculations. The method we'll use for this is the ST_Transform() method. ST_Transform allows us to convert ST_Geometry data between different SRS units. In our example, we need to transform our coordinates into geodetic units so that the ST_Distance() calculation can accurately account for the narrowing longitude lines.
This topic does start to get more and more complex, so let's skip ahead and take a look at the answer to this query. That should make it a little easier to work our way backwards to explain how we create this query.
SELECT C1.name, C2.name,
).ST_Distance(C2.location.ST_Transform(SRS_1.srtex t,SRS_2.srtext))*.6/1000 as Miles
FROM cities C1 ,cities C2,
WHERE C1.id != C2.id
AND SRS_1.AUTH_SRID = 32616 -- UTM 16 / WGS84
AND SRS_2.AUTH_SRID = 4326 -- WGS84
ORDER BY 1,3;
name name Miles
Atlanta New York 728.80
Atlanta San Diego 1872.56
Atlanta Los Angeles 1929.26
Atlanta San Francisco 2142.69
Los Angeles San Diego 105.88
Los Angeles San Francisco 375.60
Los Angeles Atlanta 1929.26
Los Angeles New York 2430.39
New York Atlanta 728.80
New York San Diego 2396.82
New York Los Angeles 2430.39
New York San Francisco 2560.22
San Diego Los Angeles 105.88
San Diego San Francisco 481.32
San Diego Atlanta 1872.56
San Diego New York 2396.82
San Francisco Los Angeles 375.60
San Francisco San Diego 481.32
San Francisco Atlanta 2142.69
San Francisco New York 2560.22
First let's look at the new distance calculation between Atlanta and New York - 728.80 miles. Yeah, that looks like the right answer. And another check on the San Diego to New York distance of 2396.82 checks out as well. Looks like we have the right query. Now let's take a closer look to understand what we did to get these results.
We can see that our select statement now includes two calls to ST_Transform() for the distance calculation. That makes sense; we're converting both the C1.location and the C2.location for this new calculation. We also see a unit conversion factor here, "*.6/1000". That's a pretty standard way of converting from meters to miles, so apparently ST_Distance() is returning meters. All right, we're good to go, right?
Well, before we accept this query as our general solution to geospatial calculations, let's dig a little deeper into these two SRS transformations. While we do seem to have a query that is working, we need to make sure we have a solution that will work for all locations, not just these US coordinates. Looking back at the SQL and those commented lines of the query, we should be asking "What is UTM 16 and WGS84".
WGS stands for World Geodetic System and is a standard for use in mapping the Earth's spheroidal surface. The latest revision is WGS 84, which is also the reference coordinate system used by Global Positioning System (GPS) devices. Geodetic references are needed because the earth is not a perfect sphere.
UTM, or Universal Transverse Mercator, is a modern mapping coordinate system developed by the US Army Corps of Engineers in the 1940s. UTM is a grid-based coordinate system that differs from traditional longitude and latitude in that it divides the world into a series of sixty zones. This allows distance calculations to be done more easily with less distortion than the trigonometric formulas needed for longitude/latitude. These calculations were particularly important during WWII and it was this work during the war that led to adoption of UTM in the post-war years. See http://egsc.usgs.gov/isb/pubs/factsheets/fs07701.h
So back to our query. What we've done here is taken the location data for our cities, which are UTM based locations, and converted them to geodetic values for more accurate distance calculations. But the key here is to 'know your data'. This query works here for US data. Other regions will be using different coordinate reference systems for their locations. Teradata does provide over a thousand SRS references to be used by ST_Transform(). Take a look at the table sysspatial.SPATIAL_REF_SYS table:
SELECT AUTH_SRID, SRTEXT
This returns 1338 rows from my database. Finding the right one for your data may take some research and trial and error.
In these past two articles we've had a hands-on tour of simple spatial objects and the powerful functional capabilities of Teradata and a good introduction to the concepts of geospatial data and the considerations required for using our calculations. In the next articles, we'll continue working with geospatial data and methods, learn how to import external spatial reference data and finally how to visualize our data by creating viewable maps. Stay tuned - this is great stuff!
a question here,
I want to calcuate the distance for longitude east 121.516 latitude north 25.041 (Taiwan/Asia)
so, I insert one additional row into table you mentioned, and try to query the result.
But it always shows ST_Transform: Error in call to transform...
Some could help me looking this problem? I doubt it maybe need to change the number of
"SRS_1.AUTH_SRID = 32616 -- UTM 16 / WGS84 or SRS_2.AUTH_SRID = 4326 -- WGS84 "
thanks in advance !
Taiwan is not on the same UTM 16, it is different,
Did you figure this out? If not do you still need any help? Please post your SQL details for any help/error.