In parts 1 and 2, we looked at how to load and retrieve large objects using Teradata BLOBs and CLOBs. In part 1, I put forward my unbreakable golden rule that you shouldn’t simply store Object Models as Large Objects. Rather, always map your objects’ attributes to columns in a table; don’t just serialize a bunch of classes and store them in a BLOB.
In part 3, I present another of my unbreakable golden rules; which is “Rules were made to be broken”. The focus of this article is how to serialize and persist (store) an Object Model into a BLOB on Teradata and reinstate it.
Given rule number 1, why would we want to serialise an Object model and store it in a BLOB which essentially makes it unusable in terms of SQL operations? The answer is that sometimes you will be in a situation where you need to manage a whole bunch of data that really is not useful to query and possibly not convenient to convert to its component elements and store in relational attributes. By way of example, an image is simply a sequence of bytes that when appropriately interpreted can be displayed or printed as a picture. You could; if you really, really wanted to store the bytes of an image stored in a table consisting of one row per row byte in the image. Most people don't store an image as a series of integers in a table. I wouldn't do that either, I like most, would store it as a BLOB. But you could do that if you really wanted to do so.
Consider the diagram. It is a diagram of an airfield (airport). This diagram could be stored as an image in a BLOB column. In this web page, it is just that, an image - a GIF image to be precise. However, the diagram is intended to be part of an application that allows the user to interact with the it. One example is of a user being given taxi instructions could enter them and have the taxi path highlighted on the diagram. This would not be easy to do if the diagram was stored as an image.
A custom object model is required to maintain the diagram and support the interactivity. This custom object model is the Airfield object model.
The Airfield object model needs to be able to be stored and retrieved to/from my database. I could store the Airfield Object Model as individual attributes in a multitude of tables or simply serialise it and store it as a BLOB. In this article I will attempt to justify and do the latter.
The Airfield object model consists of a number of classes representing all aspects of the diagram plus some additional features not shown (e.g. Radio Services).
In this initial version of the object model there are 14 classes representing runways, taxiways, aprons and other aspects of the airfield. Once could assume from this that to store the airfield class structure in a database, up to 14 different tables would be required, one for each class.
The actual number of tables required to store an airfield in a Teradata would likely be different to the number of classes in the Object model. The reasons are twofold:
There is nothing in the object model that would prevent the mapping from classes to tables and attributes to columns for storage in a 3NF (or denormalised) data model. The number of records that would need to be maintained (i.e. save and load) would however be considerable. Similarly the amount of code required to be built (whether O-R mapping configuration files or SQL queries loading an object model) would also not be insignificant. Once we are done, we would have a data model full of attributes which, quite frankly, wouldn't be useful for analytical work (and certainly not useful for the applications I have in mind). Following are two examples that hopefully better explain this point.
Consider a runway. It consists of designators (the numbers at the end of the runway) and points that define the runway’s ends relative to the airfield’s reference point. None of these are terribly useful to be used in a query. For example, it is unlikely that anyone is going to run a query listing airfields that consist of a runway that has an end point a certain distance from the origin or runway 35. Of course there will always be the exception, for example, someone concerned with the integrity of the data may wish to do this, but for the applications I have in mind, such a query does not make sense.
Taxiways also consist of designators (the letter identifying the taxiway) and a series of points relative to the airfield’s reference point that map out the path of the taxiway. In most, if not all, countries the naming of taxiways has not been formalized. As such taxiway designations can be somewhat arbitrary. Consequently, none of the taxiway attributes are particularly useful to use in queries. For example a query listing airfields that consist of taxiway “A” will pretty much return all airfields in the database; which is not terribly useful. Additionally, the location of each designator and the points that map out the path of the taxiway would need to be stored as individual entries in dependent tables having a 1:M relationship with Taxiway.
What is useful to use in a query is the airports IATA code (or ICAO code) and/or location (latitude/longitude). The IATA code is an abbreviation allocated to airfields around the world. Here are some examples of IATA codes that you may be familiar with together with the ICAO equivalent code:
So the IATA code will be one of the attributes that we store in the table containing our BLOB.
For the reasons above, I’ve decided that I will serialize my object model and store it as a Binary Large Object in Teradata. Obviously I will also need to restore my object model when reading it back from the database.
As we’ve seen in the other 2 articles, storing a large object from a file into Teradata is pretty easy. The problem that we have here is that any object model we want to work with isn’t one of the Java types that we can store in a BLOB. That is, the Airfield Object Model is not an array of byte (byte ) nor is it a Blob (which at the end of the day is basically just a wrapper for an array of byte).
So the trick is to convert our object model into a form that we can use; that is we need to convert our object model to an array of byte.
Note that in order to serialise an Object model, each and every class in the model must implement the marker interface java.io.Serializable.
The only way that I know how to convert an object model to an array of bytes is to serialize our object then get an InputStream to that serialized data. This (2 step process) sounds very convoluted because most people would think of serializing in terms of saving to a file which must then be read back in – not to mention figuring out how to keep track of a location to store the file, cleanup afterwards and so on!
Fortunately there are 2 classes that make the conversion of an object model to a byte array and back again very straightforward. These are java.io. ByteArrayOutputStream and java.io.ByteArrayInputStream. Consider the following code snippet which converts a Blob retrieved from the database into an Airfield Object structure:
Blob airfieldBlob = rs.getBlob("diagram");
InputStream is = airfieldBlob.getBinaryStream();
ObjectInputStream ois = new ObjectInputStream (is);
Airfield airfield = (Airfield) ois.readObject();
At lines 1 and 2 we extract the airfield BLOB from our result set and obtain its InputStream. At this point it is just an array of bytes that aren’t terribly useful.
At line 3, we use the Blob’s InputStream to construct an ObjectInputStream. This is the class that does all the magic of converting the collection of bytes into a useable Airfield object (line 4). Finally we close all of the InputStreams and we are done. Simple!
Apart from I/O errors, the only things that really can go wrong with this are Class Cast Exceptions and Invalid Class Exceptions. The former (Class Cast Exception) will occur if the BLOB isn’t an instance of an Airfield object. The later (Invalid Class Exception) will occur if the signature of the classes has been changed. The Invalid Class Exception can be managed by placing SerialVersionUID in your class definitions. Refer to the serialver command in the JDK for information on generating these - your IDE may have a convenient plugin to generate them for you (mine does).
Saving the object model is essentially the reverse of the retrieval:
Airfield airfield = /* obtain an airfield from somewhere */
ByteArrayOutputStream bos = new ByteArrayOutputStream ();
ObjectOutputStream oos = new ObjectOutputStream (bos);
byte  airfieldData = bos.toByteArray();
PreparedStatement pstmt = connection.prepareStatement(sqlText);
.../* setting of other attributes */
pstmt.setObject (8, airfieldData);
To begin the save process we need to obtain a reference to our airfield which is performed at line 1.
Next we convert the object to a byte array using an ObjectOutputStream. This is performed at lines 3-8. Note that there is a need to flush the output stream (line 6). If you do not, it is possible that you will end up with an empty byte array; an empty byte array would be undesirable.
The final step is to extract the byte array and store it in our BLOB. This is achieved at lines 10-13. Note that the extraction of the byte array (line 10) and the setting of the Object (line 13) could be combined into a single call if you so desire. I’ve just separated it to hopefully make the overall process a little clearer.
The attached sample program is an extremely simple program that prompts you for the airfield’s ICAO code. It then retrieves the airfield from the database, converts it to an object and passes it to the airfield renderer which draws the airfield.
To get the program working, you will need to:
When you run the program, it will prompt you for the code of the airfield to retrieve. It will then attempt to connect to the database, retrieve and display the diagram. There may be a delay between entry of the code and display of the diagram while the database connection is established. Any exceptions that occur will be printed to System.out.
The airfield editor is quite a large application and I haven’t included it here. Consequently the save logic isn’t included in the test application. To explore saving Large Objects, you could use the sample application to modify an airfield’s attribute and save it. Alternatively you could use your own object model and save it in the database.
When the airfield is loaded from Teradata, the sample program overwrites certain attributes in the object model. The reason for overwriting them is that someone may modify these values that are exposed as columns in the table using SQL. For example, someone may modify the value of the elevation attribute using SQL. To provide some degree of data integrity, the values of the exposed attributes (city, state, latitude, longitude and elevation) have precedence to the values stored in the BLOB. Should you do this for your application? I can’t answer that because the answer would depend upon factors specific to what you are trying to achieve and how you design your object model and your data model (but you probably should).
In consideration of my two rules at the top of this article, should you serialize objects and store them in a BLOB? As with most things in life there is a very clear and definitive answer which is: it depends.
Well perhaps “it depends” isn’t definitive, but it is clear. You will need to evaluate all the pros and cons of storing objects as BLOB’s or in tables/columns and make a decision that works best for you.
So with the airfield model in mind, what are the attributes of serializing the Airfield Object or any object and storing it in a BLOB?
By way of alternative, some of the attributes of storing the Airfield object in multiple tables (and not using a BLOB) are:
Hopefully the above check list of factors will help you to decide the correct approach for you.
As a rule of thumb, I would say that if the data you are storing is complex and largely graphical in nature (e.g. an airfield diagram, 3D coordinate data for parts, 3D coordinate information for how parts assemble into modules etc) and requires a dedicated tool to manipulate them via point/click and drag and drop types of user interaction, then you might consider serializing the object model and storing it in a BLOB.
On the other hand if the data you are storing is simple (including simple graphics data - e.g. photos) or primarily data that can be displayed and edited as text; including things like account numbers and amounts displayed using digits (which is also just text) it should be stored as attributes in one or more tables.
I hope this article gives you some options (as opposed to created confusion) that may help you in building applications around Teradata that leverage complex data structures.
The diagrams and data supplied in this article are for illustration purposes only.
As such, the diagrams and data supplied in this article should not be used for operational purposes.
For operational purposes, you should consult current versions of the appropriate approved operational documentation such as ERSA.