Hi, I'm an end user, mainly running queries on our database and using reporting services.
My company uses SQL Assistant as the default querying tool and some of the data I pull is in the Hebrew language. SQL Assistant is configured to use the Teradata.net connection with Session Character Set set to ASCII and Session Mode to DEFAULT. With this configuration I have no problems at all retrieving results and seeing hebrew characters and english characters in the appropriate fields.
I've tried switching to a better and more featured SQL editor, installing Teradata Studio and also trying the Teradata plug-in for Eclipse. Unfortunately, with both tools, when I query a table with hebrew data, I receive Latin characters in the fields content. I assume this has to do with JDBC and the way the results are parsed to the client.
I've tried different settings, changing the Charset value from UTF8 to UTF16 and to ASCII. With ASCII, which is the same setting in the SQL Assistant, I receive question marks instead of letters as the results. I also tried different combinations of fonts on the client but to no avail.
Is this a limitation with JDBC? Does this has to do with how our Teradata DB is configured and table columns are defined (I have no control of that as end user...). If so, why do ODBC and Teradata.net connections are able to handle that but not JDBC?
I would appreciate any help with the subject.
To properly support Hebrew, the data should be stored as CharacterSet UNICODE, and both loads and query tools should specify an appropriate Session Character Set (NOT specify/default to ASCII).
Your description of tool behavior makes me think that the columns are improperly defined with CharacterSet LATIN and the data is being loaded with session character set ASCII - but it's really Windows-1255 or something like that. Pretending other single-byte character sets are Teradata LATIN may appear to work, as long as no one requests a translation. But it's not a good idea in the long run. If this is the case, you can't really fix it as an end user.
It isn't that ODBC or .NET "handle" the situation, exactly. It's that they don't translate the data when session character set is ASCII, and SQL Assistant then leaves it up to the OS to display properly.
Thank you for your very through reply.
Is there a way to 'simulate' that behavior with JDBC? Prevent that translating from taking place? I'm gonna try a few more tables, I hope it's just those I was quering that had misconfigured charsets and not a general trend that our db programmers practice :) Perhaps I'll contact our DBA...
Fred gave a complete, and correct, answer. I don't really have anything to add from a technical point of view.
Character data in Java is stored as Java String objects, and the Teradata JDBC Driver uses String methods to encode characters to bytes before sending them to the Teradata Database, and also to decode bytes into characters after receiving them from the Teradata Database.
The question mark characters that you get are an artifact from the String encode methods when converting non-ASCII characters into bytes. There is no way to obtain the same behavior that you get on Windows with the ODBC Driver or the .NET Data Provider.
The Teradata JDBC Driver does provide a STRICT_ENCODE=ON connection parameter for applications that prefer an exception instead of the question marks. But STRICT_ENCODE=ON will not help you access non-ASCII data via the ASCII session character set.