.Net Provider - Column Character Set

Connectivity

.Net Provider - Column Character Set

Hi

Is there a way to tell the character set from a char/varchar column in a TdDataReader?

The DBC tables contain a field "CharType" which is 1 for "character set latin" and 2 for "character set unicode" columns. The "ColumnLength" is also doubled for unicode.

Using the GetSchemaTable command produces identical rows for both character sets.

There is a workarounds by accessing the dbc table (which does not work in every case for generic queries), but is there a more elegant way?

Thanks,
Helmut
11 REPLIES
Teradata Employee

Re: .Net Provider - Column Character Set

TdDataReader.GetSchemaTable method does not return CharType (Character Set not to be confused with Session Character Set). However Columns Schema Collection (TdConnection.GetSchema('Columns' ...) method) returns CharType.

That said there is another way; for example given the following table

CREATE TABLE GetCharType(
C01 INTEGER,
C02 CHAR(10) CHARACTER SET LATIN,
C03 VARCHAR(10) CHARACTER SET UNICODE);

TYPE function return the CharType: Select type(GetCharType.C02), type(GetCharType.C03)

Type(C02) Type(C03)
CHAR(10) VARCHAR(10) CHARACTER SET UNICODE

Re: .Net Provider - Column Character Set

Hi

This requires that the columns exist in a physical table.
I'd like to let an user enter a statement, which is then executed and the result exported to a mssql server. The problem is, that I don't know wheter to use char or nchar for the destination table.

My current workaround consists of:
- parsing the statement to remove an "order by" clause
- creating a (volatile) table with no data
- retrieving the metadata of the table
- submitting the original query

What gives me a headache is parsing the statement, since this is something I'd always try to avoid (and any "order by" may be part of analytic functions).

Is there no better way?

Thanks,
Helmut
Teradata Employee

Re: .Net Provider - Column Character Set

I agree that Char_type can help in some scenarios. But Teradata Database has a concept of Session Character Set. All Char/VarChar data is converted to Session Character Set before it is send to the Data Provider. For example assume Table FOO has two columns:

Col1 CHAR(10) CHARACTER SET KANJISJIS
Col2 CHAR(10) CHARACTER SET UNICODE

If application sets the Session Character set to KANJISJIS_0S then Unicode data (Col2) is always converted to KanjiSJIS. Therefore in some scenarios the Session Character set is very important and it overshadows the underlying Column-Character-Set. I recommend using SQL Server NCHAR data type and set the Session Character Set to UTF8 or UTF16.

Re: .Net Provider - Column Character Set

Hi

In some (or most?) cases, this is a complete waste of space - a typical "flag" column (char(1) character set latin not null compress ('Y', 'N')) takes about 2 bit on Teradata and 2 byte as nchar(1) on SQL Server.

SSIS also maps everything to nchar/nvarchar by default :(

- Helmut
Teradata Employee

Re: .Net Provider - Column Character Set

It is dbc. columns, not dbc.tables

If you look in the dbc.columns table you will find what you looking for.  If you find a chartype = 1 then it character set latin and if you find chartype = 2, it is unicode.

SELECT *

FROM DBC.COLUMNS

WHERE TABLENAME = 'mytable'

AND DATABASENAME = 'db_TB'

AND COLUMNTYPE IN ('CF','CV')

ORDER BY 3;

Re: .Net Provider - Column Character Set

It is now almost 4 years since Helmut's post, and I have his exact same issue.  I need to know whether a column is using double byte characters or not so I know whether to declare a char/varchar vs. an nchar/nvarchar in SqlServer.  It would be extremely useful for GetSchemaTable() to return a column that would indicate whether the column used double byte characters.  Even if it returned a ByteSize column, then you could compare ColumnSize to ByteSize and determine if DoubleByte characters are being used.  The fact that ColumnSize returns different values, either maximum number of bytes or characters depending on character set means that it must know, so why not give us a column in the SchemaTable that will let us in on the secret, because otherwise there is no way to tell from the SchemaTable.

----------

From the Teradata.Net 15.0.0.0 documentation for ColumnSize:

The maximum possible length of the column in bytes or characters.

  • Numeric Data Types: This is the size of the data type in bytes (for example, 4 for INTEGER).

  • Character and CLOB Data Types: The maximum length of the column in characters for LATIN, UNICODE and GRAPHIC. The maximum length of the column in bytes (single-byte character) for KANJI1 and KANJISJIS.

  • Byte and BLOB Data Types: The maximum length of the column in bytes.

  • DateTime and Interval Data Types: The maximum length of the column in characters.

  • JSON Data Type: The maximum length of the column in characters.

----------------

As Helmut said, currently the only way to get whether a column is Unicode is to try and query the dbc.columns data and check for CharType=2, but that is inefficient and problematic for processing the schema of adhoc queries where determining the database, table and column that was the source of a particular column in order to attempt to query dbc.columns would require a full blown query parser to do it correctly.   The alternative to that as Helmut also suggested is creating an empty table based on the query then getting it's schema info, but that has challenges as well.

Bottom line all the alternatives are ugly and error prone requiring way too much effort for something that should be easy to determine from the SchemaTable returned by TdDataReader.GetSchemaTable()

If anyone is listening, can you throw us a bone? :)

Thanks,

Rick

Teradata Employee

Re: .Net Provider - Column Character Set

A workaround that does not require parsing. The TdDataReader.GetSchemaTable() returns "BaseSchemaName", "BaseTableName" and "BaseColumnName"; you can compose a "HELP COLUMN BaseSchemaName.BaseTableName.BaseColumnName" command; it will reutrn the chartype and it does not require parsing.

We will try to add it to the next release.

Re: .Net Provider - Column Character Set

Thank you for the quick response, it is very much appreciated.  The use of the Base* values in the schema table does seem to work nicely to get the necessary values for the query to retrieve the CharType, though if you can add it to the SchemaTable that would be awesome, and way more efficient than running additional queries.  

Currently I'm using the Base* values to query against dbc.columns instead of using the HELP COLUMN so I can limit the information returned to just what I need, however can you tell me if there is any reason why I should use HELP COLUMN over querying against dbc.columns?

Querying dbc.columns also allows me to retrieve the CharType and any other info need for all the tables and columns I need in one call limiting it to only characters columns.  If I used HELP TABLE BaseSchemaName.BaseTableName, that would not let me limit to just character columns.

Again, I don't know if there is an advantage of use the HELP statements vs querying dbc.columns that I'm unaware of.  If so, let me know.

Thanks again...

Rick

Teradata Employee

Re: .Net Provider - Column Character Set

DBC.Columns[x] view are obsolete; the replacment system views are DBC.ColumnsV[x]. However you should use HELP COLUMN because DBS.ColumnsV[X] does not return CharType (or any other data type metadata) for View-Columns (vs. Table-Columns). Note that HELP TABLE does not return CharType for View-Columns either. You can either use "HELP COLUMN BaseSchema.BaseTable.*" or "HELP COLUMN BaseSchema.BaseTable.BaseColumnName". Also you can compose a multi-statement request; for example "HELP COLUMN X.Y.C01; HELP COLUMN X.Y.C222".