Our production tables have all been created using "character set LATIN not casespecific". However, we have a tool that has created some tables using "character set UNICODE not casespecific". When running an explain on queries that join a table with each of the character set type, I see many "translate unicode to latin" phrases. I would like to understand what kind of performance impact this has or other issues.
I would say it kind of depends, and yes it's generally bad idea to keep to different character sets.
For example, since the latin columns are going to be converted to unicode, this means the optimizer will not be able to make good use of stats on the latin columns, another more significant downside is that since latin and unicode generate different rowhash, if tables T1 and T2 are joined on the their corresponding PI columns C1 and C2 and if C1 is unicode and C2 is latin, TD will have to still do a rowhash on the unicode translation of C2 which may change an otherwise amp local PI join to a plan where C2 is redistributed ... that's certainly not a good thing for performance stand point.