Hash Index

Database
Enthusiast

Hash Index

I would like to know the purpose of Hash Indexes. Iam confused regarding the purpose/usage of Hash Indexes.Can some one explain me the clear distinction between a Hash Index and Join index, hash Index and secondary index. My question here is why do we need a seperate (Hash) Index when the same purpose can be acheived using single table join indexes or a secondary indexes. Under what circumstances can usage of Hash Index benefit us more than a Join Index or secondary index.

I refered through manual, but, I hope I could not get the essence of what is explained there. Can someone explain this to me in simple sentences?
4 REPLIES
Enthusiast

Re: Hash Index

A hash index is simply a restricted form of join index. It was introduced mainly to satisfy some rules in the TPC-H benchmark specification that did not allow regular join indexes.

The syntax for creating a hash index is simpler than for creating an equivalent join index. As far as I know, that is the only reason one might prefer to use a hash index instead of a join index. The underlying technology is the same. There should be no difference in performance between a hash index and an equivalent join index.

Enthusiast

Re: Hash Index

There is only one small benefit that a hash-index gives you that you don't get with a single-table join index. The hash-index will automatically provide the pointer back to the native data row. In order to have this in the single table join index, you have to specify the ROWID in the join index. So, as long as you include the ROWID as one of your columns in the single-table join index, they are equivalent and just two different ways to accomplish the same thing.

The join index provides for more options than does the hash index.

There is a big difference between the single table join index and a secondary index. The single table join index is hash distributed (just like a normal table), while a non-unique secondary index (NUSI) is not. So, if you have a somewhat highly unique secondary index (but not totally unique), a single table join index would probably be a better option since it can avoid the all-AMP operation that is required when accessing a NUSI. It can go right to the AMP containing the index row and then redistribute the ROWID's that it gets and join them to the data table. When it redistributes the ROWID's, that may be a "few AMP" operation depending on how many rows qualify. So, this again, avoids having all of the AMPs getting involved, which helps your system to support more queries.

Of course, there are consequences as well to using a single-table join index. One of the biggest ones is that some utilities (like Multiload) do not support the single-table join index, whereas it does support the NUSI. Also, if your index is not very unique (i.e. an index entry points to many rows in your table), you would be better off with the NUSI so that all AMPs do get involved in finding the rows initially.
N/A

Re: Hash Index

One advantage of hash index over Single table join index is that you can define "order by values (column_name)" in the hash index definition where "column_name" is defined in the "by" clause (i.e. primary index for hash index). It will make sure that rows are sequenced based upon the "values" rather than "hash" of the column_name. This feature is not available in single table join index where you cannot have an "order by" clause on the primary index defined (surely you can have any order by on any other column. default is order by hash of primary index). this difference is subtle.

Senior Apprentice

Re: Hash Index

Of course you can define an ordered Primary Index on a Join Index, simply use ORDER BY in the SELECT.