I am a developer hoping to get some help with ideas to devise an elegant way of grabbing the next available Unique Primary Index for inserts to a table which is quite old and large that drives an ASP.Net application which I am maintaining. Details for the change I have to make are below:
The UPI is an integer and can be possible values from 000,000,001 to 999,999,999. Previously, the first 3 digits were determined by a code from another column (001-999), therefore the lowest existing value is 001,000,000. I am making a change to decouple the code from the UPI, so for future values of this UPI after I make the change the first value will be 000,000,001, and after the first 1,000,000 inserts we will run into duplicates if we simply keep incrementing by 1.
Therefore I need a way of grabbing the next available value.
This table will be written to by the ASP.Net application which calls a stored procedure in Teradata to make the inserts.
I have been told that the best way to do this is a function to grab the value and call it from ASP.Net and then pass the value to the stored procedure, but I wanted to see if I could get any other ideas to consider as well, because I don't like the idea of looping through with a function. The highest block of consecutive numbers I found is 250 which would have to be looped through.
I thought of perhaps having a reference table that just holds values 000,000,001-999,999,999 and then do an outer join on the UPI and grab the top 1 null from the existing tables, but I'm not sure if this could be set up to avoid being too expensive. Also, my team is telling me to avoid making a new table at all costs.
Thanks for any help or ideas from out there.
Since you are dynamically generating unique values in UPI column, it is better if you can re-create your table by defining UPI column as an identity column.
However, the number sequence may not always guaranteed with identity feature of Teradata.
--create a test table by defining UPI column as an identity column
create table my_table
upi_column integer generated by default as identity (start with 1 increment by 1 minvalue 1 maxvalue 2147483647 no cycle) --generate a value if not specified with min value 1 and max value as 2147483647 (integer limit), without repeating values
unique primary index(upi_column)
--if you don't specify a value for upi_column, the Teradata system automatically generates a value for upi_column
insert my_table (upi_column, other_column) values (, 'dummy1');
insert my_table (upi_column, other_column) values (, 'dummy2');
insert my_table (upi_column, other_column) values (, 'dummy3');
insert my_table (upi_column, other_column) values (, 'dummy4');
--if you specifiy a value for upi_column, it will be populated as it is (because we have used "generated by default as identity" option while creating the upi_column).
insert my_table (upi_column, other_column) values (5, 'dummy5');
select * from my_table order by 1;
If you cannot re-create your table in the production environment for some reason, there is another option of generating the next consecutive UPI value using SQL
--create a volatile temporary table for testing purpose
create volatile table my_table ( upi_column integer, other_column varchar(10) ) unique primary index (upi_column) on commit preserve rows;
--pouplate the test table
insert into my_table (upi_column, other_column) values (1, 'dummy1');
insert into my_table (upi_column, other_column) values (2, 'dummy2');
insert into my_table (upi_column, other_column) values (3, 'dummy3');
insert into my_table (upi_column, other_column) values (4, 'dummy4');
--SQL that automatically generates the next highest consecutive value in UPI column
insert into my_table (upi_column, other_column)
row_number() over (partition by other_column) + max_upi_value --generate the next higher consecutive value after the max value in upi_column
from (select 'dummy5' as other_column) as derived_table --derived table used for testing purpose
cross join (select max(upi_column) as max_upi_value from my_table) as max_value_table --get the max value out of existing data in upi_column
select * from my_table order by 1;
Hello, thanks for your help.
Apparently my DBA does not like the idea of re creating the table.
Is there any option that will take the next available number instead of using the highest value from existing data? So if the table has UPI values 1,2,4,5 and we make a new insert the new row will have a UPI of 3, not 6?
okay so I found this page: http://stackoverflow.com/questions/4340793/how-to-find-gaps-in-sequential-numbering-in-mysql
and modified their query like so
select top 1 gap_start
SELECT (t1.upi + 1) as gap_start
FROM db.table t1
WHERE NOT EXISTS
FROM db.table t2
WHERE t2.upi = t1.upi + 1
) as gaps
order by 1;
and this always gives me the next available UPI.
If anyone reads this though who can provide me some info, I am wondering:
If a stored procedure uses this select to perform an insert, how will Teradata handle concurrent users if there are multiple calls to the stored procedure at the same time?
In the explain for the select it says it locks the table for read, and I am being told that's not good. But doesn't the stored procedure lock the table for the big insert which this is a part of anyways??
Thanks for any help.
If the gaps are truly random, then it's going to be difficult to use INS-SEL without using a fancy UDF.
Assuming the gaps are created by DELETEs, you can have a trigger that captures the values being deleted into a "gap" table, which can then be consumed by a subsequent inserts before you resort to incrementing UPI value.
For the concurrency, you'll need to serialize the access by locking table in write mode to ensure no two users end up using the same gap value.
Creating a list of all gaps is easy using OLAP-functions:
id - GapSize AS GapStart
,id - 1 AS GapEnd
,COALESCE(id - MIN(id)
OVER (ORDER BY id
ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING)
,id) - 1 AS GapSize
SELECT id FROM tab
UNION ALL -- needed to get a row even if the table is empty or there's no gap
SELECT 1000000000 AS id FROM sys_calendar.CALENDAR WHERE calendar_date = DATE
) AS x
QUALIFY GapSize > 0
Then wrap this in a Derived Table to change it to sequential values:
COALESCE(SUM(GapSize) OVER (ORDER BY GapStart ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),0)+1 AS rnStart
,SUM(GapSize) OVER (ORDER BY GapStart ROWS UNBOUNDED PRECEDING) AS rnEnd
) AS x
QUALIFY SUM (GapSize) OVER (ORDER BY GapStart ROWS UNBOUNDED PRECEDING) < GapSize + "number of rows in your data"
Finally join this to your data (including a row number):
x.GapStart + dt.rn - x.rnStart AS newID
ROW_NUMBER() OVER (ORDER BY ...) AS rn
FROM yourdata) AS dt
) AS x
ON dt.rn BETWEEN x.rnStart AND x.rnEnd
Regarding performance the "list of gaps" query will need most of the resources. If performance is not good enough you better materialize the gaps once in a permanent table and add some logic in your SP to remove the used newIDs using DELETE/UPDATE.
Thanks Dieter. I love your solution. This is a refined version of what I wanted to do, and would prefer to do.
The project team decided that we're going with a loop in the stored procedure combined with a reference table to store last used value. The ref table will start with 000000001, then the loop will start with a counter of 1 and add that to the reference table's value to see if it exists, and if it exists then continue to increment by +1 until we find a value that doesn't exist in the main table and then update the reference table with that value. Finally, select the value from the reference table for the insert to the main table's index column.
It's simple and should work fine, but I like elegant solutions like yours.