Relatively new to programming, so this might be a dumb question. I need to automatically sync up data from an excel file to a TeraData server. This includes adding all of the values from the excel file to the server, but also removing any changed files/duplicate files. Basically, I think I need to remove everything (except column headings) and then insert all of the new data. Is there anyway to do this? The end connection will likely be from a local excel file, and the TeraData server will be accessed by Tableau dashboards.
Start with what happens to the data under different circumstances.
Questions about 'Data'
Does the data in an Excel file need to completely replace any existing data?
If you're loading data from multiple Excel: files does data in file#1 only replace existing data from a previous version of file#1?
When loading data from an Excel file, what happens if the existing data contains a 'key' value that is not on the new file? Do you want to keep that data?
- as an example of this, assume that your Excel file contains data about 'customers' and that for each customer it holds the customer_id, customer_name and customer_credit_limit.
- if the current data in Teradata contains information on customer_id's 1,2,3,4,5 and your new Excel file contains information for customer_id's 1,2,3,6; at the end of this process do you want the data in Teradata to contain information for customer_id's 4 and 5 (the customer_id's that are not in the new file)?
Questions about how/when to run the process
How often are you going to need to load data from your Excel files?
What would you expect the average and maximum number of rows to be in each excel file that has to be processed?
Are the data loads going to be started by people or do you need them to be automated/scheduled?
Typically any answers to the above can be handled, but you need to have some answers before you can create a sensible design.
To start with simple designs:
1) If you are completely replacing the data in Teradata on each load with small volumes of rows and small number of Excel files, I'd extract that data from Excel to a CSV and load it using BTEQ or TPT (TD utilities).This could even load direct to the final table.
2) (Somewhere near other end of the scale) With multiple files, only processing 'changes', thousands of rows in each file then you're probably looking at a load using TPT into a work table (usually referred to as a 'staging table') and then use raw SQL to handle the changes against the final table.
Don't worry to much if some of the utility names in the "Possible designs" section don't make sense yet, start with the "Questions about data".