In this blog, we look at several lesser known features of the ncluster_loader utility that will help you manage your data loading.
You already know several options which are necessary when loading your data:
|Connect to database using this username rather than default||-U||--username|
|Database to connect to||-d||--dbname|
|Host name/IP Address of Aster database queen||-h||--hostname|
|Specifies how many rows of the file to skip before starting to load data||--skip-rows|
The simplest ncluster_loader call, loading test.txt into public.table1 in the mydata database on host 10.15.10.15, is:
> ncluster_loader -U jdoe –w jdoe -h 10.15.10.15 –d mydata public.table1 test.txt
ANALYZE and VACUUM
If you want to analyze any tables that you create, you can use the –z or –-auto-analyze option in your call. Remember that ANALYZE collects statistics about the table and stores the details in internal tables. This information is use to determine the most efficient plans for queries.
If your load fails, and you want to free up any dead space, you can use the –v or –-vacuum-table option to run VACUUM on the target table. (It also runs VACUUM on any child tables or partitions.)
DIFFERENT FILE FORMATS
The default file format for ncluster_loader is a tab-delimited text file. However that doesn’t mean you can’t load other types of files with different delimiters. Using the –c or –-csv option, you can specify what delimiter you are using with the –D or –-delimiter option. For example, I’m going to load a text file that uses a $ as a delimiter into public.table1 in the mydata database. (I’m also going to analyze the table after it’s loaded, so I need to include the -z option.) The syntax would be:
> ncluster_loader -U jdoe –w jdoe -h 10.15.10.15 –d mydata public.table1 test.txt –c –D “$” –z
> ncluster_loader -U jdoe –w jdoe -h 10.15.10.15 –d mydata public.table1 test.txt –-csv –-delimiter “$” –-auto-analyze
Be careful to not put spaces between the – (or --) and the option you are calling. Also the order that you are calling your options doesn’t matter (make sure you are attaching the correct arguments to the options, tho), but it’s a good practice to put them nearest the object you are acting on. Further, you can mix using flags and verbose options, ie you can use:
> ncluster_loader -U jdoe –w jdoe -h 10.15.10.15 –d mydata public.table1 test.txt –c –-delimiter “$” –z -vacuum-table
LOAD FILES THAT CONTAIN HEADER ROWS
If your file contains column names in the first row of your load file, the --header-included option will use these to map the columns to your target table. The column names of the load table and the target table must match, but can be in a different order. (Which is nice, and allows you to change the order of the columns on loading.)
By using this option, your are instructing ncluster_loader that the first line of the load table is a list of column names. An example of using this option is:
> ncluster_loader -U jdoe –w jdoe -h 10.15.10.15 –d mydata public.table1 test.csv –c –header-included –z -v
In future postings, we’ll look at other options that are part of the ncluster_loader utitlity, such error logging and using a map file to load your data.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.