Speed up NaiveBayesTextPredict on large prections data sets

Aster
Highlighted
Enthusiast

Speed up NaiveBayesTextPredict on large prections data sets

Hi, does anyone know some tips & tricks for speeding up the run time of a NaiveBayesTextPredict that takes about 5 mio text rows and applies a Token model table of about 160K tokens? Did a sample of 10K text rows and it took about 2 hours (Aster Express 8GB RAM Worker + 4GB RAM Queen, SSD i7, 2,7 Ghz processor). Tried to run on 1 Mio rows and cancelled it after 2 days of working. I had in my syntax a create table as select from NaiveBayesTextPredict so I don't know if the lag komes from table creation or from prediction step. Created index on all the columns of the dimesin table (model) but stil no improvement. What could be done to speed up the process? Is the delay due to the dimension table (structure, distribution, etc..or to the input table (prediction text)? Indexing 5 Mio text rows would be dezastruos I suppose from space and performance point of view...

 

Thanks