KMeans Clustering problem in Aster Analytics Foundation 5.11

Aster
Teradata Employee

KMeans Clustering problem in Aster Analytics Foundation 5.11

I tried to do clustering on the iris dataset using KMeans clustering function in Aster Analytics Foundation 5.11. The problem I am facing is

1. For K=2,3,5, 10, upto 50 I just get one cluster with non-zero values.

2. For K=80 I get non-zero clusters, with total rows being 215, but understanding these 80 clusters is not possible.

Using Teradata Warehouse Miner, the same iris dataset gets clustered into K=2, 3, 5, 15 and gives accurate results, but Aster is not working for low K values. I am using 0.001 threshold, but I also tried it for thresholds more or less. No effect.

Here is the SQL-MR query I am using

beehive=> Select * from kmeans(
ON (Select 1)
PARTITION BY 1
database('beehive')
userid('beehive')
PASSWORD('beehive')
inputtable('iris')
outputtable('iris_centroid2')
numberK(5)
threshold(0.001)
MaxIterNum(50)
);

and here is the result, giving just one cluster

Successful!
Algorithm converged.
Iterations: 0.
The final means are stored in the table "iris_centroid2", and you can use kmeansplot to assign the point to its nearest centroid.
(4 rows)

beehive=> select * from iris_centroid2;
clusterid | means
-----------+---------------------------------------------------
0 | 3392456.0 3392456.0 3392455.0 3392455.0 3392455.0
1 | 0.0 0.0 0.0 0.0 0.0
2 | 0.0 0.0 0.0 0.0 0.0
3 | 0.0 0.0 0.0 0.0 0.0
4 | 0.0 0.0 0.0 0.0 0.0
(5 rows)

Anybody has an idea how to get more non-zero clusters in Aster?

2 REPLIES

Re: KMeans Clustering problem in Aster Analytics Foundation 5.11

Hi

I am also having a similar issue. Is this a problem with Aster?

Teradata Employee

Re: KMeans Clustering problem in Aster Analytics Foundation 5.11

This is a bug with the kmeans algorithm, which has been solved with the upgrade to 6.10.