Counting NULLs in columns for TOP N records

Database

Counting NULLs in columns for TOP N records

select top 1000 count(column_name) from db_name.table_name where column_name is NULL

This query returns the number of NULLs in the entire database for this column, not just the number of NULLs for the TOP 1000 records as requested.

Is there something about the aggregate count() function that I don't know about?
5 REPLIES
Teradata Employee

Re: Counting NULLs in columns for TOP N records

Try replacing with count(*) and see if you get the result.

Re: Counting NULLs in columns for TOP N records

Using count(*) instead of count(column_name) produces the same result; all NULLs in the column are counted, not just the NULLs in the TOP n rows.
N/A

Re: Counting NULLs in columns for TOP N records

The result will be different:
count(columnname) counts only non-NULL rows, but count(*) counts all rows regardless of NULLs.
In your case "count(columnname) where columnname is null" will return 0.

But your main problem is the order of execution:
First WHERE is processed , then COUNT and finally TOP

select count(*) from
(select top 1000 column_name from db_name.table_name) dt

But TOP is optimized in a very strange way in Derived Tables, so better use
select count(*) from
(select column_name from db_name.table_name sample 1000) dt

But what are you really trying to achieve?

Dieter

Re: Counting NULLs in columns for TOP N records

I'm writing a data validation script in Perl for data analysts. They receive raw data as part of their ETL process and want to "pre-process" the data using a script. They'd like to know for each column in a table: how many NULLs are present in total and as a percentage of the records selected....

In english: "For the top N records in a table, tell me how many NULLs there are for each column".
N/A

Re: Counting NULLs in columns for TOP N records

The data is already loaded to the Teradata system and they need that information for further processing?

select
count(*) - count(col1) as col1nulls,
100 * (col1nulls - count(col1)) / cnt as col1percentnull,
count(*) - count(col2) as col2nulls,
100 * (col2nulls - count(col2)) / cnt as col2percentnull,
...
from
(select * from db_name.table_name sample 1000) dt

Dieter