How To Calculate Correlations on Big Data with Aster R

Learn Data Science

A simple question if 2 features are correlated and how much seems too easy to answer. But what if the features belong to big data and are distributed in million of rows across your cluster? Or what if there are 200 features instead of 2? Or  correlations matter across various subsets of data just as much? All of these questions are typical "complications" to simple question of correlation when dealing with big data.

Before I send you to the page discussing various features of correlations with Aster I summarized all 3 discussed solutions and the features in the table below:

Method / Solution featuresVariable (columns) PermutationsCalculating for GroupsSQL-MRin-database R
Aster R ta.corNNYN
Aster R in-database ta.tapplyNYNY
toaster computeCorrelationsYYYN

where Y means that this is supported / implemented feature and N means not.

And the page with in-depth discussion can be found here.