CLUSTERING

You are here:

AXCEL.CLUSTERING function

Clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups.

Syntax

AXCEL.CLUSTERING(data, nClusters, [KeepOut], [Method])


The AXCEL.CLUSTERING function syntax has the following arguments:

data Required. data must be a table of attributes that are considered to be used in clustering. For example, if we would like to group the customers of a grocery store based on their expenditure on the types of groceries, we would have a table like this:

FruitsDairyDrinksFrozenDetergents
12.669.657.5621.4026.74
7.059.819.5617.6232.93
6.358.807.6824.0535.16

If you add “.n” at the end of a column, you can drop it out from your analysis. For instance, if you would like to drop Frozen in the above table you can change Frozen to Fronzen.n:

Frozen.n
21.40
17.62
24.05

nCluster Required. This is the number of clusters that you would like to apply to your data. For instance, if you would like to group your data to 4 clusters, you set this number to 4.

KeepOut Optional. If you would like to keep out a number of rows from your clustering for backtesting or cross-validation, you can set the number here. It should be an integer number which indicated how many rows at the end of the dataset should be kept out of your training. The cluster output for keepout rows is the predicted value of the model.

Method Optional. Default is K-Means methodology. Axcel supports several clustering methodologies that you can choose here. Here is the list of available methods

"K" or "k" for K-Means (default)
"G" or "g" for Gaussian Mixture
"B" or "b" for balanced iterative reducing and clustering using hierarchies (Birch)
"S" or "s" for Spectral Clustering
"A" or "a" for Agglomerative Clustering

when you type =AXCEL.CLUSTERING in an Excel cell, the IntelliSense guides you through required and optional (shown in [] brackets) inputs:

In the example above, we have:

=AXCEL.CLUSTERING(A1:E441, 5)

This means that our data is located at cell A1 through E441, the number of clusters is set to 5, we do not keep out any observation and use the default K-Means methodology. Here is the output of this function:

As you can see, shoppers are clustered in 5 groups and the cluster number is shown for each of the grocery shoppers.