CLASSIFIER

You are here:

AXCEL.CLASSIFIER function

This function applies a statistical classifier on a dataset. In other words, it identifies to which of a set of categories a new observation belongs, to the attributes extracted from the training dataset.

Syntax

AXCEL.CLASSIFIER(data, [KeepOut], [Method], [probability])


The AXCEL.CLASSIFIER function syntax has the following arguments:

data Required. data must be a table of identified groups in the first column and attributes which are considered to be used in classification. For instance, if we would like to use the famous Iris dataset in our classifier, the table would be like this:

SpeciesSepal.LengthSepal.WidthPetal.LengthPetal.Width
setosa5.13.51.40.2
setosa4.931.40.2
setosa4.73.21.30.2
setosa4.63.11.50.2

Species column includes the classification of each instance and the rest of the columns are the attributes of these instances.

If you add “.n” at the end of a column, you can drop it out from your analysis. For instance, if you would like to drop Sepal.Width in the above table, you can change Sepal.Width to Sepal.Width.n:

Sepal.Width.n
3.5
3
3.2

Also, if you have binary data in not-numeric format such as TRUE/FALSE or YES/NO, you should add “.b” at the end of column name to identify them as binary (boolean) data.

KeepOut Optional. If you would like to keep out a number of rows from your clustering for backtesting or cross-validation, you can set the number of here. It should be an integer number which indicated how many rows at the end of the dataset should be kept out of your training. The cluster output for keep out rows is the predicted value of the model.

Method Optional. Default is K-Means methodology. Axcel supports several clustering methodologies that you can choose here. Here is the list of available methods

"NN" or "nn" K-nearest neighbor (default) Axcel uses 3 nearest neighbors in this classification.
"RF" or "rf" for Random Forest
"AB" or "ab" for AdaBoost
"GB" or "gb" for Gaussian Process Classifier
"RBF" or "rbf" for Support Vector Machine classifier with RBF kernel. 

probability Default is FALSE. If set TRUE, this function reports the probability assigned to each class for each observation. Otherwise, the final decided class is reported.

when you type =AXCEL.CLASSIFIER in an Excel cell, the IntelliSense guides you through required and optional (shown in [] brackets) inputs:

In the example above, we have:

=AXCEL.CLASSIFIER(A1#,,”RF”,TRUE)

This means that our data is located at cell A1 as an array, we do not keep out any observation and use Random Forest (RF) methodology. Here is the output of this function:

As you can see, Axcel reports a probability of each class for each observation as we set the probability to TRUE in the function. If we set it to FALSE:

=AXCEL.CLASSIFIER(A1#,,”RF”, FALSE)

Here is the output:

As it is presented, Axcel reports the assigned class instead of probabilities.