GLM

You are here:

AXCEL.GLM function

GLM is used to fit generalized linear models, specified by giving a table of the linear predictors and a description of the error distribution (family).

Syntax

AXCEL.GLM(data, [family], [predict], [intercept], [deployment], [plot])


The AXCEL.GLM function syntax has the following arguments:

data Required. data must be a table where its first column is the dependent (or y) variable and the rest are independent or explanatory variables. Here is an example:

mpghpgearscyl
22.115056
19.712054
20.512564

In this example, if we use this table as an input, the model is structured in this way:

mpg = β0 + β1 × hp + β2 × gears + β3 × cyl + ε

where β0 is the intercept and β1 , β2 and β3 are coefficient estimates and ε is the error term (or residuals).

If a column includes strings instead of numbers, Axcel considers them as a categorical variable. For instance, if you have columns like this in your data table:

color
blue
red
blue

Axcel considers variable color as a categorical variable and assigns a dummy for each type. If you have categorical variables in your data, whether it is in string or number format, you can add “.f” at the end of the column name to let Axcel know that it is a categorical variable. For instance, if a column in your dataset is like this:

gears
3
4
4
5
3

Axcel considers gears as a numerical variable and reports a coefficient for this variable in the regression results:

variablesEstimateStd.Errort.valuep.valuevif
(Intercept)34.65954.93697.02051.0136E-07NA
gear0.65190.90410.72110.47661.3206
cyl-2.74310.3735-7.34444.3241E-081.3206

However, if you add “.f” at the end of the name of the variable:

gears.f
3
4
4
5
3

Axcel considers it as a categorical variable and reports different coefficients for gear.f4, gear.f5, etc:

variablesEstimateStd.Errort.valuep.value
(Intercept)36.32033.7039.80831.4729E-10
gear.f40.84661.85710.45580.652
gear.f51.30281.83970.70820.4847
cyl-2.70720.4827-5.60810.00000528

Beside “.f”, there are several other operators that you can use in this function presented below:

Name Extension Operation
.abs Use absolute value (abs) of variable ( ABS(X) )
.f Transfer variable to categorical
.ln Use natural log of variable ( Ln(X) )
.n Drop the variable from estimates
.sq Use square of variable ( X2 )

family Optional. Default value is “gaussian”. Family is the description of the error distribution and link function to be used in the model. The list of supported distributions and corresponding link function in GLM is presented below:

gaussian(link = "identity")
binomial(link = "logit")
Gamma(link = "inverse")
inverse.gaussian(link = "1/mu^2")
poisson(link = "log")
quasi(link = "identity", variance = "constant")
quasibinomial(link = "logit")
quasipoisson(link = "log")

For instance, if you would like to run a logistic regression model, you set the family as “binomial”.

predict Optional. Default is FALSE. If set to TRUE or 1, Axcel generates a prediction of the variable instead of regression results. Please note that the predicted values are reported as the final “response” values. For instance, for a model with a binomial family (a logistic regression), the probability of an outcome (a number between 0 to 1) is reported for each input.

intercept Optional. Default is TRUE. If set to FALSE or 0, regression estimates are produced without intercept. For instance, the regression example mentioned before :

with intercept = TRUE: mpg = β0 + β1 × hp + β2 × gears + β3 × cyl + ε
with intercept = FALSE: mpg = β1 × hp + β2 × gears + β3 × cyl + ε

deployment Optional. You can define a deployment name to deploy your model. After deployment, you can use the deployed function in AXCEL.GLM.PREDICT function. Please note that the deployment name is case sensitive and should include alphabets, numbers, and non-repeating underline. You cannot use underline at the beginning or end of the filename. For instance “abc-123” or “a-b-c-123” are allowed but “abc–123”, “abc-123-“, “-abc-123” or “abc-$123” are not allowed. Depending on your subscription, you can view and restrict access to the deployed model through Axcel web application.

plot Optional. Default is FALSE. When it is TRUE, Axcel produces model diagnosis plots inside the sidebar. You can expand the plot and show it in your browser. Producing plots usually creates latency in showing the results. So, we recommend that you use this option when it is needed. Please note that no plot is produced when deployment is requested. Here is an example of the plot:

Model Diagnostics

when you type =AXCEL.GLM in an Excel cell, the IntelliSense guides you through required and optional (shown in [] brackets) inputs:

In the example above, we have:

=AXCEL.GLM(A1:D401,”binomial”,,,TRUE)

This means that our data is located at cell A1 through D401, we do not want prediction (skipped for default), we want to keep intercept (skipped for default), no deployment and finally, we want to see the diagnostic plots:

At the same time, model specifications, performance and variable importance are reported in the console as shown below:

With the same command but setting prediction to TRUE (leave the rest for default values), we have:

=AXCEL.GLM(A1:K33, “binomial”, TRUE)

Which reports the predicted values in a new column called “predict” as shown below: