# GLM

**AXCEL.GLM function**

GLM is used to fit generalized linear models, specified by giving a table of the linear predictors and a description of the error distribution (family).

**Syntax**

AXCEL.GLM(data, [family], [predict], [intercept], [deployment], [plot])

The AXCEL.GLM function syntax has the following arguments:

**data** Required. data must be a table where its first column is the dependent (or y) variable and the rest are independent or explanatory variables. Here is an example:

mpg | hp | gears | cyl |

22.1 | 150 | 5 | 6 |

19.7 | 120 | 5 | 4 |

20.5 | 125 | 6 | 4 |

… | … | … | … |

In this example, if we use this table as an input, the model is structured in this way:

*mpg* = β_{0} + β_{1} × *hp* + β_{2} × *gears* + β_{3} × *cyl* + *ε*

where β_{0} is the intercept and β_{1} , β_{2} and β_{3} are coefficient estimates and *ε* is the error term (or residuals).

If a column includes strings instead of numbers, Axcel considers them as a categorical variable. For instance, if you have columns like this in your data table:

color |

blue |

red |

blue |

… |

Axcel considers variable color as a categorical variable and assigns a dummy for each type. If you have categorical variables in your data, whether it is in string or number format, you can add “.f” at the end of the column name to let Axcel know that it is a categorical variable. For instance, if a column in your dataset is like this:

gears |

3 |

4 |

4 |

5 |

3 |

… |

Axcel considers gears as a numerical variable and reports a coefficient for this variable in the regression results:

variables | Estimate | Std.Error | t.value | p.value | vif |

(Intercept) | 34.6595 | 4.9369 | 7.0205 | 1.0136E-07 | NA |

gear | 0.6519 | 0.9041 | 0.7211 | 0.4766 | 1.3206 |

cyl | -2.7431 | 0.3735 | -7.3444 | 4.3241E-08 | 1.3206 |

… | … | … | … | … | … |

However, if you add “.f” at the end of the name of the variable:

gears.f |

3 |

4 |

4 |

5 |

3 |

… |

Axcel considers it as a categorical variable and reports different coefficients for gear.f4, gear.f5, etc:

variables | Estimate | Std.Error | t.value | p.value |

(Intercept) | 36.3203 | 3.703 | 9.8083 | 1.4729E-10 |

gear.f4 | 0.8466 | 1.8571 | 0.4558 | 0.652 |

gear.f5 | 1.3028 | 1.8397 | 0.7082 | 0.4847 |

cyl | -2.7072 | 0.4827 | -5.6081 | 0.00000528 |

… | … | … | … | … |

Beside “.f”, there are several other operators that you can use in this function presented below:

Name Extension | Operation |

.abs | Use absolute value (abs) of variable ( ABS(X) ) |

.f | Transfer variable to categorical |

.ln | Use natural log of variable ( Ln(X) ) |

.n | Drop the variable from estimates |

.sq | Use square of variable ( X^{2} ) |

**family** Optional. Default value is “gaussian”. Family is the description of the error distribution and link function to be used in the model. The list of supported distributions and corresponding link function in GLM is presented below:

gaussian(link = "identity") binomial(link = "logit") Gamma(link = "inverse") inverse.gaussian(link = "1/mu^2") poisson(link = "log") quasi(link = "identity", variance = "constant") quasibinomial(link = "logit") quasipoisson(link = "log")

For instance, if you would like to run a logistic regression model, you set the family as “binomial”.

**predict** Optional. Default is FALSE. If set to TRUE or 1, Axcel generates a prediction of the variable instead of regression results. Please note that the predicted values are reported as the final “response” values. For instance, for a model with a binomial family (a logistic regression), the probability of an outcome (a number between 0 to 1) is reported for each input.

**intercept** Optional. Default is TRUE. If set to FALSE or 0, regression estimates are produced without intercept. For instance, the regression example mentioned before :

with intercept = TRUE:mpg= β_{0}+ β_{1}×hp+ β_{2}×gears+ β_{3}×cyl+εwith intercept = FALSE:mpg= β_{1}×hp+ β_{2}×gears+ β_{3}×cyl+ε

**deployment** Optional. You can define a deployment name to deploy your model. After deployment, you can use the deployed function in AXCEL.GLM.PREDICT function. Please note that the deployment name is case sensitive and should include alphabets, numbers, and non-repeating underline. You cannot use underline at the beginning or end of the filename. For instance “abc-123” or “a-b-c-123” are allowed but “abc–123”, “abc-123-“, “-abc-123” or “abc-$123” are not allowed. Depending on your subscription, you can view and restrict access to the deployed model through Axcel web application.

**plot** Optional. Default is FALSE. When it is TRUE, Axcel produces model diagnosis plots inside the sidebar. You can expand the plot and show it in your browser. Producing plots usually creates latency in showing the results. So, we recommend that you use this option when it is needed. Please note that no plot is produced when **deployment** is requested. Here is an example of the plot:

when you type *=AXCEL.GLM* in an Excel cell, the IntelliSense guides you through required and optional (shown in [] brackets) inputs:

In the example above, we have:

=AXCEL.GLM(A1:D401,”binomial”,,,TRUE)

This means that our data is located at cell A1 through D401, we do not want prediction (skipped for default), we want to keep intercept (skipped for default), no deployment and finally, we want to see the diagnostic plots:

At the same time, model specifications, performance and variable importance are reported in the console as shown below:

With the same command but setting prediction to TRUE (leave the rest for default values), we have:

=AXCEL.GLM(A1:K33, “binomial”, TRUE)

Which reports the predicted values in a new column called “predict” as shown below: