OLS

You are here:

AXCEL.OLS function

OLS is used to fit linear models. It can be used to carry out regression analysis with numerical or categorical explanatory variables.

Syntax

AXCEL.OLS(data, [predict], [intercept], [deployment], [plot])


The AXCEL.OLS function syntax has the following arguments:

data Required. data must be a table where its first column is the dependent (or y) variable and the rest are independent or explanatory variables. Here is an example:

mpghpgearscyl
22.115056
19.712054
20.512564

In this example, if we use this table as an input, the model is structured in this way:

mpg = β0 + β1 × hp + β2 × gears + β3 × cyl + ε

where β0 is the intercept and β1 , β2 and β3 are coefficient estimates and ε is the error term (or residuals).

If a column includes strings instead of numbers, Axcel considers them as a categorical variable. For instance, if you have columns like this in your data table:

color
blue
red
blue

Axcel considers variable color as a categorical variable and assigns a dummy for each type. If you have categorical variables in your data, whether it is in string or number format, you can add “.f” at the end of the column name to let Axcel know that it is a categorical variable. For instance, if a column in your dataset is like this:

gears
3
4
4
5
3

Axcel considers gears as a numerical variable and reports a coefficient for this variable in the regression results:

variablesEstimateStd.Errort.valuep.valuevif
(Intercept)34.65954.93697.02051.0136E-07NA
gear0.65190.90410.72110.47661.3206
cyl-2.74310.3735-7.34444.3241E-081.3206

However, if you add “.f” at the end of the name of the variable:

gears.f
3
4
4
5
3

Axcel considers it as a categorical variable and reports different coefficients for gear.f4, gear.f5, etc:

variablesEstimateStd.Errort.valuep.value
(Intercept)36.32033.7039.80831.4729E-10
gear.f40.84661.85710.45580.652
gear.f51.30281.83970.70820.4847
cyl-2.70720.4827-5.60810.00000528

Beside “.f”, there are several other operators that you can use in this function presented below:

Name Extension Operation
.abs Use absolute value (abs) of variable ( ABS(X) )
.f Transfer variable to categorical
.ln Use natural log of variable ( Ln(X) )
.n Drop the variable from estimates
.sq Use square of variable ( X2 )

predict Optional. Default is FALSE. If set to TRUE or 1, Axcel generates a prediction of the variable instead of regression results.

intercept Optional. Default is TRUE. If set to FALSE or 0, regression estimates are produced without intercept. For instance, for the regression example mentioned before :

with intercept = TRUE: mpg = β0 + β1 × hp + β2 × gears + β3 × cyl + ε
with intercept = FALSE: mpg = β1 × hp + β2 × gears + β3 × cyl + ε

deployment Optional. You can define a deployment name to deploy your model. After deployment, you can use the deployed function in AXCEL.OLS.PREDICT function. Please note that the deployment name is case sensitive and should include alphabets, numbers, and non-repeating underline. You cannot use underline at the beginning or end of the filename. For instance “abc-123” or “a-b-c-123” are allowed but “abc–123”, “abc-123-“, “-abc-123” or “abc-$123” are not allowed. Depending on your subscription, you can view and restrict access to the deployed model through Axcel web application.

plot Optional. Default is FALSE. When it is TRUE, Axcel produces model diagnosis plots inside the sidebar. You can expand the plot and show it in your browser. Producing plots usually creates latency in showing the results. So, we recommend that you use this option when it is needed. Please note that no plot is produced when deployment is requested. Here is an example of the plot:

Model Diagnostics
Variable Importance

when you type =AXCEL.OLS in an Excel cell, the IntelliSense guides you through required and optional (shows in [] brackets) inputs:

In the example above, we have:

=AXCEL.OLS(A1:K33,,,,TRUE)

This means that our data is located at cell A1 through K33, we do not want prediction (skipped for default), we want to keep intercept (skipped for default), no deployment and finally, we want to see the diagnostic plots:


At the same time, model specifications, performance and variable importance are reported in the console as shown below:

-------- OLS Summary --------- 
Call: lm(formula = formula, data = df) 
Residuals: 
           Min      1Q  Median     3Q    Max 
       -3.2142 -1.6117 -0.2902 1.1974 4.6322 
Coefficients: 
Estimate          Std.      Error      t value      Pr(>|t|) 
(Intercept)   14.51324   18.88856        0.768       0.4513 
gear.f4        0.10225    2.78999        0.037       0.9711 
gear.f5        0.74084    3.14294        0.236       0.8160 
cyl           -0.09363    1.10044       -0.085       0.9330 
disp           0.01505    0.01825        0.824       0.4194 
hp            -0.02950    0.02577       -1.145       0.2658 
drat           0.89878    2.01117        0.447       0.6598 
wt            -3.92788    1.93376       -2.031       0.0557 . 
qsec           0.83705    0.74488        1.124       0.2744 
vs             0.12934    2.19158        0.059       0.9535 
am             2.75565    1.96702        1.401       0.1766 
carb           0.08170    0.91785        0.089       0.9300 
--- Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1 
Residual standard error: 2.682 on 20 degrees of freedom 
Multiple R-squared: 0.8723, Adjusted R-squared: 0.802 
F-statistic: 12.42 on 11 and 20 DF, p-value: 1.17e-06 
-----
Min.max.accuracy     MAE      MAPE    MSE    RMSE     NRMSE.mean NRMSE.median                                    0.918               1.71    0.0876   4.49    2.12      0.106             0.11
NRMSE.mean.accuracy   NRMSE.median.accuracy   Efron.r.squared   CV.prcnt 
              0.894                    0.89             0.872       10.6
------ 
Variable Importance Sign 
1 wt 1.9611887 NEG 
2 am 1.2254035 POS 
3 qsec 1.1234133 POS 
4 hp 0.9868407 NEG 
5 disp 0.7467585 POS 
6 drat 0.4813036 POS 
7 gear 0.4389142 POS 
8 carb 0.2406258 NEG 
9 vs 0.1509915 POS 
10 cyl 0.1066392 NEG

With the same command but setting prediction TRUE (leave the rest for default values), we have:

=AXCEL.OLS(A1:K33,TRUE)

Which reports the predicted values in a new column called “predict” as shown below: