# OLS

**AXCEL.OLS function**

OLS is used to fit linear models. It can be used to carry out regression analysis with numerical or categorical explanatory variables.

**Syntax**

AXCEL.OLS(data, [predict], [intercept], [deployment], [plot])

The AXCEL.OLS function syntax has the following arguments:

**data** Required. data must be a table where its first column is the dependent (or y) variable and the rest are independent or explanatory variables. Here is an example:

mpg | hp | gears | cyl |

22.1 | 150 | 5 | 6 |

19.7 | 120 | 5 | 4 |

20.5 | 125 | 6 | 4 |

… | … | … | … |

In this example, if we use this table as an input, the model is structured in this way:

*mpg* = β_{0} + β_{1} × *hp* + β_{2} × *gears* + β_{3} × *cyl* + *ε*

where β_{0} is the intercept and β_{1} , β_{2} and β_{3} are coefficient estimates and *ε* is the error term (or residuals).

If a column includes strings instead of numbers, Axcel considers them as a categorical variable. For instance, if you have columns like this in your data table:

color |

blue |

red |

blue |

… |

Axcel considers variable color as a categorical variable and assigns a dummy for each type. If you have categorical variables in your data, whether it is in string or number format, you can add “.f” at the end of the column name to let Axcel know that it is a categorical variable. For instance, if a column in your dataset is like this:

gears |

3 |

4 |

4 |

5 |

3 |

… |

Axcel considers gears as a numerical variable and reports a coefficient for this variable in the regression results:

variables | Estimate | Std.Error | t.value | p.value | vif |

(Intercept) | 34.6595 | 4.9369 | 7.0205 | 1.0136E-07 | NA |

gear | 0.6519 | 0.9041 | 0.7211 | 0.4766 | 1.3206 |

cyl | -2.7431 | 0.3735 | -7.3444 | 4.3241E-08 | 1.3206 |

… | … | … | … | … | … |

However, if you add “.f” at the end of the name of the variable:

gears.f |

3 |

4 |

4 |

5 |

3 |

… |

Axcel considers it as a categorical variable and reports different coefficients for gear.f4, gear.f5, etc:

variables | Estimate | Std.Error | t.value | p.value |

(Intercept) | 36.3203 | 3.703 | 9.8083 | 1.4729E-10 |

gear.f4 | 0.8466 | 1.8571 | 0.4558 | 0.652 |

gear.f5 | 1.3028 | 1.8397 | 0.7082 | 0.4847 |

cyl | -2.7072 | 0.4827 | -5.6081 | 0.00000528 |

… | … | … | … | … |

Beside “.f”, there are several other operators that you can use in this function presented below:

Name Extension | Operation |

.abs | Use absolute value (abs) of variable ( ABS(X) ) |

.f | Transfer variable to categorical |

.ln | Use natural log of variable ( Ln(X) ) |

.n | Drop the variable from estimates |

.sq | Use square of variable ( X^{2} ) |

**predict** Optional. Default is FALSE. If set to TRUE or 1, Axcel generates a prediction of the variable instead of regression results.

**intercept** Optional. Default is TRUE. If set to FALSE or 0, regression estimates are produced without intercept. For instance, for the regression example mentioned before :

with intercept = TRUE:mpg= β_{0}+ β_{1}×hp+ β_{2}×gears+ β_{3}×cyl+εwith intercept = FALSE:mpg= β_{1}×hp+ β_{2}×gears+ β_{3}×cyl+ε

**deployment** Optional. You can define a deployment name to deploy your model. After deployment, you can use the deployed function in AXCEL.OLS.PREDICT function. Please note that the deployment name is case sensitive and should include alphabets, numbers, and non-repeating underline. You cannot use underline at the beginning or end of the filename. For instance “abc-123” or “a-b-c-123” are allowed but “abc–123”, “abc-123-“, “-abc-123” or “abc-$123” are not allowed. Depending on your subscription, you can view and restrict access to the deployed model through Axcel web application.

**plot** Optional. Default is FALSE. When it is TRUE, Axcel produces model diagnosis plots inside the sidebar. You can expand the plot and show it in your browser. Producing plots usually creates latency in showing the results. So, we recommend that you use this option when it is needed. Please note that no plot is produced when **deployment** is requested. Here is an example of the plot:

when you type *=AXCEL.OLS* in an Excel cell, the IntelliSense guides you through required and optional (shows in [] brackets) inputs:

In the example above, we have:

=AXCEL.OLS(A1:K33,,,,TRUE)

This means that our data is located at cell A1 through K33, we do not want prediction (skipped for default), we want to keep intercept (skipped for default), no deployment and finally, we want to see the diagnostic plots:

At the same time, model specifications, performance and variable importance are reported in the console as shown below:

-------- OLS Summary --------- Call: lm(formula = formula, data = df) Residuals: Min 1Q Median 3Q Max -3.2142 -1.6117 -0.2902 1.1974 4.6322 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 14.51324 18.88856 0.768 0.4513 gear.f4 0.10225 2.78999 0.037 0.9711 gear.f5 0.74084 3.14294 0.236 0.8160 cyl -0.09363 1.10044 -0.085 0.9330 disp 0.01505 0.01825 0.824 0.4194 hp -0.02950 0.02577 -1.145 0.2658 drat 0.89878 2.01117 0.447 0.6598 wt -3.92788 1.93376 -2.031 0.0557 . qsec 0.83705 0.74488 1.124 0.2744 vs 0.12934 2.19158 0.059 0.9535 am 2.75565 1.96702 1.401 0.1766 carb 0.08170 0.91785 0.089 0.9300 --- Signif. codes: 0 "***" 0.001 "**" 0.01 "*" 0.05 "." 0.1 " " 1 Residual standard error: 2.682 on 20 degrees of freedom Multiple R-squared: 0.8723, Adjusted R-squared: 0.802 F-statistic: 12.42 on 11 and 20 DF, p-value: 1.17e-06 ----- Min.max.accuracy MAE MAPE MSE RMSE NRMSE.mean NRMSE.median 0.918 1.71 0.0876 4.49 2.12 0.106 0.11 NRMSE.mean.accuracy NRMSE.median.accuracy Efron.r.squared CV.prcnt 0.894 0.89 0.872 10.6 ------ Variable Importance Sign 1 wt 1.9611887 NEG 2 am 1.2254035 POS 3 qsec 1.1234133 POS 4 hp 0.9868407 NEG 5 disp 0.7467585 POS 6 drat 0.4813036 POS 7 gear 0.4389142 POS 8 carb 0.2406258 NEG 9 vs 0.1509915 POS 10 cyl 0.1066392 NEG

With the same command but setting prediction TRUE (leave the rest for default values), we have:

=AXCEL.OLS(A1:K33,TRUE)

Which reports the predicted values in a new column called “predict” as shown below: