The Great Regression – Where it All Starts

All About Statistical Analysis, Linear and Logistic Regression

By AcingAI     |     Updated May 16th, 2021
Learning the arts and crafts of AI is not a simple task, a field overflown with strange looking acronyms, complex mathematical formulas, and very little resources for those who are just starting out.

For all of the above, it is important to lay out strong foundations and understand the basics before jumping ahead to advanced models, as it will not only accelerate your learning, but also make you a better professional.
Table Of Contents

Statistical Analysis and the Role of a Statistical Model

Yes, statistical! In case you have yet made peace with this fact, machine learning and AI are all about statistical analysis. It doesn’t mean you must be familiar with Inverse Wishart Distributions (although it would help), but you do need to understand what is a statistical model.

What is Statistical Analysis?

Statistical Analysis is the branch of science that takes large amounts of data and then examines it in order to discover patterns and trends that may occur. In addition, Statistical Analysis usually handles the presenting of discovered conclusions in an understandable and often graphical way.

Statistical Analysis

What is a Statistical Model?

After we've transformed our problem into a mathematical representation, a Statistical Model will then apply a set of assumptions to it. This allows us to (theoretically) calculate the probability of certain events.

What does all of this have to do with AI?

Our set of inputs is one sample given from the larger group (or population) of the total possible inputs to the system. When training an AI algorithm (or "fitting a model"), we assume that the sample of data we use represents the entire population of possible inputs.In simpler terms, we use our algorithm on data that is similar to the data it was trained on.
...using a logistic regression model to build a binary cat classifier, judging whether a cat exists in a given image...
For example, if we built an algorithm that predicts future housing prices with data collected from Chicago’s housing prices in the past 30 years, won't use it to predict future housing prices in Tokyo (obviously…).

We use our given data sample to infer on the probability distribution of the whole population, then we find the most probable value, outputting it as the prediction.

Linear Regression

Before we get into actual linear regression, a few words about dependent and independent variables.

Linear Regression is a model of linear relationship between a dependent variable (the height of any blue dot in Figure 1) and independent variable (the distance between any blue dot and the y axis in Figure 1). Linear Regression in its simplest form, will describe the linear mapping from one independent variable to a scalar (quantity) response, as seen in Figure 1.

Figure 1: Linear fit mapping a scalar value to a scalar value

Fitting the model is done by minimizing a certain loss function, the most common choice being the Least Squares Loss, to find the set of parameters resulting in the best fit to the training data. Eventually, the model will tie the behavior of the response, or result, with the behavior of the input variables:

Formula 1: Basic Linear Regression

Or in vector form:

Formula 2: Vector Form Linear Regression

ŷ being the model’s output (the dependent variable), θᵢ are the model’s parameters and x̂ are the input’s components (the independent variable).

Figure 2: Illustration of a linear regression output

One common use for linear regression is finding the linear relationship between the variables for the sake of proving such relationship exists, or quantifying its strength. A second common goal is to predict the response for other sets of inputs.

Going back to our Chicago housing prices predictor, the input vector’s components might be the size of the house, the number of bedrooms and the number of floors. After fitting the model on data gathered across Chicago, the model will be able to predict the cost or value of a house based on those inputs.*

(*Housing prices usually vary over time, meaning a linear model will not be a very good fit. The above model can be used to predict housing prices in a certain year, or a more complex model that accounts for sequential data can be used.)

Logistic Regression

Now that we understand what a linear regression is, we can complicate things a little. While linear regression can be used to predict values of a certain response, a logistic regression is used to predict probabilities of a binary event, e.g. whether it will be ‘0’ or ‘1’. One example might be using a logistic regression model to build a binary cat classifier, judging whether a cat exists in a given image, based on the dataset that was used to fit (or train) the model.

Logistic regression can be thought of as a single-layer neural network, and its output is given by:

Formula 3: Basic Logistic Regression

Where σ is the logistic function:

Formula 4: The Logistic Function σ

From the above formula, it can be seen that the model’s output will always be ŷ∈(0,1), which makes sense as it is a probability. As a matter of fact, the output of the logistic model will be applying the logistic function on the output of a linear model.

Figure 3: Illustration of a logistic regression output

Thanks to the non-linear response of the logistic function, several of those “logistic blocks” can be put together sequentially to construct a neural network, allowing it to learn complex relationships. This is in contrast to the “linear blocks”, which only allow linear relationships, even when several of them are used in a sequence.