deriving the regression coefficient of y on x

Lets say we are given some x values with their associated y values:

We can take a guess at what the best fit line could look like:

Let y i be the output with x i, and let h i be the predicted output (= (θ * x i) + α, where θ is the gradient of the best fit line and α is the bias).

We will introduce another variable: loss, which is the sum of (y i - h i) 2 .

The smaller the difference between y i and h i, the smaller the value of loss. Our goal is to find the θ which will give the smallest possible value for loss. We can start by differentiating the loss with respect to θ, but before we do that, we must first expand α:

In the above equation, y m represents the mean y value, and x m represents the mean x value. The line of best fit is expected to pass through (x m, y m), meaning y m = θx m + α. Now lets differentiate the loss:

With different values of θ, we get different values of loss. The loss function (with respect to θ) is a parabola, and to find the global minimum, we can set the gradient to 0.

We can simplify this:

Now lets make θ the subject:

This will give us the regression coefficient:

Deriving The Regression Coefficient Of y On x