If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

To log in and use all the features of Khan Academy, please enable JavaScript in your browser.

Main content

Course: Statistics and probability > Unit 5

Lesson 6: More on regression

© 2024 Khan AcademyTerms of use Privacy Policy Cookie Notice

Proof (part 4) minimizing squared error to regression line

Google Classroom

Proof (Part 4) Minimizing Squared Error to Regression Line. Created by Sal Khan.

Want to join the conversation?

Sort by:

No posts yet.

Do you understand English? Click here to see more discussion happening on Khan Academy's English site.

Video transcript

So if you've gotten this far, you've been waiting for several videos to get to the optimal line that minimizes the squared distance to all of those points. So let's just get to the punch line. Let's solve for the optimal m and b. And just based on what we did in the last videos, there's two ways to do that. We actually now know two points that lie on that line. So we can literally find the slope of that line and then the the y intercept, the b there. Or, we could just say it's the solution to this system of equations. And they're actually mathematically equivalent. So let's solve for m first. And if we want to solve for m, we want to cancel out the b's. So let me rewrite this top equation just the way it's written over here. We have m times the mean of the x squareds plus b times the mean of-- Actually, we could even do it better than that. One step better than that is to, based on the work we did in the last video, we can just subtract this bottom equation from this top equation. So let me subtract it. Or let's add the negatives. So if I make this negative, this is negative. This is negative. What do we get? We get m times the mean of the x's minus the mean of the x squareds over the mean of x. The plus b and the negative b cancel out. Is equal to the mean of the y's minus the mean of the xy's over the mean of the x's. And then, we can divide both sides of the equation by this. And so we get m is equal to the mean of the y's minus the mean of the xy's over the mean of the x's over this. The mean of the x's minus the mean of the x squareds over the mean of the x's. Now notice, this is the exact same thing that you would get if you found the slope between these two points over here. Change in y, so the difference between that y and that y, is that right over there. Over the change in x's. The change in that x minus that x is exactly this over here. Now, to simplify it, we can multiply both the numerator and the denominator by the mean of the x's. And I do that just so we don't have this in the denominator both places. So if we multiply the numerator by the mean of the x's, we get the mean of the x's times the mean of the y's minus, this and this will cancel out, minus the mean of the xy's. All of that over, mean of the x's times the mean of the x's is just going to be the mean of the x's squared, minus over here you have the mean of the x squared. And that's what we get for m. And if we want to solve for b, we literally can just substitute back into either equation, but this equation right here is simpler. And so if we wanted to solve for b there, we can solve for b in terms of m. We just subtract m times the mean of x's from both sides. We get b is equal to the mean of the y's minus m times the mean of the x's. So what you do is you take your data point. You find the mean of the x's, the mean of the y's , the mean of the xy's, the mean of the x's squared. You find your m. Once you find your m, then you can substitute back in here and you find your b. And then you have your actual optimal line. And we're done. So these are the two big formula take aways for our optimal line. What I'm going to do in the next video, and this is where if anyone wasn't skipping up to this point, the next video is where they should re-engage, because we're actually going to use these formulas for the best fitting line. At least, when you measure the error by the squared distances from the points. We're going to use these formulas to actually find the best line for some data.