If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Reasoning behind second partial derivative test

For those of you who want to see why the second partial derivative works, I cover a sketch of a proof here.  

Background

In the last article, I gave the statement of the second partial derivative test, but I only gave a loose intuition for why it's true. This article is for those who want to dig a bit more into the math, but it is not strictly necessary if you just want to apply the second partial derivative test.

What we're building to

  • To test whether a stable point of a multivariable function is a local minimum/maximum, take a look at the quadratic approximation of the function at that point. It is easier to analyze whether this quadratic approximation has maximum/minimum.
  • For two-variable functions, this boils down to studying expression that look like this:
    ax2+2bxy+cy2
    These are known as quadratic forms. The rule for when a quadratic form is always positive or always negative translates directly to the second partial derivative test.

Single variable case via quadratic approximation

First, I'd like to walk through the formal reasoning behind why the single-variable second derivative test works. By formal, I mean capturing the idea of concavity into more of an airtight argument.
In single-variable calculus, when f(a)=0 for some function f and some input a, here's what the second derivative test looks like:
  • f has a local maximum at a if f(a)<0
  • f has a local minimum at a if f(a)>0
  • If f(a)=0, the second derivative alone cannot determine whether f has a maximum, minimum or inflection point at a.
To think about why this test works, start by approximating the function with a taylor polynomial out to the quadratic term, also known as a quadratic approximation.
f(x)f(a)+f(a)(xa)+12f(a)(xa)2
Since f(a)=0, this quadratic approximation simplifies like this:
f(a)+12f(a)(xa)2
The quadratic approximation at a local minimum.
The quadratic approximation at a local minimum.
Notice, (xa)20 for all possible x since squares are always positive or zero. That simple fact tells us everything we need to know! Why?
It means that when f(a)>0, we can read our approximation like this:
f(a)+12f(a)(xa)2This is 0 for all values of x,and equals 0 only when x=a
Therefore a is a local minimum of our approximation. In fact, it is a global minimum, but we only care about the fact that it is a local minimum. When the quadratic approximation of a function has a local minimum at the point of approximation, the function itself must also have a local minimum there. I'll say more on this in the last section, but for now the intuition should be clear since the function and its approximation "hug" one another around the point of approximation a.
The quadratic approximation at a local maximum
The quadratic approximation at a local maximum
Similarly, if f(a)<0, we can read the approximation as
f(a)+12f(a)(xa)2This is 0 for all values of x,and equals 0 only when x=a
In this case, the approximation has a local maximum at x=a, indicating that the function itself also has a local maximum there.
The quadratic approximation at the inflection point is flat.
The quadratic approximation at the inflection point is flat.
When f(a)=0, our quadratic approximation always equals the constant f(a), meaning our function is in some sense too flat to be analyzed by the second derivative alone.
What to take away from this:
When f(a)=0, studying whether f has a local maximum or minimum at a comes down to whether the quadratic term of the Taylor approximation 12f(a)(xa)2 is always positive or always negative.

Two variable case, visual warmup

Now suppose you have a function f(x,y) with two inputs and one output, and you find a stable point. That is, a point where both its partial derivatives are 0,
fx(x0,y0)=0fy(x0,y0)=0
which is more succinctly written as
f(x0,y0)=0Zero vector
Tangent plane
f(x0,y0)=0 indicates that the tangent plane at (x0,y0) is flat.
In order to determine whether this is a local maximum, local minimum, or neither, we look to it's quadratic approximation. Let's start with a visual preview of what we want to do:
  • f will have a local minimum at a stable point (x0,y0) if the quadratic approximation at that point is a concave-up paraboloid.
    Local min
  • f will have local maximum there if the quadratic approximation is a concave down paraboloid:
    Local max
  • If the quadratic approximation is saddle-shaped, f has neither a maximum nor a minimum, but a saddle point.
    Saddle point
  • If the quadratic approximation is flat in one or all directions, we do not have enough information to make conclusions about f.
    Quadratic approximation is flat in one direction.
    Quadratic approximation is constant.

Analyzing the quadratic approximation

The formula for the quadratic approximation of f, in vector form, looks like this:
Qf(x)=f(x0)Constant+f(x0)(xx0)Linear term+12(xx0)THf(x0)(xx0)Quadratic term
Since we care about points where the gradient is zero, we can get rid of that gradient term
Qf(x)=f(x0)+12(xx0)THf(x0)(xx0)
To see this spelled out for the two-variable case, let's expand out the Hessian term,
Qf(x,y)=f(x0,y0)+12fxx(x0,y0)(xx0)2+fxy(x0,y0)(xx0)(yy0)+12fyy(x0,y0)(yy0)2
(Note, if this approximation or any of the notation feels shaky or unfamiliar, consider reviewing the article on quadratic approximations).
As I showed with the single variable case, the strategy is to study if the quadratic term of this approximation is always positive or always negative.
Qf(x,y)=f(x0,y0)+12fxx(x0,y0)(xx0)2+fxy(x0,y0)(xx0)(yy0)+12fyy(x0,y0)(yy0)2}Is this always 0?Is it always 0?Can it be either?
Right now, this term is a lot to write down, but we can distill its essence by studying expressions of the following form:
ax2+2bxy+cy2
Such expressions are often fancifully called "quadratic forms".
  • The word "quadratic" indicates that the terms are of order two, meaning they involve the product of two variables.
  • The word "form" always threw me off here, and it makes the idea of a quadratic form sound more complicated than it really is. Mathematicians say "quadratic form" instead of "quadratic expression" to emphasize that all terms are of order 2, and there are no linear or constant terms mucking up the expression. A phrase like "purely quadratic expression" would have been much too reasonable and understandable to adopt.
To make the notation for quadratic forms easier to generalize into higher dimensions, they are often written with respect to a symmetric matrix M
xMx=[xy][abbc][xy]
Here is the crucial question:
  • How can we tell whether the expression ax2+2bxy+cy2 is always positive, always negative, or neither, just by analyzing the constants a, b and c?

Analyzing quadratic forms

If we plug in a constant value y0 for y, we get some single variable quadratic function:
ax2+2bxy0+c(y0)2
The graph of this function is a parabola, and it will only cross the x-axis if this quadratic function has real roots.
A quadratic with two real roots can be both positive and negative.
A quadratic with two real roots can be both positive and negative.
Otherwise, it either stays entirely positive or entirely negative, depending on the sign of a.
A quadratic with no real roots can either be entirely positive or entirely negative.
A quadratic with no real roots can either be entirely positive or entirely negative.
We can apply the quadratic formula to this expression to see whether it's roots are real or complex.
ax2+2bxy0+c(y0)2
  • The leading term is a.
  • The linear term is 2by0.
  • The constant term is cy02
Applying the quadratic formula looks like this:
2by0±(2by0)24acy022a2by0±2y0b2ac2ay0(b±b2aca)
If y0=0, the quadratic has a double root at x=0, meaning the parabola barely kisses the x-axis at that point. Otherwise, whether or not these roots are real depends only on the sign of the expression b2ac.
  • If b2ac0, there are real roots, so the graph of ax2+2bxy0+c(y0)2 crosses the x-axis.
  • Otherwise, if b2ac<0, there are no real roots, so the graph of ax2+2bxy0+c(y0)2 either stays entirely positive or entirely negative.
For example, consider the case
  • a=1
  • b=3
  • c=5
In this case, b2ac=32(1)(5)=4>0, so the graph of f(x)=x2+6xy0+5y02 always crosses the x-axis. Here is a video showing how that graph moves around as we let the value of y0 slowly change.
Khan Academy video wrapper
This corresponds with the fact that the graph of f(x,y)=x2+6xy+5y2 can be both positive and negative.
Khan Academy video wrapper
In contrast, consider the case
  • a=2
  • b=2
  • c=3
Now, b2ac=22(2)(3)=2<0. This means the graph of f(x)=2x2+4xy0+3y02 never crosses the x-axis, although it kisses it if the constant y0 is zero. Here is a video showing how that graph changes as we let the constant y0 vary:
Khan Academy video wrapper
This corresponds with the fact that the multivariable function f(x,y)=2x2+4xy+3y2 is always positive.
Khan Academy video wrapper

Rule for the sign of quadratic forms

As if to confuse students who are familiar with the quadratic formula, rules regarding quadratic forms are often phrased with respect to acb2 instead of b2ac. Since one is the negative of the other, this requires switching when you say 0 and when you say 0. The reason mathematicians prefer acb2 is because this is the determinant of the matrix describing the quadratic form:
det([abbc])=acb2
As a reminder, this is how the quadratic form looks using the matrix.
ax2+2bxy+cy2=[xy][abbc][xy]
Tying this convention together with what we found in the previous section, we write the rule for the sign of a quadratic form as follows:
  • If acb2<0, the quadratic form can attain both positive and negative values, and it's possible for it to equal 0 at values other than (x,y)=(0,0).
  • If acb2>0 the form is either always positive or always negative depending on the sign of a, but in either case it only equals 0 at (x,y)=(0,0).
    • If a>0, the form is always positive, so (0,0) is a global minimum point of the form.
    • If a<0, the form is always negative, so (0,0) is a global maximum point of the form.
  • If acb2=0, the form will again either be always positive or always negative, but now it's possible for it to equal 0 at values other than (x,y)=(0,0)

Some terminology:

When ax2+2bxy+cy2>0 for all (x,y) other than (x,y)=(0,0), the quadratic form and the matrix associated with it are both called positive definite.
When ax2+2bxy+cy2<0 for all (x,y) other than (x,y)=(0,0), they are both negative definite.
If you replace the > and < with and , the corresponding properties are positive semi-definite and negative semi-definite.

Applying this to Qf

Okay zooming back out to where we started, let's write down our quadratic approximation again:
Qf(x,y)=f(x0,y0)+12fxx(x0,y0)(xx0)2+fxy(x0,y0)(xx0)(yy0)+12fyy(x0,y0)(yy0)2
The quadratic portion of Qf is written with respect to (xx0) and (yy0) instead of simply x and y, so everywhere where the rule for the sign of quadratic forms references the point (0,0), we apply it instead to the point (x0,y0).
As with the single-variable case, when the quadratic approximation Qf has a local maximum (or minimum) at (x0,y0), it means f has a local maximum (or minimum) at that point. This means we can translate the rule for the sign of a quadratic form directly to get the second derivative test:
Suppose f(x0,y0)=0, then
  • If fxx(x0,y0)fyy(x0,y0)(fxy(x0,y0))2<0, f has a neither minimum nor maximum at (x0,y0), but instead has a saddle point.
    Saddle point
  • If fxx(x0,y0)fyy(x0,y0)(fxy(x0,y0))2>0, f definitely has either a maximum or minimum at (x0,y0), and we must look at the sign of fxx(x0,y0) to figure out which one it is.
    • If fxx(x0,y0)>0, f has a local minimum.
      Local min
    • If fxx(x0,y0)<0, f has a local maximum.
      Local max
  • If fxx(x0,y0)fyy(x0,y0)(fxy(x0,y0))2=0, the second derivatives alone cannot tell us whether f has a local minimum or maximum.

Our current tools are lacking

Everything presented here almost constitutes a full proof, except for one final step.
Intuitively, it might make sense that when a quadratic approximation bends and curves in a certain way, the function should bend and curve in that same way near the point of approximation. But how do we formalize this beyond intuition?
Unfortunately, we will not do that here. Making arguments about derivatives fully rigorous requires using real analysis, the theoretical backbone of calculus.
Furthermore, you might be wondering how this generalizes to functions with more than two inputs. There is a notion of quadratic forms with multiple variables, but phrasing the rule for when such forms are always positive or always negative uses various ideas from linear algebra.

Summary

  • To test whether a stable point of a multivariable function is a local minimum/maximum, take a look at the quadratic approximation of the function at that point. It is easier to analyze whether this quadratic approximation has maximum/minimum.
  • For two-variable functions, this boils down to studying expression that look like this:
    ax2+2bxy+cy2
    These are known as quadratic forms. The rule for when a quadratic form is always positive or always negative translates directly to the second partial derivative test.