Real Analysis – Differentiation Under the Integral Sign

Notation. In everything that follows, we will use an absolute value-type notation to indicate the measure (or length, or area, if those are the appropriate words) of a set. That is, we will write {|E|} instead of {m(E)} or {\mu(E)} to mean the Lebesgue measure of {E}. The notation has somewhat grown on me and it helps unclutter the work.

The Fundamental Theorem of Calculus. What gets referred to in calculus books as the Fundamental Theorem of Calculus is really two different (albeit closely related) theorems, each capable of being independently proved. The two theorems are:

  • The Integral of the Derivative Theorem. If {f} is continuously differentiable on {[a, b]}, then {\int_a^b f'(x)\,dx = f(b) - f(a)}.
  • The Derivative of the Integral Theorem. Suppose {f} is integrable on {[a, b]} and define {F(x) = \int_a^x f(t)\,dt}. If {f} is continuous at {x}, then {F} is differentiable at {x} and {F'(x) = f(x)}.

It is common for calculus books to prove the Derivative of the Integral Theorem first and then to obtain the Integral of the Derivative Theorem as a corollary. This order is not necessary, as the Integral of the Derivative Theorem may be separately proved. The Integral of the Derivative Theorem has great immediate impact in calculus – it promises us that we can calculate integrals if only we can find antiderivatives. The Derivative of the Integral Theorem is more subtle. It does at least assure us that every continuous function has an antiderivative. This paper concerns the Derivative of the Integral Theorem.

If we examine the specific statement of the Derivative of the Integral Theorem and use the definition of the derivative, it comes down to:

If {f} is continuous at {x}, then

{\lim_{h \rightarrow 0} {1\over{h}} \int_x^{x+h} f = f(x)}.

It is only a small step to find that this is equivalent to the following. If {f} is continuous at {x}, then

{\lim_{h, k \rightarrow 0^+} {1\over{h+k}} \int_{x-k}^{x+h} f = f(x)}.

We then rewrite this as

{\lim_{|I| \rightarrow 0, x \in I} {1\over{|I|}} \int_I f = f(x)},

where we understand in the notation that {I} is an interval. Our final version of this is in {\mathbb{R}^n}. If {f} is continuous at {x}, then

{\lim_{|Q| \rightarrow 0, x \in Q} {1\over{|Q|}} \int_Q f = f(x)},

where {Q} must be a cube.

We will not include the proof of this theorem. That is an item from a first analysis course that readers should be able to prove themselves.

Can this theorem be extended to a Lebesgue context? For us to do that, what would the weakest possible hypotheses and strongest possible conclusions be? Since our formula involves an integral, our hypotheses must include something about the integrability of {F}. Since we are always just integrating {f} over some bounded interval, local integrability is all we need – that statement means that {f}, when restricted to any bounded interval, is an integrable function. To understand this condition, note that both {{1\over{\sqrt{|x|}}}} and {e^x} are locally integrable but {{1\over{x}}} is not. In our dreams, the strongest possible condition would be that the limit equals {f(x)} everywhere – but we know we can not have that since {f(x)} is not even well-defined everywhere – if we change {f} on a set of measure zero, we get a function with exactly the same integral over any set, and we get the same element of {L^1}. The most we can possibly hope for is that the result holds almost everywhere. Let us combine the weakest possible hypothesis and strongest possible conclusion.

Lebesgue’s Theorem on the Differentiation of the Integral. Suppose {f} is a locally integrable real-valued function on {\mathbb{R}^n}. Then for almost every {x \in \mathbb{R}^n}, {\lim_{|Q| \rightarrow 0, x \in Q} {1\over{|Q|}} \int_Q f = f(x)} where the limit is taken over cubes {Q} that contain {x}.

We will attempt to prove this theorem from back to front. We would have this theorem if only we had… and we could have that if only in turn had…

An Important WLOG. Without loss of generality, we may take {f} to be integrable and zero outside some bounded interval. We can do this because the result is essentially local. If we let {g} be the product of our original {f} and the characteristic function of some bounded interval, then for every {x} in the interior of the interval, {g(x) = f(x)} and the limit of integrals in out theorem is exactly the same for {g} as for {f}. If we can prove the theorem for all functions such as {g}, then we can patch together the domain of {f} out of the countable union of such intervals to get the full theorem.

Deciding What the Enemy Is. So we are trying to prove that a certain limit is {f(x)} almost everywhere. Which is the bigger problem – the possibility that the limit might not exist or the possibility that the limit might exist but be the wrong thing? The experience of analysts suggests that whenever is limit is “formally correct,” meaning in practice that it works for all “nice enough” functions, then it is not going to converge to the wrong thing – the only way it could fail is for the limit to fail at all. The limit is formally correct – if {f} is continuous, its truth is the truth of the Fundamental Theorem of Calculus. Of course the “experience of analysts” is based in no small part on this very problem.

We will assume that the major enemy is the possible nonexistence of the limit and we will concentrate on proving that the limit exists almost everywhere, saving for the very end the question of the limit being the right number. We know we can define the {\limsup} and the {\liminf} and that the limit exists if and only if the {\limsup} and the {\liminf} agree. That leads to the following definition.

An Auxiliary Definition – Oscillation. For our function {f} and for any {x} define the oscillation of {f} to be

\displaystyle \text{osc}(f)(x) = \limsup_{|Q| \rightarrow 0, x \in Q} {1\over{|Q|}} \int_Q f - \liminf_{|Q| \rightarrow 0, x \in Q} {1\over{|Q|}} \int_Q f.

It would be enough to prove that the set of points on which {\text{osc}(f) > 0} has measure zero. How can one possibly prove that a set has measure zero? By proving that its measure is smaller than any positive number. And for a function like this? We have the following plan:

Last Step. If {(*)} holds: for any {\eta > 0} and any {\delta > 0}, {\left|\{\text{osc}(f) > \eta\}\right| < \delta}, then {\text{osc}(f) = 0} almost everywhere and hence our limit exists almost everywhere.

The proof of this step proceeds as follows. Let {E_k = \left\{\text{osc}(f) > {1\over{k}}\right\}}. Then for every positive {\delta}, {|E_k| < \delta}. This implies that {|E_k| = 0}. But then the set on which {\text{osc}(f)} is anything other than {0} is the union of the {E_k}, namely the union of countably many sets of measure zero, hence of measure zero.

How can we obtain condition {(*)}? Since size is involved, this may perhaps involve the size of {f}. But {f} is not necessarily especially small in any way. However, any integrable function can be closely approximated by a continuous function, and the oscillation of any continuous function is zero everywhere. We are going to break down {f} into a sum of functions, so we need to know something about the oscillation of a sum:

Sublemma. {\text{osc}(f+g)(x) \le \text{osc}(f)(x) + \text{osc}(g)(x)}.

The proof is a grubby little inequality chase of a type familiar to students in an undergraduate analysis course.

Now, let {\epsilon > 0} be arbitrary and assume that {f = g+ b}, where {g} (the good function) is continuous and {b} (the bad function) is small in the sense that {\|b\|_1 < \epsilon}. Then {\text{osc}(f) \le \text{osc}(g) + \text{osc}(b) = \text{osc}(b)}. This leads to:

Second From Last Step. Suppose we can prove {(**)}: {\left|\{\text{osc}(b) > \alpha\}\right| \le {{C\|b\|_1}\over{\alpha}}} for all {\alpha > 0}. Then we can prove {(*)}.

The proof is this. Given {\eta} and {\delta}, choose {\epsilon = {{\eta\delta}\over{C}}} and decompose {f} into {g + b}, where {g} is continuous and {\|b\|_1 < \epsilon}. Then {\left|\{\text{osc}(f) > \eta\}\right| = \left|\{\text{osc}(b) > \eta\}\right| \le {C\over\eta} \cdot {{\eta\delta}\over{C}} = \delta}.

How are we going to prove {(**)}? With the {L^1} norm in there, this now seems to be about size. If size is the name of the game, {\sup}‘s are easier to work with than {\limsup}‘s. This leads us naturally to the next definition and the next theorem. The names of G.H. Hardy and J.E Littlewood are on this; legend has it that Hardy first thought of the maximal function and its attendant inequality while concocting various averages of cricket scores.

The Hardy-Littlewood Maximal Function and the Hardy-Littlewood Maximal Theorem. If {f} is (locally) integrable, then the Hardy-Littlewood maximal function of {f} is

\displaystyle M(f)(x) = \sup_{x \in Q} {1\over{|Q|}} \int_Q |f|.

The Hardy-Littlewood Maximal Theorem is this: for {f \in L^1} and {\alpha > 0},

\displaystyle \left|\{M(f) > \alpha\}\right| \le {{C\|f\|_1}\over{\alpha}}.

Clearly, {\text{osc}(f)(x) \le 2M(f)(x)} for all {x}, so the Hardy-Littlewood Maximal Theorem (often phrased as “The Hardy-Littlewood maximal operator is a weak-type {1}{1} sublinear operator”) implies condition {(**)}. Hence, it is enough to prove the Hardy-Littlewood Maximal Theorem.

A False Proof of the Hardy-Littlewood Maximal Theorem. For every point {x} there is some cube containing {x}, call it {Q_x}, for which the {\sup} is nearly obtained. Let {E_\alpha} be the set {E_\alpha = \{M(f) > \alpha\}}. Then for every {x} in {E_\alpha}, there is an interval {Q_x} containing {x} for which

\displaystyle {1\over{|Q_x|}} \int_{Q_x} |f| > \alpha.

Rearrange this to {|Q_x| < {1\over{\alpha}} \int_{Q_x} |f|}.

Now the {Q_x}‘s cover {E_\alpha}, so

\displaystyle |E_\alpha| \le \sum_x |Q_x| < {1\over{\alpha}} \sum_x \int_{Q_x} |f| \overset{(\dagger)}{=} {1\over\alpha} \int |f| = {{\|f\|_1}\over{\alpha}}.

That was too easy. Where did it go wrong? It went wrong at the step marked {(\dagger)}. In that step we assumed that we could recombine the sum of the integrals over {Q_x}‘s and have it bounded by the integral of {|f|} over everything. But that kind of recombination requires that these intervals be mutually disjoint, which they are clearly not. What if we took the subcollection of the {Q_x} that is disjoint? Then we must give up the idea that the {Q_x}‘s cover {E_\alpha}, and if we do that, how are we going to estimate the measure of {E_\alpha}? What we need is some subcollection of the {Q_x} that still covers at least some fixed fraction of {E_\alpha}. We need a covering lemma.

This is where some textbook proofs of the theorem start – with some lemma such as the Vitali Covering Lemma. Stare at that up front, and one has got to be thinking, “This is a very strange-looking result, and why do we want it?” Since we arranged the proof in this order, the need for the covering lemma is now clear. We also do not need full strength of the Vitali lemma; we can live with a weaker result, a simpler covering lemma.

One other comment is in order: could this work on any measure space whatsoever? In everything we have done so far, apparently so, but not for the covering lemma itself. The covering lemma is very specifically geometric, and will work only in {\mathbb{R}^n}. (Or in some metric space with an {\mathbb{R}^n}-like relationship between the measures of balls or different radii.)

A Covering Lemma. Suppose {E} is a measurable set in {\mathbb{R}^n} and suppose each {x} in {E} is contained in some cube {Q_x}. Further suppose that {|Q_x| < M} for some bound {M}. Then we can select a finite or countable subcolleciton {Q_k} of these intervals such that the {Q_k} are disjoint but {\sum_{k=1}^\infty |Q_k| > {1\over{5^n}}|E|}.

A True Proof of the Hardy-Littlewood Maximal Theorem. Assume that the covering lemma is true. {E_\alpha = \{M(f) > \alpha\}}. Then for every {x} in {E_\alpha}, there is an interval {Q_x} containing {x} for which

\displaystyle {1\over{|Q_x|}} \int_{Q_x} |f| > \alpha.

Rearrange this to {|Q_x| < {1\over\alpha} \int_{Q_x} |f|} and note that each {|Q_x| < {1\over\alpha} \int |f|}.

Since the side lengths of the {Q_x} are bounded, the covering lemma then says we can find a mutually disjoint subcollection {Q_k} covering at least {{1\over{5^n}}} of {E_\alpha}. Hence,

\displaystyle |E_\alpha| \le 5^n \sum_{k=1}^\infty |Q_k| \le 5^n \sum_{k=1}^\infty {1\over{\alpha}} \int_{Q_x} |f| \le {{5^n}\over\alpha} \int |f|.

A Proof of the Covering Lemma. This proof requires a construction, and the construction will proceed by induction. We adopt the notation {\widetilde{Q}} for the cube that has the same center as {Q} but {5} times the side length. Let {M_0} be the supremum of the side lengths of the {Q_x}. Choose one of the {Q_x} with length {>{1\over2} M_0} and call it {Q_1}. Discard from the collection of intervals all {Q_x} such that {Q_x \cap Q_1 \neq \emptyset}. Let {M_1} be the supremum of the lengths of the remaining intervals. Choose one of the remaining intervals whose length is greater than {{1\over2}M_1} and call it {Q_2}, and we then discard from the collection of intervals all {Q_x} such that {Q_x \cap Q_2 \neq \emptyset}. We continue in this fashion. It is possible that at some point we will have discarded all remaining intervals from the collection. If that happens, we have a finite subcollection. Else, we have a sequence {Q_k}, where {\text{length}(Q_k) > {1\over2} M_{k-1}}. If {\sum_{k=1}^\infty |Q_k| = \infty}, then the conclusion of the lemma is trivially true, whereas if {\sum_{k=1}^\infty |Q_k| < \infty} then {M_k \rightarrow )}. Assume either that {M_k \rightarrow 0} or that the subcollection is finite. We claim that {\bigcup_{k=1}^\infty \widetilde{Q}_k} covers {E} and hence

\displaystyle |E| \le \left|\bigcup_{k=1}^\infty \widetilde{Q}_k\right| \le \sum_{k=1}^\infty \left|\widetilde{Q}_k\right| \le 5^n \sum_{k=1}^\infty |Q_k|,

which is what we want.

We prove this claim by noting that every {x} is contained in its own {Q_x} Such an {Q_x} has length smaller than some {{1\over2}M_k}, or the process terminated. This cube either belongs to our subcollection or it is a cube that we discarded along the way. If we included it, we are fine. If we discarded it, then there is some {Q_k} such that {Q_k \cup Q_x \neq \emptyset}. Furthermore, because of the way we picked our cubes, the side length of {Q_x} must be less than twice as long as the side length of {Q_k}. Drawing appropriate pictures should convince one that {Q_x \subset \widetilde{Q}_k}; hence, every {x} lies in some {\widetilde{Q}_k}.

Final Cleanup. We claimed that the main enemy was the possibility of the limit not existing. We still have to show that it equals the right thing. We will be sketchier with this part. Assume {f \in L^1}. Let {Q_{k, x}} be the cube centered at {x} with side length {{1\over{k}}}. Note that {\left|Q_{k, x}\right| = k^{-n}}. Define {f_k(x) = k^n \int_{Q_{k, x}} f}. This is the result of a convolution with an approximate identity. By our approximate identity theorem, {f_k \rightarrow f} in {L^1}. There is a theorem that then shows the existence of a subsequence of {f_k} that tends to {f} pointwise almost everywhere. But the theorem we have just proved shows that {f_k(x)} converges pointwise almost everywhere to some limit. Except possibly for a set of measure zero, this must be the limit of the subsequence, namely {f(x)}.