A convex functions inequality

by Stanislav Sýkora, Extra Byte, Via R.Sanzio 22C, Castano Primo, Italy 20022
in Stan's Library, Ed.S.Sykora, Vol.III. First release September 15, 2009
Permalink via DOI:  10.3247/SL3Math09.003
Other math articles in Stan's Library | Math Links Math Books | Stan's HUB


Convex real functions of real variable satisfy many inequalities. The one proved here might be novel, though it is difficult to be sure. In any case, a proof is given and it is hoped that, even if the inequality itself were not new, the proof might contain some innovative aspects. The inequality can be expressed verbally as follows:

Let f(x) be a finite convex function defined on some real numbers interval X. Divide the interval evenly into K subintervals by means of K+1 equidistant points (including the interval edges) and take the arithmetic average A(K;f) of the f(x) values at these points. Then the values A(K;f), K = 1,2,..., form a monotonous, non-increasing sequence.


1. Introduction

In the realm of real functions of real variable, convex functions constitute a conspicuous body both because they are frequently encountered in practical applications, and because they satisfy a number of useful inequalities and theorems. The most important of the inequalities is of course the defining one which states that a real function f(x) defined on a real-numbers interval X = [a,b] is convex if, for any three elements x1, x, x2 of X such that x1 ≤ x ≤ x2,

(1)   f(x) ≤ f(x1)*[(x2 - x)/(x2-x1)] + f(x2)*[(x - x1)/(x2-x1)]

Graphically, as shown in Figure 1A, this means that the point {x,f(x)} never falls above the straight line segment connecting the points {x1,f(x1)} and {x2,f(x2)}.

The definition (1) can be easily extended to any kind of real-numbers interval, though in this article we will be only concerned with the closed and finite ones. Convex functions satisfy a number of theorems, the most important of which are:

Theorem 1) f(x) is continuous at all inner points of X.
Theorem 2) f(x) has in X at most one local minimum which, if it exists, is also its absolute minimum in X.
Theorem 3) When f(x) has a second derivative at a point x of X, its value is non-negative.

Other theorems can be found in references [1] and scattered over a vast literature. Notice that while convexity implies continuity, it does not imply the existence of first or higher derivatives, nor does it imply monotonicity (see the examples in Figure 1B).

Convex functions
Figure 1. Illustration of basic concepts

A) Definition of a convex function. B) Examples of convex functions.

2. The theorem

Let X = [a,b] be a finite, closed real numbers interval and let f(x) be a finite-valued real function defined everywhere on X.
Divide the interval [a,b] into K equally wide sub-interval by means of K+1 points {xk, k = 0,1,...,K},
where x0 = a, xK = b, and xk - xk-1 = (b-a)/K. Define further the quantity

(2)   A(K;f) = Σk=0:Kf(xk)/(K+1).

The theorem we want to prove can be now stated as follows:

When f(x) is convex then, for any K ≥ 1,
(3)   A(K+1;f) ≤ A(K;f).

Before facing the proof, let us spend a few words about the meaning of inequality (3):
A(K;f) is simply the arithmetic average of the values {f(xk)}. Under the specified conditions, the integral J of f(x) over X exists and, for large K, can be approximated by (b-a)A(K;f) and the sequence A(K;f) converges to J/(b-a). The theorem implies that when f(x) is convex, the sequence A(K;f) not only converges to a limit, but it does so in a monotonous way.

3. The proof

First, let g(x) = f(xc-(x-xc)), with xc = (a+b)/2 being the central point of the interval X. Then g(x) is a horizontal reflection of the function f(x) about xc and s(x) = [f(x) + g(x)]/2 is symmetric about xc (see Figure 2A for an illustration).

It is evident that the symmetrized function s(x) satisfies the pre-requisites of the theorem and that

(4)   A(K;f) = A(K;s).

The latter identity follows from the fact that the points {f(xk)} are also distributed symmetrically about xc.
Consequently, for any k, f(xk)+f(xK-k) = f(xk)+g(xk) = 2.s(xk) = s(xk) + s(xK-k).

This entitles us, without any loss of generality, to limit the discussion to convex functions s(x) symmetric with respect to the center xc of X. In other words, if inequality (3) holds for any finite, symmetric convex function s(x) than it holds also for any finite and convex function f(x), regardless of whether it is symmetric or not. Notice that all convex functions which are symmetric on X necessarily have at xc a unique local minimum which is also their global minimum in X.

Figure 2. Illustrations for the proof for K = 4 (see the text for details)

Consider now the following sets of ordinates (for an illustrative example, see Fig.2B):
(5a)   {xi, i = 0,1,...,K}, where xi = a + i(b-a)/K, appearing in A(K;s) and
(5b)   {x'j, j = 0,1,...,K+1}, where x'j = a + j(b-a)/(K+1), appearing in A(K+1;s).

For any K, the two sets have exactly two elements in common, namely x0 = x'0 = a and xK = x'K+1 = b. The nonius-like behaviour associated with the fact that we consider division of X into K and K+1 intervals (rather than some generic K and K') is essential in this step and makes the inner points of the two grids intercalate in a regular way. The union of the two sets therefore contains exactly 2K+1 distinct ordinates
(6)   {zi, i = 0,1,...,2K} such that for even i, zi = xi/2 and for odd i, zi = x'(i+1)/2.
Notice that z0 = a, zK = xc, and z2K = b.

Consider now the symmetric convex polygon P(x) formed by the points [zi,f(zi)] as shown in Figure 2B. Clearly

(7)   A(K;s) = A(K;P) and A(K+1;s) = A(K+1;P).

This, together with Eq.(4), indicates that if we prove the validity of inequality (3) for the symmetric convex polygon P(x), its validity for any convex function f(x) will be proved.

This reduction process can be extended further exploiting the nearly obvious fact (elementary algebra is sufficient to verify it) that when f(x) and g(x) both satisfy the assumptions of the theorem and v, w are any non-negative real numbers, then the combination v.f(x)+w.g(x) also satisfies the requisites (in particular, it is finite and convex) and A(K,v.f+w.g) = v.A(K,f) + w.A(K,g). We will refer to this property as the superposition Lemma.

In particular, the polygon P0(x) = P(x) can be expressed as a sum of K+1 special symmetric convex polygons pk(x), such as those shown in Figure 2C. These are constructed as follows:

  • Set p0(x) = P0(zK) and P1(x) = P0(x) - p0(x). The constant function p0(x) is the first of our special polygons (black).
    Since P0(zK) is the minimum of P0(x) in X, P1(x) is a symmetric convex polygon whose value at the central point is zero.
  • Now define the second of our special polygons (brown), p1(x) as being composed of two linear segments, one of which passes through the points [zK-1,P1(zK-1)] and [zK,0] and the other one is positioned symmetrically. Due to the convexity of P1(x), the difference P2(x) = P1(x) - p1(x) is a symmetric convex polygon which is zero at the points zK-1, zK and zK+1.
  • The subsequent special symmetric convex polygons pk(x), k = 2 (red), 3 (blue), ..., are all composed of three linear segments: two symmetrically placed sloping ones and a central section where their value is zero. More specifically, pk(x) is zero at points zK-k+1 through zK+k-1 and its left linear segment passes through the points [zK-k,Pk(zK-k)] and [zK-k+1,0]. The residual polygons Pk(x) are defined by the recursive formula Pk+1(x) = Pk(x) - pk(x) and are all symmetric and convex.
  • The reduction process clearly terminates with pK(x) because the residual polygon PK+1(x) is everywhere zero.
    Consequently, we have P(x) = Σk=0:K pk(x).

According to the superposition Lemma, it is now sufficient to prove that each of the special polygons pk(x) satisfies inequality (3). Since this is obvious for k = 0 (any mean of a constant is always equal to the constant itself), we need to show that it holds for any k ≥ 1. One can carry the reduction one step further by taking any symmetric special convex polygon pk(x) such as the one in Figure 2C, middle, and splitting it into two components, one of which is the function qk(x) shown in Figure 2C, bottom, and the other is its symmetric reflection. In virtue of the superposition, if inequality (3) holds for qk(x), it holds also for its symmetric reflection, and therefore also for pk(x) which is the sum of the two, multiplied by some non-negative constant.

The proof of the validity of inequality (3) for functions of the form of qk(x) will be carried out explicitely. We have

   q(x) = (u-x)/(u-a) = 1-(x-a)/(u-a) for x ≤ u and q(x) = 0 for x ≥ u, where u coincides with one of the points zk, k = 1, 2, ..., K.

To evaluate A(K;q) and A(K+1;q) we will consider two cases:

I) Let k = 2m be even with 1 ≤ m ≤ K/2. In this case u = zk = xm and we have



In this case, therefore, A(K+1;q) = A(K;q), which is compatible with (3).

II) Now let k = 2m-1 be odd, again with 1 ≤ m ≤ (K+1)/2. Then u = zk = x'm and we obtain



Since m < K+1, we see that in this case A(K+1;q) < A(K;q) and inequality (3) is again satisfied.

This completes the proof.

4. Final remarks

The theorem proved above may appear intuitive and therefore little exciting, and yet its proof is relatively complex, showing that inequality (3) is cutting it very close in some cases. The following considerations illustrate this point and, in a way, explain why the inequality is not trivial.

Figure 3. Definite integral approximations (see text for explanations)

Consider a bounded convex function f(x) over an interval X = [a,b] such as the one shown in Figure 3A. If the interval is divided evenly into K parts then the integral I(f) of f(x) over X can be approximated by the area I(K;f)) of the K trapezoids shown in yellow. Elementary geometry gives

(10)   I(K;f) = (d/K)[Σi=0:Kf(xi) - (f(x0)+f(xK))/2], where d = b-a and d/K is the width of the trapezoids.

Which, of course, is just one of the Simpson numerical integration rules.

Notice that, while the values I(K;f) are all greater than their limit I(f), their sequence is NOT necessarily monotonous. An example is shown in Figure 3B; for the v-shaped convex polygonal function v(x) we have I(K;v) = I(f) for any even K, but I(K;v) > I(f) for any odd K. In particular, I(3,v) > I(2,v). According to the theorem, however, the arithmetic averages A(K;f), as well as A(K;v), are guaranteed to form a monotonous, non-increasing sequence.

Substituting Σi=0:Kf(xi) with (K+1)A(K;f), one obtains

(11)   I(K;f) = d (1+1/K) A(K;f) - I(1,f)/K, where I(1,f) = d (f(x0)+f(xK))/2 embodies the interval edge effect.

For positive convex functions f(x) the non-monotonous behaviour of I(K;f), when present, is therefore due essentially to the edge effects. The first term is in fact a product of two decreasing factors (1+1/K) and A(K;f), while the edge-effects term -I(1,f)/K is increasing and may sometimes prevail. When f(x) is convex but negative, the situation is a bit more complex since A(K;f) is negative and decreasing (which makes it increasing in absolute value) while (1+1/K) is a decreasing positive factor. This makes the behavior of their product somewhat unpredictable.

In any case, combining equation (11) with inequality (3), one obtains an interesting general inequality for I(K;f) which, among other things, places a bound upon the I(K;f) non-monotonicity:

(12)   I(K+1;f) - I(K;f) ≤ [I(1,f) - I(K;f)]/(K+1)2

Notice that the quantity [I(1,f) - I(K;f)] is graphically represented by the light blue area in Figure 3A and converges rapidly to the area enclosed between the straight line connecting the end points of f(x) and the function f(x) itself. The v-shaped polygon of Figure 3B constitutes an example where, for even values of K, the equality in (12) is actually attained.

Let us now return to inequality (3) and see some of its properties and implications. First of all, it is clear that it implies

(13)   A(K;f) ≤ A(K';f) whenever K' < K.

Another trivial extension regards concave functions. When f(x) is concave then -f(x) is convex and must satisfy inequalities (3) and (13). This implies that for concave functions the ≤ relation sign is simply replaced by the ≥ sign.

The question now is when does the inequality reduce to equality. One obvious case is when f(x) is constant since then all the f(xi) are the same and all their arithmetic averages equal the common value. From the proof, however, we have seen that for some specially constructed convex polygons p(x) and some particular K values the equality A(K+1;p) = A(K;p) can arise. For K > 2 there exist more than one such polygon and therefore the equality occurs also for any of their linear combinations. On the other hand, a closer inspection reveals that in such cases the equality happens only for some values of K, never for every K. The situation can be summed up as follows:

When f(x) is a constant, inequality (3) becomes equality. Otherwise, the inequality is always sharp, unless f(x) is a convex polygon, in which case it might in some special cases become an equality for special values of K (in this article, no detailed analysis of the special cases is carried out).

Another question is whether the implication of the theorem could be inverted, i.e., whether the validity of (3) for any K implicates that f(x) is convex. This seems highly unlikely. One can imagine a function f(x) which is convex everywhere in the interval X except in a narrow sub-interval C. Then, as far as inequality (3) is concerned, the regions where f(x) is convex could always outweigh the opposite behavior in the interval C.

5. Extensions

As a final exercise, let us try and extend inequality (3) to means other than the arithmetic one [reference 2]. Since a positive convex function f(x) remains convex when elevated to a power p ≥ 1, inequality (3) can be applied to fp(x). Both sides being positive, we can elevate them to the power 1/p without violating the inequality. Consequently, given a finite interval X = [a,b], a function f(x) which is in X positive and convex, and any real p ≥ 1, we have Hp(K+1;f) ≤ Hp(K;f), where Hp(K;f) denotes the Hölder mean with parameter p of the values {f(xi); i = 0,1,2,...,K; xi = a+i(b-a)/K}.

For positive convex f(x) and p ≤ -1, the function fp(x) is concave. Applying to it the concave version of the theorem and elevating both sides to 1/p (attention: the exponent being negative, this operation reverses the inequality!) we obtain the same result. Hence, combining the two insights, we conclude that

(14)   for positive convex f(x) and any real p such that |p| ≥ 1, we have Hp(K+1;f) ≤ Hp(K;f).

The most important special cases of (14) are the quadratic mean, obtained for p = 2, and the harmonic mean (p = -1).

An altogether analogous reasoning can be carried out for positive concave functions and Hölder means with -1 ≤ p ≤ 1.
This leads to the realization that

(15)   for positive concave f(x) and any real p such that |p| ≤ 1, we have Hp(K+1;f) ≥ Hp(K;f).

Since limit transitions do not reverse inequalities, this applies also to the geometric mean, obtained for p = 0.

The generalized inequalities (14) and (15) also stress the special standing of the arithmetic and harmonic means which are the only ones to which both of them are applicable.

References and Links


TOP | Other math articles in Stan's Library | Math Links Math Books | Stan's HUB |
Copyright ©2009 Stanislav Sykora    DOI: 10.3247/SL3Math09.003 Designed by Stan Sýkora