Optimization problem

All optimization problems can be written as:

Optimization Categories

  1. convex v.s. non-convex Deep Neural Network is non-convex

  2. continuous v.s.discrete Most are continuous variable; tree structure is discrete

  3. constrained v.s. non-constrained We add prior to make it a constrained problem

  4. smooth v.s.non-smooth Most are smooth optimization

Different initialization brings different optimum (if not convex)

Idea: Give up global optimal and find a good local optimal.

  • Purpose of pre-training: Find a good initialization to start training, and then find a better local optimal.

  • Relaxation: Convert to a convex optimization problem.

  • Brute force: If a problem is small, we can use brute force.

Affine sets

A set CRn is affine if the line through any two distinct points in C lies in C, i.e., if for any x1, x2C and θR, we have θx1+(1θ)x2C.

Note: The line passing throught x1 and x2: y=θx1+(1θ)x2.

Affine combination

We refer to a point of the form θ1x1+θ2x2++θkxk, where θ1+θ2++θk=1 as an affine combination of the points x1,x2,,xk. An affine set contains every affine combination of its points.

Affine hull

The set of all affine combinations of points in some set CRn is called the affine hull of C, and denoted aff,C:

aff,C=θ1x1+θ2x2++θkxk,|x1,x2,,xkC,θ1+θ2++θk=1.

The affine hull is the smallest affine set that contains C, in the following sense: if S is any affine set with CS, then affCS.

Affine dimension: We define the affine dimension of a set C as the dimension of its affine hull.

Convex Sets

A set C is convex if the line segment between any two points in C lies in C, i.e., if for any x1, x2C and any θ with 0θ1, we have θx1+(1θ)x2C.

Roughly speaking, a set is convex if every point in the set can be seen by every other point. Every affine set is also convex, since it contains the entire line between any two distinct points in it, and therefore also the line segment between the points.

Convex combination

We call a point of the form θ1x1+θ2x2++θkxk, where θ1+θ2++θk=1 and θi0,i=1,2,k, a convex combination of the points x1,,xk.

Convex hull

The convex hull of a set C, denoted conv,C, is the set of all convex combinations of points in C:

conv,C=θ1x1+θ2x2++θkxk,|xiC,θi0,i=1,,k,θ1+θ2++θk=1.

The convex hull convC is always convex. It is the smallest convex set that contains C: If B is any convex set that contains C, then convCB.

Cones

A set C is called a cone, or nonnegative homogeneous, if for every xC and θ0 we have θxC. A set C is a convex cone if it is convex and a cone, which means that for any x1,x2C and θ1,θ20, we have

θ1x1+θ2x2C

Hyperplanes and halfspaces

A hyperplane is a set of the form x,|aTx=b,

where aRn,a0, and bR.

This geometric interpretation can be understood by expressing the hyperplane in the form

x,|aT(xx0)=0, where x0 is any point in the hyperplane.

A hyperplane divides Rn into two halfspaces. A (closed) halfspace is a set of the form

x,|aTxb.

where x00. Halfspaces are convex but not affine. The set x|aT<b is called an open halfspace.

Polyhedra

A polyhedron is defined as the solution set of a finite number of linear equalities and inequalities:

P=x,|ajTbj,j=1,,m,cjTx=dj,j=1,,p

A polyhedron is thus the intersection of a finite number of halfspaces and hyperplanes. Here is the compact notations:

P=x,|Axb,Cx=d

Linearly Independent v.s. Affinely Independent

Consider the vectors (1,0), (0,1) and (1,1). These are affinely independent, but not independent. If you remove any one of them, their affine hull has dimension one. In contrast, the span of any two of them is all of R2, and hence these are not independent.

Simplexes

Suppose the k+1 points v0,,vkRn are affinely independent, which means v1v0,,vkv0 are linearly independent. The simplex determined by them is given by

C=convv0,,vk=θ0v0++θkvk,|θ0,1Tθ=1

Note:

  • The affine dimension of this simplex is k.

A 1-dimensional simplex is a line segment; a 2-dimensional simplex is a triangle (including its interior); and a 3-dimensional simplex is a tetrahedron.

What is the key distinction between a convex hull and a simplex?

If the elements of the set on which the convex hull is defined are affinely independent, then the convex hull and the simplex defined on this set are the same. Otherwise, simplex can’t be defined on this set, but convex hull can.

Convex Functions

  • A function f:RnR is convex if dom f is a convex set and if for all x, ydom,f, and θ with 0θ1, we have

f(θx+(1θ)y)θf(x)+(1θ)f(y).

  • We say f is concave is f is convex, and strictly concave if f is strictly convex.

  • A function is convex if and only if it is convex when restricted to any line that intersects its domain. In other words f is convex if and only if for all xdom,f and all v, the function g(t)=f(x+tv) is convex (on its domain, t,|,x+tvdom,f).

First-order conditions

  • Suppose f is differentiable, then f is convex if and only if dom,f is convex and f(y)f(x)+f(x)T(yx) holds for all x,ydom,f
  • For a convex function, the first-order Taylor approximation is in fact a global underestimator of the function. Conversely, if the first-order Taylor approximation of a function is always a global underestimator of the function, then the function is convex.

  • The inequality shows that from local information about a convex function (i.e., its value and derivative at a point) we can derive global information (i.e., a global underestimator of it).

Second-order conditions

  • Suppose that f is twice differentiable. The f is convex if and only if dom,f is convex and its Hessian is positive semidefinite: for all xdomf, 2f(x)0.

  • f is concave if and only if domf is convex and 2f(x)0 for all xdom,f.

  • If 2f(x)0 for all xdom,f, then f is strictly convex. The converse is not true. e.x. f(x)=x4 has zero second derivative at x=0 but is strictly convex.

  • Quadratic functions: Consider the quadratic function f:RnR, with dom,f=Rn, given by f(x)=(1/2)xTPx+qTx+r, with PSn,qRn, and rR. Since 2f(x)=P for all x, f is convex if and only if P0 (and concave if and only if P0).

Examples of Convex and Concave Functions

  • Exponential. eax is convex on R, for any aR.

  • Powers. xa is convex on R++ when a1 or a0, and concave for 0a1.

  • Powers of absolute value. |x|p, for p1, is convex on R.

  • Logarithm. log,x is concave on R++.

  • Negative Entropy. x,log,x (either on $\mathbf{R}{++},oron\mathbf R+,definedas0forx = 0$) is convex.

  • Norms. Every norm on Rn is convex.

  • Max function. f(x)=maxx1,,xn is convex on Rn.

  • Quadratic-over-linear function. The function f(x,y)=x2/y, with dom,f=R×R++=(x,y)R2,|y>0, is convex.

  • Log-sum-exp. The function f(x)=log(ex1+···+exn) is convex on Rn.

  • Geometric mean. The geometric mean f(x)=(i=1nxi)1/n is concave on dom,f=S++n

  • Log-determinant. The function f(X)=log,det,X is concave.


Reference:

  • Convex Optimization* by Stephen Boyd and Lieven Vandenberghe.