Presentation is loading. Please wait.

Presentation is loading. Please wait.

Information Geometry: Duality, Convexity, and Divergences

Similar presentations


Presentation on theme: "Information Geometry: Duality, Convexity, and Divergences"— Presentation transcript:

1 Information Geometry: Duality, Convexity, and Divergences
Information Geometry: Duality, Convexity, and Divergences Jun Zhang* University of Michigan Ann Arbor, Michigan 48104 *Currently on leave to AFOSR under IPA   

2 Lecture Plan A revisit to Bregman divergence
Generalization (a-divergence on Rn) and a-Hessian geometry 3) Embedding into infinite-dimensional function space 4) Generalized Fish metric and a-connection on Banach space Clarify two senses of duality in information geometry: Reference duality: choice of the reference vs comparison point on the manifold; Representational duality: choice of a monotonic scaling of density function;

3 Bregman Divergence i) Quadri-lateral relation:
Triangular relation (generalized cosine) as a special case: ii) Reference-representation biduality:

4 Canonical Divergence and Fenchel Inequality
or explicitly: An alternative expression of Bregman divergence is canonical divergence That A is non-negative is a direct consequence of the Fenchel inequality for a strictly convex function: where equality holds if and only if

5 Convex Inequality and a-Divergence Induced by it
By the definition of a strictly convex function F, It is easy to show that the following is non-negative for all , Conjugate-symmetry: Easily verifiable:

6 Significance of Bregman Divergence Among a-Divergence Family
Proposition: For a smooth function F: Rn -> R, the following are equivalent:

7 Statistical Manifold Structure Induced From
Expanding D(x,y) around x=y: i) 2nd order: one (and the same) metric ii) 3rd order: a pair of conjugated connections Statistical Manifold Structure Induced From Divergence Function (Eguchi, 1983) Given a divergence D(x,y), with D(x,x)=0. One can then derive the Riemannian metric and a pair of conjugate connections: In essence, is satisfied by such identification of derivatives of D.

8 a-Hessian Geometry (of Finite-Dimension Vector Space)
Theorem. D(a) induces the a-Hessian manifold, i.e. i) The metric and conjugate affine connections are given by: ii) Riemann curvature is given by:

9 iii) The manifold is equi-affine, with the Tchebychev potential given by:
and a-parallel volume form given by iv) There exists biorthogonal coordinates: with

10 From Vector Space to Function Space
Question: How to extend the above analysis to infinite-dimensional function space? A General Divergence Function(al) for any two functions in some function space, and an arbitrary, strictly increasing function Remark: Induced by convex inequality

11 A Special Case of D(a): Classic a-Divergence
For parameterized pdf’s, such divergence induces an a-independent metric, but a-dependent dual connections:

12 Other Examples of D(a) Jensen Difference U-Divergence (a=1)

13 A Short Detour: Monotone Scaling
Define monotone embedding (“scaling”) of a measurable function p as the transformation r(p), where is a strictly monotone function. A Short Detour: Monotone Scaling Therefore, monotone embeddings of a given probability density function form a group, with functional composition as group operation: ii) r(t) = t as the identity element; iii) r1, r2 are strictly monotone, so is i) r is strictly monotone iff r-1 is strictly monotone; Observe: We recall that for a strictly convex function f :

14 DEFINITION: r-embedding is said to be conjugated to t-embedding with
respect to a strictly convex function f (whose conjugate is f*) if : Example: a-embedding

15 Parameterized Functions as Forming
a Submanifold under Monotone Scaling A sub-manifold is said to be r-affine if there exists a countable set of linearly independent functions li(z) over a measurable space such that: Here, q is called the “natural parameter”. The “expectation parameter” is defined by projecting the conjugated t-embedding onto the li(z): Example: For log-linear model (exponential family) The expectation parameter is:

16 Proposition. For the r-affine submanifold:
i) The following potential function is strictly convex: F(q) is called the generating (partition) functional. ii) Define, under the conjugate representations then is Fenchel conjugate of F*(h) is called the generalized entropy functional. Theorem. The r-affine submanifold is a-Hessian manifold.

17 An Application: the (a,b)-Divergence
a: parameter reflecting reference duality b: parameter reflecting representation duality An Application: the (a,b)-Divergence Take f=r-(b), where: called “alpha-embedding”, now denoted by b. They reduce to a-divergence proper A(a) and to Jensen difference E(a) :

18 Information Geometry on Banach Space
Proposition 1. Denote tangent vector fields which are, at given p on the manifold, themselves functions in Banach space. The metric and dual connections induced by take the forms: Written in dually symmetric form:

19 The metric and dual connections associated with are given by:
Corollary 1a. For a finite-dimensional submanifold (parametric model), with Remark: Choosing reduces to the forms of Fisher metric and the a-connections in classical parametric information geometry, where

20 Proposition 2. The curvature R(a) and torsion tensors T(a) associated with
any a-connection on the infinite-dimensional function space B are identically zero. Remark: The ambient space B is flat, so it embeds, as proper submanifolds, the manifold Mm of probability density functions (constrained to be positive-valued and normalized to unit measure); the finite-dimensional manifold Mq of parameterized probability models. B (ambient manifold) Mm Mq CAVEAT: Topology? (G. Pistone and his colleagues)

21 Proposition 3. The (a,b)-divergence for the parametric models gives rise to the Fisher metric proper and alpha-connections proper: Remark: The (a,b)-divergence is the homogeneous f-divergence As such, it should reproduce the standard Fisher metric and the dual alpha- connections in their proper form. Again, it is the ab that takes the role of the conventional “alpha” parameter.

22 Summary of Current Approach
Geometry Riemannian metric Fisher information Conjugate connections a-connection family Equi-affine structure cubic form, Tchebychev 1-form Curvature Divergence a-divergence equiv to d-divergence (Zhu & Rohwer, 1985) includes KL divergence as a special case f-divergence (Csiszar) Bregman divergence equivalent to the canonical divergence U-divergence (Eguchi) Summary of Current Approach Convex-based a-divergence for vector space of finite dim function space of infinite dim Generalized expressions of Fisher metric a-connections

23 References Zhang, J. (2004). Divergence function, duality, and convex analysis. Neural Computation, 16: Zhang, J. (2005) Referential duality and representational duality in the scaling of multidimensional and infinite-dimensional stimulus space. In Dzhafarov, E. and Colonius, H. (Eds.) Measurement and representation of sensations: Recent progress in psychological theory. Lawrence Erlbaum Associates, Mahwah, NJ. Zhang, J. and Hasto, P. (2006) Statistical manifold as an affine space: A functional equation approach. Journal of Mathematical Psychology, 50: Zhang, J. (2006). Referential duality and representational duality on statistical manifolds. Proceedings of the Second International Symposium on Information Geometry and Its Applications, Tokyo (pp 58-67). Zhang J. (2007). A note on curvature of a-connections of a statistical manifold. Annals of the Institute of Statistical Mathematics. 59, Zhang, J. and Matsuzuo, H. (in press). Dualistic differential geometry associated with a convex function. To appear in a special volume in the Springer series of Advances in Mechanics and Mathematics. Zhang, J. (under review) Nonparametric information geometry: Referential duality and representational duality on statistical manifolds.

24 Questions?


Download ppt "Information Geometry: Duality, Convexity, and Divergences"

Similar presentations


Ads by Google