On Voronoi Diagrams on the Information-Geometric Cauchy Manifolds

We study the Voronoi diagrams of a finite set of Cauchy distributions and their dual complexes from the viewpoint of information geometry by considering the Fisher-Rao distance, the Kullback-Leibler divergence, the chi square divergence, and a flat divergence derived from Tsallis entropy related to the conformal flattening of the Fisher-Rao geometry. We prove that the Voronoi diagrams of the Fisher-Rao distance, the chi square divergence, and the Kullback-Leibler divergences all coincide with a hyperbolic Voronoi diagram on the corresponding Cauchy location-scale parameters, and that the dual Cauchy hyperbolic Delaunay complexes are Fisher orthogonal to the Cauchy hyperbolic Voronoi diagrams. The dual Voronoi diagrams with respect to the dual flat divergences amount to dual Bregman Voronoi diagrams, and their dual complexes are regular triangulations. The primal Bregman Voronoi diagram is the Euclidean Voronoi diagram and the dual Bregman Voronoi diagram coincides with the Cauchy hyperbolic Voronoi diagram. In addition, we prove that the square root of the Kullback-Leibler divergence between Cauchy distributions yields a metric distance which is Hilbertian for the Cauchy scale families.


Introduction
Let P = {P 1 , . . . , P n } be a finite set of points in a space X equipped with a measure of dissimilarity D(·, ·) : X × X → R + . The Voronoi diagram [1] of P partitions X into elementary Voronoi cells Vor(P 1 ), . . . , Vor(P n ) (also called Dirichlet cells [2]) such that Vor D (P i ):= X ∈ X, D(P i , X) ≤ D(P j , X), ∀j ∈ {1, . . . , n} denotes the proximity cell of point generator P i (also called Voronoi site), i.e., the locii of points X ∈ X closer with respect to D to P i than to any other generator P j . When the dissimilarity D is chosen as the Euclidean distance ρ E , we recover the ordinary Voronoi diagram [1]. The Euclidean distance ρ E (P, Q) between two points P and Q is defined as where p and q denote the Cartesian coordinates of point P and Q, respectively, and · 2 the 2 -norm. Figure 1 (left) displays the Voronoi cells of an ordinary Voronoi diagram for a given set of generators. The Voronoi diagram and its dual Delaunay complex [3] are fundamental data structures of computational geometry [4]. These core geometric data-structures find many applications in robotics, 3D reconstruction, geographic information systems (GISs), etc. See the textbook [1] for some of their When the dissimilarity is oriented or asymmetric, i.e., D(P, Q) = D(Q, P), one can define the reverse or dual dissimilarity D * (P, Q):=D(Q, P). This duality is termed reference duality in [6], and is an involution: (D * ) * (P, Q) = D(P, Q).
The dissimilarity D(P : Q) is called the forward dissimilarity.
That is, the dual Voronoi cell Vor * D (P i ) with respect to a dissimilarity D is the primal Voronoi cell Vor D * (P i ) for the dual (reverse) dissimilarity D * .
In general, we can build a Voronoi diagram as a minimization diagram [8] by defining the n functions f i (X):=D(P i : X). Then X ∈ Vor D (P i ) iff f i (X) ≤ f j (X) for all j ∈ {1, . . . , n}. Thus, by building the lower envelope [8] of the n functions f 1 (X), . . . , f n (X), we can retrieve the Voronoi diagram.
In this paper, we study the Voronoi diagrams induced by the Fisher-Rao distance [17][18][19], the Kullback-Leibler (KL) divergence [12] and the chi square distance [20] for the family C of Cauchy distributions. Cauchy distributions also called Lorentzian distributions in the literature [21,22].
The paper is organized with our main contributions as follows: In Section 2, we concisely review the information geometry of the Cauchy family: We first describe the hyperbolic Fisher-Rao geometry in Section 2.1 and make a connection between the Fisher-Rao distance and the chi square divergence, then we point out the remarkable fact that any α-geometry coincides with the Fisher-Rao geometry (Section 2.2), and we finally present dually flat geometric structures on the Cauchy manifold related to Tsallis' quadratic entropy [23,24] which amount to a conformal flattening of the Fisher-Rao geometry (Section 2.4). Section 3.3 proves that the square root of the KL divergence between any two Cauchy distributions yields a metric distance (Theorem 3), and that this metric distance can be isometrically embedded in a Hilbert space for the case of Cauchy scale families (Theorem 4). Section 4 shows that the Cauchy Voronoi diagrams induced either by the Fisher-Rao distance, the chi-square divergence, or the Kullback-Leibler divergence (and its square root metrization) all coincide with a hyperbolic Voronoi diagram [25] calculated on the Cauchy 2D location-scale parameters. This result yields a practical and efficient construction algorithm of hyperbolic Cauchy Voronoi diagrams [25,26] (Theorem 5) and their dual hyperbolic Cauchy Delaunay complexes (explained in detail in Section 6). We prove that the hyperbolic Cauchy Voronoi diagrams are Fisher orthogonal to the dual Cauchy Delaunay complexes (Theorem 6). In Section 4.2, we show that the primal Voronoi diagram with respect to the flat divergence coincides with the hyperbolic Voronoi diagram, and that the Voronoi diagram with respect to the reverse flat divergence matches the ordinary Euclidean Voronoi diagram. Finally, we conclude this work in Section 5.

Information Geometry of the Cauchy Family
We start by reporting the Fisher-Rao geometry of the Cauchy manifold (Section 2.1), then show that all α-geometries coincide with the Fisher-Rao geometry (Section 2.2). Then we recall that we can associate an information-geometric structure to any parametric divergence (Section 2.3), and finally dually flatten this Fisher-Rao curved geometry using Tsallis's quadratic entropy [23,24] (Section 2.4) and a conformal Fisher metric.

Fisher-Rao Geometry of the Cauchy Manifold
Information geometry [7,10,11] investigates the geometry of families of probability measures. The 2D family C of Cauchy distributions is a location-scale family [27] (and also a univariate elliptical distribution family [28]) where l ∈ R and s > 0 denote the location parameter and the scale parameter, respectively: where p(x):= 1 is the Cauchy standard distribution.
Let l λ (x):= log p λ (x) denote the log density. The parameter space H:=R × R + of the Cauchy family is called the upper plane. The Fisher-Rao geometry [17,19,29] of C consists in modeling C as a Riemannian manifold (C, g FR ) by choosing the Fisher Information metric [7] (FIm) as the Riemannian metric tensor, where ∂ m := ∂ ∂λ m for m ∈ {1, 2} (i.e., ∂ 1 = ∂ ∂l and ∂ 2 = ∂ ∂s ). The matrix [g FR ij ] is called the Fisher Information Matrix (FIM), and is the expression of the FIm tensor in a local coordinate system {e 1 , e 2 }: is then defined as the Riemannian geodesic length distance on the Cauchy manifold (C, g FR ): The Fisher information metric tensor for the Cauchy family [28] is where λ = (l, s) ∈ H. A generic formula for the Fisher-Rao distance between two univariate elliptical distributions is reported in [28]. This formula when instantiated for the Cauchy distributions yields the following closed-form formula for the Fisher-Rao distance: where However, by noticing that the metric tensor for the Cauchy family (Equation (14)) is equal to the scaled metric tensor g P of the Poincaré (P) hyperbolic upper plane [30]: we get a relationship between the square infinitesimal lengths (line elements) ds 2 FR = dl 2 +ds 2 2s 2 and ds 2 P = dx 2 +dy 2 y 2 as follows: It follows that the Fisher-Rao distance between two Cauchy distributions is simply obtained by rescaling the 2D hyperbolic distance expressed in the Poincaré upper plane [30]: where ρ P (l 1 , s 1 ; l 2 , s 2 ):=arccosh (1 + δ(l 1 , s 1 , l 2 , s 2 )) , with and This latter term δ shall naturally appear in Section 2.4 when studying the dually flat space obtained by conformal flattening the Fisher-Rao geometry. The expression δ(l 1 , s 1 , l 2 , s 2 ) of Equation (23) can be interpreted as a conformal divergence for the squared Euclidean distance [31][32][33].
We may also write the delta term using the 2D Cartesian coordinates λ = (λ (1) , λ (2) ) as: where λ ∈ H. In particular, when l 1 = l 2 , we get the simplified Fisher-Rao distance for Cauchy scale families: Proposition 1. The Fisher-Rao distance between two Cauchy distributions is The Fisher-Rao manifold of Cauchy distributions has constant negative scalar curvature κ = −2, see [28] for detailed calculations.

Remark 1.
It is well-known that the Fisher-Rao geometry of location-scale families amount to a hyperbolic geometry [27]. For d-variate scale-isotropic Cauchy distributions p λ (x) with λ = (l, s) ∈ R d × R, the Fisher information metric is g FR (λ) = 1 2s 2 I, where I denotes the (d + 1) × (d + 1) identity matrix. It follows that where where · 2 is the d-dimensional Euclidean 2 -norm: x = √ x x. That is, ρ FR [p l 1 ,s 1 , p l 2 ,s 2 ] is the scaled d-dimensional real hyperbolic distance [30] expressed in the Poincaré upper space model.
Let us mention that recently the Riemannian geometry of location-scale models was also studied from the complementary viewpoint of warped metrics [34,35]. [36] proposed to use the Wasserstein Information metric (WIm) expressed using the distribution parameter coordinates by the Wasserstein Information Matrix (WIM). They reported the explicit formula of the WIM for generic location-scale families:

Remark 2. Li and Zhao
In particular, the WIM of the Gaussian family (a location-scale family) is the identity matrix and yields the Euclidean geometry (see the Wasserstein geometry of Gaussians [37]). Although the WIM can be calculated for the Gaussian location-scale family, let us notice that the moments greater or equal to one (i.e., E[X] and E[X 2 ]) are not finite for the Cauchy distributions. Thus, the WIM is not well-defined for the Cauchy family since Equation (28) makes sense only for finite moments.

The Dualistic α-Geometry of the Statistical Cauchy Manifold
A statistical manifold [38] is a triplet (M, g, T) where g is a Riemannian metric tensor and T is a cubic totally symmetric tensor (i.e., T σ(i)σ(j)σ(k) = T ijk for any permutation σ). For a parametric family of probability densities M = {p λ (x)}, the cubic tensor is called the skewness tensor [7], and defined by: A statistical manifold structure (M, g, T) allows one to construct Amari's dualistic α-geometry [7] for any α ∈ R: Namely a quadruplet (M, g FR , ∇ −α , ∇ α ) where ∇ −α and ∇ α are dual torsion-free affine connections coupled to the Fisher metric g FR (i.e., ∇ −α = (∇ α ) * ). We refer the reader to the textbook [7] and the overview [11] for further details.
That is, we have (C, g FR ) = (C, g FR , ∇ 0 , ∇ 0 ). (30) In information geometry, the invariance principle states that the geometry should be invariant under the transformation of a random variable X to Y provided that Y = t(X) is a sufficient statistics [7] of X. The α-geometry (M, g FR , ∇ −α , ∇ α ) and its special case of Fisher-Rao geometry are invariant geometry [7,11] for any α ∈ R.
A remarkable fact is that all the α-geometries of the Cauchy family coincide with the Fisher-Rao geometry since the cubic skewness tensor T vanishes everywhere [28], i.e., T ijk = 0. The non-zero coefficients of the Christoffel symbols of the α-connections (including the Levi-Civita metric connection derived from the Fisher metric tensor) are: Thus, all α-geometries coincide and have constant negative scalar curvature κ = −2. In other words, we cannot choose a value for α to make the Cauchy manifold dually flat [7]. To contrast with this result, Mitchell [28] reported values of α for which the α-geometry is dually flat for some parametric location-scale families of distributions: For example, it is well known that the manifold N of univariate Gaussian distributions is ±1-flat [7]. The manifold S k of t-Student's distributions with k degrees of freedom is proven dually flat when α = ± k+5 k−1 [28]. Dually flat manifolds are Hessian manifolds [39] with dual geodesics being straight lines in one of the two dual global affine coordinate systems. On a global Hessian manifold, the canonical divergences are Bregman divergences. Thus, these dually flat Bregman manifolds are computationally friendly [15] as many techniques of computational geometry [4] can be naturally extended to these Hessian spaces (e.g., the smallest enclosing balls [40]).

Dualistic Structures Induced by a Divergence
A divergence or contrast function [13] is a smooth parametric dissimilarity. Let M denote the manifold of its parameter space. Eguchi [13] showed how to associate to any divergence D a canonical information-geometric structure (M, D g, D ∇, D ∇ * ). Moreover, the construction allows proving that D ∇ * = D * ∇. That is the dual connection D ∇ * for the divergence D corresponds to the primal connection for the reverse divergence D * (see [7,11] for details).
Conversely, Matsumoto [41] proved that given an information-geometric structure (M, g, ∇, ∇ * ), one can build a divergence D such that (M, g, T) = (M, D g, D T) from which we can derive the structure (M, D g, D ∇, D ∇ * ). Thus, when calculating the Voronoi diagram Vor D for an arbitrary divergence D, we may use the induced information-geometric structure (M, D g, D ∇, D ∇ * ) to investigate some of the properties of the Voronoi diagram: For example, is the bisector Bi D D ∇-autoparallel?, or is the bisector Bi D of two generators orthogonal with respect to the metric D g to their D ∇-geodesic? Section 4 will study these questions in particular cases.
A dually flat structure construction for q-Gaussians is reported in [7] (Sec. 4.3, pp. 84-89). We instantiate this construction for the Cauchy distributions (2-Gaussians): Let denote the deformed q-exponential and its compositional inverse, the deformed q-logarithm.
The probability density of a 2-Gaussian can be factorized as where θ denotes the 2D natural parameters. We have: Therefore the natural parameter is θ(l, s) = (θ 1 , ) and the deformed log-normalizer is In general, we obtain a strictly convex and C 3 -function F q (θ), called the q-free energy for a q-Gaussian family. Here, we let F(θ):=F 2 (q) for the Cauchy family: F(θ) is the Cauchy free energy.
We convert back the natural parameter θ ∈ Θ to the ordinary parameter λ ∈ H as follows: The gradient of the deformed log-normalizer is: The gradient ∇F(θ) defines the dual global affine coordinate system η:=∇F(θ) where η ∈ H = R × R + is the dual parameter space.
We used a computer algebra system (CAS, see Section 7) to calculate the closed-form formulas of the following definite integrals: Here, observe that the equivalent Bregman divergence is not on swapped parameter order as it is the case for ordinary exponential families: D KL [p θ 1 : p θ 2 ] = B F (θ 2 : θ 1 ) where F denotes the cumulant function of the exponential family, see [7,11].
We term the divergence D flat the flat divergence because its induced affine connection [13] D flat ∇ has zero curvature (i.e., the 4D Riemann-Christofel curvature tensor induced by the connection vanishes, see [7] p. 134). Since the flat divergence is interpreted as a conformal squared Euclidean distance [33], with conformal factor π s 1 . In general, the Fisher-Rao geometry of q-Gaussians has scalar curvature [44] Thus, we recover the scalar curvature κ = −2 for the Fisher-Rao Cauchy manifold since q = 2. Theorem 1. The flat divergence D flat [p λ 1 : p λ 2 ] between two Cauchy distributions is equivalent to a Bregman divergence B F (θ 1 : θ 2 ) on the corresponding natural parameters, and yields the following closed-form formula using the ordinary location-scale parameterization: The conversion of η-coordinates to θ-coordinates are calculated as follows: where is the Legendre-Fenchel convex conjugate [7]: Since we have that is independent of the location parameter l. Moreover, we have [7] We can convert the dual parameter η to the ordinary parameter λ ∈ H as follows: It follows that we have the following equivalent expressions for the flat divergence: where is the Legendre-Fenchel divergence measuring the inequality gap of the Fenchel-Young inequality: That is, The Hessian metrics of the dual convex potential functions F(θ) and F * (η) are: We check the Crouzeix identity [11,48]: where I denotes the 2 × 2 identity matrix. The Hessian metric ∇ 2 F(θ) is also called the q-Fisher metric [44] (for q = 2). Let g λ FR (λ) and g θ FR (θ) denote the Fisher information metric expressed using the λ-coordinates and the θ-coordinates, respectively. Then, we have where Jac λ (θ) denotes the Jacobian matrix: Similarly, we can express the Hessian metric g F :=∇ 2 F(θ) using the λ-coordinate system: We calculate explicitly the following Jacobian matrices: and We check that we have That is, the Riemannian metric tensors g λ FR (λ) and g λ F (λ) (or g θ F (θ) and g θ FR (θ)) are conformally equivalent. This is, there exists a smooth function u(λ) = log 2 πσ such that g λ F (λ) = e u(λ) g λ FR (λ). This dually flat space construction of the Cauchy manifold can be interpreted as a conformal flattening of the curved α-geometry [7,44,49]. The relationships between the curvature tensors of dual ±α-connections are studied in [50]. Notice that this dually flat geometry can be recovered from the divergence-based structure of Section 2.3 by considering the Bregman-Tsallis divergence. Figure 2 illustrates the relationships between the invariant α-geometry and the dually flat geometry of the Cauchy manifold. The q-Gaussians can further be generalized by χ-family with corresponding deformed logarithm and exponential functions [7,45]. The χ-family unifies both the dually flat exponential family with the dually flat mixture family [45].
A statistical dissimilarity D[p λ 1 : p λ 2 ] between two parametric distributions p λ 1 and p λ 2 amounts to an equivalent dissimilarity D(θ 1 : θ 2 ) between their parameters: D(θ 1 : θ 2 ):=D[p λ 1 : p λ 2 ]. When the parametric dissimilarity is smooth, one can construct the divergence-based α-geometry [11,51]. Thus, the dually flat space structure of the Cauchy manifold can also be obtained from the divergence-based ±α-geometry obtained from the flat divergence D flat (see Figure 2). It can be shown that the dually flat space q-geometry is the unique geometry in the intersection of the conformal Fisher-Rao geometry with the deformed χ-geometry (Theorem 13 of [45]) when the manifold is the positive orthant R d+1 . Please note that a dually flat space in information geometry is usually not Riemannian flat (with respect to the Levi-Civita connection, e.g., the Gaussian manifold). In particular, Matsuzoe proved in [52] that the Riemannian manifold (C, ∇ 2 F(θ)) induced by the q-Fisher metric is of constant curvature −1 when q = 2.
Cauchy family C χ-family invariant geometry dually flat space (Hessian manifold) Deformed exponential family conformal Riemannian geometry Fisher information metric There are many alternative possible ways to build a dually flat space from a q-Gaussian family once a convex Bregman generator F(θ) has been built from the density p q (θ) of a q-Gaussian. The method presented above is a natural generalization of the dually flat space construction for exponential families.
To give another approach, let us mention that Matsuzoe [52] also introduced another Hessian metric g M (θ) = [g M ij (θ)] defined by: This metric is conformal to both the Fisher metric and the q-Fisher metric, and is obtained by generalizing equivalent representations of the Fisher information matrix (see α-representations in [7]).

Invariant Divergences in Information Geometry
The f -divergences [20,53] between two densities p(x) and q(x) is defined for a positive convex function f , strictly convex at 1, with f (1) = 0 as: The KL divergence is a f -divergence obtained for the generator f (u) = − log(u). An invariant divergence is a divergence D which satisfies the information monotonicity [7]: D[p X : is a sufficient statistic. The invariant divergences are the f -divergences for the simplex sample space [7]. Moreover, the standard f -divergences (calibrated with f (1) = 0 and f (1) = f (1) = 1) induce the Fisher information metric (FIm) for its metric tensor I f g when the sample space is the probability simplex: I f g = g FR , see [7].
For the location-scale families which include the normal family N , the Cauchy family C and the t-Student families S k with fixed degree of freedom k, the α-divergences are not symmetric in general (e.g., α-divergences between two normal distributions). However, we have shown that the chi square divergences and the KL divergence are symmetric when densities belong to the Cauchy family. Thus, it is of interest to prove whether the α-divergences between Cauchy densities are symmetric or not, and report their closed-form formula for all α ∈ R.
Using symbolic integration described in Section 7, we found that and checked that this Chernoff similarity coefficient is symmetric: Therefore the 3-divergence I 3 between two Cauchy distributions is symmetric. In particular, when l 1 = l 2 = l, we find that In the Section7, we proved by symbolic calculations that the α-divergences are symmetric for α ∈ {0, 1, 2, 3, 4}.

Remark 3.
The Cauchy family can also be interpreted as a family of univariate elliptical distributions [28]. A univariate elliptical distribution has canonical parametric density: for some function h(u). For example, the Gaussian distributions are elliptical distributions obtained for h(u) = 1 √ 2π exp − 1 2 u . Location-scale densities p l,s with standard density p 0,1 can be interpreted as univariate elliptical distributions q µ,σ with h(u) = p 0,1 (u 2 ) and (µ, σ) = (l, s): p l,s = q µ,σ . It follows that the Cauchy densities are elliptical distributions for h(u) = 1 π(1+u) . By doing a change of variable in the KL divergence integral, we find again the following identity:

Metrization of the Kullback-Leibler Divergence
The Kullback-Leibler divergence [12] D KL [p : q] between two continuous probability densities p and q defined over the real line support is an oriented dissimilarity measure defined by: The closed-form formula for the KL divergence between two Cauchy distributions requires to perform a (non-trivial) integration task. The following closed-form expression has been reported in [58] using advanced symbolic integration: Although the KL divergence is usually asymmetric, it is a remarkable fact that it is symmetric between any two Cauchy densities. However, the KL divergence of Equations (92) and (93) does not satisfy the triangle inequality, and therefore although symmetric, it is not a metric distance.
The KL divergence between two Cauchy distributions is related to the Pearson D χ 2 P [p : q] and Neyman D χ 2 N [p : q] chi square divergences [20]: Indeed, the formula for the Pearson and Neyman chi square divergences between two Cauchy distributions coincide, and (surprisingly) amount to the δ distance: Since the Pearson and Neyman chi square divergences are symmetric, let us write D χ 2 [p : q] = D χ 2 P [p : q] in the remainder. We can rewrite the Fisher-Rao distance between two Cauchy distributions using the D χ 2 divergence as follows: Figure 3 plots the strictly increasing chi-to-Fisher-Rao conversion function:  Since the Cauchy family is a location-scale family, we have the following general invariance property of f -divergences: Theorem 2. The f -divergence [53] between two location-scale densities p l 1 ,s 1 and p l 2 ,s 2 can be reduced to the calculation of the f -divergence between one standard density with another location-scale density: Since the KL divergence is expressed by D KL [p l 1 ,s 1 : p l 2 ,s 2 ] = log 1 + 1 2 δ(l 1 , s 1 ; l 2 , s 2 ) , we also check that δ(l 1 , s 1 ; l 2 , s 2 ) = δ 0, 1; where It follows the following corollary for scale families: Many algorithms and data-structures can be designed efficiently when dealing with metric distances: For example, the metric ball tree [59] or the vantage point tree [60,61] are two such data structures for querying efficiently nearest neighbors in metric spaces. Thus, it is of interest to consider statistical dissimilarities which are metric distances. The total variation distance [12] and the square-root of the Jensen-Shannon divergence [62] are two common examples of statistical metric distances often met in the literature. In general, the metrization of f -divergences was investigated in [63,64].
We shall prove the following theorem: Theorem 3. The square root of the Kullback-Leibler divergence between two Cauchy density p l 1 ,s 1 and p l 2 ,s 2 is a metric distance: Proof. The proof consists in showing that the square root of the conversion function of the Fisher-Rao distance to the KL divergence is a metric transform [65]. A metric transform t(u) : R + → R + is a transform which preserves the metric distance ρ, i.e., (t • ρ)(p, q) = t(ρ(p, q)) is a metric distance. The following are sufficient conditions for function t(u) to be a metric transform: 1. t is a strictly increasing function, 2. t(0) = 0, 3. t satisfies that subadditive property: t(a + b) ≤ t(a) + t(b) for all a, b ≥ 0.
For example, strictly concave functions t(u) with t(0) = 0 are metric transforms. In general, one can check that t(u) is subadditive by verifying that the ratio of functions t(u) u is non-decreasing. The following transform t FR→KL (u) converts the Fisher-Rao distance ρ FR to the Kullback-Leibler divergence D KL : where cosh(x):= e x + e −x 2 . (114) The square root of that conversion function is a subadditive function since √ t FR→KL (u) u is non-decreasing (see Figure 4) and t FR→KL (0) = 0.  Since the Fisher-Rao distance is a metric distance and since t FR→KL (u) is a metric transform, we conclude that is a metric distance.
A metric distance ρ(p, q) is said to be Hilbertian if there exists an embedding φ(·) into a Hilbert space such that ρ(p, q) = φ(p) − φ(q) H , where · H is a norm. A metric is said to be Euclidean if there exists an embedding with associated norm 2 , the Euclidean norm. For example, the square root of the celebrated Jensen-Shannon divergence is a Hilbertian distance [62].
Let us prove the following: Theorem 4. The square root of the KL divergence between to Cauchy densities of the same scale family is a Hilbertian distance.
Proof. For Cauchy distributions with fixed location parameter l, the KL divergence of Equation (93) simplifies to: We can rewrite this KL divergence as where A(s 1 , s 2 ) = s 1 +s 2 2 and G(s 1 , s 2 ) = √ s 1 s 2 are the arithmetic mean and the geometric mean of s 1 and s 2 , respectively. Then we use Lemma 3 of [66] to conclude that D KL [p l,s 1 : p l,s 2 ] is a Hilbertian metric distance.
Another proof consists in rewriting the KL divergence as a scaled Jensen-Bregman divergence [66,67]: where for a strictly convex generator F. We use F(θ) = − log(u), i.e., the Burg information yielding the Jensen-Burg divergence JB F . Then we use Corollary 1 of [66] (i.e., F is the cumulant of an infinitely divisible distribution) to conclude that JB F (θ 1 , θ 2 ) is a metric distance (and hence, ρ KL (l, s 1 , l, The α-skewed Jensen-Bregman divergence is defined by and the maximal α-skewed Jensen-Bregman divergence is called the Jensen-Chernoff divergence: The maximal exponent α * corresponds to the error exponent in Bayesian hypothesis testing on exponential family manifolds [57]. In general, the metrization of Jensen-Bregman divergence (and Jensen-Chernoff) was studied in [68].
Furthermore, by combining Corollary 1 of [66] with Theorem 3 of [67], we get the following proposition: Proposition 2. The square root of the Bhattacharyya divergence between two densities of an exponential family is a metric distance when the exponential family is infinitely divisible.

This proposition holds because the Bhattacharyya divergence
between two parametric densities p(x) = p θ 1 (x) and q(x) = p θ 2 (x) of an exponential family with cumulant function F amounts to a Jensen-Bregman divergence [67] (Theorem 3 of [67]): Notice that Proposition 2 recovers the fact that the square root of the Bhattacharyya divergence between two zero-centered normal distributions is a metric (proved differently in [69]) since the set of normal distributions form an infinitely divisible exponential family.

Cauchy Voronoi Diagrams and Dual Cauchy Delaunay Complexes
Let us consider the Voronoi diagram [1] of a finite set P = {p λ 1 , . . . p λ n } of n Cauchy distributions with the location-scale parameters λ i = (l i , s i ) ∈ H for i ∈ {1, . . . , n}. We shall consider the Fisher-Rao distance ρ FR , the KL divergence D KL and its square root metrization ρ KL , the chi square divergence D χ 2 , and the flat divergence D flat .

The Hyperbolic Cauchy Voronoi Diagrams
Observe that the Voronoi diagram does not change under any strictly increasing function t of the dissimilarity measure (e.g., square root function): Vor D•t (P ) = Vor D (P ). Thus, we get the following theorem: Theorem 5. The Cauchy Voronoi diagrams under the Fisher-Rao distance, the the chi-square divergence and the Kullback-Leibler divergence all coincide, and amount to a hyperbolic Voronoi diagram on the corresponding location-scale parameters.
Proof. The KL divergence can be expressed as Thus, both the D KL and ρ FR dissimilarities are expressed as strictly increasing functions of δ (a synonym for the D χ 2 divergence). Therefore the Voronoi bisectors between two Cauchy distributions p l 1 ,s 1 and p l 2 ,s 2 for D ∈ {ρ FR , D KL , √ D KL , D χ 2 } amounts to the same expression: Bi D (p l 1 ,s 1 : p l 2 ,s 2 ) = {(l, s) ∈ H : δ(l, s, l 1 , s 1 ) = δ(l, s, l 2 , s 2 )} .
It follows that we can calculate the Cauchy Voronoi diagram of n Cauchy distributions in optimal Θ(n log n) time by calculating the 2D hyperbolic Voronoi diagram [25,26] on the location-scale parameters (see Section 6 for details). Figure 5 displays the Voronoi diagram of a set of Cauchy distributions by its equivalent parameter hyperbolic Voronoi diagram in the Poincaré upper plane model, the Poincaré disk model, and the Klein disk model. Figure 6 shows the hyperbolic Voronoi diagram in the upper plane with colored Voronoi cells. A model of hyperbolic geometry is said to be conformal if it preserves angles, i.e., its underlying Riemannian metric tensor is a scalar positive function of the Euclidean metric tensor. The Poincaré disk model and the Poincaré upper plane model are both conformal models [30]. The Klein model is not conformal, except at the disk origin. Let D = {p : p < 1} denote the open unit disk domain for the Poincaré and Klein disk models. Indeed, the Riemannian metric corresponding to the Klein disk model is where dp = dx + dy and ds Eucl = dx 2 + dy 2 denotes the Euclidean line element. Since ds 2 Klein (0) = ds 2 Eucl , we deduce that Klein model is conformal at the origin (when measuring the angles between two vectors v 1 and v 2 of the tangent plane T 0 ).  The dual of the Voronoi diagram is called the Delaunay (simplicial) complex [4,5]: We build the Delaunay complex by drawing an edge between generators whose Voronoi cells are adjacent. For the ordinary Euclidean Delaunay complex with points in general position (i.e., no d + 2 cospherical points in dimension d), the Delaunay complex triangulates the convex hull of the points [8,70]. Therefore it is called the Delaunay triangulation [1,3,8]. Figure 7 displays an Euclidean Voronoi diagram with its dual Delaunay triangulation. Similarly, for the hyperbolic Voronoi diagram, we construct the hyperbolic Delaunay complex by drawing a hyperbolic geodesic edge between any two generators whose Voronoi cells are adjacent. However, we do not necessarily obtain anymore a geodesic triangulation of the hyperbolic geodesic convex hull but rather a simplicial complex, hence the name hyperbolic Delaunay complex [5,71,72]. In extreme cases, the hyperbolic Delaunay complex has a tree structure. See Figure 8 for examples of a hyperbolic Delaunay triangulation and a hyperbolic Delaunay complex which is not a triangulation In fact, hyperbolic geometry is very well-suited for embedding isometrically with low distortion weighted tree graphs [73]. Hyperbolic embeddings of hierarchical structures [74] has become a hot topic in machine learning.
Let us now prove that these Cauchy hyperbolic Voronoi/Delaunay structures are Fisher orthogonal: Theorem 6. The Cauchy Voronoi diagram is Fisher orthogonal to the Cauchy Delaunay complex.
Proof. It is enough to prove that the corresponding hyperbolic geodesic γ(p λ 1 , p λ 2 ) is orthogonal to the bisector Bi(p λ 1 : p λ 2 ). The distance in the Klein disk model is The equation of the hyperbolic bisector in the Klein disk model [25] is Using a Möbius transformation [25] (i.e., a hyperbolic "rigid motion"), we may consider without loss of generality that p λ 1 = −p λ 2 . It follows that the bisector equation writes simply as Since the Klein disk model is conformal at the origin, we deduce from Equation (130) that we have

Remark 4.
The hyperbolic Cauchy Voronoi diagram can be used for classification tasks in statistics as originally motivated by C.R. Rao in his celebrated paper [17]: Let p λ 1 , . . . , p λ n be n Cauchy distributions, and x 1 , . . . , x s be s identically and independently samples drawn from a Cauchy distribution p λ . We can estimateλ the location-scale parameters from the s samples [75], and then decide the multiple test hypothesis H i : p λ = p λ i by choosing the hypothesis H i such that ρ FR (p λ i , p λ ) ≤ ρ FR (p λ j , p λ ) for all j ∈ {1, . . . , n}. This classification task amounts to perform a nearest neighbor query in the Fisher-Rao hyperbolic Cauchy Voronoi diagram. Hypothesis testing for comparing location parameters based on Rao's distance is investigated in [76]. Notice that it is possible to construct a set of points such that all hyperbolic Voronoi cells for that point set are unbounded. See Figure 11 for such an example.
The ordinary Euclidean Delaunay triangulation satisfies the empty sphere property [4,77]: That is the circumscribing spheres passing through the vertices of the Delaunay triangles of the Delaunay complex are empty of any other Voronoi site. This property still holds for the hyperbolic Delaunay complex which is obtained by a filtration of the ordinary Euclidean Delaunay triangulation in [5]. A hyperbolic ball in the Poincaré conformal disk model or the upper plane model has the shape of a Euclidean ball with displaced center [71]. Figure 12 displays the Delaunay complex with the empty sphere property in the Poincaré and Klein disk models. The centers of these circumscribing spheres are located at the T-junctions of the Voronoi diagrams.

The Dual Voronoi Diagrams on the Cauchy Dually Flat Manifold
The dual Cauchy Voronoi diagrams with respect to the flat divergence D flat (and dual reverse flat divergence D * flat which corresponds to a dual Bregman-Tsallis divergence) of Section 2.4 amount to calculate 2D dual Bregman Voronoi diagrams [15,16]. We get the following dual bisectors: The primal bisector with respect to the dual flat divergence is: = {λ : δ(l 1 , s 1 ; l, s) = δ(l 2 , s 2 ; l, s)} .
Thus, this primal bisector with respect to the flat divergence corresponds to the hyperbolic bisector of the Fisher-Rao distance/chi square/ KL divergences: The dual bisector with respect to the dual flat divergence (reverse Bregman-Tsallis divergence) is: That is, the dual bisector corresponds to an ordinary Euclidean bisector: Notice that Bi * D flat (p λ 1 : p λ 2 ) = Bi D * flat (p λ 1 : p λ 2 ). To summarize, one primal bisector coincides with the Fisher-Rao bisector while the dual bisector amounts to the ordinary Euclidean bisector. Theorem 7. The dual Cauchy Voronoi diagrams with respect to the flat divergence can be calculated efficiently in Θ(n log n)-time.
The construction of 2D Bregman Voronoi diagrams is described in [15].

The Cauchy Voronoi Diagrams with Respect to α-Divergences
The dual bisectors with respect to the α-divergences between any two parametric probability densities p λ 1 (x) and p λ 2 (x) are and It is an open problem to prove when the dual α-bisectors coincide for the Cauchy family. We have shown it is the case for the χ 2 -divergence and the KL divergence. In theory, the Risch semi-algorithm [78] allows one to answer whether a definite integral has a closed-form formula or not. However, the Risch semi-algorithm is only a semi-algorithm as it requires to implement an oracle to check whether some mathematical expressions are equivalent to zero or not.

Conclusions
In this paper, we have considered the construction of Voronoi diagrams of finite sets of Cauchy distributions with respect to some common statistical distances. Since statistical distances can potentially be asymmetric, we defined the dual Voronoi diagrams with respect to the forward and reverse/dual statistical distances. From the viewpoint of information geometry [7], we have reported the construction of two types of geometry on the Cauchy manifold: (1) The invariant α-geometry equipped with the Fisher metric tensor g FR and the skewness tensor T from which we can build a family of pairs of torsion-free affine connections coupled with the metric, and (2) a dually flat geometry induced by a Bregman generator defined by the free energy F q of the q-Gaussians (here, instantiated to q = 2 when dealing with the Cauchy family). The metric tensor of the latter geometry is called the q-Fisher information metric, and is a Riemannian conformal metric of the Fisher information metric. We have shown that the Fisher-Rao distance amount to a scaled hyperbolic distance in the Poincaré upper plane model (Proposition 1), and that all Amari's α-geometries [7] coincide with the Fisher-Rao geometry since the cubic tensor vanishes, thus yielding a hyperbolic manifold of negative constant scalar curvature κ = −2 for the Cauchy α-geometric manifolds. We noticed that the Fisher-Rao distance and the KL divergence can be expressed as a strictly increasing function of the chi square divergence. Then we explained how to conformally flatten the curved Fisher-Rao geometry to obtain a dually flat space where the flat divergence amounts to a canonical Bregman divergence built from Tsallis' quadratic entropy (Theorem 1). We reported the Hessian metrics of the dual potential functions of the dually flat space, and showed that there are other alternative choices for building Hessian structures [52]. Table 1 summarizes the various closed-form formula of statistical dissimilarities obtained for the Cauchy family. We proved that the square root of the KL divergence between any two Cauchy distributions is a metric distance (Theorem 3) in general, and more precisely a Hilbertian metric for the scale Cauchy families (Theorem 4). It follows that the Cauchy Voronoi diagram for the Fisher-Rao distance coincides with the Voronoi diagram with respect to the KL divergence or the chi square divergence ( Figure 13). We showed how to build this hyperbolic Cauchy diagram from an equivalent hyperbolic Voronoi diagram on the corresponding location-scale parameters (see also Section 6). Then we proved that the dual hyperbolic Cauchy Delaunay complex is Fisher orthogonal to the Fisher-Rao hyperbolic Cauchy Voronoi diagram (Theorem 6). The dual Voronoi diagrams with respect to the dual flat divergences can be built from the corresponding dual Bregman-Tsallis divergences with the primal Voronoi diagram coinciding with the hyperbolic Voronoi diagram and the dual diagram coinciding with the ordinary Euclidean Voronoi diagram ( Figure 13). These results are particular to the special case of the Cauchy location-scale family, and do not hold in general for arbitrary location-scale families since the cubic tensor may not vanish [28] and the KL divergence is usually asymmetric (e.g., the Gaussian location-scale family). However, the Fisher-Rao geometry of any location-scale family amounts after a potential rescaling to hyperbolic geometry [27,79]. Figure 13. Voronoi diagrams of a set of Cauchy distributions with respect to the Fisher-Rao (FR) distance ρ FR , the Kullback-Leibler (KL) divergence D KL , the χ 2 -divergence D χ 2 , and the asymmetric Bregman-Tsallis flat divergence D flat .

Klein Hyperbolic Voronoi Diagram from a Clipped Power Diagram
We concisely recall the efficient construction of the hyperbolic Voronoi diagram in the Klein disk model [25]. Let P = {p 1 , . . . , p n } be a set of n points in the d-dimensional open unit ball domain D = x ∈ R d : x 2 < 1 , where · 2 denotes the Euclidean 2 -norm. The hyperbolic distance between two points p and q is expressed in the Klein model as follows: It follows that the Klein bisector between any two points in the Klein disk is an hyperplane (affine equation) clipped to D: The Klein bisector is a hyperplane (i.e., line in 2D) restricted to the disk domain D. A Voronoi diagram is said to be affine [8] when all bisectors are hyperplanes. It is known that affine Voronoi diagrams can be constructed from equivalent power diagrams [8]. Thus, the Klein hyperbolic Voronoi diagram is equivalent to a clipped power diagram: where denotes the power "distance" between a point x (and more generally a weighted point [80] when the weight can be negative) to a sphere σ = (c, w), and S = {σ 1 = (c 1 , w 1 ), . . . , σ n = (c n , w n )} is the equivalent set of weighted points. The power distance is a signed distance since we have the following property: D PD (σ, x) < 0 iff x ∈ int(σ), i.e., the point x falls inside the sphere σ = {x : x − c 2 2 = w}. The power bisector is a hyperplane of equation Notice that by shifting all weights by a predefined constant a, we obtain the same power bisector since (w i + a) − (w j + a) = w i − w j is kept invariant. Thus, we may consider without loss of generality that all weights are non-negative, and that the weighted points correspond to spheres with non-negative radius r 2 i = w i .
By identifying Equation (142) with Equation (145), we get the following equivalent spheres σ i = (c i , w i ) [25] for the points in the Klein disk: We can then shift all weights by the constant a = min i∈{1,...,n} w i so that w i = w i + a ≥ 0. Thus, the Klein hyperbolic Voronoi diagram is a power diagram clipped to the unit ball D [80][81][82]. In computational geometry [4], the power diagram can be calculated from the intersection of n halfspaces by lifting the spheres σ i to corresponding halfspaces H + i of R d+1 as follows: Let F = {(x, z) ∈ R d+1 : z ≥ ∑ d i=1 x 2 i } be the epigraph of the paraboloid function, and ∂F denotes its boundary. We lift a point x ∈ R d to ∂F using the upper arrow operator x ↑ = (x, z = ∑ d i=1 x 2 i ), and we project orthogonally a point (x, z) of the potential function F by dropping its last z-coordinate so that we have ↓ (x ↑ ) = x. Now, when we lift a sphere σ = (c, w) to F , the set of lifted points σ ↑ all belong to a hyperplane H σ , called the polar hyperplane of equation: Let H + σ denote the upper halfspace with bounding hyperplane H σ : H + σ : z ≥ 2c x − c c + w. Then one can show [4] that Vor D PD (S) is obtained as the vertical projection ↓ of the intersection of all these polar halfspaces H i with ∂F : Transforming back and forth non-vertical (d + 1)-dimensional hyperplanes to corresponding d-dimensional spheres allows one to design various efficient algorithms, e.g., computing the intersection or the union of spheres [4], useful primitives for molecular chemistry [1].
Let H − D denote the lower halfspace (containing the origin (x = 0, z = 0)) supported by the polar hyperplane associated with the boundary sphere of the disk domain D. Computing the clipped power diagram Vor D PD (S) ∩ D can be done equivalently as follows: using the commutative property of the set intersection. The advantage of the method of Equation (151) is that we begin to clip the power diagram using H − D before explicitly calculating it. Indeed, we first compute the intersection polytope of n + 1 hyperplanes P K : Then we project down orthogonally the intersection of P K with ∂F to get the clipped power diagram equivalent to the hyperbolic Klein Voronoi diagram: By doing so, we potentially reduce the algorithmic complexity by avoiding to compute some of the vertices of P PD := ∩ n i=1 H + i whose orthogonal projection fall outside the domain D. More generally, a Bregman Voronoi diagram [15] can be calculated equivalently as a power diagram (and intersection of d + 1-dimensional halfspaces) using an arbitrary smooth and strictly convex potential function F instead of the the paraboloid potential function of Euclidean geometry [25]. The non-empty intersection of halfspaces can in turn be calculated as an equivalent convex hull [4]. Thus,