Linear optimal transport subspaces for point set classification

Learning from point sets is an essential component in many computer vision and machine learning applications. Native, unordered, and permutation invariant set structure space is challenging to model, particularly for point set classification under spatial deformations. Here we propose a framework for classifying point sets experiencing certain types of spatial deformations, with a particular emphasis on datasets featuring affine deformations. Our approach employs the Linear Optimal Transport (LOT) transform to obtain a linear embedding of set-structured data. Utilizing the mathematical properties of the LOT transform, we demonstrate its capacity to accommodate variations in point sets by constructing a convex data space, effectively simplifying point set classification problems. Our method, which employs a nearest-subspace algorithm in the LOT space, demonstrates label efficiency, non-iterative behavior, and requires no hyper-parameter tuning. It achieves competitive accuracies compared to state-of-the-art methods across various point set classification tasks. Furthermore, our approach exhibits robustness in out-of-distribution scenarios where training and test distributions vary in terms of deformation magnitudes.

presents significant challenges due to the sparsity and noise in data, the accumulation of spatial deformations (such as affine deformations) in real-world point set data, and the high dimensionality of point sets, among other factors [1], [9]- [11].Furthermore, defining a metric or distance function for point set classification is challenging due to the permutation invariant nature of point sets resulting from the arbitrary order of points in a set [1], [12].However, despite these challenges, there has been a growing interest in developing new algorithms and techniques for point set classification.
In recent years, several research efforts have focused on point set classification, resulting in the development of various methods to address challenges in this area.Over the last few decades, point set classification methods have evolved from relying on feature engineering [13]- [16] to utilizing deep neural networks to learn representations and use them in classification tasks [1], [3], [17].Neural networks have emerged as a leading classification framework for point sets, providing end-to-end learning capabilities and eliminating the need for hand-crafted feature engineering.They have demonstrated to achieve high accuracy in several classification tasks, and are also suitable for parallel implementation using graphical processing units (GPUs) [1], [3], [17].However, the effectiveness of neural network-based methods is often limited by their high data requirements [18], high computational costs [1], and vulnerability to out-of-distribution samples, e.g., adversarial attacks [19]- [21].
While the conventional approach to modeling point sets involves direct processing of their coordinates, an alternative and less commonly used method is to represent a point set as a deformation of another point set [22].To address this challenge, point set deformation models have been developed utilizing the mathematics of optimal mass transport [22], [23].These models treat a point set as a smooth, nonlinear, and invertible transformation of a reference point set structure.The estimation of such models can be facilitated through the use of the linear optimal transport (LOT) transform, which has found applications in various fields [23].The LOT of a point set provides as a linear embedding for that point set which can be used to compare with other point set data [23].The LOT transform has been combined with various machine learning techniques and has been used in many applications [20], [23].
This paper introduces a new method for classifying point sets by expanding upon the LOT-based modeling frameworks.We start by introducing a transport generative model to define point set classes, where class elements can be conceived as instances of an unknown template point set pattern under the effect of unknown spatial deformations.Using the mathemat-ical properties of the LOT transform, we establish that these point set classes, under our generative model (with certain conditions on spatial deformations), can be constructed as convex subspaces in the LOT space, which are capable of accommodating the variations in point set data.Subsequently, we propose a nearest subspace-based classifier in the LOT space for classifying point sets under the given generative model.Our model is also capable of mathematically encoding invariances by integrating mathematical knowledge of deformations known to be present in the data.In our experiments, we particularly focus on datasets experiencing affine deformations and demonstrate the effectiveness of our method compared to several state-of-the-art methods.Our approach exhibits particular strength in situations characterized by limited training data and in the challenging out-of-distribution setting, where the training and test distributions differ in terms of deformation magnitudes.

II. PRELIMINARIES A. Linear optimal transport embeddings
The fundamental principle of optimal transport theory relies on quantifying the amount of effort (measured as the product of mass and distance) required to rearrange one distribution to another, which gives rise to the Wasserstein metric between distributions.In the present study, we utilize a linearized version of this metric, as outlined in [23], which is constructed formally through a tangent space approximation of the underlying manifold.
Following the construction in [23], we define the linear optimal transport transform for probability measures in P 2 (R L ), which is the set of absolutely continuous measures with bounded finite second moments and densities 1 .For simplicity, let us fix a reference measure σ as the Lebesgue measure on a convex compact set of R L .Thanks to Brenier's theorem [24], there is a unique minimizer T µ σ to the following optimal transportation problem min where the push-forward (transport) relation The linear optimal transport (LOT) transform is given by the following correspondence where each probability measure µ is identified with the optimal transport map T µ σ : R L → R L from a fixed reference σ to µ, which lies in a linear space.This square-root of the minimum is called the Wasserstein-2 distance between σ and µ [25].The LOT metric between two probability distributions µ, ν ∈ 1 Any µ ∈ P 2 (R L ) has the following two properties (i) bounded second moment, i.e. ∥x∥ 2 dµ(x) < ∞; (ii) absolute continuity with respect to the Lebesgue measure on R L with bounded density, i.e., µ has a density function fµ defined on R L with ∥fµ∥∞ < ∞. .
For simplicity, we denote µ as the LOT transform of µ, i.e., µ = T µ σ where σ is fixed.It turned out the linearization ability of LOT is closely related to the scope of the following so-called composition property [26], [27] T where g ∈ T L , and T L is the set of all diffeomorphisms from R L to R L .In particular, given a convex G ⊆ T L , the LOT embedding of deformed measures via maps in G become convex 3 if all g ∈ G satisfies the above composition property (4), which is shown more formally below.
When the dimension L ≥ 2, it is shown in [26] that g can only be "basic" transformations (more specifically, translations or isotropic scalings or their compositions) for the composition property (4) to hold for arbitrary µ's.Luckily, [27] proposes an approximate composition property for perturbations of the aforementioned basic transformations, the set of which we denote as Property 1 (Approximate composition, p.388 in [27] 4 ) Let ϵ ≥ 0 and µ ∈ P 2 (R L ).Let g ∈ T L such that ∥g − h∥ ≤ ϵ for some h ∈ A. Then there exists some δ such that Remark: Using the µ notation for LOT transform of µ, we have With the above approximate composition property, one can show the following approximate convexity analog of Proposition II.1 using Lemma A.3, A.4 of [27]: Proposition II.2.Let ϵ ≥ 0 and G ⊆ T L be convex such that for any g ∈ G, there exists some h ∈ A such that ∥g − h∥ ≤ ϵ.
where δ is given in the above approximate composition property.In particular, for any where

B. Discrete implementation for point sets
For the analysis of discrete point set data, a discrete version of the Linear Optimal Transport (LOT) embedding is required.In this particular case, both the reference σ and target µ are chosen as discrete probability measures, represented by point sets in R L .A point set in a L-dimensional space is a finite set of points in R L .A point set Ω s with N points can be thought as the image of an injective map s : Given a point set Ω s with N points, we define a discrete probability distribution associated with the point set as Given a diffeomorphism g ∈ T L , the push-forward distribution of P s under g is given as , the optimal transportation (Wasserstein-2) distance between associated distributions P s and P r can be obtained by solving the linear programming problem given below: where π ij ≥ 0, and Let us fix some r ∈ F N ,L and use P r as a reference.It turned out that any minimizer matrix π * to the optimal transport problem in ( 10) is a permutation matrix [25].In other words, there is a permutation σ Hence with r being fixed, an optimal transport map between P r and P s can be determined by σ * s and s.The LOT transform for P s is defined as [23] 6 and the LOT distance between two point set measures is where s, q ∈ F N ,L .

III. TRANSPORT BASED CLASSIFICATION PROBLEM STATEMENT
In this section, we present a generative model-based problem statement for the type of classification problems we discuss in this paper, building upon the preliminaries established earlier.Our focus is on point set classification, where every class can be viewed as a collection of instances of a prototype point set pattern (a template) observed under unknown spatial deformations.To formalize this concept, we introduce a generative model that provides a formal approach to characterizing point set data of this type.
Generative model Let G L ⊂ T L be a set of smooth oneto-one transformations in an L-dimensional space.The masspreserving generative model for the k-th class is defined to be the set where P φ (k) corresponds to the point set distribution of the prototype template pattern for the k-th class and P s (k) j represents the point set distribution of the j-th sample from the k-th class in S (k) .With these definitions, we can now construct a formal mathematical description for the generative modelbased problem statement for point set classification.
Classification problem: Let the set of point set distributions S (k) are given as in equation (13).Given training samples {P s (1) Note that the generative model in equation ( 13) describes set-structured point set data, which makes it challenging to compare point sets due to their permutation-invariant nature.The generative model above is also not guaranteed to be convex, presenting challenges for effective classification using machine learning techniques.In the subsequent sections, we present solutions to the above classification problem at first by restructuring the point clouds by providing linear optimal transport (LOT) embeddings for them and then by approximating the resulting convex spaces with subspaces as done in many image [21], [28], signal [29], [30], and gradient distribution [31] classification problems.

IV. PROPOSED SOLUTION
The LOT transform, which was previously described in section II, can significantly simplify the classification problem described earlier by providing a convex linear embedding for the set-structured point set data.Let us first investigate the generative model in equation (13) in the LOT transform space.Applying the approximate composition property (equation ( 6)) to the generative model in equation ( 13), we have the LOTspace generative model as follows: In this context, P s (k) j and P φ (k) refer to the LOT embeddings of P s (k) j and P φ (k) , respectively, with respect to a reference structure P r (see equation (11)).Based on the preliminary results presented in Section II (Property 1, PropositionII.2, and other results), it is possible to establish the convexity of the set S (k) up to a certain bound, subject to certain constraints.Furthermore, we can show that when S (k) ∩ S (p) = ∅, the intersection of S (k) with S (p) is empty [21].

A. Training phase
Based on the aforementioned theoretical discussions, we put forward a straightforward non-iterative training approach for the classification method.This involves computing a projection matrix that maps each sample in the LOT space onto the subspace V (k) (as outlined in [21]), generated by the 2δconvex set S (k) .Specifically, we estimate the projection matrix by applying the following procedure: Given a set of sample training data, denoted as , • • • }, the first step in our proposed method is to apply the LOT transform on them using a reference distribution P r (k) .This results in the generation of transformed samples, denoted as { P s (k) is obtained by selecting a point set at random from the training set, followed by the introduction of random perturbations.Subsequently, we estimate V (k) as follows: The proposed method also provides a structure to mathematically encode invariances with respect to deformations that are known to be present in the data [21], [28].In this paper, we prescribe methods to encode invariances with respect to a set of affine transformations: translation, isotropic and anisotropic scaling, and shear.Detailed descriptions of the deformation types used for encoding invariances and the corresponding methodologies are explained as follows: 1) Translation: Let g(x) = x + x 0 be the translation by Using equation ( 6), we have that P sg = g # P s ≈ g • P s = P s + x 0 , where P s = (( P s ) 1 , ( P s ) 2 , • • • , ( P s ) L ).Consequently, Therefore, as in [21], [28], we define the spanning set for translation as 2) Isotropic scaling: Let g(x) = ax be the normalized isotropic scaling of P s by a, where a ∈ R + and P sg = g # P s .Using equation ( 6), we have that P sg ≈ g • P s = a P s .As in [21], [28], an additional spanning set for isotropic scaling is not required as the subspace containing P s naturally contains its scalar multiplication a P s .Therefore, the spanning set for isotropic is defined as U D0 = ∅.3) Anisotropic scaling: Let g(x) = Dx be the normalized anisotropic scaling of P s , where a i ̸ = a j , a i ∈ R + , and P sg = g # P s .Using equa- tion (6), we have that P sg ≈ g • P s = D P s = a 1 ( P s ) 1 , a 2 ( P s ) 2 , • • • , a L ( P s ) L .Consequently, Therefore, the spanning set for anisotropic scaling is defined as Therefore, the spanning set for shear is defined as Finally, in light of the preceding discussion, we can approximate the enriched subspace where

B. Testing phase
To classify a given test sample P s , we first apply the LOT transform to P s to obtain its corresponding LOT space representation P s,r (k) with respect to the reference P r (k) (which was pre-selected duing the training phase).Assuming that the test samples originate from the generative model presented in equation ( 13) (or equation ( 14)), we can determine the class of an unknown test sample P s using the following expression: where d(•, •) is the distance between the test sample and the trained subspaces in the LOT transform space.We can estimate the distance between P s,r (k) and the trained subspaces using L2 , where the matrix B (k) contains the basis vectors of the subspace V (k) E arranged in its columns.

A. Experimental setup
Our objective is to analyze how the proposed method performs compared to state-of-the-art approaches in terms of classification accuracy, required training data, and robustness in out-of-distribution scenarios in limited training data setting.To achieve this, we created training sets of varying sizes from the original training set for each dataset under examination.We then trained the models using these training sets and assessed their performance on the original test set.Each train split was generated by randomly selecting (without replacement) samples from the original training set, and we repeated the experiments for each split size ten times.The same train-test data samples were used for all algorithms in each split.
In order to assess the effectiveness of the proposed approach, we utilized several comparison methods.These included PointNet [1], DGCNN [17], and multilayer perceptron (MLP) [32] in FSpool feature embedding space [16].We also conducted a comparative analysis with various conventional machine learning techniques across different set feature embedding spaces.These included logistic regression (LR), kernel support vector machine (k-SVM), multilayer perceptron (MLP), and nearest subspace (NS) classifier models [32] in GeM1, GeM2, GeM4 [13], COVpool [14], [15], and FSpool [16] embedding spaces.The performance of the proposed method was evaluated in relation to these baselines.We conducted these evaluations in addition to performing out-ofdistribution experiments.In the proposed method, we selected the number of basis vectors for the subspaces V (k) E such that the total variance explained by the chosen basis vectors in the k-th class captured up to 99% of the total variance explained by that class.
To assess the relative performance of the methods, we evaluated them on several datasets, including Point cloud MNIST [33], [34], ModelNet [35], and ShapeNet [36] datasets.We additionally applied random translations, anisotropic scaling, and shear transformations to both the training and test sets of the datasets.For the ShapeNet dataset, we tested the methods under two experimental setups: the regular setup, where both the training and test sets contained point sets at the same deformation magnitude level, and the out-of-distribution setup, where the training and test sets contained point sets at different deformation magnitude levels.

B. Accuracy in synthetic case
We first evaluated the effectiveness of the proposed method by comparing it with other state-of-the-art techniques on two synthetic datasets.The synthetic datasets were generated by selecting one sample per class from the point cloud MNIST and ShapeNet datasets, followed by introducing random translations, anisotropic scaling, and shear transformations to each selected sample to generate training and test sets.Specifically, the training set consisted of two samples per class, while the test set comprised 25 samples per class.The obtained comparative results are displayed in Fig. 1.As observed, the proposed method substantially outperformed the other methods in this synthetic scenario.

C. Accuracy and efficiency in real datasets
We conducted the performance evaluation of the proposed method by comparing it with several state-of-the-art techniques, including PointNet, DGCNN, and MLP in FSpool feature embedding space, on the MNIST, ShapeNet, and Mod-elNet datasets.Fig. 2 presents the average test accuracy values obtained for different numbers of training samples per class.The results demonstrate that our proposed method outperformed the other methods across the range of training sample sizes used to train the models.Notably, the proposed method's accuracy vs. training size curves exhibited a smoother trend in most cases compared to the other methods.

D. Out-of-distribution robustness
To assess the effectiveness of the proposed method under the out-of-distribution setting, we introduced a gap between the magnitudes of deformations in the training and test sets.Specifically, we used G out as the deformation set for the 'out-distribution' test set, while G in was the deformation set for the 'in-distribution' training set.We trained the models using the 'in-distribution' data and tested using the 'outdistribution' data.For our out-of-distribution experiment, we used the ShapeNet dataset with small deformations as the 'indistribution' training set and the ShapeNet dataset with larger deformations as the 'out-distribution' test set (see Fig. 3).The results show that the proposed method outperformed the other methods by an even more significant margin under the challenging out-of-distribution setup, as shown in Fig. 3.Under this setup, the proposed method obtained accuracy figures closer to that in the standard experimental setup (i.e., ShapeNet in Fig. 2).On the other hand, the accuracy of the other methods declined significantly under the out-ofdistribution setup compared to the standard experimental setup (see ShapeNet results in Figs. 2 and 3).

E. Comparison with set-embedding-based methods
We further evaluated the proposed method against various set embedding-based approaches in combination with classical machine learning methods.The study involved comparing the proposed method with different classifier techniques, including LR, k-SVM, MLP, and NS [32], that were employed with various set-to-vector embedding methods, such as GeM (1,2,4) [13], COVpool [14], [15], and FSpool [16].Fig. 4 illustrates the percentage test accuracy results obtained from these modified experiments, along with the results of the proposed method for comparison.As shown in Fig. 4, the proposed method outperformed all these models in terms of test accuracy.

VI. DISCUSSION
This paper presents a new method for classifying point sets using linear optimal transport (LOT) subspaces.Our method is appropriate for problems where the data at hand can be represented as instances of prototype template point set patterns observed under smooth, nonlinear, and one-to-one transformations.The results achieved in different experimental scenarios indicate that our proposed approach can deliver accuracy results comparable to state-of-the-art methods, provided that the data adheres to the generative model specified in equation (13).Additionally, the nearest LOT subspace technique was shown to be more data-efficient in these cases, meaning that it can attain higher accuracy levels using fewer training samples.
Our proposed method maintains high classification accuracy, even in challenging out-of-distribution experimental conditions, as depicted in Fig. 3, whereas the accuracy figures of other methods decline sharply.These results indicate that our method provides a better overall representation of the underlying data distribution, resulting in robust classification performance.The key to achieving better accuracy under outof-distribution conditions is that our method not only learns the deformations present in the data but also learns the underlying data model, including the types of the deformations, such as translation, scaling, and shear, and their respective magnitudes.These deformation types can be learned from just a few training samples containing those deformations, as well as potentially from the mathematically prescribed invariances proposed in [28].
Our proposed method, which utilizes the nearest subspace classifier in the LOT domain, is more suitable for classification problems in the above category compared with general set embedding methods in combination with classical machine learning classifiers, as demonstrated by its classification performance.Typically, point set data classes in their original domain do not constitute embeddings, and commonly used set-to-vector representation techniques are inadequate in generating effective embeddings for them, as indicated by the results.This presents a significant challenge for any machine learning approach to perform effectively.However, the subspace model is appropriate in the LOT domain since the LOT transform provides a linear embedding and convex data geometry.Moreover, considering the subspace model in the LOT space improves the generative nature of our proposed classification method by implicitly including the data points from the convex combination of the provided training data points.

VII. CONCLUSIONS
In this paper, we propose an end-to-end classification system designed for a specific category of point set classification problems, where a data class can be considered as a collection of instances of a template pattern observed under a set of spatial deformations.If these deformations are appropriately modeled as a collection of smooth, one-to-one, and nonlinear transformations, then the data classes become easily separable in the transform space, specifically the LOT space, due to the properties outlined in the paper.These properties also enable the approximation of data classes as convex subspaces in the LOT space, resulting in a more suitable data model for the nearest subspace method.As we observed in our experiments, this approach yields high accuracy and robustness against out-of-distribution conditions.Many point set classification problems can be formulated in this way, and therefore, our proposed solution has wide applicability.
Finally, we note that there can be many potential adaptations of the proposed method.For instance, the linear subspace method in the presented LOT space could be adjusted to incorporate alternative assumptions regarding the set that best represents each class.While some problems might benefit from a linear subspace method similar to the one described earlier, where all linear combinations are allowed, other problems may be require constraining the model using linear convex hulls.Additionally, investigating the sliced-Wasserstein distance using discrete CDT transform (as proposed in [31]) in conjunction with subspace models is another promising avenue for future research.
Our proposed approach provides promising results in point set classification and serves as a basis for further exploration in this domain.As the amount of 3D (or N-D) data continues to increase and accurate object recognition and scene understanding become more crucial, we believe that the combination of linear optimal transport embeddings and subspace modeling in the transform space will become increasingly significant in this context.We anticipate that our proposed method will inspire further research in this direction and lead to novel developments in recognizing 3D (or N-D) objects or distributions.

Fig. 1 .
Fig. 1.Percentage test accuracy comparison of different methods on synthetic datasets.

4 )
Shear: Let g(x) = Hx be the normalized shear of P s ,

Fig. 2 .
Fig. 2. The accuracy of different methods as a function of the number of training samples per class, evaluated on MNIST, ModelNet, and ShapeNet datasets.

Fig. 3 .
Fig. 3. Performance assessment under an out-of-distribution experimental setup where training and test distributions vary in terms of deformation magnitudes.The performance of the methods was assessed in terms of percentage test accuracy and plotted against the number of training images per class.

Fig. 4 .
Fig. 4. Comparative analysis of the percentage test accuracy results attained by the proposed method and the conventional machine learning techniques implemented across different feature embedding spaces.