Imaging-based representation and stratification of intra-tumor heterogeneity via tree-edit distance

Personalized medicine is the future of medical practice. In oncology, tumor heterogeneity assessment represents a pivotal step for effective treatment planning and prognosis prediction. Despite new procedures for DNA sequencing and analysis, non-invasive methods for tumor characterization are needed to impact on daily routine. On purpose, imaging texture analysis is rapidly scaling, holding the promise to surrogate histopathological assessment of tumor lesions. In this work, we propose a tree-based representation strategy for describing intra-tumor heterogeneity of patients affected by metastatic cancer. We leverage radiomics information extracted from PET/CT imaging and we provide an exhaustive and easily readable summary of the disease spreading. We exploit this novel patient representation to perform cancer subtyping according to hierarchical clustering technique. To this purpose, a new heterogeneity-based distance between trees is defined and applied to a case study of prostate cancer. Clusters interpretation is explored in terms of concordance with severity status, tumor burden and biological characteristics. Results are promising, as the proposed method outperforms current literature approaches. Ultimately, the proposed method draws a general analysis framework that would allow to extract knowledge from daily acquired imaging data of patients and provide insights for effective treatment planning.


Patients' personal information summary
Supplementary Tables S1 and S2 summarize the patients' population.

Distance metrics for trees: brief literature review
In this section we present a high level overview of the literature concerning the comparison between different trees.
Dendrograms are unlabelled object which, in our context, may have a different number of leaves and do not hold any a-priori correspondence between the leaves in different objects. The literature dealing with the comparison of dendrograms divide in two macro-areas, including (1) metrics defined for clustering dendrograms and (2) metrics designed for merge trees. The first family of metrics mainly deals with labeled trees [1,2] as byproducts of a hierarchical clustering algorithm. We refer to Flesia et al. [3] for an exhaustive review of distance definitions. This kind of metrics are known to be heavily dependent on the graph structure of the dendrograms, like many others [4,5,6,7], leading to limitations when comparing dendrograms with a different number of leaves and lacking theoretical continuity results with respect to dendrogram-associated point clouds [8,9]. On the other hand, within topological data analysis [10], dendrograms are often referred as a particular case of merge trees, obtained when all the leaves of a merge tree lie at height 0. This allows to transfer merge trees literature to dendrogram analysis. Most of the metrics belonging to this second family, however, share one main drawback, namely the out of reach computational cost, which makes them unsuitable for our application [11,12,13]. Besides, the metrics with more performing algorithms [14,15] still lack the theoretical investigation to assess some practical properties, making them less worthy than others.

Building the vertices heights curve
In Figure 1, we explain in details how to build the curve displaying the filtered heights of one tree' vertices. In the plot there are three detailed examples showing how to obtain the curves in Figure 2a. The colors of the curves match the colors of the trees. For each tree and for every h ∈ R, the value f (h) is equal to the number of vertices (highlighted with dots) which lie above h. For instance consider the blue dendrogram: we obtain a value of 3 until we reach the height where two leaves merge, and then 1 for a small interval of values, and lastly 0 from the height of the root of the blue tree onwards.

Dendrograms construction
In this section we present few technical definition that we need in order to describe the dendrogram representation we employ. We describe the procedure in the general case of having a finite metric space (X, d) i.e. a finite set {x 1 , . . . , x n } with a metric d : X ×X → R ≥0 which is reflexive, symmetric and satisfies the triangular inequality. In our case we work with {x 1 , . . . , x n } ⊂ R n and the Euclidean norm.  Now, to obtain a dendrogram we need to add some kind of length measure to a tree structure.

Definition 2.
A merge tree (T, f ) is a finite tree structure T coupled with a monotone increasing function (with respect to partial ordering on V T ) f : V T → R. If f (l) = 0 for all l ∈ L T , then we say that the merge tree is a dendrogram. The function f also defines a weight value for every edge e = (v, f ather(v)): To build a hierarchical clustering dendrogram T C from a finite metric space (C, d C ) we proceed as follows. With K we indicate the set of clusters we are considering: (S0) at the beginning K = {{c} | c ∈ C}, and every c ∈ C is associated to a leaf v c ∈ V T C with f (v c ) = 0; (S1) consider all the couples of clusters k 1 , k 2 ∈ K and we measure the distance d(k 1 , k 2 ) according to some linkage; Then remove k and k ′ from K and add k ∪ k ′ to K; (S3) start again from (S1) unless K = C.
The linkage determines the distance d(k 1 , k 2 ) between k 1 , k 2 ⊂ C and the most common examples are: where #k i is the cardinality of the finite set k i .
• ward linkage: see [16] It is well known that single linkage is very sensitive to outliers, while complete linkage is the most conservative choice in term of clustering points together. Average linkage displays a kind of in-between behaviour. For this reason we resorted to average linkage.

3/10
Having a continuity result of the distance between dendrograms with respect to some metric between point clouds would surely benefit the consistency and the interpretability of the framework: whenever a representation of a datum is employed, looking at how changes in the representation reflect on changes in the initial datum can help both assessing the consistency of the pipeline, and familiarizing with the representation. In our case we are particularly interested in looking at what happens at the distance between dendrograms as two sequences of point clouds get closer and closer. To do so, we introduce the definitions of the Hausdorff and Gromov-Hausdorff metrics between point clouds. We then prove the continuity proposition between Hausdorff distance and the Edit distance, from which it follows the continuity proposition between Gromov-Hausdorff distance and the Edit distance. Given C = {x 1 , . . . , x n } and C ′ = {y 1 , . . . , y m } two point clouds in a metric space (X, d), we can build at least a function γ : C → C ′ such that γ(x i ) is (one of) the closest point(s) to x i , belonging to the cloud C ′ . Similarly, we can build ϕ : C ′ → C so that ϕ(y j ) is (one of) the closest point(s) to y j , belonging to the cloud C. The Hausdorff distance between C and C ′ is given by: The distance d H has been proven to be a metric for the space of all compact subsets of X [17].
Leveraging on the Hausdorff distance, given two compact metric spaces X and Y we define the Gromov-Hausdorff metric where γ and ϕ vary over all possible isometries of (respectively) X and Y into another (common) metric space Z [18]. Consider two point clouds in the metric space (X, d), C = {x 1 , . . . , x n } and C ′ = {y 1 , . . . , y m } and we consider T C and T C ′ the single linkage hierarchical clustering dendrograms obtained from C and C ′ respectively. In the following, we prove the following result: Proposition 1. Given C = {x 1 , . . . , x n } and C ′ = {y 1 , . . . , y m } point clouds in a metric space (X, d) and given T C and T C ′ single linkage hierarchical clustering dendrograms obtained from C and C ′ respectively, there is a simplicial complex S and two functions f : S → R and g : S → R such that the merge tree associated to f (via sublevel set filtration) is isomorphic to T C , the merge tree associated to g is isomorphic to T C ′ , and Proof. Let γ : C → C ′ and ϕ : C ′ → C be the two operators which map a point of a point cloud C ′ to (one of) the closest point(s) of the other cloud C and viceversa. Consider the following simplicial complex S. Its 0 simplices are x 1 , . . . , x n , y 1 , . . . , y m and its 1 simplices are all possible edges between 0 simplices, forming a complete graph. Now we define two functions f : S → R and g : S → R such that the merge trees T f and T g obtained with the lower star filtration from f and g (see [19], Section 2) are isomorphic to T C and T C ′ . Define: f (s) = 0 for every 0 simplex s. Then for a 1 simplex of the form e i j = (x i , x j ), we have f (e i j ) = d(x i , x j ). For 1 simplices of the form e ′ i j = (y i , x j ) we have f (e ′ i j ) = d(ϕ(y i ), x j ). Lastly, for 1 simplices of the form e ′′ i j = (y i , y j ) we have f (e ′′ i j ) = d(ϕ(y i ), ϕ(y j )). Note that t ∈ Im( f ) iff t = d(x i , x j ) for some i and j. Clearly, f is a finite set and we can order it: t 0 = 0 < t 1 < . . ..

Similarly we define g(e ′′
i j ) = d(y i , y j ), g(e ′ i j ) = (y i , γ(x j )) and f (e ′′ i j ) = (y i , y j ). Consider now the connected components of the graph S f t := {s ∈ S| f (s) ≤ t} for t ∈ R. If t < 0, S f t is empty. If t = t 0 = 0, then all 0 simplices are in S f 0 , plus the 1 simplices of the form (x i , y j ) such that ϕ(y j ) = x i and (y i , y j ) such that ϕ(y i ) = ϕ(y j ). This means that every vertex y i is connected with exactly one point x j and with all other y k such that ϕ(y k ) = x j . That is, there are n path connected components, one for each x i . Call such components [x i ].
Consider the value t = t 1 = d(x i , x j ). For every y ∈ ϕ −1 (x i ) and y ′ ∈ ϕ −1 (x j ), we have f ((y, y ′ )) = f ((x i , y ′ )) = f ((y, x j )) = f ((x i , x j )) = d(x i , x j ) and so all these 1 simplices get added, when passing from S f 0 to S f t 1 . Moreover, these are the only ones which get added. Which means that we get all possible edges between [x i ] and [x j ] but all others components are left unchanged. And this happens whenever we hit a level t k = d(x i , x j ): we add to the simplicial complexes

4/10
Thus, we can define a map η : which is an isomorphism of merge trees. An analogous proof yields the isomorphism between T C ′ and T g .
To conclude the proof it is enough to notice that: || f − g|| ∞ ≤ 2ε with ε = d H (C,C ′ ). In fact, for vertices s: f (s) = g(s) = 0. For an edge e, we have the following possibilities: • e = (y i , y j ): | f (e) − g(e)| = |d(ϕ(y i ), ϕ(y j )) − d(y i , y j )|; reasoning as above we obtain | f (e) − g(e)| ≤ 2ε Corollary 1. Given C = {x 1 , . . . , x n } and C ′ = {y 1 , . . . , y m } point clouds in (X, d) metric space, and given T C and T C ′ the single linkage hierarchical clustering dendrograms obtained from C and C ′ respectively, we have d E (T C , Proof. We apply Proposition 1 and then we are in the position to use Theorem 1 in [19] to obtain that d E (T f , T g ) ≤ 2(2d H (C,C ′ )) · (n + m).
With the above results, we can prove a last corollary involving the Gromov-Hausdorff distance between compact metric spaces.
Corollary 2. Given two finite metric spaces C = {x 1 , . . . , x n } and C ′ = {y 1 , . . . , y m } and given T C and T C ′ the single linkage hierarchical clustering dendrograms obtained from C and C ′ respectively, we have d E (T C , Proof. We apply Proposition 1 and Corollary 1 on the images γ(X) and ϕ(Y ) for every γ : X → Z, ϕ : Y → Z isometries, and for every Z metric space.
6 Proof about d µ P being a metric We prove the following proposition. Symmetry is obvious.
The triangle inequality holds for d E and so The linearity of the integral then entails d In this section, we test the metric d P µ and the whole pipeline employed in the case study of the main manuscript in a supervisedin a broad sense -and easier setting. In particular, the aim of this simulation is to showcase the differences between d E and d P µ and to which extent d P µ captures heterogeneity in a point cloud. We generate point clouds in R 2 according to two generating processes. The size n i 1 of the i-th point cloud of the first group is sampled uniformly from [2, 20] Z and then a sample of size (n i 1 , 2) is taken from a normal distribution N (0, σ 1 ), with σ 1 = 1. Similarly, the j-th point cloud of the second group has cardinality n j 2 sampled uniformly from [2, 10] Z, and the cloud itself is taken as a sample of size (n j 2 , 2) distributed according to N (0, σ 2 ), with σ 2 = 2. The data set of point clouds contains 50 clouds of the first group and 50 of the second group.
From the data-generating processes it is clear that the sources of variability between the two groups arise potentially from the different cardinalities of the point clouds and variance within each cloud. We want to show that, while the metric d E is susceptible to both kind of variability, d P µ , with an appropriately chosen measure µ, can mitigate the variability coming from higher cardinalities in the clouds sampled according to the first process. In particular, group 1 is expected to display a lower level of heterogeneity within each point cloud and thus those trees, for our purposes, should be regarded as more similar between each other compared to the other trees. The second group instead may not display a clear clustering structure, in fact, despite exhibiting a common level of heterogeneity, the different number of leaves and the different merging structure at the level of very heterogeneous leaves could prevent all such dendrograms to form a recognizable cluster -or, equivalently, could give birth to a cluster with higher dispersion.
Following the pipeline presented in the main manuscript, we extract average linkage hierarchical clustering dendrograms from the set of point clouds and take pairwise distances both with d E and d P µ . Examples of dendrograms belonging to the first and second groups can be found, respectively, in Fig. S3b and S3c. We select µ as in the main manuscript, Section 4.3.2, with the final choice being a Beta distribution with parameters a = 2, b = 8, as shown in Fig. S2a. The two matrices are reported in Fig.  S2, with data being ordered according to the two groups: the first 50 point clouds belong to the first group, and the following 50 to the second. By visual inspection of Fig. S2b and S2c we can clearly see that d E sees very little structure in the data, because of the two sources of variability (cardinality and variance) mixing up and preventing d E to discriminate between group 1 and 2. Instead d P µ recognizes a clear and pronounced cluster made by point clouds from group 1 plus, potentially, some other point clouds belonging to group 2. The rest of the point clouds of group 2 still show some agglomerative structure, but less evident. The matrix in Fig. S2d shows the pointwise differences between the values obtained with d E and d P µ , highlighting how the different behaviour of the two metrics concentrates on the data belonging to the first group.
To get more insights into the clustering structures expressed by d E and d P µ we extract the hierarchical clustering dendrograms with average linkage from the two matrices. These dendrograms are reported in Fig. S3a. The leftmost tree is obtained from d E and the central from d P µ . To better compare the clustering structures we remove from this last dendrogram the outlier (v53), obtaining the rightmost tree.
Visual inspection of the dendrograms in Fig. S3a reveals a two-clusters structure in both metric spaces, with this structure being much more recognizable in the metric space induced by d P µ . In particular, the rightmost dendrogram shows a very cohesive and compact cluster, with very low internal variability, which is absent in the leftmost tree. The other cluster of the same tree, instead, displays a much higher level of variability. Now we show that this clustering structure reflects the group structure that generated our data. We cut the rightmost tree to obtain two clusters. Then, for each cluster, we aggregate the points contained in the data of such cluster and we estimate the marginal densities from the obtained samples. The results of this estimation pipeline is showcased in Fig. S4. We see that we retrieve the two distributions which we used to generate the components of the point clouds of the two groups. This is precisely the behaviour we aimed to achieve: being insensitive to the cardinality of small homogeneous features, while still being sensitive to cardinalities and merging structures characterized by high heterogeneity. Supplementary Fig. S5 integrates the results, in terms of cluster characterization. (c) Matrix of pairwise distances obtained with d P µ .

Additional plots for clustering interpretation
(d) Absolute differences between the matrix obtained with d E and d P µ .

Fig. S 2.
The plot in the left upper corner is used to fix µ in the case study of Section 7, according to the procedure detailed in Figure 7 of the manuscript; the other figures show the pairwise distance matrices obtained in the case study of Section 7.