Wednesday, April 16, 11 AM Bldg. 38A, B2 Conference room Branching principal components for data analysis and dimension reduction with biological applications Prof. Alexander Gorban Chair in Applied Mathematics, University of Leicester, UK In 1901, Karl Pearson explained to the scientific community that the problem of data approximation is (i) important and (ii) nice, and (iii) differs from the regression problem. He demonstrated how to approximate data sets with straight lines and planes. That is, he invented Principal Component Analysis (PCA). What was invented in the data approximation during the century? First of all, the approximation by linear manifolds (lines, planes, ...) was supplemented by a rich choice of the approximate objects. The important discovery is the approximation of a data set by a smaller finite set of "centroids". Between the "most rigid" linear manifolds and "most soft" unstructured finite sets there is the whole universe of approximants. We started from the problem of Principal Manifold construction, but Multidimensional data distributions can have complex topologies and variable local dimensions. To approximate complex data, we propose a new type of low-dimensional "principal object": a principal cubic complex. This complex is a generalization of linear and non-linear principal manifolds and includes them as a particular case. To construct such an object, we combine a method of topological grammars with the minimization of an elastic energy defined for its embedment into multidimensional data space. The whole complex is presented as a system of nodes and springs and as a product of one-dimensional continua (represented by graphs), and the grammars describe how these continua transform during the process of optimal complex construction. The simplest case of a topological grammar is equivalent to the construction of "principal trees", an object useful in many practical applications. We demonstrate how it can be applied to the analysis of bacterial genomes and for visualization of cDNA microarray data.