Randomized Algorithms for Matrix Computations and Applications to Data Mining
Abstract:
The introduction of randomization in the design and analysis of algorithms for
matrix computations (such as matrix multiplication, least-squares regression,
the Singular Value Decomposition (SVD), etc.) over the last decade provided a
new paradigm and a complementary perspective to traditional numerical linear
algebra approaches. These novel approaches were motivated by technological
developments in many areas of scientific research that permit the automatic
generation of large data sets, which are often modeled as matrices.
In this talk we will focus on the Singular Value Decomposition (SVD) of
matrices and the related Principal Components Analysis, which have found
numerous applications in extracting structure from datasets in diverse
domains, ranging from the social sciences and the internet to biology and
medicine. The extracted structure is expressed in terms of singular vectors,
which are linear combinations of all the input data and lack an intuitive
physical interpretation. We shall discuss matrix decompositions which express
the structure in a matrix as linear combination of actual columns (or rows) of
the matrix. Such decompositions are easier to interpret in applications, since
the selected columns and rows are subsets of the data. Our (randomized)
algorithms run in cubic time and come with strong accuracy guarantees.
Finally, we will discuss applications of these algorithms to human genetics
and text mining.