Parallelization of a finite element surface fitting algorithm for data mining
Peter Christen, Irfan Altas, Markus Hegland, Stephen Roberts, Kevin Burrage and Roger Sidje
(Received 7 August 2000)
Abstract
A major task in data mining is to develop automatic techniques to
process and to detect patterns in very large data sets. An important
data mining technique is multivariate regression, and an essential sub
task is the estimation of interaction surfaces, i.e. the estimation of
functions of two variables. Thin plate splines provide a very good
method to determine an approximating surface. Obtaining standard thin
plate splines requires the solution of a dense linear system of
equations of order n, where n is the number of observations.
Standard thin plate splines may not be practical, because the number
of observations for data mining applications is often in the millions.
We have developed a finite element approximation of a spline that can
handle data sizes with millions of records. The resolution of the
finite element method can be chosen independently from the number of
observations. The observation data is read from secondary storage once,
and does not need to be stored in memory. In this paper, we present a
first parallel implementation of this method in an MPI environment.