The monograph provides an overview of the basic algorithms for cluster analysis, together with their motivations and discussion of their various modifications. For example, the classic k-means algorithm, intended to group observations from a mixture of multivariate normal distributions with different mean vectors and similar covariance matrices is presented and a number of embodiments thereof are discussed, like the harmonic k-means, spherical k-means (used to document the analysis), and kernel variants for grouping the data that are not linearly separable. Not only the crisp partitions, but also fuzzy ones are considered. The latter allow for an alternative formulation of the k-means algorithm and its various variants (including kernel fuzzy algorithms, or spherical fuzzy algorithm) and extensions obtained through the introduction of the Dempster-Shafer mass function. This leads to so-called creedal partitions, for which - depending on the quality of the data - you can talk about a crisp, fuzzy, probabilistic, possibilistic or rough partitions reflecting various shades of data (un-)certainty.
In this context the problems of appropriate choice of parameters for these algorithms as well as of their efficient implementations are discussed. An attempt is made to systematize the methods of assessing the quality of the clusters. Quality measures of covering a given set of observations are also presented, as from a practical point of view, it is more natural to formulate an algorithm that generates coverage of a given set of objects, and not its partition.
A further contribution of the monograph is a unified exposition of spectral clustering and diffusive data analysis methods. Both classical spectral clustering algorithms and their modifications, resulting from various understandings of the term “cluster”, are presented. The relationship between graph cuts and selected spectral properties of graph Laplacians are investigated.
Formal analogies between various clustering methods are pointed out in order to show the possibilities of transferring modifications between methodologies for purposes of further algorithm development.
The monograph covers also specialisation of selected algorithms for tasks of processing very large data sets. A number of solutions to the problem of semi-supervised clustering are also presented.
An abbreviated version of the monograph in English is available from the Institute of Computer Science of Polish Academy of Sciences.