Clustering Data

Load this file with "load clustering".

Functions for clustering data.

function kmeanscluster (x : numerical, k : index)

  Cluster the rows in x in k clusters
  
  It uses the algorithm proposed by Lloyd, and used by Steinhaus,
  MacQueen. The algorithm starts with a random partition (cluster).
  Then it computes the means of the clusters, and associates each
  point to the cluster with the closest mean. It loops this
  procedure until there are no changes.
  
  The function works for multi-dimensional x too. The means are then
  vector means, and the distance to the mean is measured in Euclidean
  distance.
  
  x : rows containing the data points
  k : number of clusters that should be used
  
  Returns j : indices of the clusters the rows should belong to.

function similaritycluster (S : numerical, k : index)

  Cluster data depending on the similarity matrix S
  
  This clustering uses the first k eigenvalue of S, and clusters
  the entries of their eigenvalues.
  
  S : similarity matrix (symmetric)
  k : number of clusters
  
  Returns j : indices of the clusters the rows should belong to.

function eigencluster (x : numerical, k : index)

  Cluster the rows in x in k clusters
  
  This algorithm uses the similarity matrix S, which contains the
  Euclidean distances of two rows in x. Then it uses the the function
  similaritycluster() to get the clustering of the similarity
  matrix.
  
  x : rows containing the data points
  k : number of clusters that should be used
  
  Returns j : indices of the clusters the rows should belong to.

Documentation Homepage