Hierarchical Clustering
   -- Decision Tree

Partitioning Clustering
   -- k-means

Density-based Clustering
   -- EM algorithm

Topology-based Clustering
   -- Self-organizing Map
   (SOM)
   -- Growing Neural Gas
   (GNG)

Unsupervised Clustering


  • Clustering in the field of artificial neural networks have performed its usefulness in numerous research fields such as statistics, data mining and multivariate analysis.
  • A little or no a priori knowledge is given.
  • The clustering organizes or separates the unlabeled data into natural groups according to the internal homogeneity of the data.

Topology-based Clustering


Features:

  • Data visualization
  • Information processing such as an association by topology
  • Processing a nonlinear data
    • Self-Organizing Map (SOM)
      • a fixed network
    • Growing Neural Gas (GNG)
      • a dynamic network
      • summary of large scale data
    • Several studies have introduced to improve the ability of topology-based clustering.
      • The original topology-based clustering methods utilize the Euclidean distance as the similarity measure.
      • Due to the global nature of the MSE cost function, the updating of the weight vectors is greatly influenced by the outliers.
    • One of the successful approaches is to apply the kernel method for the network learning process.
    • Chalasani and Principe [1] introduced Correntropy Induced Metric (CIM) to SOM, which is called SOM-CIM, based on the information theoretic learning perspective.
    • [1]R.Chalasani and J.C.Principe. "Self-organizing maps with information theorelic learning, "Neurocomputing, vol. 147, pp. 3-14,2015.

Growing Neural Gas with CIM


  • We utilized CIM in GNG architecture as;
    • Find the 1st and 2nd similar reference vectors to input vector.
    • Local error definition.
    • Node insert criterion.
  • CIM is the kernel based generalized similarity measure.
    • To determine the appropriate kernel bandwidth is essential for controlling the performance of model.
  • We introduced two types of kernel bandwidth adaptation methods inspired by SOM-CIM.
    • Number of nodes based adaptation
    • Node distribution based adaptation

SOM-CIM


  • CIM is utilized to find the Best Matching Unit (BMU).
  • Adapting the SOM in the CIM sense…
    • It is equivalent to reducing the localized cross information potential.
    • Information theoretic function quantifies the similarity between two probability distributions.

Definition of Correntropy Induced Metric


  • Correntropy defines generalized similarity between two arbitrary random data X and Y, which is defined as follows;
  • It induces a reproducing kernel Hilbert space, thus it can be defined as the dot product of the two random variables in the feature space as follows;
  • In practical, correntropy can be described by the following due to the finite number of data L available;
  • Correntropy is able to induce a metric, which is called Correntropy Induced Metric (CIM). Let sample vectors and are given, CIM can be define as follows;

Fundamentals of GNG-CIM (1)


  • Let us suppose the input vector V=(v1,v2,...,vL) is given to the network at an instant l, the index of winner node is obtained by CIM as follows;
  • where, k denotes the number of nodes in network.Ws denotes the reference vector, and σ denotes a kernel bandwidth.

  • Here, it is assumed that s denotes the index of 1st similar node, t denotes the index of 2nd similar node, respectively, i.e., Ws and Wt denote 1st and 2nd similar reference vectors. In addition, topological neighbors reference vector is represented as We.
  • As mentioned earlier, the CIM is regarded as a local error between input and reference vectors for updating the weight, i.e.;
  • Err ← Err+[CIM(v(l),w)]2

Fundamentals of GNG-CIM


Fundamentals of GNG-CIM (2)


  • The update rule for reference vector ω is defined as follows;

  • :learning rale

  • The gradient Δω is defined as follows;
  • where, Gσ and hσes are as follows;

    here, σ in Gσ and hσes are both same kernel bandwidth.

  • The update network topology is same as original GNG manner.

Kernel Bandwidth Adaptation

─ Number of Nodes based Adaptation ─


  • The kernel adaptation algorithms are performed at step 6 in Algorithm 1. (After updated the reference vectors)
  • This is one of the simplest methods to determine the kernel bandwidth.
  • The kernel bandwidth is controlled by the proportion of current number of nodes in topological network in maximum number of nodes, i.e.,

Kernel Bandwidth Adaptation

─Node Distribution based Adaptation(1)─


  • The kernel adaptation algorithms are performed at step 6 in Algorithm 1. (After updated the reference vectors)
  • The node distribution based adaptation is based on Kullback-Leibler divergence based density estimation algorithm [2].
  • The Kullback-Leibler divergence DKL between the input vector vl(l∈1,2,...,L) and the reference vector wj(j∈1,2,...,Nmax) is defined as follows;
  • Here, it can be regarded that the minimizing DKL is equivalent to optimizing the kernel bandwidth σw, i.e.,

Simulation Experiment


  • Affect of kernel bandwidth in CIM
  • Self-organizing Capability
  • Peak Signal-to-Noise Ratio (PSNR)

Kernel Bandwidth Adaptation

─Node Distribution based Adaptation(2)─


  • The kernel bandwidth σw is calculated by gradient descent over the cost function J(σ) as follows;
  • The update rule is defined as follows;
  • where, Δσ(l) is calculated as follow;

    here, ρ denotes a scaling factor for the stability of topological network [3].

  • The update network topology is same as original GNG manner.

Simulation Experiment

─ Affect of kernel bandwidth in CIM ─


GNG-CIM with several kernel bandwidth condition

Simulation Experiment

─ Self-organizing Capability (1) ─


Simulation Experiment

─ Self-organizing Capability (3) ─


Original GNG

GNG-CIM (Num. of Nodes)

GNG-CIM (Node Distribution)

Simulation Experiment

─ Self-organizing Capability (2) ─


Simulation Experiment

─ Peak Signal-to-Noise Ratio (PSNR) ─


  • In order to provide a quantitative comparison for the unfolding capability, the PSNR [5] of topological network is presented.
  • The PSNR is defined as follows;
  • where

Summary & Future Work


Summary

  • We introduced CIM to GNG for similarity measure and node insert criterion.
  • Based on SOM-CIM, two types of kernel bandwidth adaptation algorithms are introduced.
  • Experimental results show that GNG-CIM is able to represent the data structure with the less number of nodes, with maintaining the superior outliers rejection ability.

Future Work

    We will consider that the optimization algorithm for kernel bandwidth adaptation and the number of nodes for maximizing the performance of GNG-CIM.