Need help to understand LAMDA clustering algorithm, taken from some article.
Here is the algorithm from the article
LAMDA is a conceptual clustering and classification methodology that computes the degree of adequation of an object to a class with all the partial or marginal information available [17]. The difference between this algorithm and the classical clustering and classification approaches is that LAMDA models the total indistinguishability or homogeneity inside the context or universe from which the information is extracted. This is done by means of a special class, the so called non-informative class (NIC), which accepts all objects under the same status. Therefore, the adequation degree of the objects of NIC acts as a minimum threshold to assign an element to a significant class. Hence, the minimum threshold is not fixed arbitrarily but is auto- matically determined by the proper context. Algorithm 1 summarizes the main steps of LAMDA clustering. Given a set of objects X = { $\vec{x_1}$, $\vec{x_2}$,..., $\vec{x_n}$}, where an object is represented by an m-dimensional vector $\vec{x_k}$ = {$\vec{x_1^k}$, $\vec{x_2^k}$, ..., $\vec{x_m^k}$}, the algorithm starts by creating an initial class consisting of one of the objects selected at random.
For each remaining object $\vec{x_k}$ and for each existing class $C_j$, LAMDA computes for every descriptor the so-called marginal adequacy degree $MAD_{ij}$ ($x^k_i$ ) between the values that the ith descriptor takes over $\vec{x_k}$ and the class $C_j$ . Thus, a vector $MAD_j$($\vec{x_k}$) can be associated with object $\vec{x_k}$ for each class $C_j$ . $MAD_j$($\vec{x_k}$) is a membership function derived from a fuzzy generalization of a binomial probability law, as expressed in the algorithm. In the expression, ν ($x^k_i$ , $c_ij$ ) is a distance function between the descriptor $x^k_i$ and the attribute $c_{ij}$ of the center of the class $C_j$ ; $ρ_{ij}$ is the possibility of the descriptor $x^k_i$ to belong to class $C_j$ .
Algorithm: Data clustering with LAMDA
Input: A set of data objects X = { $\vec{x_1}$, $\vec{x_2}$,..., $\vec{x_n}$}, where $\vec{x_k}$ = {$\vec{x_1^k}$, $\vec{x_2^k}$, ..., $\vec{x_m^k}$}
Output: Γ = {$C_j$} set of classes where each class $C_j$ is represented by the parameters $c_{ij}$ and $ρ_{ij}$
Initialization: ν($x_i^k$) = 1 - $||$$x_i^k$ - $c_{ij}$$||$ and $ρ_{init}$ = 0.5, α ∈ ]0,1[, $T_{norm}$, $T_{conorm}$ and $C$ = 1
begin
for $k$ ← 1 to $n$ do
for $j$ ← 1 to $C$ do
for $i$ ← 1 to $m$ do
$MAD_{ij}$($x_i^k$) = $ρ_{ij}^{ν(x_i^k,c_{ij})}*(1 - ρ_{ij})^{1 - ν(x_i^k, c_{ij})}$
end
$MAD_j$($\vec{x_k}$) = {$MAD_{ij}$($\vec{x_i^k}$) | 1 ≤ i ≤ m}
$GAD_j$($\vec{x_k}$) = $L_α$($MAD_j$($\vec{x_k}$)) = α × $T_{norm}$($MAD_j$($\vec{x_k}$))+(1 − α) × $T_{conorm}$($MAD_j$($\vec{x_k}$))
end
j ← arg $max_{1≤l≤C}$($GAD_l$ ($\vec{x_k}$))
if ($GAD_j$($\vec{x_k}$) > 0.5) then
//1 − Affect object $\vec{x_k}$ to class j
$\vec{x_k}$ → $C_j$
//2 − Update parameters ρ and c for class $C_j$
for $i$ ← 1 to $m$ do
$\sum_{i=0}^m$(δ/δ$c_{ij}$ )ν($\vec{x_k}$,$\hat{c}_{ij}$ )=0
$\hat{ρ}_{ij}$ =1/n $\sum_{i=0}^n$ν($x^k_i$,$\hat{c}_{ij}$)
end
else
//Create a new class
Γ ← Γ ∪ {$C_j$}
C ← C + 1
//Initialize the new class parameters
$ρ_{ij}$ = $ρ_{init}$
$c_{ij}$=$x^k_i$
end
end
return Γ
end
When computing $GAD_j$ ($\vec{x_k}$ ), if the value is smaller or equal to 0.5, the object is considered as part of NIC and automatically assigned as the first element of the new class as a result. Otherwise, after comput- ing the GAD values corresponding to all the classes, the object will be assigned to the class with the greatest GAD value. Using sample data, the $ρ_{ij}$ and $c_{ij}$ values for each class are estimated by minimizing a maximum likelihood criterion as expressed in the algorithm.
Question 1:
What is ${c}_{ij}$ in this article? Is it center of the cluster?
Question 2:
What is $x_i^k$? Is it i-th component of k-th vector (scalar value)? If so, what does norm operation mean in the following statement:
ν($x_i^k$) = 1 - $||$$x_i^k$ - $c_{ij}$$||$
Question 3:
There is a statement in article:
Using sample data, the $ρ_{ij}$ and $c_{ij}$ values for each class are estimated by minimizing a maximum likelihood criterion as expressed in the algorithm.
It points out on the following statements:
for $i$ ← 1 to $m$ do
$\sum_{i=0}^m$(δ/δ$c_{ij}$ )ν($\vec{x_k}$,$\hat{c}_{ij}$ )=0
$\hat{ρ}_{ij}$ =1/n $\sum_{i=0}^n$ν($x^k_i$,$\hat{c}_{ij}$)
3.1 What does hat symbol in these statements mean?
3.2 There is cycle by i and sum by i (in article). How could it be? Is it a typo?
3.3 How would I calculate ${c}_{ij}$ and ${ρ}_{ij}$? Need some help with turning these statements into pseudocode.