MutClust 슈도코드 작성하기 #

#2025-07-28

1 #

Input:
    - D: 데이터 포인트 집합
    - Efactor: 이웃 거리 조정값
    - DiminFactor: 클러스터 경계 조정값
    - minPts: 최소 이웃 수 (밀도 기준)

Output:
    - cluster_labels: 각 데이터 포인트에 대한 클러스터 라벨 (노이즈는 -1)

Initialize:
    - cluster_id ← 0
    - Label[x] ← UNVISITED for all x in D

데이터 집합 D, 파라미터 eps와 minPts가 들어간다.

2. H-중요도 계산 #

For each point x in D:
    x.H ← calculateHscore(x)

각 데이터포인트에 대해 H-score를 계산한다.

3. 임시 Eps 계산 및 후보 Core 돌연변이 탐색 #

For each point x in D:
    x.eps ← calculateEps(x)               // x의 임시 eps 계산 (H 점수 기반)           
//  Hsum, Hmean, MtCount, Neighbors ← regionQuery(x) 

//  If x.Hsum ≥ 0.05 and x.Hmean ≥ 0.01 and x.H ≥ 0.03 and x.MtCount ≥ 5:
//      Label[x] ← CCM  // 조건 만족 시 CCM(Core Cluster Member)로 라벨링

Function calculateEps(x, Efactor):
  //EPSILON = 5
    x.Escaler = ceil(x.H * Efactor)
    eps = x.H * EPSILON
    return eps

현재위치 x의 H score 기반으로 임시 eps를 계산한다.

For each point x in D:
//  x.eps ← calculateEps(x)                
    x ← regionQuery(x)   // eps 내 이웃 점수 분석

    If x.Hsum ≥ 0.05 and x.Hmean ≥ 0.01 and x.H ≥ 0.03 and x.MtCount ≥ 5:
        Label[x] ← CCM  // 조건 만족 시 CCM(Core Cluster Member)로 라벨링

그리고 regionQuery로 eps 내 돌연변이들의 중요도를 확인한다. 조건을 만족하면, CCM(Candidate Core Mutation)으로 처리한다.

Function regionQuery(x):
    neighbors ← ∅

    For each point y in H:
        If distance(x, y) ≤ x.eps:
            neighbors ← neighbors ∪ {y}

    x.Hsum ← sum(score of y in neighbors)
    x.Hmean ← sum_scores / |neighbors|
    x.MtCount ← len(neighbors)
    x.Neighbors ← neighbors

    return x

4. 클러스터 확장 (CCM일 경우) #

For each point x in D:
//  x.eps ← calculateEps(x)     
//  x ← regionQuery(x) 

    If x.Hsum ≥ 0.05 and x.Hmean ≥ 0.01 and x.H ≥ 0.03 and x.MtCount ≥ 5:
        Label[x] ← CCM  // 조건 만족 시 CCM(Core Cluster Member)로 라벨링

        ClusterID ← ClusterID + 1
        expandCluster(x, ClusterID, Label, DiminFactor)

CCM일 경우 expandCluster를 사용해서 클러스터 확장을 수행한다.

Function expandCluster(x, ClusterID, Label, DiminFactor):
    Label[x] ← ClusterID
    finalneighbors ← ∅   // 최종적으로 클러스터에 포함할 점들
    es ← x.Escaler

    Sort x.Neighbors by ascending order of |y.Idx - x.Idx|

    For y in x.Neighbors:
        delta ← es - y.Escaler
        es ← (es - delta) / DiminFactor
        new_eps ← es * EPSILON
        If new_eps ≤ 0:
            break
        max_dist ← new_eps
        Label[y] ← ClusterID

    return Label

노이즈였지만 현재는 x의 이웃포인트가 된 y는 border point로 재분류해준다.