第二章

1.MED分类器

  • 均值是对类中所有训练样本代表误差最小的一种表达方式

  • MED分类器采用欧式距离作为距离度量,没有考虑特征变化的不同及特征之间的相关性

    1.对角线元素不相等:每维特征的变化不同

    2.非对角元素不为0:特征之间存在相关性

  • 代码

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    def MED(dataset, is_2d=None):
    # 实现MED分类器
    train = dataset['train']
    test = dataset['test']
    target = dataset['target']

    train_pos_mean_2d = None
    train_neg_mean_2d = None
    # 画2d图
    if is_2d == True:
    scaler = pre.StandardScaler()
    pca = PCA(n_components=2)

    train_2d = copy.deepcopy(train)
    label = train_2d[:, -1]

    train_2d = scaler.fit_transform(train_2d[:, :4])
    train_2d = pca.fit_transform(train_2d)
    train_2d = np.c_[train_2d, label]

    train_pos = np.array(train_2d[train[:, -1] == target[0]])
    train_neg = np.array(train_2d[train[:, -1] == target[1]])

    train_pos_mean_2d = np.mean(train_pos[:, :2], axis=0)
    train_neg_mean_2d = np.mean(train_neg[:, :2], axis=0)
    # target[0]为正样本,target[1]为负样本
    train_pos = np.array(train[train[:, -1] == target[0]])
    train_neg = np.array(train[train[:, -1] == target[1]])

    print('positive num:', train_pos.shape)
    print('negative num:', train_neg.shape)
    # 分别取平均
    train_pos_mean = np.mean(train_pos[:, :4], axis=0)
    train_neg_mean = np.mean(train_neg[:, :4], axis=0)

    x_test = test[:, :4]
    y_test = test[:, -1].flatten()

    y_pre = np.array([])

    for i in range(x_test.shape[0]):
    # 欧氏距离
    dis_pos = (x_test[i] - train_pos_mean).T @ (x_test[i] - train_pos_mean)
    dis_neg = (x_test[i] - train_neg_mean).T @ (x_test[i] - train_neg_mean)
    # 预测
    if dis_pos < dis_neg:
    y_pre = np.append(y_pre, target[0])
    else:
    y_pre = np.append(y_pre, target[1])
    # 准确率
    acc = (y_pre == y_test.astype(int)).sum() / y_test.shape[0]
    print(acc)

    if is_2d == True:
    return train_pos_mean_2d, train_neg_mean_2d
    else:
    return y_test, y_pre

2.特征白化

  • 目的

    image-20210430202044962

  • 步骤

    1.解耦,去除特征之间相关性,即使对角元素为0,矩阵对角化

    2.白化,对特征进行尺度变化,使所有特征具有相同的方差

  • 原始特征投影到协方差矩阵对应的特征向量上,其中每一个特征向量构成一个坐标轴

  • 转换前后欧氏距离保持一致,W1只起到旋转作用

  • 代码

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    def whitening(dataset):
    # 进行特征白化
    data = dataset.data
    print('data shape:', data.shape)
    # 计算协方差矩阵
    cov = np.cov(data, rowvar=False)
    print('cov shape:', cov.shape)
    # 特征值,特征向量
    w, v = np.linalg.eig(cov)
    print('w shape:', w.shape)
    print('v shape:', v.shape)
    # 单位化特征向量
    sum_list = []
    for i in range(v.shape[0]):
    sum = 0.
    for j in range(v.shape[1]):
    sum += v[i][j] * v[i][j]
    sum_list.append(math.sqrt(sum))

    for i in range(v.shape[0]):
    for j in range(v.shape[1]):
    v[i][j] /= sum_list[i]
    # 计算W
    W1 = v.T
    print('W1 shape:', W1.shape)
    W2 = np.diag(w ** (-0.5))
    print('W2 shape:', W2.shape)
    W = W2 @ W1
    print('W shape:', W.shape)
    data = data @ W
    print('data shape', data.shape)

    return data

3.MICD分类器

  • image-20210430204137366

  • 代码有空再补