# 不能不用也不可乱用的标准化和归一化处理

VarianceThreshold:[90370.21684180899, 55277.04960170764, 51395.858083599174]
PCA:[176251.93379431,74196.48270488,55716.27982124]

$$健康状况=3\times身高+2\times体重$$

1.7 120 245.1
1.6 200 404.8
2.0 140 286

0.25 0 0.5
0 1 2
1 0.25 3.5

### 归一化

$$X_{norm}=\frac{X-X_{min}}{X_{max}-X_{min}}$$

def normalization(data):
M_m = np.max(data)-np.min(data)
return (data-np.min(data)) / M_m

### 标准化

$$X_{std} = \frac{X-\mu}{\sigma}$$

def standardization(data):
mu = np.mean(data, axis=0)
sigma = np.std(data, axis=0)
return (data - mu) / sigma

from sklearn.preprocessing import MinMaxScaler
import numpy as np
X = [[83,2,10],
[60,3,15],
[75,4,13]]
X = np.array(X)
Mm = MinMaxScaler()
data = Mm.fit_transform(X)
print(data)

[[1.         0.         0.        ]
[0.         0.5        1.        ]
[0.65217391 1.         0.6       ]]

from sklearn.preprocessing import StandardScaler
import numpy as np

X = [[83,2,10],
[60,3,15],
[75,4,13]]
X = np.array(X)
Mm = StandardScaler()
data = Mm.fit_transform(X)
print(data)

[[ 1.08388958 -1.22474487 -1.29777137]
[-1.32863884  0.          1.13554995]
[ 0.24474926  1.22474487  0.16222142]]

• 如果数据集小而稳定，可以选择归一化
• 如果数据集中含有噪声和异常值，可以选择标准化，标准化更加适合嘈杂的大数据集。