如何实现一个神经网络

2017 年 1 月 6 日

定义目标函数

在接下来的例子中，目标t是通过f函数和一个期望值为0方差为0.2的高斯噪声采样的叠加生成。f函数为f(x)=x∗2， x是输入样本，斜率为2，截距为0。 t ＝ f(x)+N(0,0.2).

我们取20个在0和1之间均匀分布的数值作为样本x，然后通过上面的处理得到目标输出值t。下图显示了输入x和目标值t的对应关系以及去除高斯噪声的原始函数f(x)。其中，x是单独的输入样本xi的向量，ti是对应的目标值的向量。

Python

# Define the vector of input samples as x, with 20 values sampled from a uniform distribution

# between 0 and 1

x = numpy.random.uniform(0, 1, 20)

# Generate the target values t from x with small gaussian noise so the estimation won’t

# be perfect.

# Define a function f that represents the line that generates t without noise

def f(x): return x * 2

# Create the targets t with some gaussian noise

noise_variance = 0.2 # Variance of the gaussian noise

# Gaussian noise error for each sample in x

noise = numpy.random.randn(x.shape[0]) * noise_variance

# Create targets t

t = f(x) + noise

Python

# Plot the target t versus the input x

plt.plot(x, t, ‘o’, label=‘t’)

# Plot the initial line

plt.plot([0, 1], [f(0), f(1)], ‘b-‘, label=‘f(x)’)

plt.xlabel(‘$x$’, fontsize=15)

plt.ylabel(‘$t$’, fontsize=15)

plt.ylim([0,2])

plt.title(‘inputs (x) vs targets (t)’)

plt.grid()

plt.legend(loc=2)

plt.show()

定义成本函数

我们通过微调ww来优化 y=x∗w的模型，以达到针对所有采样的平方误差最小。平方误差被定义为ξ=∑Ni=1∥ti−yi∥2

其中N表示采样的数目，我们的优化的目标是：

argminw∑Ni=1∥ti−yi∥2

你已经看到我们获取了所有采样的误差的总和，这种方法的名称是批量学习。我们也可以一次计算一个采样，在每次的计算中更新参数w，这种方法便是在线学习。

下图绘制了含有参数w的成本函数。当w＝2时，成本函数达到最小值（抛物线的底部），w＝2也同样是我们选择的函数f(x)的斜率值。我们可以看到这个函数是凸函数，只有一个最小值：全局最小值。虽然线性回归的平方差函数是凸函数，但并不表示其他模型和成本函数也是如此。

这个神经网络模型通过nn(x,w)函数来表示，它的成本函数通过cost(y,t)来表示。

Python

# Define the neural network function y = x * w

def nn(x, w): return x * w

# Define the cost function

def cost(y, t): return ((t – y)**2).sum()

Python

# Plot the cost vs the given weight w

# Define a vector of weights for which we want to plot the cost

ws = numpy.linspace(0, 4, num=100) # weight values

cost_ws = numpy.vectorize(lambda w: cost(nn(x, w) , t))(ws) # cost for each weight in ws

# Plot

plt.plot(ws, cost_ws, ‘r-‘)

plt.xlabel(‘$w$’, fontsize=15)

plt.ylabel(‘$xi$’, fontsize=15)

plt.title(‘cost vs. weight’)

plt.grid()

plt.show()

优化成本函数

对于类似这个例子的简单成本函数，我们可以目测出最佳权重的数值。但是有些误差平面可能非常复杂也可能有一个很高的维度（每一个参数都会增加一个维度）。这就是为什么我们需要使用优化技术来找到最小的误差函数。

梯度下降

w(k+1)=w(k)−Δw(k+1)

Δw为Δw=μ∂ξ∂w , u表示学习速率，代表着执行梯度下降操作的步长， ∂ξ/∂w 表示成本函数相对权重w的梯度。根据链式法则，对于样本i，它的梯度可以拆分成∂ξi∂w=∂yi/∂w * ∂ξi/∂yi, 其中ξi表示平方误差，所以表示为∂ξi∂yi=∂(ti−yi)2∂yi=−2(ti−yi)=2(yi−ti) 又由于yi=xi∗w 所以∂yi/∂w 变成 ∂yi∂w=∂(xi∗w)∂w=xi。所以针对样本i的更新函数Δw的计算公式就变成 Δw=μ∗∂ξi∂w=μ∗2xi(yi−ti)。在批处理中，我们可以为所有样本的梯度进行求和：Δw=μ∗2∗∑i=1Nxi(yi−ti)

要开始我们的梯度下降算法，我们通常会随机选取我们的初始参数，然后开始通过Δw更新参数直到收敛。针对每一个神经网络，学习速率要作为一个超参数需要被单独调整。

我们用gradient(w, x, t)来表示梯度表达式∂ξ/∂w，用delta_w(w_k, x, t, learning_rate) 来计算Δw。下面的循环使用四次梯度下降的迭代，我们打印出来参数的值和当前的cost（成本）。

Python

# define the gradient function. Remember that y = nn(x, w) = x * w

def gradient(w, x, t):

return 2 * x * (nn(x, w) – t)

# define the update function delta w

def delta_w(w_k, x, t, learning_rate):

return learning_rate * gradient(w_k, x, t).sum()

# Set the initial weight parameter

w = 0.1

# Set the learning rate

learning_rate = 0.1

# Start performing the gradient descent updates, and print the weights and cost:

nb_of_iterations = 4 # number of gradient descent updates

w_cost = [(w, cost(nn(x, w), t))] # List to store the weight,costs values

for i in range(nb_of_iterations):

dw = delta_w(w, x, t, learning_rate) # Get the delta w update

w = w – dw # Update the current weight parameter

w_cost.append((w, cost(nn(x, w), t))) # Add weight,cost to list

# Print the final w, and cost

for i in range(0, len(w_cost)):

print(‘w({}): {:.4f} t cost: {:.4f}’.format(i, w_cost[i][0], w_cost[i][1]))

Python

w(0): 0.1000 cost: 13.6197

w(1): 1.5277 cost: 1.1239

w(2): 1.8505 cost: 0.4853

w(3): 1.9234 cost: 0.4527

w(4): 1.9399 cost: 0.4510

在之前的结果中，我们看到梯度下降算法迅速收敛到逼近2.0的目标值，接下来我们将这些梯度下降的迭代通过图示展现出来。

Python

# Plot the first 2 gradient descent updates

plt.plot(ws, cost_ws, ‘r-‘) # Plot the error curve

# Plot the updates

for i in range(1, len(w_cost)–2):

w1, c1 = w_cost[i–1]

w2, c2 = w_cost[i]

plt.plot(w1, c1, ‘bo’)

plt.plot([w1, w2],[c1, c2], ‘b-‘)

plt.text(w1, c1+0.5, ‘$w({})$’.format(i))

# Plot the last weight, axis, and show figure

w1, c1 = w_cost[len(w_cost)–3]

plt.plot(w1, c1, ‘bo’)

plt.text(w1, c1+0.5, ‘$w({})$’.format(nb_of_iterations))

plt.xlabel(‘$w$’, fontsize=15)

plt.ylabel(‘$xi$’, fontsize=15)

plt.title(‘Gradient descent updates plotted on cost function’)

plt.grid()

plt.show()

梯度下降更新

上面的最后一张图展示了梯度下降两次更新权重参数，蓝线代表着在k次迭代时权重参数w(k)的值。我们可以看到在权重位置的更新以及梯度的更新。第一个更新比第二次更新跨度大是因为w(0)处的梯度比w(1)处的梯度大。

通过10次迭代后得到的梯度回归的回归线展示如下图。红色的拟合线很接近蓝色的原始线，这是我们努力在噪声抽样中估算出的值。我们注意到这两条线都经过点(0,0)，这是因为我们并没有代表截距的偏置项，在x轴上的截距为0，所以t＝0。

Python

w = 0

# Start performing the gradient descent updates

nb_of_iterations = 10 # number of gradient descent updates

for i in range(nb_of_iterations):

dw = delta_w(w, x, t, learning_rate) # get the delta w update

w = w – dw # update the current weight parameter

Python

# Plot the fitted line agains the target line

# Plot the target t versus the input x

plt.plot(x, t, ‘o’, label=‘t’)

# Plot the initial line

plt.plot([0, 1], [f(0), f(1)], ‘b-‘, label=‘f(x)’)

# plot the fitted line

plt.plot([0, 1], [0*w, 1*w], ‘r-‘, label=‘fitted line’)

plt.xlabel(‘input x’)

plt.ylabel(‘target t’)

plt.ylim([0,2])

plt.title(‘input vs. target’)

plt.grid()

plt.legend(loc=2)

plt.show()

转载自演道,想查看更及时的互联网产品技术热点文章请点击http://go2live.cn

About The Author

bjmayor

程序员，码农，php,python,ios,android,go，产品经理，创业。

M	T	W	T	F	S	S
« Jan
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

定义目标函数

定义成本函数

优化成本函数

梯度下降

Related Posts

About The Author

bjmayor