本文用于记录Logistic回归在神经网络中的应用,主要用到 SigmoidSigmoid 函数和二元交叉熵损失函数,做一个可以识别图片中是否是小猫的简单神经网络。
关于二元交叉熵损失函数,在我的博文交叉熵损失函数里有相关介绍。
本项目用到的数据集是从博主何宽开源的数据集处获得,你可以点击这里下载,提取码: 2u3w


用到的库

  • torch
  • numpy
  • h5py
  • matplotlib

神经网络实现

数据文件

训练数据由209张64×64的图片组成,测试数据由50张64×64的图片组成。图片分为两类:

  • 一类是猫,比如:
    猫
  • 另一类不是猫,比如:
    非猫

另外还有标签数据,是一个[img numbers, 1]的矩阵,1表示该图片是猫,0表示该图片不是猫。

用以下代码加载数据集:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
import torch
import numpy as np
import h5py


def load_dataset():
train_dataset = h5py.File('./datasets/train_catvnoncat.h5', "r")
train_set_x_orig = torch.tensor(train_dataset["train_set_x"][:])
train_set_y_orig = torch.tensor(train_dataset["train_set_y"][:])

test_dataset = h5py.File('./datasets/test_catvnoncat.h5', "r")
test_set_x_orig = torch.tensor(test_dataset["test_set_x"][:])
test_set_y_orig = torch.tensor(test_dataset["test_set_y"][:])

classes = np.array(test_dataset["list_classes"][:])

train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0]))
test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))

train_set_y_orig = train_set_y_orig.float()
test_set_y_orig = test_set_y_orig.float()

train_set_x_orig = train_set_x_orig.reshape((train_set_x_orig.shape[0], -1)).T
test_set_x_orig = test_set_x_orig.reshape((test_set_x_orig.shape[0], -1)).T
#print(f"the shape of x_train: {train_set_x_orig.shape}")

train_set_x_orig = train_set_x_orig.float()
test_set_x_orig = test_set_x_orig.float()
train_set_x_orig = train_set_x_orig / 255
test_set_x_orig = test_set_x_orig / 255
#print(f"the type of x_train: {train_set_x_orig.type()}")

return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes

#load_dataset()

以上代码的一些关键信息:

  • train_set_x_orig :训练集里面的图像数据(64, 63, 3, 209),是209张64*64的图片信息。
  • train_set_y_orig :训练集的图像的分类值,0或者1,0表示不是猫,1表示是猫。
  • test_set_x_orig :测试集里面的图像数据,是50张64*64的图片信息。
  • test_set_y_orig : 测试集的图像的分类值,也是0或1。
  • classes : 字符串数据:[b’non-cat’ b’cat’],表示不是猫/是猫。

主实现

激活函数使用 sigmoidsigmoid 函数

σ(z)=11+ez\sigma(z) = \frac{1}{1 + e^{-z}}

其中 z=wTX+bz = w^T X + b 是对图片信息矩阵做线性变换

sigmoidsigmoid 函数是相当简单易实现的映射,把图片信息给映射到 [0,1][0, 1] 中,表示该图片是猫的概率

激活函数的作用图可以直观地体现出它的作用,以下是用drawio搓出的图:

sigmoid

损失函数使用二元交叉熵函数

L(a,y)=1mi=1m[yilog(ai)+(1yi)log(1ai)]\mathcal{L}(a, y) = -\frac{1}{m} \sum_{i=1}^{m} \left[ y^i \log(a^i) + (1 - y^i) \log(1 - a^i) \right]

文章交叉熵损失函数里有对二元交叉熵损失函数的相关介绍。

以下是实现代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
import torch
import matplotlib.pyplot as plt
from lr_utils import load_dataset



def __init__(dim):
'''
initialization
dim - dimension
w - (dim, 1)
b - is initialized to zero tensor
'''
w = torch.zeros((dim,1), requires_grad = True)
b = torch.zeros(1, requires_grad = True)
assert w.shape == (dim, 1), "The dimension of w is incorrect, it should be (dim, 1)"
assert b.shape == (1,), "Parameter b is incorrect, it should be (1,)"

return (w, b)

def propagate_torch(w, b, X, Y):
'''
The cost functions and gradients of forward and backward propagation
parameter:
w - weight,(num_px * num_px * 3, 1)
b - deviation, a scalar
X - the matrix type is (num_px * num_px * 3, training quantity)
Y - the true label vector(1 or 0) with a matrix dimension of (1, number of training data)
return:
cost - Negative log likelihood cost of cost logistic regression
dw - loss gradient relative to w
db - loss gradient relative to b
'''
#print(f"the shape of w: {w.shape}, \nthe shape of X_train: {X.shape}")
# forward propagation
Z = w.T @ X + b
A = torch.sigmoid(Z)
m = X.shape[1]
cost = (-1 / m) * torch.sum(Y * torch.log(A) + (1 - Y) * torch.log(1 - A))

return cost

def optimize(w, b, X, Y, iter, lr, print_cost = False):
#print(f"the shape of w: {w.shape}, \nthe shape of X_train: {X.shape}")
costs = []
for i in range(iter):
w.grad = None
b.grad = None
cost = propagate_torch(w = w, b = b, X = X, Y = Y)
cost.backward()
with torch.no_grad():
w -= lr * w.grad
b -= lr * b.grad

if i % 100 == 0:
costs.append(cost)
if (print_cost) and (i % 100 == 0):
print(f"number of iterations: {i}, error value: {cost}%")

params = {
"w" : w,
"b" : b
}

return (params, costs)

def predict(w, b, X):
'''
Using logistic regression parameters logistic(w, b) to predict whether a label is 0 or 1

parameters:
w - weights, (num_px * num_px * 3, 1)
b - deviation, a scalar
X - (num_px * num_px * 3, training quantity)

return:
Y_prediction - a torch tensor containing all predictions [0 | 1] for all pictures in X
'''

m = X.shape[1] #quantity of pictures
#print(f"the shape of X: {X.shape}")
Y_prediction = torch.zeros((1, m))
w = w.reshape(X.shape[0], 1)

A = torch.sigmoid(w.T @ X + b)
for i in range(A.shape[1]):
Y_prediction[0, i] = 1 if A[0, i] > 0.5 else 0
#print(f"the shape of Y_prediction{Y_prediction.shape}")
assert Y_prediction.shape == (1, m)
return Y_prediction

def cat_model(X_train, Y_train, X_test, Y_test, iter = 2000, lr = 0.5, print_cost = False):
'''
Model for identifying cats

parameters:
X_train - (num_px * num_px * 3, m_train)
Y_train - (1, m_train), label
X_test - (num_px * num_px * 3, m_test)
Y_test - (1, m_test) label
iter - Hyperparameters of the number of iterations used to optimize for parameters
lr - learning rate
print_cost - if true, print the cost every 100 iterations

return:
d - dictionary containing information about the model
'''

w, b = __init__(X_train.shape[0])
#print(f"the shape of w: {w.shape}")

params, costs = optimize(w = w, b = b, X = X_train, Y = Y_train, iter = iter, lr = lr, print_cost = print_cost)

w, b = params["w"], params["b"]

Y_prediction_test = predict(w, b, X_test)
Y_prediction_train = predict(w, b, X_train)

print(f"Training set accuracy: {100 - torch.mean(torch.abs(Y_prediction_train - Y_train)) * 100}%")
print(f"Test set accuracy: {100 - torch.mean(torch.abs(Y_prediction_test - Y_test)) * 100}%")

d = {
"costs" : costs,
"Y_prediction_test" : Y_prediction_test,
"Y_prediction_train" : Y_prediction_train,
"w" : w,
"b" : b,
"lr" : lr,
"iter" : iter
}
return d

if __name__ == "__main__":
train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_dataset()
'''
train_set_x_orig - (m_train, num_px, num_px, 3)
m_train - number of pictures in the training set
m_test - number of pictures in the testing set
num_px - the width and height of the pictures in the training and testing sets (both 64 * 64)
'''
#print(f"the shape of X_train: {train_set_x_orig.shape}")

#testing picture, negligible
#index = 127
#plt.imshow(train_set_x_orig[index])
#print("y=" + str(train_set_y[:,index]) + ", it's a " + classes[np.squeeze(train_set_y[:,index])].decode("utf-8") + "' picture")
#plt.show()
#train_set_x_flatten = train_set_x_orig.reshape(train_set_x_orig.shape[0], -1)
#train_set_x = train_set_x_flatten / 255 #normalization

d = cat_model(
X_train = train_set_x_orig,
Y_train = train_set_y,
X_test = test_set_x_orig,
Y_test = test_set_y,
print_cost = True,
lr = 0.01
)

costs = [c.detach().item() for c in d["costs"]]
plt.plot(costs)
plt.xlabel("cost")
plt.ylabel("iterations (every hundreds)")
plt.title(f"Learning rate = {d["lr"]}")
plt.show()

结果

输出结果为:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
number of iterations: 0,   error value: 0.6931471824645996%
number of iterations: 100, error value: 0.7640451192855835%
number of iterations: 200, error value: 1.003746747970581%
number of iterations: 300, error value: 0.627930223941803%
number of iterations: 400, error value: 0.5223615765571594%
number of iterations: 500, error value: 0.3879987597465515%
number of iterations: 600, error value: 0.23637865483760834%
number of iterations: 700, error value: 0.15424039959907532%
number of iterations: 800, error value: 0.13533082604408264%
number of iterations: 900, error value: 0.12497343868017197%
number of iterations: 1000, error value: 0.11647997051477432%
number of iterations: 1100, error value: 0.10919397324323654%
number of iterations: 1200, error value: 0.10280577838420868%
number of iterations: 1300, error value: 0.09713104367256165%
number of iterations: 1400, error value: 0.0920444130897522%
number of iterations: 1500, error value: 0.08745360374450684%
number of iterations: 1600, error value: 0.08328702300786972%
number of iterations: 1700, error value: 0.07948748022317886%
number of iterations: 1800, error value: 0.07600819319486618%
number of iterations: 1900, error value: 0.07281030714511871%
Training set accuracy: 99.52153015136719%
Test set accuracy: 70.0%

可以看到训练集的准确率为 99.599.5%,相当之高,测试集的准确率为 70.070.0,有些过拟合。但作为一个使用非常简单的激活和损失函数的模型,能达到这个效果也不错了。

附上一张损失函数-迭代次数的图:
pict

尽管cost在iteration为250次附近有所波折,但最后还是成功收敛了,处于一个很低的损失值

反思

手搓代码时遇到了许多问题

  • 提取数据集时,训练、测试集的张量的维度没有处理正确,所需要的是shape为 [12288,209][12288, 209][12288,50][12288, 50] 的张量,但是我初次传的是 [209,64,64,3][209, 64, 64, 3][50,64,64,3][50, 64, 64, 3] 的张量, 导致在 SigmoidSigmoid 计算时出现了 wTw^TXX 维度不匹配的问题。在print矩阵的shape之后发现了问题所在
  • 未正确使用torch的自动求导,在写optimize函数时,将 wwbb 的自动求导写在了梯度下降函数propatage_torch里,因此在optimize函数里 wwbb 的更新 w=wlr×dww = w - lr \times dwb=blr×dbb = b - lr \times db 就导致了 wwbb 的求导信息丢失的情况 (其实本来也该写在propatage_torch里)。后将其求导转移到optimize就解决了问题
  • 还有诸如语法错误等低级问题

不过这些问题会随着熟练度的提升而减少犯错频率,多学多练多思考。