Logistic回归--识别小猫

本文用于记录Logistic回归在神经网络中的应用，主要用到 $Sigmoid$ 函数和二元交叉熵损失函数，做一个可以识别图片中是否是小猫的简单神经网络。
关于二元交叉熵损失函数，在我的博文交叉熵损失函数里有相关介绍。
本项目用到的数据集是从博主何宽开源的数据集处获得，你可以点击这里下载，提取码: 2u3w

用到的库

torch
numpy
h5py
matplotlib

神经网络实现

数据文件

训练数据由209张64×64的图片组成，测试数据由50张64×64的图片组成。图片分为两类：

一类是猫，比如：
另一类不是猫，比如:

另外还有标签数据，是一个[img numbers, 1]的矩阵，1表示该图片是猫，0表示该图片不是猫。

用以下代码加载数据集：

import torch
import numpy as np
import h5py


def load_dataset():
    train_dataset = h5py.File('./datasets/train_catvnoncat.h5', "r")
    train_set_x_orig = torch.tensor(train_dataset["train_set_x"][:])
    train_set_y_orig = torch.tensor(train_dataset["train_set_y"][:])

    test_dataset = h5py.File('./datasets/test_catvnoncat.h5', "r")
    test_set_x_orig = torch.tensor(test_dataset["test_set_x"][:])
    test_set_y_orig = torch.tensor(test_dataset["test_set_y"][:])

    classes = np.array(test_dataset["list_classes"][:])

    train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0]))
    test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))

    train_set_y_orig = train_set_y_orig.float()
    test_set_y_orig = test_set_y_orig.float()

    train_set_x_orig = train_set_x_orig.reshape((train_set_x_orig.shape[0], -1)).T
    test_set_x_orig = test_set_x_orig.reshape((test_set_x_orig.shape[0], -1)).T
    #print(f"the shape of x_train: {train_set_x_orig.shape}")

    train_set_x_orig = train_set_x_orig.float()
    test_set_x_orig = test_set_x_orig.float()
    train_set_x_orig = train_set_x_orig / 255
    test_set_x_orig = test_set_x_orig / 255
    #print(f"the type of x_train: {train_set_x_orig.type()}")

    return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes

#load_dataset()

以上代码的一些关键信息:

train_set_x_orig ：训练集里面的图像数据(64, 63, 3, 209)，是209张64*64的图片信息。
train_set_y_orig ：训练集的图像的分类值，0或者1，0表示不是猫，1表示是猫。
test_set_x_orig ：测试集里面的图像数据，是50张64*64的图片信息。
test_set_y_orig ：测试集的图像的分类值，也是0或1。
classes ：字符串数据：[b’non-cat’ b’cat’]，表示不是猫/是猫。

主实现

激活函数使用 $sigmoid$ 函数

$\sigma(z) = \frac{1}{1 + e^{-z}}$

其中 $z = w^T X + b$ 是对图片信息矩阵做线性变换

$sigmoid$ 函数是相当简单易实现的映射，把图片信息给映射到 $[0, 1]$ 中，表示该图片是猫的概率

激活函数的作用图可以直观地体现出它的作用，以下是用drawio搓出的图：

sigmoid

损失函数使用二元交叉熵函数

$\mathcal{L}(a, y) = -\frac{1}{m} \sum_{i=1}^{m} \left[ y^i \log(a^i) + (1 - y^i) \log(1 - a^i) \right]$

文章交叉熵损失函数里有对二元交叉熵损失函数的相关介绍。

以下是实现代码

import torch
import matplotlib.pyplot as plt
from lr_utils import load_dataset



def __init__(dim):
    '''
    initialization
        dim - dimension
        w - (dim, 1)
        b - is initialized to zero tensor
    '''
    w = torch.zeros((dim,1), requires_grad = True)
    b = torch.zeros(1, requires_grad = True)
    assert w.shape == (dim, 1), "The dimension of w is incorrect, it should be (dim, 1)"
    assert b.shape == (1,), "Parameter b is incorrect, it should be (1,)"

    return (w, b)

def propagate_torch(w, b, X, Y):
    '''
    The cost functions and gradients of forward and backward propagation
    parameter:
        w - weight,(num_px * num_px * 3, 1)
        b - deviation, a scalar
        X - the matrix type is (num_px * num_px * 3, training quantity)
        Y - the true label vector(1 or 0) with a matrix dimension of (1, number of training data)
    return:
        cost - Negative log likelihood cost of cost logistic regression
        dw - loss gradient relative to w
        db - loss gradient relative to b
    '''
    #print(f"the shape of w: {w.shape}, \nthe shape of X_train: {X.shape}")
    # forward propagation
    Z = w.T @ X + b
    A = torch.sigmoid(Z)
    m = X.shape[1]
    cost = (-1 / m) * torch.sum(Y * torch.log(A) + (1 - Y) * torch.log(1 - A))
    
    return cost

def optimize(w, b, X, Y, iter, lr, print_cost = False):
    #print(f"the shape of w: {w.shape}, \nthe shape of X_train: {X.shape}")
    costs = []
    for i in range(iter):
        w.grad = None
        b.grad = None
        cost = propagate_torch(w = w, b = b, X = X, Y = Y)
        cost.backward()
        with torch.no_grad():
            w -= lr * w.grad
            b -= lr * b.grad
        
        if i % 100 == 0:
            costs.append(cost)
        if (print_cost) and (i % 100 == 0):
            print(f"number of iterations: {i},   error value: {cost}%")
    
    params = {
        "w" : w,
        "b" : b
    }

    return (params, costs)

def predict(w, b, X):
    '''
    Using logistic regression parameters logistic(w, b) to predict whether a label is 0 or 1
    
    parameters:
        w - weights, (num_px * num_px * 3, 1)
        b - deviation, a scalar
        X - (num_px * num_px * 3, training quantity)

    return:
        Y_prediction - a torch tensor containing all predictions [0 | 1] for all pictures in X
    '''

    m = X.shape[1] #quantity of pictures
    #print(f"the shape of X: {X.shape}")
    Y_prediction = torch.zeros((1, m))
    w = w.reshape(X.shape[0], 1)

    A = torch.sigmoid(w.T @ X + b)
    for i in range(A.shape[1]):
        Y_prediction[0, i] = 1 if A[0, i] > 0.5 else 0
    #print(f"the shape of Y_prediction{Y_prediction.shape}")
    assert Y_prediction.shape == (1, m)
    return Y_prediction

def cat_model(X_train, Y_train, X_test, Y_test, iter = 2000, lr = 0.5, print_cost = False):
    '''
    Model for identifying cats

    parameters:
        X_train - (num_px * num_px * 3, m_train)
        Y_train - (1, m_train), label
        X_test - (num_px * num_px * 3, m_test)
        Y_test - (1, m_test) label
        iter - Hyperparameters of the number of iterations used to optimize for parameters
        lr - learning rate
        print_cost - if true, print the cost every 100 iterations

    return:
        d - dictionary containing information about the model
    '''

    w, b = __init__(X_train.shape[0])
    #print(f"the shape of w: {w.shape}")

    params, costs = optimize(w = w, b = b, X = X_train, Y = Y_train, iter = iter, lr = lr, print_cost = print_cost)

    w, b = params["w"], params["b"]

    Y_prediction_test = predict(w, b, X_test)
    Y_prediction_train = predict(w, b, X_train)

    print(f"Training set accuracy: {100 - torch.mean(torch.abs(Y_prediction_train - Y_train)) * 100}%")
    print(f"Test set accuracy: {100 - torch.mean(torch.abs(Y_prediction_test - Y_test)) * 100}%")

    d = {
        "costs" : costs,
        "Y_prediction_test" : Y_prediction_test,
        "Y_prediction_train" : Y_prediction_train,
        "w" : w,
        "b" : b,
        "lr" : lr,
        "iter" : iter
    }
    return d

if __name__ == "__main__":
    train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_dataset()
    '''
        train_set_x_orig - (m_train, num_px, num_px, 3)
        m_train - number of pictures in the training set
        m_test - number of pictures in the testing set
        num_px - the width and height of the pictures in the training and testing sets (both 64 * 64)
    '''
    #print(f"the shape of X_train: {train_set_x_orig.shape}")

    #testing picture, negligible
    #index = 127
    #plt.imshow(train_set_x_orig[index])
    #print("y=" + str(train_set_y[:,index]) + ", it's a " + classes[np.squeeze(train_set_y[:,index])].decode("utf-8") + "' picture")
    #plt.show()
    #train_set_x_flatten = train_set_x_orig.reshape(train_set_x_orig.shape[0], -1)
    #train_set_x = train_set_x_flatten / 255     #normalization

    d = cat_model(
        X_train = train_set_x_orig,
        Y_train = train_set_y,
        X_test = test_set_x_orig,
        Y_test = test_set_y,
        print_cost = True,
        lr = 0.01
    )

    costs = [c.detach().item() for c in d["costs"]]
    plt.plot(costs)
    plt.xlabel("cost")
    plt.ylabel("iterations (every hundreds)")
    plt.title(f"Learning rate = {d["lr"]}")
    plt.show()

结果

输出结果为：

number of iterations: 0,   error value: 0.6931471824645996%
number of iterations: 100,   error value: 0.7640451192855835%
number of iterations: 200,   error value: 1.003746747970581%
number of iterations: 300,   error value: 0.627930223941803%
number of iterations: 400,   error value: 0.5223615765571594%
number of iterations: 500,   error value: 0.3879987597465515%
number of iterations: 600,   error value: 0.23637865483760834%
number of iterations: 700,   error value: 0.15424039959907532%
number of iterations: 800,   error value: 0.13533082604408264%
number of iterations: 900,   error value: 0.12497343868017197%
number of iterations: 1000,   error value: 0.11647997051477432%
number of iterations: 1100,   error value: 0.10919397324323654%
number of iterations: 1200,   error value: 0.10280577838420868%
number of iterations: 1300,   error value: 0.09713104367256165%
number of iterations: 1400,   error value: 0.0920444130897522%
number of iterations: 1500,   error value: 0.08745360374450684%
number of iterations: 1600,   error value: 0.08328702300786972%
number of iterations: 1700,   error value: 0.07948748022317886%
number of iterations: 1800,   error value: 0.07600819319486618%
number of iterations: 1900,   error value: 0.07281030714511871%
Training set accuracy: 99.52153015136719%
Test set accuracy: 70.0%

可以看到训练集的准确率为 $99.5%$ ，相当之高，测试集的准确率为 $70.0$ ，有些过拟合。但作为一个使用非常简单的激活和损失函数的模型，能达到这个效果也不错了。

附上一张损失函数-迭代次数的图：
pict

尽管cost在iteration为250次附近有所波折，但最后还是成功收敛了，处于一个很低的损失值

反思

手搓代码时遇到了许多问题

提取数据集时，训练、测试集的张量的维度没有处理正确，所需要的是shape为 $[12288, 209]$ 、 $[12288, 50]$ 的张量，但是我初次传的是 $[209, 64, 64, 3]$ 、 $[50, 64, 64, 3]$ 的张量，导致在 $Sigmoid$ 计算时出现了 $w^T$ 和 $X$ 维度不匹配的问题。在print矩阵的shape之后发现了问题所在
未正确使用torch的自动求导，在写optimize函数时，将 $w$ 和 $b$ 的自动求导写在了梯度下降函数propatage_torch里，因此在optimize函数里 $w$ 和 $b$ 的更新 $w = w - lr \times dw$ 、 $b = b - lr \times db$ 就导致了 $w$ 和 $b$ 的求导信息丢失的情况 (其实本来也该写在propatage_torch里)。后将其求导转移到optimize就解决了问题
还有诸如语法错误等低级问题