纯numpy卷积神经网络实现手写数字识别的实践

  • Post category:Python

下面是关于“纯numpy卷积神经网络实现手写数字识别的实践”的完整攻略,包含了两个示例。

简介

卷积神经网络(Convolutional Neural NetworkCNN)是一种深度学习模型,广泛应用于图像识别、语音识别等领域。本文将介绍如何纯numpy实现一个简单的卷积神经网络,用于手写数字识别。

数据集

我们将使用MNIST数据集,该数据集包含60,000个训练图像和10,000个测试图像,每个图像都是28×28像素的灰度图像。我们将使用numpy和matplotlib库来加载和可视化数据集。

import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras.datasets import mnist

# 加载数据集
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# 可视化数据集
fig, axs = plt.subplots(2, 5, figsize=(10, 5))
axs = axs.flatten()
for i in range(10):
    axs[i].imshow(x_train[i], cmap='gray')
    axs[i].set_title(str(y_train[i]))
plt.show()

上面的代码将加载MNIST数据集,并可视化前10个图像及其标签。

数据预处理

在训练模型之前,我们需要对数据进行预处理。首先,我们将对图像进行归一化,将像素值缩放到0到1之间。其次,我们将对标签进行one-hot编码,将每个标签转换为一个长度为10的向量,其中对应标签的位置为1,其余位置为0。

# 归一化图像
x_train = x_train / 255.0
x_test = x_test / 255.0

# one-hot编码标签
y_train = np.eye(10)[y_train]
y_test = np.eye(10)[y_test]

构建模型

我们将使用numpy实现一个简单的卷积神经网络,包含两个卷积层和一个全连接层。下面是模型的架构图:

Input -> Conv2D -> ReLU -> MaxPool2D Conv2D -> ReLU -> MaxPool2D -> Flatten -> Dense -> Softmax

我们将使用以下超参数:

  • 卷积核大小:3×3
  • 卷积核数量:32和64
  • 池化大小:2×2
  • 全连接层大小:128
class Conv2D:
    def __init__(self, in_channels, out_channels, kernel_size, stride=1, padding=0):
        self.in_channels = in_channels
        self.out_channels = out_channels
        self.kernel_size = kernel_size
        self.stride = stride
        self.padding = padding
        self.weights = np.random.randn(out_channels, in_channels, kernel_size, kernel_size)
        self.bias = np.zeros((out_channels, 1))

    def forward(self, x):
        batch_size, in_channels, in_height, in_width = x.shape
        out_height = int((in_height + 2 * self.padding - self.kernel_size) / self.stride + 1)
        out_width = int((in_width + 2 * self.padding - self.kernel_size) / self.stride + 1)
        out = np.zeros((batch_size, self.out_channels, out_height, out_width))
        padded_x = np.pad(x, ((0, 0), (0, 0), (self.padding, self.padding), (self.padding, self.padding)), mode='constant')
        for b in range(batch_size):
            for c in range(self.out_channels):
                for i in range(out_height):
                    for j in range(out_width):
                        out[b, c, i, j] = np.sum(padded_x[b, :, i*self.stride:i*self.stride+self.kernel_size, j*self.stride:j*self.stride+self.kernel_size] * self.weights[c]) + self.bias[c]
        return out

class ReLU:
    def forward(self, x):
        return np.maximum(0, x)

class MaxPool2D:
    def __init__(self, kernel_size, stride=None):
        self.kernel_size = kernel_size
        self.stride = stride or kernel_size

    def forward(self, x):
        batch_size, channels, in_height, in_width = x.shape
        out_height = int((in_height - self.kernel_size) / self.stride + 1)
        out_width = int((in_width - self.kernel_size) / self.stride + 1)
        out = np.zeros((batch_size, channels, out_height, out_width))
        for b in range(batch_size):
            for c in range(channels):
                for i in range(out_height):
                    for j in range(out_width):
                        out[b, c, i, j] = np.max(x[b, c, i*self.stride:i*self.stride+self.kernel_size, j*self.stride:j*self.stride+self.kernel_size])
        return out

class Flatten:
    def forward(self, x):
        return x.reshape(x.shape[0], -1)

class Dense:
    def __init__(self, in_features, out_features):
        self.in_features = in_features
        self.out_features = out_features
        self.weights = np.random.randn(out_features, in_features)
        self.bias = np.zeros((out_features, 1))

    def forward(self, x):
        return np.dot(self.weights, x.T).T + self.bias.T

class Softmax:
    def forward(self, x):
        exp_x = np.exp(x - np.max(x, axis=1, keepdims=True))
        return exp_x / np.sum(exp_x, axis=1, keepdims=True)

class CNN:
    def __init__(self):
        self.layers = [
            Conv2D(1, 32, kernel_size=3, padding=1),
            ReLU(),
            MaxPool2D(kernel_size=2),
            Conv2D(32, 64, kernel_size=3, padding=1),
            ReLU(),
            MaxPool2D(kernel_size=2),
            Flatten(),
            Dense(7*7*64, 128),
            ReLU(),
            Dense(128, 10),
            Softmax()
        ]

    def forward(self, x):
        for layer in self.layers:
            x = layer.forward(x)
        return x

训练模型

我们将使用交叉熵损失函数和随机梯度下降优化器来训练模型。下面是训练代码:

# 定义超参数
learning_rate = 0.01
batch_size = 128
epochs = 10

# 创建模型
model = CNN()

# 训练模型
for epoch in range(epochs):
    for i in range(0, len(x_train), batch_size):
        x_batch = x_train[i:i+batch_size]
        y_batch = y_train[i:i+batch_size]
        # 前向传播
        y_pred = model.forward(x_batch)
        # 计算损失
        loss = -np.sum(y_batch * np.log(y_pred)) / len(x_batch)
        # 反向传播
        grad = y_pred - y_batch
        for layer in reversed(model.layers):
            grad = layer.backward(grad, learning_rate)
        # 打印损失
        if i % 1000 == 0:
            print(f'Epoch {epoch+1}/{epochs}, Step {i+1}/{len(x_train)}, Loss {loss:.4f}')

# 测试模型
y_pred = model.forward(x_test)
accuracy = np.mean(np.argmax(y_pred, axis=1) == np.argmax(y_test, axis=1))
print(f'Test Accuracy: {accuracy:.4f}')

上面的代码将训练模型,并在每个epoch结束时打印损失。训练完成后,我们将使用测试集评估模型的性能。

示例

下面是两个示例,演示了如何使用训练好的模型对新图像进行预测。

# 示例1
img = x_test[0]
plt.imshow(img, cmap='gray')
plt.show()
pred = model.forward(img.reshape(1, 1, 28, 28))
print(f'Prediction: {np.argmax(pred)}')

# 示例2
img = x_test[1]
plt.imshow(img, cmap='gray')
plt.show()
pred = model.forward(img.reshape(1, 1, 28, 28))
print(f'Prediction: {np.argmax(pred)}')

上面的代码将显示测试集中的两个图像,并使用训练好的模型对其进行预测。

总结

本文介绍了如何使用纯numpy实现一个简单的卷积神经网络,用于手写数字识别。我们使用MNIST数据集进行训练和测试,并演示了如何使用训练好的模型对新图像进行预测。