PyTorch in Action: A Detailed Step-by-Step Guide to Building CV Models from Scratch

Setting Up the PyTorch Environment

Ensure that Python 3.7 or later is installed. Install PyTorch using the following command (select the appropriate installation command based on your CUDA version):

# 无CUDA版本
pip install torch torchvision

# 有CUDA 11.7版本
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu117

Data Preparation and Loading

Usetorchvision.datasetsto load standard datasets (such as CIFAR-10), or useDatasetclass implementation:

from torchvision import datasets, transforms

# 数据增强与归一化
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

# 加载CIFAR-10
train_data = datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(train_data, batch_size=64, shuffle=True)

Model Architecture Design

Inheritnn.Moduleclass to build a convolutional neural network (using LeNet as an example):

import torch.nn as nn

class LeNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)  # 输入通道3，输出通道6，卷积核5x5
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16*5*5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(nn.functional.relu(self.conv1(x)))
        x = self.pool(nn.functional.relu(self.conv2(x)))
        x = torch.flatten(x, 1)
        x = nn.functional.relu(self.fc1(x))
        x = nn.functional.relu(self.fc2(x))
        x = self.fc3(x)
        return x

Implementing the training process

Define the loss function and optimizer, and write the training loop:

import torch.optim as optim

model = LeNet()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

for epoch in range(10):
    running_loss = 0.0
    for i, (inputs, labels) in enumerate(train_loader):
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
    print(f'Epoch {epoch+1}, Loss: {running_loss/len(train_loader):.3f}')

Model Validation and Testing

Load the test set to evaluate model accuracy:

test_data = datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=64, shuffle=False)

correct = 0
total = 0
with torch.no_grad():
    for (images, labels) in test_loader:
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f'Test Accuracy: {100 * correct / total:.2f}%')

Saving and Loading the Model

Usagetorch.saveto save model weights or the entire model:

# 保存权重
torch.save(model.state_dict(), 'lenet.pth')

# 加载权重
loaded_model = LeNet()
loaded_model.load_state_dict(torch.load('lenet.pth'))

Setting Up the PyTorch Environment

Data Preparation and Loading

Model Architecture Design

Implementing the training process

Model Validation and Testing

Saving and Loading the Model

More in AI Academy

How to choose A100, A800, H100, H800 Arithmetic GPU cards for large model training [Ape World Arithmetic AI Academy

NVIDIA B300 Technology In-Depth Analysis: Architectural Innovation and Enterprise AI Arithmetic Enabling Value

RTX 5090 Technology Analysis and Enterprise Application Enablement: The Value of Arithmetic Innovation in Four Core Areas

Arithmetic Leasing Selection Alert: A Guide to Avoiding the Three Core Pitfalls | 猿界算力

Low Latency-High Throughput: How Bare Metal GPUs Reconfigure the HPC and AI Convergence Arithmetic Base