PyTorch in Action: A Detailed Step-by-Step Guide to Building CV Models from Scratch

Published November 18, 2025

Setting Up the PyTorch EnvironmentEnsure that Python 3.7 or later is installed. Install PyTorch using the following command (select the appropriate installation command based on your CUDA version):# 无CUDA版本 pip instal...

Setting Up the PyTorch Environment

Ensure that Python 3.7 or later is installed. Install PyTorch using the following command (select the appropriate installation command based on your CUDA version):

# 无CUDA版本
pip install torch torchvision

# 有CUDA 11.7版本
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu117


Data Preparation and Loading

Usetorchvision.datasetsto load standard datasets (such as CIFAR-10), or useDatasetclass implementation:

from torchvision import datasets, transforms

# 数据增强与归一化
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

# 加载CIFAR-10
train_data = datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(train_data, batch_size=64, shuffle=True)


Model Architecture Design

Inheritnn.Moduleclass to build a convolutional neural network (using LeNet as an example):

import torch.nn as nn

class LeNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)  # 输入通道3,输出通道6,卷积核5x5
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16*5*5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(nn.functional.relu(self.conv1(x)))
        x = self.pool(nn.functional.relu(self.conv2(x)))
        x = torch.flatten(x, 1)
        x = nn.functional.relu(self.fc1(x))
        x = nn.functional.relu(self.fc2(x))
        x = self.fc3(x)
        return x


Implementing the training process

Define the loss function and optimizer, and write the training loop:

import torch.optim as optim

model = LeNet()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

for epoch in range(10):
    running_loss = 0.0
    for i, (inputs, labels) in enumerate(train_loader):
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
    print(f'Epoch {epoch+1}, Loss: {running_loss/len(train_loader):.3f}')


Model Validation and Testing

Load the test set to evaluate model accuracy:

test_data = datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
test_loader = torch.utils.data.DataLoader(test_data, batch_size=64, shuffle=False)

correct = 0
total = 0
with torch.no_grad():
    for (images, labels) in test_loader:
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f'Test Accuracy: {100 * correct / total:.2f}%')


Saving and Loading the Model

Usagetorch.saveto save model weights or the entire model:

# 保存权重
torch.save(model.state_dict(), 'lenet.pth')

# 加载权重
loaded_model = LeNet()
loaded_model.load_state_dict(torch.load('lenet.pth'))


More in AI Academy

How to choose A100, A800, H100, H800 Arithmetic GPU cards for large model training [Ape World Arithmetic AI Academy

Choosing the right GPU depends on your specific needs and use cases. Below is a description of the features and recommended use cases for the A100, A800, H100, and H800 GPUs. You can select the appropriate GPU based on y...

NVIDIA B300 Technology In-Depth Analysis: Architectural Innovation and Enterprise AI Arithmetic Enabling Value

As generative AI evolves toward multimodal capabilities and models with trillions of parameters, and as enterprises’ computing needs shift from “general-purpose computing” to “scenario-specific, precision computing,” NVI...

RTX 5090 Technology Analysis and Enterprise Application Enablement: The Value of Arithmetic Innovation in Four Core Areas

Against the backdrop of enterprise AI R&D delving into models with hundreds of billions of parameters, professional content creation pursuing ultra-high-definition real-time processing, and industrial manufacturing r...

Arithmetic Leasing Selection Alert: A Guide to Avoiding the Three Core Pitfalls | 猿界算力

As digital transformation accelerates, computing power—a core factor of productivity—has become a critical pillar supporting corporate R&D innovation and business expansion. With the rapid expansion of the computing...

Low Latency-High Throughput: How Bare Metal GPUs Reconfigure the HPC and AI Convergence Arithmetic Base

When weather forecasting requires AI models to optimize the accuracy of numerical simulations, when biomedical R&D relies on HPC computing power to analyze molecular structures and uses AI to accelerate drug screenin...