diff --git a/Ch14_Computer_Vision/Image_Augmentation.ipynb b/Ch14_Computer_Vision/Image_Augmentation.ipynb new file mode 100644 index 00000000..2b77e5ae --- /dev/null +++ b/Ch14_Computer_Vision/Image_Augmentation.ipynb @@ -0,0 +1,445 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "name": "Image_Augmentation.ipynb", + "provenance": [] + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + } + }, + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "1QLSUl2cPmqk", + "colab_type": "text" + }, + "source": [ + "# Image Augmentation\n", + "Image augmentation technology expands the scale of training datasets\n", + "by making a series of random changes to the training images to produce similar,\n", + "but different, training examples. Another way to explain image augmentation is\n", + "that randomly changing training examples can reduce a model's dependence on\n", + "certain properties, thereby improving its capability for generalization. For\n", + "example, we can crop the images in different ways, so that the objects of\n", + "interest appear in different positions, reducing the model's dependence on the\n", + "position where objects appear. We can also adjust the brightness, color, and\n", + "other factors to reduce model's sensitivity to color. It can be said that image\n", + "augmentation technology contributed greatly to the success of AlexNet. In this\n", + "section, we will discuss this technology, which is widely used in computer\n", + "vision.\n", + "\n", + "First, import the packages or modules required for the experiment in this section." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "V833oFFA93jA", + "colab_type": "code", + "colab": {} + }, + "source": [ + "import torch \n", + "import torch.nn as nn\n", + "import torchvision\n", + "import torchvision.transforms as transforms\n", + "import PIL\n", + "from PIL import Image" + ], + "execution_count": 1, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "UIDqgJsTQTAc", + "colab_type": "text" + }, + "source": [ + "# Common Image Augmentation Method\n", + "We will apply the following transforms to an image using 'torchvision.transforms' :\n", + "\n", + "\n", + "\n", + "* Resize the image to (224, 224)\n", + "* We can randomly change the hue and saturation of the image\n", + "* We can also create a RandomColorJitter instance and set how to randomly change the brightness, contrast, saturation, and hue of the image at the same time. \n", + "* Horizontal Flip\n", + "* Rotation\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "mj98T_1sQOfX", + "colab_type": "code", + "colab": {} + }, + "source": [ + "transforms = torchvision.transforms.Compose([\n", + " torchvision.transforms.Resize((224,224)),\n", + " torchvision.transforms.ColorJitter(hue=.10, saturation=.55),\n", + " torchvision.transforms.RandomHorizontalFlip(),\n", + " torchvision.transforms.RandomRotation(20, resample=PIL.Image.BILINEAR)\n", + "])" + ], + "execution_count": 2, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "9Kb5c_j-P1BE", + "colab_type": "code", + "colab": {} + }, + "source": [ + "img = Image.open(\"/content/apple.jpeg\")" + ], + "execution_count": 3, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "RZmWLaY7SOc4", + "colab_type": "code", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 241 + }, + "outputId": "1f38c8dc-9272-45b8-d9e4-3f07ca17e20f" + }, + "source": [ + "transforms(img)" + ], + "execution_count": 4, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "image/png": "\n", + "text/plain": [ + "" + ] + }, + "metadata": { + "tags": [] + }, + "execution_count": 4 + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Bawyz8FHUHSR", + "colab_type": "text" + }, + "source": [ + "# Using an Image Augmentation Training Model\n", + "Next, we will look at how to apply image augmentation in actual training. Here, we use the MNIST dataset to observe the model performance on dataset of augmented images." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "WdEJ57Sy-AmF", + "colab_type": "code", + "colab": {} + }, + "source": [ + "transform_train = torchvision.transforms.Compose([\n", + " torchvision.transforms.ColorJitter(hue=.10, saturation=.55),\n", + " torchvision.transforms.RandomHorizontalFlip(),\n", + " torchvision.transforms.RandomRotation(40, resample=PIL.Image.BILINEAR),\n", + " torchvision.transforms.ToTensor()\n", + "])\n", + "\n", + "transform_test = torchvision.transforms.Compose([\n", + " torchvision.transforms.ToTensor()\n", + "])" + ], + "execution_count": 5, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-jVCSvGSUwc0", + "colab_type": "text" + }, + "source": [ + "In order to obtain definitive results during prediction, we usually only apply image augmentation to the training example, and do not use image augmentation with random operations during prediction. Here, we use resize, color jitter, rotation and random horizontal flipping method. In addition, we use a `ToTensor` instance to convert minibatch images into the format required by PyTorch." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "vxCj7lWwMXHG", + "colab_type": "code", + "colab": {} + }, + "source": [ + "# Device configuration\n", + "device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')" + ], + "execution_count": 6, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rZWPgTZ9V8T0", + "colab_type": "text" + }, + "source": [ + "We can set the hyper parameters for the model." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "c7NURcRJMXy6", + "colab_type": "code", + "colab": {} + }, + "source": [ + "# Hyper parameters\n", + "num_epochs = 5\n", + "num_classes = 10\n", + "batch_size = 100\n", + "learning_rate = 0.001" + ], + "execution_count": 7, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "otT3Ub7gWDwH", + "colab_type": "text" + }, + "source": [ + "We download the dataset and create DataLoaders for test set and train set." + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "o9Uif2Z3JHrU", + "colab_type": "code", + "colab": {} + }, + "source": [ + "# MNIST dataset\n", + "train_dataset = torchvision.datasets.MNIST(root='../../data/',\n", + " train=True, \n", + " transform=transform_train,\n", + " download=True)\n", + "\n", + "test_dataset = torchvision.datasets.MNIST(root='../../data/',\n", + " train=False, \n", + " transform=transform_test)\n", + "\n", + "# Data loader\n", + "train_loader = torch.utils.data.DataLoader(dataset=train_dataset,\n", + " batch_size=batch_size, \n", + " shuffle=True)\n", + "\n", + "test_loader = torch.utils.data.DataLoader(dataset=test_dataset,\n", + " batch_size=batch_size, \n", + " shuffle=False)" + ], + "execution_count": 8, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "Dyvf5Qh2M1mB", + "colab_type": "code", + "colab": {} + }, + "source": [ + "# Convolutional neural network (two convolutional layers)\n", + "class ConvNet(nn.Module):\n", + " def __init__(self, num_classes=10):\n", + " super(ConvNet, self).__init__()\n", + " self.layer1 = nn.Sequential(\n", + " nn.Conv2d(1, 16, kernel_size=5, stride=1, padding=2),\n", + " nn.BatchNorm2d(16),\n", + " nn.ReLU(),\n", + " nn.MaxPool2d(kernel_size=2, stride=2))\n", + " self.layer2 = nn.Sequential(\n", + " nn.Conv2d(16, 32, kernel_size=5, stride=1, padding=2),\n", + " nn.BatchNorm2d(32),\n", + " nn.ReLU(),\n", + " nn.MaxPool2d(kernel_size=2, stride=2))\n", + " self.fc = nn.Linear(7*7*32, num_classes)\n", + " \n", + " def forward(self, x):\n", + " out = self.layer1(x)\n", + " out = self.layer2(out)\n", + " out = out.reshape(out.size(0), -1)\n", + " out = self.fc(out)\n", + " return out\n", + "\n", + "model = ConvNet(num_classes).to(device)" + ], + "execution_count": 9, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "4Kz0ODLsEtiT", + "colab_type": "code", + "colab": {} + }, + "source": [ + "# Loss and optimizer\n", + "criterion = nn.CrossEntropyLoss()\n", + "optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)" + ], + "execution_count": 10, + "outputs": [] + }, + { + "cell_type": "code", + "metadata": { + "id": "1IzDkL13M82i", + "colab_type": "code", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 521 + }, + "outputId": "791f04fd-7ccc-4ac0-8dce-7ed0b207a88d" + }, + "source": [ + "# Train the model\n", + "total_step = len(train_loader)\n", + "for epoch in range(num_epochs):\n", + " for i, (images, labels) in enumerate(train_loader):\n", + " images = images.to(device)\n", + " labels = labels.to(device)\n", + " \n", + " # Forward pass\n", + " outputs = model(images)\n", + " loss = criterion(outputs, labels)\n", + " \n", + " # Backward and optimize\n", + " optimizer.zero_grad()\n", + " loss.backward()\n", + " optimizer.step()\n", + " \n", + " if (i+1) % 100 == 0:\n", + " print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}' \n", + " .format(epoch+1, num_epochs, i+1, total_step, loss.item()))" + ], + "execution_count": 11, + "outputs": [ + { + "output_type": "stream", + "text": [ + "Epoch [1/5], Step [100/600], Loss: 0.5863\n", + "Epoch [1/5], Step [200/600], Loss: 0.2191\n", + "Epoch [1/5], Step [300/600], Loss: 0.3434\n", + "Epoch [1/5], Step [400/600], Loss: 0.2428\n", + "Epoch [1/5], Step [500/600], Loss: 0.2102\n", + "Epoch [1/5], Step [600/600], Loss: 0.3019\n", + "Epoch [2/5], Step [100/600], Loss: 0.3036\n", + "Epoch [2/5], Step [200/600], Loss: 0.2419\n", + "Epoch [2/5], Step [300/600], Loss: 0.2189\n", + "Epoch [2/5], Step [400/600], Loss: 0.2131\n", + "Epoch [2/5], Step [500/600], Loss: 0.1447\n", + "Epoch [2/5], Step [600/600], Loss: 0.3064\n", + "Epoch [3/5], Step [100/600], Loss: 0.1166\n", + "Epoch [3/5], Step [200/600], Loss: 0.1930\n", + "Epoch [3/5], Step [300/600], Loss: 0.1506\n", + "Epoch [3/5], Step [400/600], Loss: 0.1399\n", + "Epoch [3/5], Step [500/600], Loss: 0.2541\n", + "Epoch [3/5], Step [600/600], Loss: 0.1431\n", + "Epoch [4/5], Step [100/600], Loss: 0.2039\n", + "Epoch [4/5], Step [200/600], Loss: 0.1451\n", + "Epoch [4/5], Step [300/600], Loss: 0.2032\n", + "Epoch [4/5], Step [400/600], Loss: 0.2130\n", + "Epoch [4/5], Step [500/600], Loss: 0.1343\n", + "Epoch [4/5], Step [600/600], Loss: 0.2140\n", + "Epoch [5/5], Step [100/600], Loss: 0.2033\n", + "Epoch [5/5], Step [200/600], Loss: 0.1511\n", + "Epoch [5/5], Step [300/600], Loss: 0.1760\n", + "Epoch [5/5], Step [400/600], Loss: 0.1979\n", + "Epoch [5/5], Step [500/600], Loss: 0.1957\n", + "Epoch [5/5], Step [600/600], Loss: 0.1357\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "code", + "metadata": { + "id": "sAuHgHQBNAr3", + "colab_type": "code", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 34 + }, + "outputId": "a9ce4848-9dca-42b2-9582-72f2451e7e3c" + }, + "source": [ + "# Test the model\n", + "model.eval() # eval mode (batchnorm uses moving mean/variance instead of mini-batch mean/variance)\n", + "with torch.no_grad():\n", + " correct = 0\n", + " total = 0\n", + " for images, labels in test_loader:\n", + " images = images.to(device)\n", + " labels = labels.to(device)\n", + " outputs = model(images)\n", + " _, predicted = torch.max(outputs.data, 1)\n", + " total += labels.size(0)\n", + " correct += (predicted == labels).sum().item()\n", + "\n", + " print('Test Accuracy of the model on the 10000 test images: {} %'.format(100 * correct / total))" + ], + "execution_count": 12, + "outputs": [ + { + "output_type": "stream", + "text": [ + "Test Accuracy of the model on the 10000 test images: 96.03 %\n" + ], + "name": "stdout" + } + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "TBNmZoIGXXcU", + "colab_type": "text" + }, + "source": [ + "## Summary\n", + "\n", + "* Image augmentation generates random images based on existing training data to cope with overfitting.\n", + "* In order to obtain definitive results during prediction, we usually only apply image augmentation to the training example, and do not use image augmentation with random operations during prediction.\n", + "* We can obtain classes related to image augmentation from PyTorch's `transforms` module." + ] + } + ] +} \ No newline at end of file diff --git a/README.md b/README.md index 3ce82bc0..55a3cb40 100644 --- a/README.md +++ b/README.md @@ -105,7 +105,7 @@ Note: Some ipynb notebooks may not be rendered perfectly in Github. We suggest ` * 12.9 Adadelta * 12.10 Adam * **Ch14 Computer Vision** - * 14.1 Image Augmentation + * 14.1 [Image Augmentation](https://github.com/ShambhaviCodes/d2l-pytorch/blob/master/Ch14_Computer_Vision/Image_Augmentation.ipynb) * 14.2 Fine Tuning * 14.3 [Object Detection and Bounding Boxes](https://github.com/dsgiitr/d2l-pytorch/blob/master/Ch14_Computer_Vision/Object_Detection_and_Bounding_Boxes.ipynb) * 14.4 [Anchor Boxes](https://github.com/dsgiitr/d2l-pytorch/blob/master/Ch14_Computer_Vision/Anchor_Boxes.ipynb)