Image Classification of Rock-paper-scissors Hand-shaped Pictures

Hey everyone! This project is the final assignment for my Dicoding machine learning course. We had to make a smart thing using TensorFlow and Keras - a Convolutional Neural Network (CNN). What does it do? It looks at hand pictures and decides if it's showing rock, paper, or scissors. Cool part? Got a perfect 5/5 for it.

Here are the requirements for this task:

No Plagiarism!
The dataset must be divided into a train set and a validation set.
Image augmentation must be implemented.
Utilize an image data generator.
The model must be implemented using the sequential model.
Model training should not exceed 30 minutes.
The program is to be executed on Google Colaboratory.
The model's accuracy should be a minimum of 85%.
It should be capable of predicting images uploaded to Colab.

Criteria for Achieving a 5/5 Rating:

All requirements must be fulfilled.
Accuracy above 96%.
Utilize three or more techniques not covered in the module.

As mentioned in the requirements, this work was performed in Google Colab. In this post, I will provide a brief summary of each step in the project. You can view the complete code here

Steps of this project are as follows:

Data preparation
Data preprocessing
Visualization of training data samples
Sequential modeling
Customizing callbacks
Model training
Prediction on new data

Now, let's get into the project! 🚀

Data preparation

The first stage is data preparation. In this stage, the initial step involves downloading and importing data into Google Colab. The dataset used must be the one provided by Dicoding. To view the dataset, please click here.

The provided dataset is in .zip format. After downloading, the next step is to extract the zip file. The processes in this data preparation stage use the os, zipfile, and pathlib libraries. The final result of this process is a dataset containing 2188 images.

Data preprocessing

The next stage is the data preprocessing process. This stage includes image augmentation and dividing the dataset into test and validation datasets. These processes utilize the TensorFlow library, and then the Numpy library is employed to calculate the number of data for each criterion in the train dataset. Number of data per criteria in training dataset are as follows: 'paper': 428, 'rock': 436, 'scissors': 450

Training data sample visualization

This stage is optional. Here, I perform visualization to observe 15 sample images from the training dataset using the Matplotlib library. The results are as follows:

Sequential Modelling

In this segment, I'm constructing a Convolutional Neural Network (CNN) using TensorFlow and Keras for image classification. The model is sequential, meaning each layer flows in sequence.

Input Layer:
- The first layer is a Conv2D layer with 32 filters of size (3, 3), utilizing the ReLU activation function. It takes input images of shape (150, 150, 3) representing height, width, and channels.
- A MaxPooling2D layer with a pool size of (2, 2) follows, reducing spatial dimensions.
Convolutional Layers with Pooling:
- The pattern repeats with additional Conv2D and MaxPooling2D layers, gradually increasing the number of filters to 64, 128, and 256, respectively.
- Each convolutional layer uses the ReLU activation function to introduce non-linearity, and MaxPooling2D reduces spatial dimensions.
Flatten Layer:
- After the convolutional layers, a Flatten layer is applied to convert the multi-dimensional feature maps into a one-dimensional array.
Dense Layers:
- Two Dense (fully connected) layers follow, with 512 neurons each in the first layer and 3 neurons in the output layer.
- The ReLU activation is applied in the first Dense layer, while the output layer uses the softmax activation for multi-class classification.

The model is then compiled using the Adam optimizer, categorical cross-entropy loss function (suitable for multi-class classification), and accuracy as the evaluation metric.

Creating Customized Callbacks

Next, after building the model, I customized callbacks. The callbacks I used are the model checkpoint, which saves the model's weights during training, and early stopping, which stops training if the accuracy and validation accuracy exceed 96 percent. Here is the code for creating the callbacks that I performed.

Train the model

After all the preparations are complete, it's time to train the model. The model is trained with parameters as shown in the following code.

Training result:
I set the number of epochs to 20, indicating that the model will be trained 20 times. However, after the 10th training epoch, both accuracy and validation accuracy have surpassed 96 percent. Therefore, with the early stopping callback we prepared earlier, the training is halted at the 10th epoch. Here's the visualization of training process:

Predict new data

Finally, we will predict new data, which is a photo of my own hand that will be uploaded and then predicted. The code and its results are as follows.

Share on

Twitter Facebook LinkedIn

Gerald Simanullang

Data preparation

Data preprocessing

Training data sample visualization

Sequential Modelling

Input Layer:

Convolutional Layers with Pooling:

Flatten Layer:

Dense Layers:

Creating Customized Callbacks

Train the model

Predict new data

Share on