Visualization and Analysis of an Online Retail Invoices Dataset

In this project I conducted visualization and analysis on an online retail transaction dataset.

The data I used is sourced from UC Irvine Machine Learning Repository and is referred to as the "Online-retail" dataset. You can access this dataset here

The analysis was performed using Jupyter Notebook. To view the complete code, please click here

Now, let's begin.

Dataset Information

First, let's take a look to the top 5 rows of the dataset:

These are description of the dataset's columns obtained from data source:
ColumnDescription
InvoiceNoInvoice number, a 6-digit integral number uniquely assigned to each transaction.
StockCodeProduct (item), a 5-digit integral number uniquely assigned to each distinct product.
DescriptionProduct (item) name.
QuantityThe quantities of each product (item) per transaction.
InvoiceDateInvoice Date and time, the day and time when each transaction was generated.
UnitPriceUnit price, Product price per unit in sterling.
CustomerIDCustomer number, a 5-digit integral number uniquely assigned to each customer.
CountryCountry name, the name of the country where each customer resides.

This includes basic information and numerical variables' statistics for the dataset:


Dataset information summary:
1. Dataset contains 541909 rows data with 8 columns
2. CustomerID column has missing values
3. There are non-positive values in Quantity and UnitPrice variable

Data Preprocessing

Before performing visualization and analysis, this dataset needs to be preprocessed first. The preprocessing processes include:
1. Remove all the data that have non-positive value in Quantity or Unitprice value
2. Standardize Description based on StockCode
3. Convert InvoiceDate and CustomerID data type
4. Remove data with missing value in CustomerID column (only when doing customer analysis)

Visualization & Analysis Result

Every analysis always starts with question. This analysis attempt to answer following basic questions:
1. How is average gross sales by the day of the week?
2. How is sales performance each month?
3. What are the top-selling products?
3. Who are the most loyal customers?
4. How is the distribution of purchases from each country?
From these questions, we may encounter new ones during the analysis.
Let's jump to the analysis.

Sales Analysis

To conduct sales analysis, I first create two new columns: 'TotalSales' and 'DayofWeek' by command below. TotalSales is the product of Quantity and UnitPrice, while DayofWeek represents the day name when the transaction occurred.


How is average gross sales by the day of the week?

Insights:


How is sales performance each month?
To answer this question, I created two graphs: one based on Quantity and another based on TotalSales.

Insight:
The increase in sales is significantly noticeable starting from October and peaks in November.

Product Analysis

We have observed an increase in sales from September to November, leading to a new question:
Which products are highly popular during these months?

Insight:
Popcorn Holder is the top-selling product during Sept 2011 - Nov 2011. From this information we may know the behavior of the customer. We can deduce that towards the end of the year is when customers enjoy holidays by watching movies, if we associate popcorn with movies.
Besides that, we can make further deductions through other products that occupy the second and subsequent positions.

How about top-selling products all the time?
1. Based on Quantity

2. Based on TotalSales


Insight:
From both graphs above, we observed that LITTLE BIRDIE PAPER CRAFT is the most purchased item. However, when we look at the top 2 products that contribute the highest gross sales, these two items are not among the top ten most purchased items. From this, we can see that both products are premium items with fewer sales but contribute significantly to gross sales.

Customer Analysis

Who are the most loyal customers?

Insight:
The graph above displays the top 10 customer IDs with the highest purchases based on TotalSales. In this data, we can only identify the IDs of loyal customers. However, if we can analyze who these loyal customers are, we might gather more information to perform customer segmentation, know our customers better, and consequently enhance our marketing strategies.

How is the distribution of purchases from each country?

Insight:
The highest sales are domestic sales (United Kingdom), and the highest exports are to the Netherlands, EIRE, Germany, and so on.

Conclusion

Based on the insights above, here are some conclusions and further considerations:

1. Store Closure on Saturdays:
The absence of sales on Saturdays suggests that the store is likely closed on this day. This information is crucial for scheduling staff and managing resources effectively.

2. Weekly Sales Pattern:
Sales from Monday to Thursday are consistent, while Sunday sales are relatively smaller. This information can be useful for staffing and inventory management, emphasizing the importance of weekends in terms of sales.

3. Seasonal Sales Increase:
The significant increase in sales starting from October, peaking in November, indicates a seasonal trend. Customers may be more active during the holiday season, suggesting an opportunity for targeted marketing campaigns or promotions during these months.

4. Popcorn Holder as a Seasonal Indicator:
The popularity of the Popcorn Holder during September to November may indicate a correlation between movie-related items and holiday seasons. This insight can guide product placement and marketing efforts during these months.

5. Premium Items and Gross Sales:
Premium items like "LITTLE BIRDIE PAPER CRAFT" may not be among the top-purchased items individually, but they significantly contribute to gross sales. This suggests a market for higher-end products and the importance of promoting and highlighting such items.

6. Customer Segmentation and Loyalty:
Identifying the top 10 customer IDs with the highest purchases is a starting point for recognizing loyal customers. Further analysis of these customers can provide insights into their preferences and behaviors, allowing for more targeted marketing and personalized promotions.

7. International Sales Insights:
Recognizing the highest sales in the domestic market (United Kingdom) and significant exports to the Netherlands, EIRE, and Germany provides valuable information for international expansion strategies. It may also guide inventory planning based on geographical demand.

In conclusion, leveraging these insights can help the business optimize its operations, tailor marketing strategies, and enhance customer relationships. Further analysis of customer demographics, preferences, and behaviors can contribute to effective segmentation and more personalized approaches.