statistics is a powerful tool that allows us to address complex problems and answer questions that arise when observing data or patterns for the first time. An example of this could be analyzing the personality of customers in a supermarket. Questions like Is this group really different from the other? To what extent? Should I focus more on this group to improve their experience and my sales? They are key to making good decisions.
While visualizations can help us understand data quickly, they are not always 100% reliable. We could observe clear differences between groups, but those differences may not be statistically significant.
This is where statistics comes into play: not only does it help us analyze the data more deeply, but it gives us the confidence to validate our assumptions. As data scientists or decision-making professionals, we must be aware that incorrect analysis can lead to wrong decisions, resulting in loss of time and money. Therefore, it is crucial that our conclusions are well-founded, supported by statistical evidence.
True satisfaction comes when we see the results of our analysis reflected in effective changes within the company, improvements in the customer experience, and, ultimately, a positive impact on sales and operations. It's an incredible feeling to have been part of that process!
To help you develop this skill we will develop in this article in Personality Analysis of supermarket customers, we will use the Kaggle Dataset Customer Personality Analysis: https://www.kaggle.com/datasets /imakash3011/customer-personality-analysis
In this analysis, we will explore the behavior of a supermarket's customers with the aim of extracting valuable information from the data. We will seek to answer the following questions:
Although this analysis could be extended much further, we will focus on answering these three questions, as they offer great explanatory power. Throughout the article, we will show you how we can address these questions and how, through the same approach, we could answer many more questions.
In this article we will explore statistical analyzes such as the Kolmogorov-Smirnov test, the Levene test, and how to know when to apply ANOVA or Kruskal -Wallis. These names may sound unfamiliar to you, but don't worry, I will explain them in a simple way so that you understand them without complications.
We import the necessary Python libraries.
import pandas as pd import matplotlib.pyplot as plt import seaborn as sns import numpy as np import os
Now we can choose two ways to upload the .csv file, we directly get the file or we can get the kaggle link, right on the download button.
#pip install kagglehub import kagglehub # Download latest version path = kagglehub.dataset_download("imakash3011/customer-personality-analysis") print("Path to dataset files:", path)
import pandas as pd import matplotlib.pyplot as plt import seaborn as sns import numpy as np import os
#pip install kagglehub import kagglehub # Download latest version path = kagglehub.dataset_download("imakash3011/customer-personality-analysis") print("Path to dataset files:", path)
#Obtenemos el nombre del archivo nombre_archivo = os.listdir(path)[0] nombre_archivo
ID | Year_Birth | Education | Marital_Status | Income | Kidhome | Teenhome | Dt_Customer | Recency | MntWines | MntFruits | MntMeatProducts | MntFishProducts | MntSweetProducts | MntGoldProds | NumDealsPurchases | NumWebPurchases | NumCatalogPurchases | NumStorePurchases | NumWebVisitsMonth | AcceptedCmp3 | AcceptedCmp4 | AcceptedCmp5 | AcceptedCmp1 | AcceptedCmp2 | Complain | Z_CostContact | Z_Revenue | Response | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 5524 | 1957 | Graduation | Single | 58138.0 | 0 | 0 | 04-09-2012 | 58 | 635 | 88 | 546 | 172 | 88 | 88 | 3 | 8 | 10 | 4 | 7 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 11 | 1 |
1 | 2174 | 1954 | Graduation | Single | 46344.0 | 1 | 1 | 08-03-2014 | 38 | 11 | 1 | 6 | 2 | 1 | 6 | 2 | 1 | 1 | 2 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 11 | 0 |
2 | 4141 | 1965 | Graduation | Together | 71613.0 | 0 | 0 | 21-08-2013 | 26 | 426 | 49 | 127 | 111 | 21 | 42 | 1 | 8 | 2 | 10 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 11 | 0 |
To have a better idea of the data set that we will analyze, I will indicate the meaning of each column.
Columns:
People:
Products:
Promotion:
Place:
Yes, there are many columns, however here we will only use a few, so as not to extend too much, in any case you can apply the same steps for the other columns.
Now, we will verify that we do not have null data
import pandas as pd import matplotlib.pyplot as plt import seaborn as sns import numpy as np import os
#pip install kagglehub import kagglehub # Download latest version path = kagglehub.dataset_download("imakash3011/customer-personality-analysis") print("Path to dataset files:", path)
We can notice that we have 24 null data in the Income column, however this column will not be used in this analysis therefore we will not do anything with it, in case you want to use it, you must verify perform one of these two options:
We will keep the columns that are of interest to us, such as education, children, marital status, amount of spending per product category, among others.
#Obtenemos el nombre del archivo nombre_archivo = os.listdir(path)[0] nombre_archivo
We calculate the total expense by adding the expenses of all product categories.
'marketing_campaign.csv'
The above is the detailed content of From Data to Strategies: How Statistics Can Drive Trustworthy Marketing Decisions. For more information, please follow other related articles on the PHP Chinese website!