From Data to Strategies: How Statistics Can Drive Trustworthy Marketing Decisions-Python Tutorial-php.cn

statistics is a powerful tool that allows us to address complex problems and answer questions that arise when observing data or patterns for the first time. An example of this could be analyzing the personality of customers in a supermarket. Questions like Is this group really different from the other? To what extent? Should I focus more on this group to improve their experience and my sales? They are key to making good decisions.

While visualizations can help us understand data quickly, they are not always 100% reliable. We could observe clear differences between groups, but those differences may not be statistically significant.

This is where statistics comes into play: not only does it help us analyze the data more deeply, but it gives us the confidence to validate our assumptions. As data scientists or decision-making professionals, we must be aware that incorrect analysis can lead to wrong decisions, resulting in loss of time and money. Therefore, it is crucial that our conclusions are well-founded, supported by statistical evidence.

De Datos a Estrategias: Cómo la Estadística Puede Impulsar Decisiones Confiables en Marketing

True satisfaction comes when we see the results of our analysis reflected in effective changes within the company, improvements in the customer experience, and, ultimately, a positive impact on sales and operations. It's an incredible feeling to have been part of that process!

To help you develop this skill we will develop in this article in Personality Analysis of supermarket customers, we will use the Kaggle Dataset Customer Personality Analysis: https://www.kaggle.com/datasets /imakash3011/customer-personality-analysis

In this analysis, we will explore the behavior of a supermarket's customers with the aim of extracting valuable information from the data. We will seek to answer the following questions:

Is there any significant difference in total spending by Education?
Is there a significant difference in total spending by Number of children?
Is there any significant difference in total spending by Marital Status?

Although this analysis could be extended much further, we will focus on answering these three questions, as they offer great explanatory power. Throughout the article, we will show you how we can address these questions and how, through the same approach, we could answer many more questions.

In this article we will explore statistical analyzes such as the Kolmogorov-Smirnov test, the Levene test, and how to know when to apply ANOVA or Kruskal -Wallis. These names may sound unfamiliar to you, but don't worry, I will explain them in a simple way so that you understand them without complications.

Next, I will show you the Python code and the steps to follow to perform these statistical analyzes effectively.

1. Getting started

We import the necessary Python libraries.

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import os

Copy after login

Now we can choose two ways to upload the .csv file, we directly get the file or we can get the kaggle link, right on the download button.

#pip install kagglehub
import kagglehub

# Download latest version
path = kagglehub.dataset_download("imakash3011/customer-personality-analysis")

print("Path to dataset files:", path)

Copy after login

De Datos a Estrategias: Cómo la Estadística Puede Impulsar Decisiones Confiables en Marketing

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import os

Copy after login

#pip install kagglehub
import kagglehub

# Download latest version
path = kagglehub.dataset_download("imakash3011/customer-personality-analysis")

print("Path to dataset files:", path)

Copy after login

#Obtenemos el nombre del archivo
nombre_archivo = os.listdir(path)[0]
nombre_archivo

Copy after login

	ID	Year_Birth	Education	Marital_Status	Income	Kidhome	Teenhome	Dt_Customer	Recency	MntWines	MntFruits	MntMeatProducts	MntFishProducts	MntSweetProducts	MntGoldProds	NumDealsPurchases	NumWebPurchases	NumCatalogPurchases	NumStorePurchases	NumWebVisitsMonth	Z_CostContact	Z_Revenue	Response
0	5524	1957	Graduation	Single	58138.0	0	0	04-09-2012	58	635	88	546	172	88	88	3	8	10	4	7	3	11	1
1	2174	1954	Graduation	Single	46344.0	1	1	08-03-2014	38	11	1	6	2	1	6	2	1	1	2	5	3	11	0
2	4141	1965	Graduation	Together	71613.0	0	0	21-08-2013	26	426	49	127	111	21	42	1	8	2	10	4	3	11	0

To have a better idea of the data set that we will analyze, I will indicate the meaning of each column.

Columns:

People:
- ID: unique identifier of the client
- Year_Birth: year of birth of the client.
- Education: level of education of the client.
- Marital_Status: client's marital status
- Income: annual income of the client's household
- Kidhome: Number of children in the client's home
- Teenhome: Number of teenagers in the client's home
- Dt_Customer: Customer registration date in the company
- Recency: number of days since the customer's last purchase.
- Complain: 1 if the customer complained in the last 2 years, 0 otherwise
Products:
- MntWines: Amount spent on wine in the last 2 years.
- MntFruits: Amount spent on fruits in the last 2 years.
- MntMeatProducts: Amount spent on meat in the last 2 years.
- MntFishProducts: amount spent on fish in the last 2 years.
- MntSweetProducts: amount spent on sweets in the last 2 years.
- MntGoldProds: amount spent on gold in the last 2 years.
Promotion:
- NumDealsPurchases: Number of purchases made with a discount.
- AcceptedCmp1: 1 if the customer accepted the offer in the first campaign, 0 otherwise.
- AcceptedCmp2: 1 if the customer accepted the offer in the second campaign, 0 otherwise.
- AcceptedCmp3: 1 if the customer accepted the offer in the third campaign, 0 otherwise.
- AcceptedCmp4: 1 if the customer accepted the offer in the fourth campaign, 0 otherwise.
- AcceptedCmp5: 1 if the customer accepted the offer in the fifth campaign, 0 otherwise.
- Response: 1 if the customer accepted the offer in the last campaign, 0 otherwise
Place:
- NumWebPurchases: Number of purchases made through the company website.
- NumCatalogPurchases: Number of purchases made through a catalog.
- NumStorePurchases: Number of purchases made directly in stores.
- NumWebVisitsMonth: Number of visits to the company's website in the last month.

Yes, there are many columns, however here we will only use a few, so as not to extend too much, in any case you can apply the same steps for the other columns.

Now, we will verify that we do not have null data

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import os

Copy after login

#pip install kagglehub
import kagglehub

# Download latest version
path = kagglehub.dataset_download("imakash3011/customer-personality-analysis")

print("Path to dataset files:", path)

Copy after login

We can notice that we have 24 null data in the Income column, however this column will not be used in this analysis therefore we will not do anything with it, in case you want to use it, you must verify perform one of these two options:

Impute the missing data if it does not represent more than 5% of the total data (recommendation).
Delete null data.

2. Configure the Dataset for analysis

We will keep the columns that are of interest to us, such as education, children, marital status, amount of spending per product category, among others.

#Obtenemos el nombre del archivo
nombre_archivo = os.listdir(path)[0]
nombre_archivo

Copy after login

We calculate the total expense by adding the expenses of all product categories.

'marketing_campaign.csv'

Copy after login

The above is the detailed content of From Data to Strategies: How Statistics Can Drive Trustworthy Marketing Decisions. For more information, please follow other related articles on the PHP Chinese website!