As a powerful programming language, Python can be applied in various fields, including data mining and machine learning. In the field of data mining, association rule mining is a commonly used technique that can be used to discover relationships between different items in a data set and the impact of these relationships on other things. This article will briefly introduce association rule mining techniques in Python.
The Apriori algorithm is a classic algorithm in the field of association rule mining, which can be used to discover frequent item sets and association rules in data sets. Frequent itemsets refer to the set of items that appear more frequently in the data set, while association rules refer to the relationship between two or more items. They may appear at the same time, or the occurrence of one means that the other is also likely to appear. .
You can use the apriori function in the mlxtend library to implement the Apriori algorithm in Python. The following is a simple sample code:
from mlxtend.frequent_patterns import apriori # 构建数据集 data = [['牛奶', '面包', '啤酒'], ['奶酪', '面包', '黄油'], ['牛奶', '面包', '黄油', '鸡蛋'], ['奶酪', '黄油', '鸡蛋'], ['面包', '啤酒']] # 使用apriori算法挖掘频繁项集 frequent_itemsets = apriori(data, min_support=0.6) # 输出频繁项集 print(frequent_itemsets)
In the above code, we first define a data set, which contains the contents of five shopping baskets. Then use the apriori function in the mlxtend library to mine frequent itemsets. The first parameter of the function is the data set, and the second parameter is the minimum support threshold, which is set to 0.6 here.
In the output result, we can see that the algorithm found two frequent item sets: ['Bread'] and ['Milk', 'Bread']. This means that in this data set, the largest number of people buy bread, followed by milk and bread. We can discover frequent itemsets of different sizes by adjusting the support threshold.
After discovering frequent itemsets, we can continue to extract association rules. Association rules can help us understand the probability that certain items will appear together, or the probability that one item will appear when another item appears.
You can use the association_rules function in the mlxtend library to extract association rules in Python. The following is a simple sample code:
from mlxtend.frequent_patterns import association_rules, apriori data = [['牛奶', '面包', '啤酒'], ['奶酪', '面包', '黄油'], ['牛奶', '面包', '黄油', '鸡蛋'], ['奶酪', '黄油', '鸡蛋'], ['面包', '啤酒']] # 使用apriori算法挖掘频繁项集 frequent_itemsets = apriori(data, min_support=0.6) # 使用association_rules函数提取关联规则 rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.8) # 输出关联规则 print(rules)
In the above code, we first use the Apriori algorithm to find frequent itemsets in the data set. Then use the association_rules function to extract association rules. The first parameter of the function is the frequent itemset, the second parameter is the indicator for evaluating the association rules, here select confidence (confidence), and the third parameter is the minimum confidence threshold, here set to 0.8.
In the output, we can see that the algorithm found an association rule with a confidence level of 1.0: 'Bread' => 'Beer'. This means that 100% of the people who bought bread also bought beer. This association rule can be used to recommend products to users in recommendation systems.
FP-Growth algorithm is another classic algorithm in the field of association rule mining. It is faster than the Apriori algorithm and can handle large-scale of data sets.
The pyfpgrowth library can be used in Python to implement the FP-Growth algorithm. The following is a simple sample code:
import pyfpgrowth # 构建数据集 data = [['牛奶', '面包', '啤酒'], ['奶酪', '面包', '黄油'], ['牛奶', '面包', '黄油', '鸡蛋'], ['奶酪', '黄油', '鸡蛋'], ['面包', '啤酒']] # 使用FP-Growth算法挖掘频繁项集 patterns = pyfpgrowth.find_frequent_patterns(data, 2) # 使用FP-Growth算法提取关联规则 rules = pyfpgrowth.generate_association_rules(patterns, 0.8) # 输出频繁项集和关联规则 print(patterns) print(rules)
In the above code, we first define a data set, and then use the find_frequent_patterns function in the pyfpgrowth library to mine frequent itemsets. The first parameter of the function is the data set, and the second parameter is the support threshold. Here, we set the support threshold to 2, which means that each item set must appear in at least two shopping baskets. The function will return a dictionary containing all frequent itemsets and their support counts.
Then use the generate_association_rules function in the pyfpgrowth library to extract association rules. The first parameter of the function is a dictionary of frequent itemsets, and the second parameter is the confidence threshold. Here, we set the confidence threshold to 0.8.
In the output result, we can see that the algorithm found two frequent item sets: ('bread',) and ('bread', 'milk'). At the same time, the algorithm extracted an association rule with a confidence level of 1.0: ('bread',) => ('beer',). This means that 100% of people who buy bread will buy beer. In addition to this, you can also see other association rules with a confidence level higher than 0.8.
Summary
Association rule mining is a very useful data mining technique that can be used to discover relationships between different items in a data set and the impact of these relationships on other things. Python provides a variety of methods to implement association rule mining, including the Apriori algorithm and FP-Growth algorithm. In the specific implementation, we also need to pay attention to the threshold settings of frequent itemsets and association rules, and how to apply them to actual problems.
The above is the detailed content of Association rule mining techniques in Python. For more information, please follow other related articles on the PHP Chinese website!