Increase your knowledge! Machine learning with logical rules

青灯夜游
Release: 2023-04-01 22:07:42
forward
1798 people have browsed it

On the precision-recall curve, the same points are plotted with different coordinate axes. Warning: The first red dot on the left (0% recall, 100% precision) corresponds to 0 rules. The second dot on the left is the first rule, and so on.

Skope-rules uses a tree model to generate rule candidates. First build some decision trees and consider the paths from the root node to internal nodes or leaf nodes as rule candidates. These candidate rules are then filtered by some predefined criteria such as precision and recall. Only those with precision and recall above their thresholds are retained. Finally, similarity filtering is applied to select rules with sufficient diversity. In general, Skope-rules is applied to learn the underlying rules for each root cause.

Increase your knowledge! Machine learning with logical rules

Project address: https://github.com/scikit-learn-contrib/skope-rules

  • Skope-rules is a build Python machine learning module on top of scikit-learn, released under the 3-clause BSD license.
  • Skope-rules aims to learn logical, interpretable rules for "defining" a target category, that is, detecting instances of that category with high accuracy.
  • Skope-rules are a trade-off between the interpretability of decision trees and the modeling capabilities of random forests.

Increase your knowledge! Machine learning with logical rules

schema

Installation

You can use pip to get the latest resources:

pip install skope-rules

Quick Start

SkopeRules can be used to describe classes with logical rules:

from sklearn.datasets import load_iris
from skrules import SkopeRules

dataset = load_iris()
feature_names = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
clf = SkopeRules(max_depth_duplicatinotallow=2,
n_estimators=30,
precision_min=0.3 ,
recall_min=0.1,
feature_names=feature_names)

for idx, species in enumerate(dataset.target_names):
X, y = dataset.data, dataset.target
clf.fit(X, y == idx)
rules = clf.rules_[0:3]
print("Rules for iris", species)
for rule in rules:
print( rule)
print()
print(20*'=')
print()

Increase your knowledge! Machine learning with logical rules

Note:

If it appears The following error:

Increase your knowledge! Machine learning with logical rules

Solution:

About Python import error: cannot import name 'six' from 'sklearn.externals' , Yun Duojun on Stack Overflow Found a similar question on: https://stackoverflow.com/questions/61867945/

The solution is as follows

import six
import sys
sys.modules['sklearn .externals.six'] = six
import mlrose

Personal test is valid!

SkopeRules can also be used as predictors if using the "score_top_rules" method:

from sklearn.datasets import load_boston
from sklearn.metrics import precision_recall_curve
from matplotlib import pyplot as plt
from skrules import SkopeRules

dataset = load_boston()
clf = SkopeRules(max_depth_duplicatinotallow=None,
n_estimators=30,
precision_min=0.2,
recall_min=0.01 ,
feature_names=dataset.feature_names)

X, y = dataset.data, dataset.target > 25
X_train, y_train = X[:len(y)//2], y [:len(y)//2]
X_test, y_test = X[len(y)//2:], y[len(y)//2:]
clf.fit(X_train, y_train )
y_score = clf.score_top_rules(X_test) # Get a risk score for each test example
precision, recall, _ = precision_recall_curve(y_test, y_score)
plt.plot(recall, precision)
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('Precision Recall curve')
plt.show()

Increase your knowledge! Machine learning with logical rules

Practical Case

This case demonstrates the use of skope-rules on the famous Titanic data set.

skope-rules Applicability:

  • Solving two-classification problems
  • Extracting interpretable decision rules

This case is divided into 5 parts

  • Import related libraries
  • Data preparation
  • Model training (using ScopeRules().score_top_rules() method)
  • Explanation of "Survival Rules" (using SkopeRules().rules_property).
  • Performance analysis (using SkopeRules.predict_top_rules() method).

Import related libraries

# Import skope-rules
from skrules import SkopeRules

# Import librairies
import pandas as pd
from sklearn.ensemble import GradientBoostingClassifier, RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
import matplotlib.pyplot as plt
from sklearn.metrics import roc_curve, precision_recall_curve
from matplotlib import cm
import numpy as np
from sklearn.metrics import confusion_matrix
from IPython.display import display

# Import Titanic data
data = pd.read_csv('../ data/titanic-train.csv')

Data preparation

# Delete rows with missing ages
data = data.query('Age == Age')
# is The variable Sex creates an encoded value
data['isFemale'] = (data['Sex'] == 'female') * 1
# The variable Embarked creates an encoded value
data = pd.concat(
[data,
pd.get_dummies(data.loc[:,'Embarked'],
dummy_na=False,
prefix='Embarked',
prefix_sep='_')],
axis=1
)
# Delete unused variables
data = data.drop(['Name', 'Ticket', 'Cabin',
'PassengerId', 'Sex ', 'Embarked'],
axis = 1)
# Create training and test sets
X_train, X_test, y_train, y_test = train_test_split(
data.drop(['Survived'], axis =1),
data['Survived'],
test_size=0.25, random_state=42)
feature_names = X_train.columns

print('Column names are: ' ' '. join(feature_names.tolist()) '.')
print('Shape of training set is: ' str(X_train.shape) '.')

Column names are: Pclass Age SibSp Parch Fare
isFemale Embarked_C Embarked_Q Embarked_S.
Shape of training set is: (535, 9).

Model training

# Train a gradient boosting classifier for benchmark testing
gradient_boost_clf = GradientBoostingClassifier(random_state=42, n_estimators=30, max_depth = 5)
gradient_boost_clf.fit(X_train, y_train)

# Train a random forest classifier for benchmarking
random_forest_clf = RandomForestClassifier(random_state=42, n_estimators=30, max_depth = 5)
random_forest_clf.fit(X_train, y_train)

# Train a decision tree classifier for benchmarking
decision_tree_clf = DecisionTreeClassifier(random_state=42, max_depth = 5)
decision_tree_clf.fit(X_train, y_train)

# Train a skope-rules-boosting classifier
skope_rules_clf = SkopeRules(feature_names=feature_names, random_state= 42, n_estimators=30,
recall_min=0.05, precision_min=0.9,
max_samples=0.7,
max_depth_duplicatinotallow= 4, max_depth = 5)
skope_rules_clf.fit(X_train, y_train)


# Calculate prediction score
gradient_boost_scoring = gradient_boost_clf.predict_proba(X_test)[:, 1]
random_forest_scoring = random_forest_clf.predict_proba(X_test)[:, 1]
decision_tree_scoring = decision_tree_clf.predict_proba( X_test)[:, 1]

skope_rules_scoring = skope_rules_clf.score_top_rules(X_test)

Extraction of "Survival Rules"

# Get the number of created survival rules
print("Created with SkopeRules" str(len(skope_rules_clf.rules_)) "rules n")

# Print these rules
rules_explanations = [
"Under 3 years old and under 37 years old , women in first or second class. "
"Females over 3 years old traveling in first or second class and paying more than 26 euros. "
"Female who travels in first or second class and pays more than 29 euros. "
"Female who is over 39 years old and traveling in first or second class. "
]
print('The four best-performing "Titanic survival rules" are as follows:/n')
for i_rule, rule in enumerate(skope_rules_clf.rules_[:4] )
print(rule[0])
print('->' rules_explanations[i_rule] 'n')

9 rules were created using SkopeRules.

Among them The top 4 "Titanic Survival Rules" are as follows:

Age 2.5
and Pclass 0.5
-> Women under 3 years old and under 37 years old, in first or second class.

Age > 2.5 and Fare > 26.125
and Pclass 0.5
-> Women over 3 years old traveling in first or second class and paying more than 26 euros.

Fare > 29.356250762939453
and Pclass 0.5
-> Women who ride in first or second class and pay more than 29 euros.

Age > 38.5 and Pclass and isFemale > 0.5
-> Women who are over 39 years old and traveling in first or second class.

def compute_y_pred_from_query(X, rule):
score = np.zeros(X.shape[0])
X = X.reset_index(drop=True)
score[list( X.query(rule).index)] = 1
return(score)

def compute_performances_from_y_pred(y_true, y_pred, index_name='default_index'):
df = pd.DataFrame(data=
{
'precision':[sum(y_true * y_pred)/sum(y_pred)],
'recall':[sum(y_true * y_pred)/sum(y_true)]
},
index=[index_name],
columns=['precision', 'recall']
)
return(df)

def compute_train_test_query_performances(X_train, y_train, X_test, y_test , rule):

y_train_pred = compute_y_pred_from_query(X_train, rule)
y_test_pred = compute_y_pred_from_query(X_test, rule)

performances = None
performances = pd.concat([
performances,
compute_performances_from_y_pred(y_train, y_train_pred, 'train_set')],
axis=0)
performances = pd.concat([
performances,
compute_performances_from_y_pred(y_test, y_test_pred, ' test_set')],
axis=0)

return(performances)


print('Precision = 0.96 means that 96% of the people determined by the rules are survivors. ')
print('Recall = 0.12 means that the survivors identified by the rule account for 12% of the total number of survivors n')

for i in range(4):
print('Rule ' str (i 1) ':')
display(compute_train_test_query_performances(X_train, y_train,
X_test, y_test,
skope_rules_clf.rules_[i][0])
)

Precision = 0.96 means that 96% of the people determined by the rules are survivors.
Recall = 0.12 means that the survivors identified by the rule account for 12% of the total survivors.

Increase your knowledge! Machine learning with logical rules

Model performance detection

def plot_titanic_scores(y_true, scores_with_line=[], scores_with_points=[],
labels_with_line=['Gradient Boosting', 'Random Forest', 'Decision Tree'],
labels_with_points=['skope-rules']):
gradient = np.linspace(0, 1, 10)
color_list = [ cm .tab10(x) for x in gradient ]

fig, axes = plt.subplots(1, 2, figsize=(12, 5),
sharex=True, sharey=True)
ax = axes[0]
n_line = 0
for i_score, score in enumerate(scores_with_line):
n_line = n_line 1
fpr, tpr, _ = roc_curve(y_true, score)
ax.plot(fpr, tpr, linestyle='-.', c=color_list[i_score], lw=1, label=labels_with_line[i_score])
for i_score, score in enumerate(scores_with_points):
fpr , tpr, _ = roc_curve(y_true, score)
ax.scatter(fpr[:-1], tpr[:-1], c=color_list[n_line i_score], s=10, label=labels_with_points[i_score] )
ax.set_title("ROC", fnotallow=20)
ax.set_xlabel('False Positive Rate', fnotallow=18)
ax.set_ylabel('True Positive Rate (Recall)', fnotallow =18)
ax.legend(loc='lower center', fnotallow=8)

ax = axes[1]
n_line = 0
for i_score, score in enumerate(scores_with_line ):
n_line = n_line 1
precision, recall, _ = precision_recall_curve(y_true, score)
ax.step(recall, precision, linestyle='-.', c=color_list[i_score], lw =1, where='post', label=labels_with_line[i_score])
for i_score, score in enumerate(scores_with_points):
precision, recall, _ = precision_recall_curve(y_true, score)
ax.scatter (recall, precision, c=color_list[n_line i_score], s=10, label=labels_with_points[i_score])
ax.set_title("Precision-Recall", fnotallow=20)
ax.set_xlabel('Recall (True Positive Rate)', fnotallow=18)
ax.set_ylabel('Precision', fnotallow=18)
ax.legend(loc='lower center', fnotallow=8)
plt.show ()

plot_titanic_scores(y_test,
scores_with_line=[gradient_boost_scoring, random_forest_scoring, decision_tree_scoring],
scores_with_points=[skope_rules_scoring]
)

Increase your knowledge! Machine learning with logical rules

On the ROC curve, each red point corresponds to the number of activated rules (from skope-rules). For example, the lowest point is the result point of 1 rule (the best). The second lowest point is the 2 rule result point, and so on.

On the precision-recall curve, the same points are plotted with different coordinate axes. Warning: The first red dot on the left (0% recall, 100% precision) corresponds to 0 rules. The second dot on the left is the first rule, and so on.

Some conclusions can be drawn from this example.

  • skope-rules perform better than decision trees.
  • The performance of skope-rules is similar to random forest/gradient boosting (in this example).
  • Using 4 rules can achieve very good performance (61% recall, 94% precision) (in this example).

n_rule_chosen = 4
y_pred = skope_rules_clf.predict_top_rules(X_test, n_rule_chosen)

print('The performances reached with ' str(n_rule_chosen) ' discovered rules are the following: ')
compute_performances_from_y_pred(y_test, y_pred, 'test_set')

Increase your knowledge! Machine learning with logical rules

predict_top_rules(new_data, n_r) method is used to calculate the prediction of new_data, including the first n_r items skope-rules rules.

The above is the detailed content of Increase your knowledge! Machine learning with logical rules. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:51cto.com
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn