知識を増やしましょう！論理ルールを使用した機械学習-AI-php.cn

適合率-再現率曲線では、同じ点が異なる座標軸でプロットされます。警告: 左側の最初の赤い点 (再現率 0%、精度 100%) は 0 ルールに対応します。左側の 2 番目のドットが最初のルール、というように続きます。

Skope-rules はツリーモデルを使用してルール候補を生成します。まず、いくつかのデシジョンツリーを構築し、ルートノードから内部ノードまたはリーフノードまでのパスをルール候補として検討します。これらの候補ルールは、適合率や再現率などの事前定義された基準によってフィルタリングされます。精度と再現率がしきい値を超えるものだけが保持されます。最後に、類似性フィルタリングを適用して、十分な多様性を持つルールを選択します。一般に、スコープルールは、各根本原因の基礎となるルールを学習するために適用されます。

知識を増やしましょう！論理ルールを使用した機械学習

プロジェクトアドレス: https://github.com/scikit-learn-contrib/skope-rules

Skope-rules はビルドですscikit-learn 上の Python 機械学習モジュール。3 条項 BSD ライセンスに基づいてリリースされています。
Skope-rules は、ターゲットカテゴリを「定義」するための論理的で解釈可能なルールを学習すること、つまり、そのカテゴリのインスタンスを高精度で検出することを目的としています。
スコープルールは、デシジョンツリーの解釈可能性とランダムフォレストのモデリング機能の間のトレードオフです。

知識を増やしましょう！論理ルールを使用した機械学習

スキーマ

インストール

pipを使用して最新のリソースを取得できます:

pip install skope-rules

クイックスタート

SkopeRules は、論理ルールを使用してクラスを記述するために使用できます。

from sklearn.datasets import load_iris
from skrules import SkopeRules

dataset =load_iris()
feature_names = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
clf = SkopeRules(max_ Depth_duplicatinotallow=2,
n_estimators=30,
precision_min=0.3 ,
recall_min=0.1,
feature_names=feature_names)

for idx、enumerate(dataset.target_names) の種:
X、y = dataset.data、データセット。 target
clf.fit(X, y == idx)
rules = clf.rules_[0:3]
print("アヤメのルール", Species)
ルール内のルール:
print(ルール)
print()
print(20*'=')
print()

知識を増やしましょう！論理ルールを使用した機械学習

注:

次のエラーが表示される場合:

知識を増やしましょう！論理ルールを使用した機械学習

解決策:

Python インポートエラーについて: 名前 'six' を 'sklearn.externals' からインポートできませんYun Duojun on Stack Overflow で同様の質問が見つかりました: https://stackoverflow.com/questions/61867945/

解決策は次のとおりです

import six
import sys
sys.modules['sklearn .externals.six'] = six
import mlrose

個人テストは有効です。

SkopeRules は、「score_top_rules」メソッドを使用する場合、予測子としても使用できます。

from sklearn.datasets import load_boston
from sklearn.metrics import precision_recall_curve
from matplotlib import pyplot as plt
from skrules import SkopeRules

dataset =load_boston()
clf = SkopeRules(max_ Depth_duplicatinotallow=None,
n_estimators=30,
precision_min=0.2,
remember_min=0.01 ,
feature_names=dataset.feature_names)

XX, y = dataset.data, dataset.target > 25
X_train, y_train = X[:len(y)//2], y [:len(y)//2]
#X_test, y_test = X[len(y)//2:], y[len(y)//2:]
clf.fit(X_train, y_train )
y_score = clf.score_top_rules(X_test) # 各テスト例のリスクスコアを取得します
precision, recall, _ = precision_recall_curve(y_test, y_score)
plt.plot(recall, precision)
plt.xlabel('リコール')
plt.ylabel('精度')
plt.title('精度リコール曲線')
plt.show()

知識を増やしましょう！論理ルールを使用した機械学習

実用的なケース

このケースでは、有名なタイタニック号のデータセットに対するスコープルールの使用を示します。

スコープルールの適用性:

2 分類問題の解決
解釈可能な決定ルールの抽出

このケースは 5 つの部分に分かれています

関連ライブラリのインポート
データの準備
モデルのトレーニング (ScopeRules().score_top_rules() メソッドを使用)
「サバイバルルール」の説明 (SkopeRules().rules_property を使用)。
パフォーマンス分析 (SkopeRules.predict_top_rules() メソッドを使用)。

#関連ライブラリのインポート

# skope-rules のインポート

from skrules import SkopeRules

# ライブラリのインポート
import pandas as pd
from sklearn.ensemble import GradientBoostingClassifier、RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
import matplotlib.pyplot as plt
from sklearn.metrics import roc_curve、precision_recall_curve
from matplotlib import cm
import numpy as np
from sklearn.metrics import infection_matrix
from IPython.display import display

# タイタニック号のデータをインポート
data = pd.read_csv('../ data/titanic-train.csv')
データ準備

# 年齢が欠落している行を削除します

data = data.query('Age == Age')
# は変数 Sex はエンコードされた値
data['is Female'] = (data['Sex'] == '女性') * 1
# 変数 Embarked はエンコードされた値
data = pd を作成します。 concat(
[data,
pd.get_dummies(data.loc[:,'Embarked'],
dummy_na=False,
prefix='Embarked',
prefix_sep='_' )],
axis=1
)
# 未使用の変数を削除
data = data.drop(['Name', 'Ticket', 'Cabin',
'PassengerId', ' Sex ', 'Embarked'],
axis = 1)
# トレーニングセットとテストセットを作成します
XX_train, X_test, y_train, y_test = train_test_split(
data.drop(['Survived'], axis =1)、
data['Survived']、
test_size=0.25、random_state=42)
feature_names = X_train.columns

print('列名は次のとおりです: ' ' ' . join(feature_names.tolist()) '.')
print('トレーニングセットの形状は次のとおりです: ' str(X_train.shape) '.')
列名は次のとおりです: Pclass Age SibSp Parch料金

is Female Embarked_C Embarked_Q Embarked_S.
トレーニングセットの形状: (535, 9).
モデルトレーニング

# ベンチマークテスト用の勾配ブースティング分類器をトレーニングします

gradient_boost_clf = GradientBoostingClassifier(random_state=42, n_estimators=30, max_ Depth = 5)
gradient_boost_clf.fit(X_train, y_train)

# ベンチマーク用のランダムフォレスト分類器をトレーニングする
random_forest_clf = RandomForestClassifier(random_state= 42、n_estimators=30、max_ Depth = 5)
random_forest_clf.fit(X_train, y_train)

# ベンチマーク用のデシジョンツリー分類器をトレーニングする
decion_tree_clf = DecisionTreeClassifier(random_state=42, max_ Depth = 5)
decion_tree_clf.fit(X_train, y_train)

# skope-rules ブースティング分類器をトレーニングする
skope_rules_clf = SkopeRules(feature_names=feature_names, random_state= 42, n_estimators=30,
remember_min= 0.05, precision_min=0.9,
max_samples=0.7,
max_ Depth_duplicatinotallow= 4, max_ Depth = 5)
skope_rules_clf.fit(X_train, y_train)

# 予測スコアの計算
gradient_boost_scoring = gradient_boost_clf.predict_proba(X_test)[:, 1]
random_forest_scoring = random_forest_clf.predict_proba(X_test)[:, 1]
decion_tree_scoring = Decision_tree_clf.predict_proba( X_test)[:, 1]

skope_rules_scoring = skope_rules_clf.score_top_rules(X_test)
「生存ルール」の抽出

作成された生存ルールの数を取得

print("SkopeRules で作成" str(len( skope_rules_clf.rules_)) "rules n")

# Print these rules
rules_explanations = [
"3 歳未満および 37 歳未満、第 1 クラスまたは第 2 クラスの女性。 "
"1 等または 2 等で旅行し、26 ユーロ以上を支払う 3 歳以上の女性。 "
"1 等または 2 等で旅行し、29 ユーロ以上支払う女性。 "
"1 等または 2 等で旅行する 39 歳以上の女性。 "
]
print('最も優れた 4 つの「タイタニック生存ルール」は次のとおりです:/n')
for i_rule, rules in enumerate(skope_rules_clf.rules_[:4] )
print(rule[0])
print('->' rules_explanations[i_rule] 'n')
9 個のルールが SkopeRules を使用して作成されました。

そのうち上位 4 個は "タイタニックサバイバルルール」は次のとおりです:

年齢 2.5
および Pclass 0.5
-> 3 歳未満の女性および37 歳未満、1 等または 2 等

年齢 > 2.5、運賃 > 26.125
、Pclass 0.5
-> 3 歳以上の女性1 等または 2 等で旅行し、26 ユーロ以上支払う老人。

運賃 > 29.356250762939453
and Pclass 0.5
-> 1 等または 2 等に乗車し、29 ユーロ以上支払う女性。

年齢 > 38.5 かつ Pclass and is女性 > 0.5
-> 39 歳以上で、1 等または 2 等で旅行する女性。

def compute_y_pred_from_query(X, ルール):
score = np.zeros(X.shape[0])#XX = X.reset_index(drop=True)
score[list( X.query(rule).index)] = 1
return(score)

def compute_performances_from_y_pred(y_true, y_pred,index_name='default_index'):
df = pd.DataFrame(data=
{
'精度':[sum(y_true * y_pred)/sum(y_pred)],
'再現率':[sum(y_true * y_pred)/sum(y_true)]
},
index=[index_name],
columns=['precision', 'recall']
)
return(df)

def compute_train_test_query_performances(X_train, y_train, X_test, y_test) 、ルール):

y_train_pred = compute_y_pred_from_query(X_train, ルール)
y_test_pred = compute_y_pred_from_query(X_test, ルール)

パフォーマンス = なし
パフォーマンス = pd.concat([
パフォーマンス,
compute_パフォーマンス_from_y_pred(y_train, y_train_pred, 'train_set')],
axis=0)
パフォーマンス = pd.concat([
パフォーマンス,
compute_performances_from_y_pred(y_test, y_test_pred, ' test_set')],
axis=0)

return(パフォーマンス)

print('Precision = 0.96 は、ルールによって決定された人々の 96% が生存者であることを意味します。 ')
print('Recall = 0.12 は、ルールによって特定された生存者が生存者の総数 n の 12% を占めることを意味します')

for i in range(4):
print ('ルール ' str (i 1) ':')
display(compute_train_test_query_performances(X_train, y_train,#XX_test, y_test,
skope_rules_clf.rules_[i][0])
)

精度 = 0.96 は、ルールによって決定された人々の 96% が生存者であることを意味します。 Recall = 0.12 は、ルールによって特定された生存者が全生存者の 12% を占めることを意味します。

知識を増やしましょう！論理ルールを使用した機械学習モデルパフォーマンスの検出

def Lot_titanic_scores(y_true,scores_with_line=[],scores_with_points=[],

labels_with_line=['Gradientブースティング'、'ランダムフォレスト'、'デシジョンツリー'],

labels_with_points=['skope-rules']):
gradient = np.linspace(0, 1, 10)
color_list = [ cm .tab10(x) for x (勾配の x ) ]

fig, axes = plt.subplots(1, 2, figsize=(12, 5),
sharex=True, sharey=True)
ax = axes[0]
n_line = 0
i_score、enumerate(scores_with_line) のスコア:
n_line = n_line 1
fpr, tpr, _ = roc_curve(y_true, スコア)
ax.plot(fpr, tpr, linestyle='-.', c=color_list[i_score], lw=1, label=labels_with_line[i_score])
i_score の場合、enumerate(scores_with_points):
fpr のスコア、tpr、_ = roc_curve(y_true, スコア)
ax.scatter(fpr[:-1], tpr[:-1], c=color_list[n_line i_score], s=10, label=labels_with_points[i_score] )
ax.set_title("ROC", fnotallow=20)
ax.set_xlabel('偽陽性率', fnotallow=18)
ax.set_ylabel('真陽性率 (リコール)', fnotallow =18)
ax.legend(loc='下中央', fnotallow=8)

ax = axes[1]
n_line = 0
i_score の場合、enumerate のスコア(scores_with_line) ):
n_line = n_line 1
精度、リコール、_ = precision_recall_curve(y_true、スコア)
ax.step(リコール、精度、linestyle='-.'、c=color_list[i_score]、lw =1、where='post'、label=labels_with_line[i_score])
i_score の場合、enumerate(scores_with_points) のスコア:
precision、recall、_ = precision_recall_curve(y_true、score)
ax.scatter (リコール, 精度, c=color_list[n_line i_score], s=10, label=labels_with_points[i_score])
ax.set_title("精度-リコール", fnotallow=20)
ax.set_xlabel('リコール(真陽性率)', fnotallow=18)
ax.set_ylabel('精度', fnotallow=18)
ax.legend(loc='中央下', fnotallow=8)
plt.show ()

plot_titanic_scores(y_test,
scores_with_line=[gradient_boost_scoring,random_forest_scoring,decion_tree_scoring],
scores_with_points=[skope_rules_scoring]
)

##ROC 曲線上で、各赤い点は、(skope-rules からの) アクティブ化されたルールの数に対応します。たとえば、最低点は 1 つのルールの結果ポイント (最高) です。 2 番目に低いポイントは 2 ルール結果ポイント、以下同様です。知識を増やしましょう！論理ルールを使用した機械学習

この例からいくつかの結論を導き出すことができます。

skope-rules はデシジョンツリーよりも優れたパフォーマンスを発揮します。
skope-rules のパフォーマンスは、ランダムフォレスト/勾配ブースティング (この例では) と似ています。
4 つのルールを使用すると、非常に優れたパフォーマンス (再現率 61%、精度 94%) を達成できます (この例では)。

n_rule_chosen = 4
y_pred = skope_rules_clf.predict_top_rules(X_test, n_rule_chosen)

print('「 str(n_rule_chosen) 」で検出されたルールで到達したパフォーマンスは次のとおりです。 ')
compute_performances_from_y_pred(y_test, y_pred, 'test_set')

知識を増やしましょう！論理ルールを使用した機械学習