SPAM、ベイジアン、中国語 4 - CakePHP へのベイジアンアルゴリズムの統合-PHPチュートリアル-php.cn

SPAM、ベイジアン、中国語 4 - CakePHP へのベイジアンアルゴリズムの統合

WBOY

リリース： 2016-06-13 13:25:42

オリジナル

951 人が閲覧しました

SPAM、ベイジアン、中国語 4 - CakePHP へのベイジアンアルゴリズムの統合

上記のでは、ベイジアンアルゴリズムのいくつかのオープンソース実装について説明しました。この記事では、b8 と呼ばれるオープンソース実装の 1 つを CakePHP に統合する方法について説明します。

b8 をダウンロードしてインストールします

b8 Web サイトにアクセスして最新バージョンをダウンロードし、vendors ディレクトリに解凍します。ファイルの場所は、vendors/b8/b8.php などです。
vendors/b8/etc/config_storage をテキストエディタで開き、tableName をキーワードの保存に使用するデータテーブルの名前に変更し、createDB を TRUE に変更することに注意してください。初めて b8 を実行すると、上記のデータテーブルを作成してから、createDB を再度 FALSE に変更する必要があります。
b8 のラッパーコンポーネントを作成します

Cake が b8 を呼び出すためには、コンポーネントを記述する必要があります。 controllers/components/ に新しい spam_shield.php を作成し、次のコードを追加します:

class SpamShieldComponent extends Object {

??? /** * b8 instance?*/

??? var $b8;

??? /** * standard rating * * comments with ratings which are higher than this one will be considered as SPAM?*/

??? var $standardRating = 0.7;

??? /** * text to be classified */

??? var $text;

??? /** * rating of the text */

??? var $rating;

??? /** * Constructor * * @date 2009-1-20 */

??? function startup(&$controller) {

??????? //register a CommentModel to get the DBO resource link

??????? $comment = ClassRegistry::init('Comment'); //import b8 and create an instance????

?????? ?App::import('Vendor', 'b8/b8');

?????? ?$this->b8 = new b8($comment->getDBOResourceLink()); //set standard rating???

?????? ?$this->standardRating = Configure::read('LT.bayesRating') ? Configure::read('LT.bayesRating') : $this->standardRating;

???? }

??? /** * Set the text to be classified * * @param $text String the text to be classified * @date 2009-1-20 */

??? function set($text) {

??????? $this->text = $text;

???? }

??? /** * Get Bayesian rating * * @date 2009-1-20 */

??? function rate() {

?????? ?//get Bayes rating and return return

?????? ?$this->rating = $this->b8->classify($this->text);

???? }

??? /** * Validate a message based on the rating, return true if it's NOT a SPAM * * @date 2009-1-20 */

??? function validate() {

??????? return $this->rate() < $this->standardRating;

???? }

??? /** * Learn a SPAM or a HAM * * @date 2009-1-20 */

??? function learn($mode) {

?????? ?$this->b8->learn($this->text, $mode);

???? }

??? /** * Unlearn a SPAM or a HAM * * @date 2009-1-20 */

??? function unlearn($mode) {

?????? ?$this->b8->unlearn($this->text, $mode);

??? }

} いくつかのメモ:

$standardRating は重要なポイントです。ベイズ確率がこの値より高い場合、メッセージはスパムとみなされ、そうでない場合はハムと見なされます。私は 0.7 に設定しましたが、状況に応じて変更できます。

b8 はデータテーブルを操作できるようにデータベースハンドルを取得する必要があるため、startup() で $this->b8 = new b8($comment->getDBOResourceLink()) と書きました。が使用されます () については後ほど説明します。

/** * get the resource link of MySQL connection */ public function getDBOResourceLink() { return $this->getDataSource()->connection; }

この時点で、すべての準備が完了し、最終的にベイジアンアルゴリズムを使用してメッセージを分類できるようになります。

b8 カテゴリを使用してメッセージを残してください

controllers/comments_controller.php で、まず SpamShieldComponent をロードします:

var $components = array('SpamShield');

次に、add() メソッドで次の操作を実行します。

//set data for Bayesian validation

$this->SpamShield->set($this->data['Comment']['body']); //validate the comment with Bayesian

if(!$this->SpamShield->validate()) { //set the status

??? $this->data['Comment']['status'] = 'spam'; //save

??? $this->Comment->save($this->data); //learn it $this->SpamShield->learn("spam"); //render

??? $this->renderView('unmoderated');

??? return;

}

//it's a normal post

$this->data['Comment']['status'] = 'published'; //save for publish

$this->Comment->save($this->data); //learn it

$this->SpamShield->learn("ham");