how do I choose between different models?
The process of evaluating a machine learning helps determines how well the model is reliable and effective for its application. This involves assessing different factors such as its performance, metrics and accuracy for predictions or decision making.
No matter what model you choose to use, you need a way to choose between models: different model types, tuning parameters, and features. Also you need a model evaluation procedure to estimate how well a model will generalize to unseen data. Lastly you need an evaluation procedure to pair with your procedure in other to quantify your model performance.
Before we proceed, let's review some of the different model evaluation procedures and how they operate.
From above, we can deduce that:
Training and testing on the same data is a classic cause of overfitting in which you build an overly complex model that won't generalize to new data and that is not actually useful.
Train_Test_Split provides a much better estimate of out-of-sample performance.
K-fold cross-validation does better by systematically K train test splits and averaging the results together.
In summary, train_tests_split is still profitable to cross validation due to its speed and simplicity, and that's what we will use in this tutorial guide.
You will always need an evaluation metric to go along with your chosen procedure, and your choice of metric depends on the problem you are addressing. For classification problems, you can use classification accuracy. But we will focus on other important classification evaluation metrics in this guide.
Before we learn any new evaluation metrics' Lets review the classification accuracy, and talk about its strength and weaknesses.
We've chosen the Pima Indians Diabetes dataset for this tutorial, which includes the health data and diabetes status of 768 patients.
Let's read the data and print the first 5 rows of the data. The label column indicates 1 if the patients has diabetes and 0 if the patients doesn't have diabetes, and we intend to answer the question:
Question: Can we predict the diabetes status of a patient given their health measurements?
We define our features metrics X and response vector Y. We use train_test_split to split X and Y into training and testing set.
Next, we train a logistic regression model on training set. During then fit step, the logreg model object is learning the relationship between the X_train and Y_train. Finally we make a class predictions for the testing sets.
Now , we've made prediction for the testing set, we can calculate the classification accuracy,, which is the simply the percentage of correct predictions.
However, anytime you use classification accuracy as your evaluation metrics, it is important to compare it with Null accuracy, which is the accuracy that could be achieved by always predicting the most frequent class.
Null accuracy answers the question; if my model was to predict the predominant class 100 percent of the time, how often will it be correct? In the scenario above, 32% of the y_test are 1 (ones). In other words, a dumb model that predicts that the patients has diabetes, would be right 68% of the time(which is the zeros).This provides a baseline against which we might want to measure our logistic regression model.
When we compare the Null accuracy of 68% and the model accuracy of 69%, our model doesn't look very good. This demonstrates one weakness of classification accuracy as a model evaluation metric. The classification accuracy doesn't tell us anything about the underlying distribution of the testing test.
In Summary:
Let's now look at the confusion matrix.
The Confusion matrix is a table that describes the performance of a classification model.
It is useful to help you understand the performance of your classifier, but it is not a model evaluation metric; so you can't tell scikit learn to choose the model with the best confusion matrix. However, there are many metrics that can be calculated from the confusion matrix and those can be directly used to choose between models.
Let's explain some of its basic terminologies.
Let’s see how we can calculate the metrics
In Conclusion:
The above is the detailed content of Evaluating A Machine Learning Classification Model. For more information, please follow other related articles on the PHP Chinese website!