python - 在推荐系统、机器学习中，如何将一个完整的数据集划分为训练集和测试集

Question

如题，有没有快速一点的方法，我如果要做多折交叉验证，应该怎么去划分数据集

黄舟 · Answer

Divide into 10 parts on average, cycle 10 times, select 1 part each time as the test set, and 9 parts as the training set

PHP中文网 · Answer

Generally speaking, when doing cross validation, everyone will set k to 5 or 10. In other words, the data is (randomly) divided into k份，其中k-1份为训练，1 parts for testing. But having said that, you have to do cross validation, so it shouldn’t be fast.

PHP中文网 · Answer

You can use 3.1. Cross-validation: evaluating estimator performance

>>> from sklearn.model_selection import cross_val_score
>>> clf = svm.SVC(kernel='linear', C=1)
>>> scores = cross_val_score(clf, iris.data, iris.target, cv=5)
>>> scores                                              
array([ 0.96...,  1.  ...,  0.96...,  0.96...,  1.        ])