How sklearn trains large-scale data sets - Stack Overflow
typecho
typecho 2017-06-28 09:22:17

Question one:

Now I have more than 400,000 pieces of data, and I need to use some kind of machine learning classification algorithm to build a model for this data. The problem I encountered is that the data is too large and cannot be read at once, so Want to ask how to process data?

Question 2:

I have a question about sklearn cross-validation: If I have 10,000 training data, these 10,000 training data sets can be divided into n groups of training using the KFold method based on the cross-validation principle (train data accounts for 0.7). Now What I don’t understand is that I fit() the training set of the first group, and then performed prediction verification on the test set to get the prediction accuracy. But what is the use of getting the prediction accuracy? Will it affect the next training session? Also, will the last trained model be used in the next fit() function?

typecho
typecho

Following the voice in heart.

reply all(3)
三叔

我最近在学大数据的数据挖掘与分析这一块,对于问题一,我有个思路你参考一下:既然无法一次性读取,可以建立分布式数据模型,分次读取数据,确定地址datanode(可以是某个变量名),建立一个namenode(名字与该地址对应的表),然后获取数据的时候,先在namenode中确认地址(需要的是哪一个变量对应的数据),再访问该地址获取数据进行处理。由于初学,我只是提供下我个人的思路,答案不唯一,仅供参考,各路大牛不喜勿喷。

Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
About us Disclaimer Sitemap
php.cn:Public welfare online PHP training,Help PHP learners grow quickly!