Home >Common Problem >What is bootstrap data

What is bootstrap data

藏色散人
藏色散人Original
2019-07-26 10:55:176311browse

What is bootstrap data

What is bootstrap data?

bootstrap data refers to sampling n samples from a total of N samples with replacement.

In statistics, the bootstrap method (Bootstrap Method, Bootstrapping, or bootstrap sampling method) is a uniform sampling with replacement from a given training set, that is, whenever a sample is selected, it may be selected again and added to the training set again.

The self-help method was published by Bradley Efron in "Annals of Statistics" in 1979. When the sample comes from the population and can be described by a normal distribution, its sampling distribution is a normal distribution; but when the sample comes from the population that cannot be described by a normal distribution, it is analyzed by asymptotic analysis, bootstrapping, etc. Use random sampling with replacement. For small data sets, bootstrapping works well.

.632 Bootstrap method

The most commonly used one is the .632 bootstrap method, assuming that the given data set contains d samples. The data set is sampled d times with replacement, producing a training set of d samples. In this way, some samples in the original data samples are likely to appear multiple times in the sample set. The samples that do not enter the training set eventually form the verification set (test set).

Obviously the probability of each sample being selected is 1/d, so the probability of not being selected is (1-1/d). In this way, the probability that a sample does not appear in the training set is that it has not been selected d times. The probability of selection is (1-1/d)d. When d approaches infinity, this probability will approach e-1=0.368, so the samples remaining in the training set account for approximately 63.2% of the original data set.

The above is the detailed content of What is bootstrap data. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn