The content of this article is about the calculation of simple statistics in Python. It has certain reference value. Friends in need can refer to it. I hope it will be helpful to you.
1. These operations must ensure that the Anaconda integrated library has been installed on the computer. If an error occurs after installation, you can uninstall python in the original computer and reinstall Anaconda. It is recommended to install it during installation. Directly check Add environment variables, otherwise you will have to add environment variables yourself in the future. In the compiler in Pycharm, select python in the Anaconda installation folder. Create a new data folder in Pycharm to store data files.
2. Open the Python Console.
3. First use python to read the data. You need to first enter import pandas as pd to introduce the pandas package, then enter df=pd.read_csv("./data/CityData.csv") to read the data, and finally Enter df to display data.
4. Enter type(df) and type(df["cid"]) respectively to find that the two data types are different.
##5. Calculate the average : df.mean() or df["xid"].mean()
6. Calculate the median: Enter df.median( ) or df["yid"].median
7. Find the quartiles: enter df .quantile(q=0.25)
8. Find the mode: enter df.mode() or df["xid"].mode( )9. Find the standard deviation: enter df.std() or df["yid"].std()
10. Calculate variance: df.var() or df["xid"].var()
11. Sum: df. sum() or df["xid"].sum()
12. Calculate the skew coefficient: df.skew() or df[ "yid"].skew()
13. Calculate kurtosis coefficient: df.kurt() or df["yid"].kurt ()
14. Generate a normal distribution function. Pandas cannot generate it directly. You need to introduce scipyimport scipy.stats as ss first, and then enter ss. norm, what is generated at this time is a normal distribution object. We enter ss.norm.stats(moments="mvsk") to check. mvsk represents the mean, variance, skewness coefficient, and kurtosis coefficient respectively.
At this time we can see that four values are generated, corresponding to the mvsk of the normal distribution, which are 0, 1, 0, and 0 respectively. 15.ss.norm.pdf(0.0) represents the value of the ordinate when the abscissa is 0. ss.norm.ppf(0.9) means that the value obtained when accumulating from negative infinity to the return value is 0.9, where the value after ppf must be between 0-1. ss.norm.cdf(2) represents the return value when integrating from negative infinity to 2, and ss.norm.rvs(size=10) can obtain 10 random numbers that conform to the normal distribution.
16.Similarly, we can input ss.chi2 and ss.t to get the chi-square distribution and T distribution respectively.
17. In addition, we can also perform sampling, enter df.sample(n=10) to extract 10 samples from the data, enter df. sample(frac=0.1) takes a 10% sample from the data.
The above is the detailed content of Calculation of simple statistics in Python. For more information, please follow other related articles on the PHP Chinese website!