Python Data Analysis Practical Overview Data Analysis-Python Tutorial-php.cn

Python TutorialThe column introduces the overview data.

Python Data Analysis Practical Overview Data Analysis

Recommended (free): Python Tutorial

Article Directory

1. Introduction to data analysis
- 1. Fundamentals of the big data era
- 2. Career prospects of data analysts
- 3. The road to becoming a data analyst
2. Python installation and environment configuration
- 1.Python version
- 2. Install Python on different systems
- 3. Environment variable configuration
- 4. Install pip
- 5. Integrated development environment selection
3. Introduction and installation of Anaconda
- 1.What is Anaconda
- 2. Download and install Anaconda
- 3.conda Introduction to tools and package management
4. Jupyter Notebook
- 2.Jupyter Notebook Usage
- 3. Using Python in Jupyter
- 4. Data interaction case
- - Use Jupyter to process store data

##1. Introductory data analysis

1. Fundamentals of the big data era

The development status of the big data industry:

Now data has shown

explosive
growth, and there may be 100% of data every minute :

13,000 iPhone apps downloaded

98,000 new Weibo posts posted on Twitter
168 million emails sent
Taobao Double Eleven 10,680 New orders
12306 issued 1840 tickets
In the era of big data, three major changes have occurred:

From random samples to full data

From accuracy to confounding
From causation to correlation
Give a typical example:

Men will buy some diapers when they go to the supermarket. Beer, the results of big data analysis have prompted supermarkets to put some beer near the diaper shelves, thereby increasing sales. There is no causal relationship between buying diapers and buying beer, but there is a certain correlation.

The domestic big data application status is as follows (from CSDN):

Python Data Analysis Practical Overview Data Analysis It can be seen that the application of big data has reached a certain scale, but there is still a lot of room for development. .

Talent needs mainly include:

Data Analyst

Statistical Analysis
- Predictive Analysis
- Process Optimization
Big Data Engineer
Platform Development
- Application Development
- Technical Support
Data Architect
Business Understanding
- Application Deployment
- Architecture Design

2. Data analyst career prospects

Problems that data analysts need to solve:

Estimated demand, allocation Production Capacity

In the era of big data, the ability to interpret data is even more needed.

A: List the most popular breads and give priority to the production of
star products
.
The key is to find the star product, which requires counting the total turnover of bread, and then calculating the relative proportion of each type of bread to the total turnover, and giving priority to the production of product combinations that can account for 70% of the turnover. This will use the statistical distribution table and histogram. This analysis method is also called the ABC analysis method, as follows:

Evaluate the effectiveness of the marketing plan
Statistics is not just about analyzing data. The key is to infer how to influence customer behavior from the analysis results, and formulate a specific
, and act accordingly.
Q: If you want to sell bread online, which kind of advertising is more effective?
A: Write two types of copywriting and advertise them for a period of time to see how effective they are. To compare advertising effectiveness, the best way is to use statistical randomized controlled experiments
, where two types of advertising appear randomly. After a period of time, observe which advertising effect is better, and then use it on a large scale. Advertising that is more effective.

Product Quality Control
It is very important to discover the relationship between the results and the causes of the results.
A: Randomly check a few loaves and use a scale to see if the weight difference is too large.
You need to know the average weight of the bread first, and then sample the bread to see if the weight of the bread shows a bell-shaped curve with a normal distribution? If it deviates from the curve, it may indicate a problem with the quality of the bread. as follows:

A good data analyst is a good product planner and a leader in the industry;
In IT companies, excellent data analysts are very promising Become a senior member of the company.

The workflow of a data analyst is as follows:
Python Data Analysis Practical Overview Data Analysis

The three major tasks of a data analyst:

Analyzing history
Predict the future
Optimization selection

8 skills required by data analysts:

Statistics
- Statistical testing, P Values, Distributions, Estimation
Basic Tools
- Python
- SQL
Multivariable Calculus Sum Linear algebra
Data sorting
Data visualization
Software engineering
Machine learning
The thinking of a data scientist
- Data-driven
- Problem solving

Three major abilities required by data analysts:

Statistical foundation and application of analytical tools
Computer Coding Ability
Knowledge of specific application areas or industries

Typical Data Analyst Growth History:
Python Data Analysis Practical Overview Data Analysis

3. The road to becoming a data analyst

Self-cultivation to become a data analyst:

Sensitive
Exploration
Detailed
Pragmatic

The skills that data analysts need to possess are as follows:

Familiar with Excel data processing
Data sensitive Strong degree
Familiar with company business and industry knowledge
Master data analysis methods
- Basic analysis methods
  - Contrastive analysis method
  - Group analysis method
  - Cross analysis method
  - Structural analysis method
  - Funnel plot analysis method
  - Comprehensive evaluation analysis method
  - Factor analysis method
  - Matrix correlation analysis
- ##Advanced analysis method
  - Regression analysis method
  - Cluster analysis method
  - Discriminant analysis method
  - Principal component analysis method
  - Correspondence analysis method
  - Time series

Practice in data analysis in different industries Job content and responsibilities of personnel:

- Daily sales and inventory tables
- Products Sales forecast
- Inventory calculation and early warning
- Traffic analysis related tables
- Review
- Verify product improvement effects
- Provide emails and reports for senior management
- Various periodic reports
- Analysis reports for a certain business issue
- Offline modeling and analysis for the business

The very important subject foundation for data analysis is mathematics, but it doesn’t matter if you are not good at mathematics. You can use

Python to help you learn: Python is not only a programming language , and is the basis of data mining machine learning and other technologies, which facilitates the establishment of automated workflows;
It is not difficult to get started with Python, and its mathematical requirements are not too high. The important thing is to know how to express an algorithmic logic in language;
Python has many encapsulated tool libraries and commands. What needs to be done is to use mathematical methods to solve a problem and build it.

If you want to quickly get started with Python data analysis, you must make good use of Python related toolkits:

(1) The biggest feature of Python is that it has a huge and active
Scientific Computing community , the trend of using python for scientific computing is becoming more and more obvious. (2) Because Python has continuously improved libraries, it has become a major alternative for data processing tasks. Combined with its strong strength in general programming, you can just use Python as a language to build data-based applications. Central applications, including:

- Scipy
- Pandas
- matplotlib
- igraph
- scikit-learn

(3) As a scientific computing platform, Python can easily integrate C, C and Fortran codes.

Preparation for data analysis:

Data cleaning and preliminary analysis
Drawing and visualization
Data Aggregation and grouping processing
Data mining

Commonly used algorithms for data analysis and data mining:

Time series analysis
Classification algorithm
Clustering algorithm
Dimensionality reduction algorithm

The method of learning and engaging in data analysis is:

Do more
Summary

## 2. Python installation and environment configuration

1.Python version

Python is divided into two major versions: 3.X and 2.X.
Version 3.0 of Python is often called Python 3000, or Py3k for short. This is a major upgrade compared to earlier versions of Python.
In order not to bring too much burden, Python 3.X was not designed with downward compatibility in mind. Many programs designed for earlier Python versions cannot run normally on Python 3.X.
Most third-party libraries are working hard to be compatible with Python 3.X versions.

2. Install Python on different systems

(1)Unix & Linux system

Visit http://www.python.org /download/
Select the source code compressed package suitable for Unix/Linux
Download and decompress the compressed package
If you need to customize some options, modify Modules/Setup
Execute./configurescript
make
##make install

Command meaning	conda command
conda –h	View help
Create an environment named python36 based on python3.6 version	conda create - -name python36 python=3.6
Activate this environment	activate python36 (Windows), source activate python36 (linux/mac)
View python version	python -V
Exit the current environment	deactivate python36
Delete the environment	conda remove -n py27 --all
View all installed environments	conda info -e

Package management command meaning	Package management command
Install matplotlib	conda install matplotlib
View installed packages	conda list
Package update	conda update matplotlib
Remove package	conda remove matplotlib

Operation	Command
Update conda itself	conda update conda
Update anaconda application	conda update anaconda
Update python, assuming the current python environment is 3.8.1 and the latest version is 3.8.2, then it will be upgraded to 3.8.2	conda update python

(2) Window system
Visit http://www.python.org/download/
Select the Window platform installation package in the download list
Since the download from the official website is very slow Slow, so I have downloaded and organized the installation packages for each version of Python. You can directly click to join the QQ group
963624318 Just download it from the group folder Python related installation package.
After downloading, double-click the download package to enter the Python installation wizard. The installation is very simple. Just use the default settings and click
Next until the installation is completed.
(3) Mac system
comes with python 2.7, you can execute
brew install python to install the new version.
3. Environment variable configuration
Windows systems need to configure environment variables.
If you did not choose to add environment variables when installing Python, you need to add them manually. You need to add the path to install Python
XXX\PythonXXX and XXX\PythonXXX\Scripts To the environment variables, there are two ways:
Command line addition
Execute
path=%path%;XXX\PythonXXX and path=% respectively in CMD path%;XXX\PythonXXX\Scripts is enough.
Add
in the system settings. Right-click the computer → Properties → Advanced system settings → System properties → Environment variables → Double-click path → Add
XXX\PythonXXX and XXX\PythonXXX\ ScriptsInstallation path is as follows:
Finally click Confirm to exit.
4. Install pip
pip is a package installation and management tool in Python. You can choose to install pip when installing Python. In Python 2 >=2.7. 9 or Python 3 >=3.4.
If pip is not installed, you can install it through the command:
Linux or Mac

pip install -U pip
Windows（ cmd input)

python -m pip install -U pip
5. Integrated development environment selection
There are many Python Editors, including PyCharm, etc., choose PyCharm here:
PyCharm is a Python IDE created by JetBrains, supporting Mac OS, Windows, and Linux systems.
Includes
debugging, syntax highlighting, Project management, code jump, smart prompts, auto-complete, unit testing, version control and other functions.
You can choose the appropriate version to download and install at https://www.jetbrains.com/pycharm/download/.
3. Introduction and installation of Anaconda
1.What is Anaconda
Anaconda is a tool that can be used
The Python distribution of Scientific Computing supports Linux, Mac, and Windows systems, and has built-in commonly used scientific computing libraries. It solves two major pain points of official Python:
(1) Provides package management function, solving the scenario where third-party package installation on Windows platform often fails;
(2) Provides environment management function, The function is similar to virtualenv, which solves the problem of coexistence and switching of multiple versions of Python.
2. Download and install Anaconda
Download the installation package directly from the official website https://www.anaconda.com/products/inpidual and select download
Python3 The installation package of .8Personal Edition is enough, but the download speed from the official website is slow, so I have downloaded and sorted out the Anaconda installation package corresponding to Python 3.8. You can directly click to add the QQ group 963624318 Just download it from the group folder Python related installation package.
Install directly after downloading. Please note that during the click process, a prompt to add environment variables will appear. You need to check it, as follows:

Finally click Next. After the installation is complete, click the Win key (under Windows system) to see the recently added or application list A as shown below:

At this time, you can click Anaconda Navigator, as shown below:
You can see that the environment is Python 3.8.3, and the basic environment created by Anaconda is named base. It is also the default environment, and you can also see the libraries installed by default.
Open the Anaconda command line tool Anaconda Powershell Prompt, enter python -V, and Python 3.8.3 will also be printed.
You can also create a new conda environment through commands, such as conda create --name py27 python=2.7After execution, a conda environment with a Python version of 2.7 named py27 will be created.
Activate the environment and execute the command conda activate py27, and deactivate the command conda deactivate.
You can execute conda list on the command line to view the installed libraries, as follows:
# packages in environment at E:\Anaconda3: # # Name                    Version                   Build  Channel _ipyw_jlab_nb_ext_conf    0.1.0                    py38_0 alabaster                 0.7.12                     py_0 anaconda                  2020.07                  py38_0 anaconda-client           1.7.2                    py38_0 anaconda-navigator        1.9.12                   py38_0 ... zlib                      1.2.11               h62dcd97_4 zope                      1.0                      py38_1 zope.event                4.4                      py38_0 zope.interface            4.7.1            py38he774522_0 zstd                      1.4.5                ha9fde0e_0
Copy after login
3. Introduction to the conda tool and package management
conda is a tool for package management and environment management under Anaconda. Its function is similar to the combination of pip and virtualenv. Conda’s environment management is basically the same as virtualenv. It's a similar operation.
After successful installation, conda will be added to the environment variables by default, so you can run the conda command directly in the command line window.
Common conda commands and their meanings are as follows:
Command meaning conda command
conda –h View help
Create an environment named python36 based on python3.6 version conda create - -name python36 python=3.6
Activate this environment activate python36 (Windows), source activate python36 (linux/mac)
View python version python -V
Exit the current environment deactivate python36
Delete the environment conda remove -n py27 --all
View all installed environments conda info -e
Common conda package management commands are as follows:
Package management command meaning Package management command
Install matplotlib conda install matplotlib
View installed packages conda list
Package update conda update matplotlib
Remove package conda remove matplotlib
In conda, anything is a package, everything is a package, conda itself can be regarded as a package, the python environment can be regarded as a package, anaconda can also be regarded as It is a package, so in addition to ordinary third-party packages supporting updates, these 3 packages also support the following commands:
Operation Command
Update conda itself conda update conda
Update anaconda application conda update anaconda
Update python, assuming the current python environment is 3.8.1 and the latest version is 3.8.2, then it will be upgraded to 3.8.2 conda update python
四、Jupyter Notebook
1.Jupyter Notebook基本介绍
Jupyter Notebook（此前被称为IPython notebook）是一个交互式笔记本，支持运行40多种编程语言。
在开始使用notebook之前，需要先安装该库：
（1）在命令行中执行pip install jupyter来安装；
（2）安装Anaconda后自带Jupyter Notebook。
在命令行中执行jupyter notebook，就会在当前目录下启动Jupyter服务并使用默认浏览器打开页面，还可以复制链接到其他浏览器中打开，如下：
可以看到，notebook界面由以下部分组成：
（1）notebook名称；
（2）主工具栏，提供了保存、导出、重载notebook，以及重启内核等选项；
（3）notebook主要区域，包含了notebook的内容编辑区。
2.Jupyter Notebook的使用
在Jupyter页面下方的主要区域，由被称为单元格的部分组成。每个notebook由多个单元格构成，而每个单元格又可以有不同的用途。
上图中看到的是一个代码单元格（code cell），以[ ]开头，在这种类型的单元格中，可以输入任意代码并执行。
例如，输入1 + 2并按下Shift + Enter，单元格中的代码就会被计算，光标也会被移动到一个新的单元格中。
如果想新建一个notebook，只需要点击New，选择希望启动的notebook类型即可。
简单使用示意如下：
可以看到，notebook可以修改之前的单元格，对其重新计算，这样就可以更新整个文档了。如果你不想重新运行整个脚本，只想用不同的参数测试某个程式的话，这个特性显得尤其强大。
不过，也可以重新计算整个notebook，只要点击Cell -> Run all即可。
再测试标题和其他代码如下：
可以看到，在顶部添加了一个notebook的标题，还可以执行for循环等语句。
3.Jupyter中使用Python
Jupyter测试Python变量和数据类型如下：
测试Python函数如下：
测试Python模块如下：
可以看到，在执行出错时，也会抛出异常。
测试数据读写如下：
数据读写很重要，因为进行数据分析时必须先读取数据，进行数据处理后也要进行保存。
4.数据交互案例
加载csv数据，处理数据，保存到MongoDB数据库
有csv文件Python Data Analysis Practical Overview Data Analysis.csv和Python Data Analysis Practical Overview Data Analysis.csv，分别是商品数据和用户评分数据，如下：

如需获取数据、代码等相关文件进行测试学习，可以直接点击加QQ群 963624318 在群文件夹Python数据分析实战中下载即可。
现在需要通过Python将其读取出来，并将指定的字段保存到MongoDB中，需要在Anaconda中执行命令conda install pymongo安装pymongo。
Python代码如下：
import pymongoclass Product:     def __init__(self,productId:int ,name, imageUrl, categories, tags):         self.productId = productId         self.name = name         self.imageUrl = imageUrl         self.categories = categories         self.tags = tags    def __str__(self) -> str:         return self.productId +'^' + self.name +'^' + self.imageUrl +'^' + self.categories +'^' + self.tagsclass Rating:     def __init__(self, userId:int, productId:int, score:float, timestamp:int):         self.userId = userId         self.productId = productId         self.score = score         self.timestamp = timestamp    def __str__(self) -> str:         return self.userId +'^' + self.productId +'^' + self.score +'^' + self.timestampif __name__ == '__main__':     myclient = pymongo.MongoClient("mongodb://127.0.0.1:27017/")     mydb = myclient["goods-users"]     # val attr = item.split("\\^")     # // 转换成Product     # Product(attr(0).toInt, attr(1).trim, attr(4).trim, attr(5).trim, attr(6).trim)     Python Data Analysis Practical Overview Data Analysis = mydb['Python Data Analysis Practical Overview Data Analysis']     with open('Python Data Analysis Practical Overview Data Analysis.csv', 'r',encoding='UTF-8') as f:         item = f.readline()         while item:             attr = item.split('^')             product = Product(int(attr[0]), attr[1].strip(), attr[4].strip(), attr[5].strip(), attr[6].strip())             Python Data Analysis Practical Overview Data Analysis.insert_one(product.__dict__)             # print(product)             # print(json.dumps(obj=product.__dict__,ensure_ascii=False))             item = f.readline()     # val attr = item.split(",")     # Rating(attr(0).toInt, attr(1).toInt, attr(2).toDouble, attr(3).toInt)     Python Data Analysis Practical Overview Data Analysis = mydb['Python Data Analysis Practical Overview Data Analysis']     with open('Python Data Analysis Practical Overview Data Analysis.csv', 'r',encoding='UTF-8') as f:         item = f.readline()         while item:             attr = item.split(',')             rating = Rating(int(attr[0]), int(attr[1].strip()), float(attr[2].strip()), int(attr[3].strip()))             Python Data Analysis Practical Overview Data Analysis.insert_one(rating.__dict__)             # print(rating)             item = f.readline()
Copy after login
在启动MongoDB服务后，运行Python代码，运行完成后，再通过Robo 3T查看数据库如下：
显然，保存数据成功。
使用Jupyter处理商铺数据
待处理的数据是商铺数据，如下：
包括名称、评论数、价格、地址、评分列表等，其中评论数、价格和评分均不规则、需要进行数据清洗。
如需获取数据、代码等相关文件进行测试学习，可以直接点击加QQ群 963624318 在群文件夹Python数据分析实战中下载即可。
Jupyter中处理如下：
可以看到，最后得到了经过清洗后的规则数据。
完整Python代码如下：
# 数据读取f = open('商铺数据.csv', 'r', encoding='utf8')for i in f.readlines()[1:15]:     print(i.split(','))# 创建comment、price、commentlist清洗函数def fcomment(s):     '''comment清洗函数：用空格分段，选取结果list的第一个为点评数，并且转化为整型'''     if '条' in s:         return int(s.split(' ')[0])     else:         return '缺失数据'def fprice(s):     '''price清洗函数：用￥分段，选取结果list的最后一个为人均价格，并且转化为浮点型'''     if '￥' in s:         return float(s.split('￥')[-1])     else:         return '缺失数据'def fcommentl(s):     '''commentlist清洗函数：用空格分段，分别清洗出质量、环境及服务数据，并转化为浮点型'''     if ' ' in s:         quality = float(s.split('                                ')[0][2:])         environment = float(s.split('                                ')[1][2:])         service = float(s.split('                                ')[2][2:-1])         return [quality, environment, service]     else:         return '缺失数据'# 数据处理清洗datalist = []  # 创建空列表f.seek(0)n = 0  # 创建计数变量for i in f.readlines():     data = i.split(',')     # print(data)     classify = data[0]  # 提取分类     name = data[1]  # 提取店铺名称     comment_count = fcomment(data[2])  # 提取评论数量     star = data[3]  # 提取星级     price = fprice(data[4])  # 提取人均     address = data[5]  # 提取地址     quality = fcommentl(data[6])[0]  # 提取质量评分     env = fcommentl(data[6])[1]  # 提取环境评分     service = fcommentl(data[6])[2]  # 提取服务评分     if '缺失数据' not in [comment_count, price, quality]:  # 用于判断是否有数据缺失         n += 1         data_re = [['classify', classify],                    ['name', name],                    ['comment_count', comment_count],                    ['star', star],                    ['price', price],                    ['address', address],                    ['quality', quality],                    ['environment', env],                    ['service', service]]         datalist.append(dict(data_re))  # 字典生成，并存入列表datalist         print('成功加载%i条数据' % n)     else:         continueprint(datalist)print('总共加载%i条数据' % n)f.close()
Copy after login
更多编程相关知识，请访问：编程教学！！

The above is the detailed content of Python Data Analysis Practical Overview Data Analysis. For more information, please follow other related articles on the PHP Chinese website!