Hello everyone! I am Brother Tiger.
As data analysts, we need to often make statistical analysis charts. But when there are too many reports, it often takes us most of our time to create them. This delayed us from using a lot of time to conduct data analysis. But as data analysts, we should try our best to dig out the relevant information hidden behind the data in tables and charts, instead of simply making statistical tables and charts and then sending reports.
Automation can always save time and improve our work efficiency. Let our programming reduce the coupling of each function implementation code as much as possible and better maintain the code. This will save us a lot of time and free us up to do more valuable and meaningful work.
If the coding effect is correct, it can be used forever. If it is done manually, some mistakes may be made. It is more reassuring to leave it to a fixed program. When the requirements change, only part of the code can be modified to solve the problem.
First of all, we need to formulate the reports we need according to business needs. Not every report needs to be automated. Some complex secondary development indicator data It is relatively complicated to realize automated programming, and various bugs may be hidden. Therefore, we need to summarize the characteristics of the reports we use in our work. The following are several aspects that we need to comprehensively consider:
It is often used in some businesses Tables that we may want to include in the scope of automated procedures. For example, customer information list, sales flow report, business loss report, month-on-month and year-on-year reports, etc.
It is necessary to automate these frequently used reports. For those reports that need to be used occasionally, or for secondary development indicators, or reports that need to copy statistics, there is no need to automate these reports.
This is equivalent to cost and interest rate. If it is difficult to automate some reports and exceeds the time required for our ordinary statistical analysis, there is no need to automate it. . Therefore, when starting automation work, you need to measure whether the time spent developing scripts or the time spent manually making tables is shorter. Of course, I will provide a set of implementation solutions, but only for some commonly used and simple reports.
For each process and step of our report, each company is different. We need to code according to the business scenario to implement the functions of each step. Therefore, the process we create should be consistent with business logic, and the program we create should also be logical.
First we need to know what indicators we need:
Indicators
Reflects the overall size of a certain data indicator
Adjacent Direct difference between indicators within a time period
Comparison of indicators at a common time point within adjacent time periods
We take a simple report to simulate:
First we need to understand where our data comes from, that is, the data source. Our final data processing is converted into DataFrame for analysis, so the data source needs to be converted into DataFrame form:
import pandas as pd import json import pymysql from sqlalchemy import create_engine # 打开数据库连接 conn = pymysql.connect(host='localhost', port=3306, user='root', passwd='xxxx', charset = 'utf8' ) engine=create_engine('mysql+pymysql://root:xxxx@localhost/mysql?charset=utf8') def read_excel(file): df_excel=pd.read_excel(file) return df_excel def read_json(file): with open(file,'r')as json_f: df_json=pd.read_json(json_f) return df_json def read_sql(table): sql_cmd ='SELECT * FROM %s'%table df_sql=pd.read_sql(sql_cmd,engine) return df_sql def read_csv(file): df_csv=pd.read_csv(file) return df_csv
The above codes can be used normally after passing the test, but the read function of pandas targets different For file reading in the form, the read function parameters also have different meanings and need to be adjusted directly according to the form of the table.
Other read functions will be added after the article is written. Except that read_sql needs to connect to the database, the others are relatively simple.
We take user information as an example:
我们需要统计的指标为:
#将城市空值的一行删除 df=df[df['city_num'].notna()] #删除error df=df.drop(df[df['city_num']=='error'].index) #统计df = df.city_num.value_counts()
我们仅获取前10名的城市就好了,封装为饼图:
def pie_chart(df): #将城市空值的一行删除 df=df[df['city_num'].notna()] #删除error df=df.drop(df[df['city_num']=='error'].index) #统计 df = df.city_num.value_counts() df.head(10).plot.pie(subplots=True,figsize=(5, 6),autopct='%.2f%%',radius = 1.2,startangle = 250,legend=False) pie_chart(read_csv('user_info.csv'))
将图表保存起来:
plt.savefig('fig_cat.png')
要是你觉得matplotlib的图片不太美观的话,你也可以换成echarts的图片,会更加好看一些:
pie = Pie() pie.add("",words) pie.set_global_opts(title_opts=opts.TitleOpts(title="前十地区")) #pie.set_series_opts(label_opts=opts.LabelOpts(user_df)) pie.render_notebook()
封装后就可以直接使用了:
def echart_pie(user_df): user_df=user_df[user_df['city_num'].notna()] user_df=user_df.drop(user_df[user_df['city_num']=='error'].index) user_df = user_df.city_num.value_counts() name=user_df.head(10).index.tolist() value=user_df.head(10).values.tolist() words=list(zip(list(name),list(value))) pie = Pie() pie.add("",words) pie.set_global_opts(title_opts=opts.TitleOpts(title="前十地区")) #pie.set_series_opts(label_opts=opts.LabelOpts(user_df)) return pie.render_notebook() user_df=read_csv('user_info.csv') echart_pie(user_df)
可以进行保存,可惜不是动图:
from snapshot_selenium import snapshot make_snapshot(snapshot,echart_pie(user_df).render(),"test.png")
保存为网页的形式就可以自动加载JS进行渲染了:
echart_pie(user_df).render('problem.html') os.system('problem.html')
做出来的一系列报表一般都要发给别人看的,对于一些每天需要发送到指定邮箱或者需要发送多封报表的可以使用Python来自动发送邮箱。
在Python发送邮件主要借助到smtplib和email这个两个模块。
不同种类的邮箱服务器连接地址不一样,大家根据自己平常使用的邮箱设置相应的服务器进行连接。这里博主用网易邮箱展示:
首先需要开启POP3/SMTP/IMAP服务:
之后便可以根据授权码使用python登入了。
import smtplib from email import encoders from email.header import Header from email.utils import parseaddr,formataddr from email.mime.application import MIMEApplication from email.mime.multipart import MIMEMultipart from email.mime.text import MIMEText #发件人邮箱 asender="fanstuck@163.com" #收件人邮箱 areceiver="1079944650@qq.com" #抄送人邮箱 acc="fanstuck@163.com" #邮箱主题 asubject="谢谢关注" #发件人地址 from_addr="fanstuck@163.com" #邮箱授权码 password="####" #邮件设置 msg=MIMEMultipart() msg['Subject']=asubject msg['to']=areceiver msg['Cc']=acc msg['from']="fanstuck" #邮件正文 body="你好,欢迎关注fanstuck,您的关注就是我继续创作的动力!" msg.attach(MIMEText(body,'plain','utf-8')) #添加附件 htmlFile = 'C:/Users/10799/problem.html' html = MIMEApplication(open(htmlFile , 'rb').read()) html.add_header('Content-Disposition', 'attachment', filename='html') msg.attach(html) #设置邮箱服务器地址和接口 smtp_server="smtp.163.com" server = smtplib.SMTP(smtp_server,25) server.set_debuglevel(1) #登录邮箱 server.login(from_addr,password) #发生邮箱 server.sendmail(from_addr,areceiver.split(',')+acc.split(','),msg.as_string()) #断开服务器连接 server.quit()
运行测试:
下载文件:
完全没问题!!!
The above is the detailed content of Python automated office applet: realize report automation and automatically send it to the destination mailbox. For more information, please follow other related articles on the PHP Chinese website!