


Use Python to display the distribution of colleges and universities across the country
Apr 11, 2023 pm 08:04 PMData acquisition
To show the distribution of colleges and universities, you must first obtain the location data of colleges and universities across the country. The data for this article comes from the Palm College Entrance Examination Network (https://www.gaokao.cn/school/search).
When writing this article in June 2022, a total of 2,822 colleges and universities information was obtained. After checking the data, except for a few null values, the entire data is very complete and does not affect use. The data has a total of 44 fields. This article will only use a few fields. They do not need to be processed and can be obtained on demand when using them.
##Introduction to data acquisition methods (basic crawler knowledge):
1. Register and log in to the Palm College Entrance Examination Network. Select all schools on the page.
2. Press the F12 key, click to Network > Fetch/XHR, and then click , ## on the page several times # button, the accessed API and other information will be displayed on the XHR page.
The numFound parameter value in the Response is the total number of schools. Divide by the number of schools displayed on each page to get the total number of pages. You can also directly click
on the page to view the total number of pages. , thus determining the number of visits.
Warm reminder: When obtaining data, you need to comply with the relevant statements of the website. Try to set a certain time interval for the crawler code, and do not use it during peak access times. Run the crawler code regularly.Additional explanation:
The latest announcement from People’s Daily Online: the number of general colleges and universities in the country is 2,759. This article is from the Pocket College Entrance Examination Network The difference in the obtained 2822 schools is 63, mainly due to the difference in statistical methods of branch branches of some schools. What this article shows is the distribution, and this difference has little impact.
## The Palm College Entrance Examination Network is a website that provides volunteer services for the college entrance examination. Although the data obtained has 44 fields, it does not contain the longitude and latitude of the school. In order to better display the location of colleges and universities on the map, it is necessary to obtain the corresponding longitude and latitude based on the school's address.
This article uses Baidu Maps open platform: https://lbsyun.baidu.com/apiconsole/center#/home, you can use Baidu Maps Open interface to obtain the latitude and longitude of a geographical location.
The steps are:
1. Register and log in to a Baidu account. This account can be a common account for the entire Baidu ecosystem (such as accounts for network disks, libraries, etc. are common).
2. Log in to Baidu Map Open Platform, click to enter , then click in , and then click Create an application. Customize the application name, fill in other information as prompted and required, and conduct real-name authentication to become an individual developer.
##3. After creating the application, you will get an application , use this AK value to call Baidu's API, the reference code is as follows.
import requests def baidu_api(addr): url = "http://api.map.baidu.com/geocoding/v3/?" params = { "address": addr, "output": "json", "ak": "复制你创建的应用AK到此" } req = requests.get(url, params) res = req.json() if len(res["result"]) > 0: loc = res["result"]["location"] return loc else: print("获取{}经纬度失败".format(addr)) return {'lng': '', 'lat': ''}
4. After successfully calling Baidu Map API, read the locations of all colleges and universities, call the above function in sequence, obtain the longitude and latitude of all colleges and universities, and rewrite it into excel.
import pandas as pd import numpy as np def get_lng_lat(): df = pd.read_excel('school.xlsx') lng_lat = [] for row_index, row_data in df.iterrows(): addr = row_data['address'] if addr is np.nan: addr = row_data['city_name'] + row_data['county_name'] # print(addr) loc = baidu_api(addr.split(',')[0]) lng_lat.append(loc) df['经纬度'] = lng_lat df['经度'] = df['经纬度'].apply(lambda x: x['lng']) df['纬度'] = df['经纬度'].apply(lambda x: x['lat']) df.to_excel('school_lng_lat.xlsx')
The final data results are as follows:
Individual developers need to use Baidu Map Open Platform Note that there is a daily quota limit, so when debugging the code, do not use all the data first, use the demo first, otherwise you will have to wait a day or purchase quota.
##College location display
The data is ready, now display them on the map.pip install pyecharts
1. Mark the location of the universityfrom pyecharts.charts import Geo
from pyecharts import options as opts
from pyecharts.globals import GeoType
import pandas as pd
def multi_location_mark():
"""批量标注点"""
geo = Geo(init_opts=opts.InitOpts(bg_color='black', width='1600px', height='900px'))
df = pd.read_excel('school_lng_lat.xlsx')
for row_index, row_data in df.iterrows():
geo.add_coordinate(row_data['name'], row_data['经度'], row_data['纬度'])
data_pair = [(name, 2) for name in df['name']]
geo.add_schema(
maptype='china', is_roam=True, itemstyle_opts=opts.ItemStyleOpts(color='#323c48', border_color='#408080')
).add(
'', data_pair=data_pair, type_=GeoType.SCATTER, symbol='pin', symbol_size=16, color='#CC3300'
).set_series_opts(
label_opts=opts.LabelOpts(is_show=False)
).set_global_opts(
title_opts=opts.TitleOpts(title='全国高校位置标注图', pos_left='650', pos_top='20',
title_textstyle_opts=opts.TextStyleOpts(color='white', font_size=16))
).render('high_school_mark.html')
2. Draw a heat map of the distribution of colleges and universities
from pyecharts.charts import Geo from pyecharts import options as opts from pyecharts.globals import ChartType import pandas as pd def draw_location_heatmap(): """绘制热力图""" geo = Geo(init_opts=opts.InitOpts(bg_color='black', width='1600px', height='900px')) df = pd.read_excel('school_lng_lat.xlsx') for row_index, row_data in df.iterrows(): geo.add_coordinate(row_data['name'], row_data['经度'], row_data['纬度']) data_pair = [(name, 2) for name in df['name']] geo.add_schema( maptype='china', is_roam=True, itemstyle_opts=opts.ItemStyleOpts(color='#323c48', border_color='#408080') ).add( '', data_pair=data_pair, type_=ChartType.HEATMAP ).set_series_opts( label_opts=opts.LabelOpts(is_show=False) ).set_global_opts( title_opts=opts.TitleOpts(title='全国高校分布热力图', pos_left='650', pos_top='20', title_textstyle_opts=opts.TextStyleOpts(color='white', font_size=16)), visualmap_opts=opts.VisualMapOpts() ).render('high_school_heatmap.html')
3. Draw distribution density map by province
from pyecharts.charts import Map from pyecharts import options as opts import pandas as pd def draw_location_density_map(): """绘制各省高校分布密度图""" map = Map(init_opts=opts.InitOpts(bg_color='black', width='1200px', height='700px')) df = pd.read_excel('school_lng_lat.xlsx') s = df['province_name'].value_counts() data_pair = [[province, int(s[province])] for province in s.index] map.add( '', data_pair=data_pair, maptype="china" ).set_global_opts( title_opts=opts.TitleOpts(title='全国高校按省分布密度图', pos_left='500', pos_top='70', title_textstyle_opts=opts.TextStyleOpts(color='white', font_size=16)), visualmap_opts=opts.VisualMapOpts(max_=200, is_piecewise=True, pos_left='100', pos_bottom='100',textstyle_opts=opts.TextStyleOpts(color='white', font_size=16)) ).render("high_school_density.html")
4. Distribution of 211 and 985 colleges and universities
Filter out the data of 211 and 985 colleges and universities and draw it again. (The code does not need to be pasted repeatedly, just add a line of filtering code)
The above is the detailed content of Use Python to display the distribution of colleges and universities across the country. For more information, please follow other related articles on the PHP Chinese website!

Hot Article

Hot tools Tags

Hot Article

Hot Article Tags

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

What are the advantages and disadvantages of templating?

Google AI announces Gemini 1.5 Pro and Gemma 2 for developers

For only $250, Hugging Face's technical director teaches you how to fine-tune Llama 3 step by step

A complete guide to golang function debugging and analysis

Share several .NET open source AI and LLM related project frameworks
