Home  >  Article  >  Backend Development  >  Python’s eight data import methods, have you mastered them?

Python’s eight data import methods, have you mastered them?

WBOY
WBOYforward
2023-04-19 12:52:031528browse

In most cases, NumPy or Pandas will be used to import data, so before starting, execute:

import numpy as np
import pandas as pd

Two ways to get help

Many times you don’t know much about some function methods. At this time, Python provides some help information to quickly use Python objects.

Use the info method in Numpy.

np.info(np.ndarray.dtype)

Python’s eight data import methods, have you mastered them?

Python built-in function

help(pd.read_csv)

Python’s eight data import methods, have you mastered them?

1. Text File

1. Plain text file

filename = 'demo.txt'
file = open(filename, mode='r') # 打开文件进行读取
text = file.read() # 读取文件的内容
print(file.closed) # 检查文件是否关闭
file.close() # 关闭文件
print(text)

Use context manager -- with

with open('demo.txt', 'r') as file:
print(file.readline()) # 一行一行读取
print(file.readline())
print(file.readline())

2. Table data: Flat file

Use Numpy to read Flat file

Numpy’s built-in functions process data at the C language level.

Flat file is a file containing records without relative relationship structure. (Excel, CSV and Tab delimiter files are supported)

Files with one data type

The string used to separate values ​​skips the first two lines. Read the type of the resulting array in the first and third columns.

filename = 'mnist.txt'
data = np.loadtxt(filename,
delimiter=',',
skiprows=2,
usecols=[0,2],
dtype=str)

  • Files with mixed data types

Two hard requirements:

  • Skip header Information
  • Distinguish between horizontal and vertical coordinates

filename = 'titanic.csv'
data = np.genfromtxt(filename,
 delimiter=',',
 names=True,
 dtype=None)

Python’s eight data import methods, have you mastered them?

##Use Pandas to read Flat files

filename = 'demo.csv' 
data = pd.read_csv(filename, 
 nrows=5,# 要读取的文件的行数
 header=None,# 作为列名的行号
 sep='t', # 分隔符使用
 comment='#',# 分隔注释的字符
 na_values=[""]) # 可以识别为NA/NaN的字符串

2. Excel Spreadsheet

ExcelFile() in Pandas is a very convenient and fast class in pandas for reading excel table files, especially for excel containing multiple sheets. It is very convenient to manipulate files.

file = 'demo.xlsx'
data = pd.ExcelFile(file)
df_sheet2 = data.parse(sheet_name='1960-1966',
 skiprows=[0],
 names=['Country',
'AAM: War(2002)'])
df_sheet1 = pd.read_excel(data,
sheet_name=0,
parse_cols=[0],
skiprows=[0],
names=['Country'])

Use the sheet_names property to get the name of the worksheet to be read.

data.sheet_names

3. SAS file

SAS (Statistical Analysis System) is a modular and integrated large-scale application software system. The file it saves, sas, is a statistical analysis file.

from sas7bdat import SAS7BDAT
with SAS7BDAT('demo.sas7bdat') as file:
df_sas = file.to_data_frame()

4. Stata file

Stata is a complete and integrated statistical software that provides its users with data analysis, data management and professional chart drawing. The saved file is a Stata file with the .dta extension.

data = pd.read_stata('demo.dta')

5. Pickled files

Almost all data types in python (lists, dictionaries, sets, classes, etc.) can be serialized using pickle. Python's pickle module implements basic data sequencing and deserialization. Through the serialization operation of the pickle module, we can save the object information running in the program to a file and store it permanently; through the deserialization operation of the pickle module, we can create the object saved by the last program from the file.

import pickle
with open('pickled_demo.pkl', 'rb') as file:
 pickled_data = pickle.load(file) # 下载被打开被读取到的数据

The corresponding operation is to write the method pickle.dump().

6. HDF5 file

HDF5 file is a common cross-platform data storage file that can store different types of images and digital data and can be transferred on different types of machines. There are also function libraries that uniformly handle this file format.

HDF5 files generally have .h5​ or .hdf5 as the suffix, and special software is required to open the content of the preview file.

import h5py
filename = 'H-H1_LOSC_4_v1-815411200-4096.hdf5'
data = h5py.File(filename, 'r')

7. Matlab file

It is a file with the suffix .mat in which matlab stores the data in its workspace.

import scipy.io
filename = 'workspace.mat'
mat = scipy.io.loadmat(filename)

8. Relational database

from sqlalchemy import create_engine
engine = create_engine('sqlite://Northwind.sqlite')

Use the table_names() method to obtain a list of table names

table_names = engine.table_names()

1. Directly query the relational database

con = engine.connect()
rs = con.execute("SELECT * FROM Orders")
df = pd.DataFrame(rs.fetchall())
df.columns = rs.keys()
con.close()

Use the context manager -- with

with engine.connect() as con:
rs = con.execute("SELECT OrderID FROM Orders")
df = pd.DataFrame(rs.fetchmany(size=5))
df.columns = rs.keys()

2. Use Pandas to query the relationship Type database

df = pd.read_sql_query("SELECT * FROM Orders", engine)

Data exploration

After the data is imported, the data will be initially explored, such as checking the data type, data size, length and other basic information. Here is a brief summary.

1, NumPy Arrays

data_array.dtype# 数组元素的数据类型
data_array.shape# 阵列尺寸
len(data_array) # 数组的长度

2, Pandas DataFrames

df.head()# 返回DataFrames前几行(默认5行)
df.tail()# 返回DataFrames最后几行(默认5行)
df.index # 返回DataFrames索引
df.columns # 返回DataFrames列名
df.info()# 返回DataFrames基本信息
data_array = data.values # 将DataFrames转换为NumPy数组

The above is the detailed content of Python’s eight data import methods, have you mastered them?. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete