Use pandas to easily process txt file data
In data analysis and processing, we often encounter situations where data read from txt files needs to be processed. For example, the data format is confusing and needs to be cleaned; some columns are invalid and need to be deleted; some columns need to be type-converted, etc. These tasks may bring a lot of work and time, but we can easily complete these operations through the Python library pandas.
This article will combine code examples to teach you how to use pandas to process txt file data.
Before using the pandas library, we need to introduce it first. In Python scripts, it is generally agreed to rename the pandas library to pd to facilitate subsequent calls.
import pandas as pd
First, we need to read the data in the txt file. In pandas, we use the pd.read_csv() function to read in data. Although the function name contains csv, this function is also suitable for reading txt files.
data = pd.read_csv('data.txt', sep=' ', header=None)
The function parameters are explained as follows:
After reading the data, we can view the content and form of the data by printing the data.
print(data)
Output result:
0 1 2 0 A 123 1.0 1 B 321 2.0 2 C 231 NaN 3 D 213 4.0 4 E 132 3.0
It can be seen that the read data has been stored in data in the form of DataFrame.
The read data may have many format irregularities or errors, which requires us to clean the data. For example, there may be missing values in some rows or columns, and we need to fill or delete them; the data type of some columns may not meet our needs, and we need to convert them to numeric or string types, etc.
a. Delete rows containing missing values
We can use the dropna() function to delete rows containing missing values.
data_clean = data.dropna()
This function will delete any rows containing missing values in the data and return a DataFrame with only complete data.
b. Filling missing values
If rows containing missing values cannot be deleted, we can choose to fill these missing values. Just use the fillna() function.
data_fill = data.fillna(0)
This function fills missing values with 0. If you want to fill with other values, you can pass in the corresponding value in parentheses.
c. Convert data types
In data analysis, certain data types need to be converted into numerical or character types for subsequent calculation or processing. In pandas, you can use the astype() function for type conversion.
data_conversion = data_clean.astype({'1': 'int', '2': 'str'})
This function can convert the type of column 1 in data_clean to integer type (int), and the type of column 2 to string type (str).
Finally, we need to save the cleaned and processed data to a new txt file. In pandas, we can use the to_csv() function to achieve this.
data_clean.to_csv('data_clean.txt', index=False, header=False, sep=' ')
The function parameters are explained as follows:
Code Example
Below is the complete code example that you can copy into a Python script and run.
import pandas as pd # 读入数据 data = pd.read_csv('data.txt', sep=' ', header=None) print('原始数据: ', data) # 删除含有缺失值的行 data_clean = data.dropna() print('处理后数据(删除缺失值): ', data_clean) # 填充缺失值 data_fill = data.fillna(0) print('处理后数据(填充缺失值): ', data_fill) # 转换数据类型 data_conversion = data_clean.astype({'1': 'int', '2': 'str'}) print('处理后数据(类型转换): ', data_conversion) # 保存新数据 data_clean.to_csv('data_clean.txt', index=False, header=False, sep=' ')
This article introduces how to use pandas to easily process txt file data, including reading, cleaning, converting and saving data. As one of the important data processing tools in Python, pandas can help us complete data mining and analysis tasks more efficiently.
The above is the detailed content of Use pandas to easily process txt file data. For more information, please follow other related articles on the PHP Chinese website!