How to use the pandas library for data manipulation in Python?-Python Tutorial-php.cn

Table of Contents

2. Exploring the Data

3. Selecting and Filtering Data

4. Handling Missing Data

5. Adding and Removing Columns

6. Data Transformation

7. Sorting and Grouping

8. Merging and Concatenating Data

9. Saving Data

Home

Backend Development

Python Tutorial

How to use the pandas library for data manipulation in Python?

Charles William Harris

Aug 18, 2025 am 03:12 AM

data processing pandas

Pandas is a powerful Python library for data manipulation and analysis using Series and DataFrame structures. 1. Import and load data from CSV, Excel, JSON, or create manually with pd.DataFrame(). 2. Explore data using head(), tail(), info(), describe(), shape, and columns. 3. Select and filter data by column, row index (loc/iloc), or conditions with boolean indexing. 4. Handle missing values using isnull(), dropna(), or fillna() with mean or custom values. 5. Add columns with conditional logic or remove/rename them using drop() and rename(). 6. Transform data using apply(), map(), replace(), and astype() for cleaning and type conversion. 7. Sort data with sort_values() and group by categories using groupby() with aggregation functions like mean() or count(). 8. Merge DataFrames with merge() or concatenate with concat() along rows or columns. 9. Save processed data to CSV, Excel, or JSON using to_csv(), to_excel(), or to_json() with index=False if needed. Mastering loc, groupby, and apply enables efficient handling of most real-world data tasks, making pandas essential for data workflows.

How to use the pandas library for data manipulation in Python?

Pandas is one of the most powerful and widely used libraries in Python for data manipulation and analysis. It provides easy-to-use data structures like Series (1D) and DataFrame (2D), along with a wide range of functions to clean, filter, transform, and analyze data efficiently.

Here’s a practical guide on how to use pandas for common data manipulation tasks:

1. Importing and Loading Data

Start by importing pandas and loading data from common formats like CSV, Excel, or JSON.

import pandas as pd

# Load data from a CSV file
df = pd.read_csv('data.csv')

# Load from Excel (requires openpyxl)
df = pd.read_excel('data.xlsx')

# Load from JSON
df = pd.read_json('data.json')

You can also create a DataFrame manually:

data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)

2. Exploring the Data

Before manipulating, inspect your data to understand its structure.

# Display first 5 rows
df.head()

# Last 3 rows
df.tail(3)

# General info: data types, missing values, memory usage
df.info()

# Summary statistics for numeric columns
df.describe()

# Shape of the DataFrame (rows, columns)
df.shape

# Column names
df.columns

3. Selecting and Filtering Data

Access specific parts of the DataFrame using labels, positions, or conditions.

# Select a single column
df['Name']

# Select multiple columns
df[['Name', 'Age']]

# Select rows by index
df.loc[0]           # by label
df.iloc[0]          # by position

# Filter rows based on condition
df[df['Age'] > 28]

# Multiple conditions (use & for AND, | for OR, parentheses required)
df[(df['Age'] > 25) & (df['City'] == 'Chicago')]

4. Handling Missing Data

Real-world data often has missing values (NaN). Pandas provides tools to manage them.

# Check for missing values
df.isnull()

# Count missing values per column
df.isnull().sum()

# Drop rows with any missing values
df.dropna()

# Drop columns with more than 50% missing
df.dropna(thresh=len(df)*0.5, axis=1)

# Fill missing values
df['Age'].fillna(df['Age'].mean(), inplace=True)
df['City'].fillna('Unknown', inplace=True)

5. Adding and Removing Columns

Modify the structure of your DataFrame as needed.

# Add a new column
df['Senior'] = df['Age'] > 30

# Remove a column
df.drop('Senior', axis=1, inplace=True)

# Rename columns
df.rename(columns={'Name': 'Full Name'}, inplace=True)

6. Data Transformation

Apply functions to transform or clean data.

# Apply a function to a column
df['Age'] = df['Age'].apply(lambda x: x   1)  # Increment age

# Map values (e.g., replace city names)
df['City'] = df['City'].map({'New York': 'NYC', 'Los Angeles': 'LA'})

# Replace specific values
df.replace({'NYC': 'New York City'}, inplace=True)

# Convert data types
df['Age'] = df['Age'].astype(int)

7. Sorting and Grouping

Organize and summarize data effectively.

# Sort by one or more columns
df.sort_values('Age', ascending=False)

# Sort by multiple columns
df.sort_values(['City', 'Age'], ascending=[True, False])

# Group data and aggregate
grouped = df.groupby('City')['Age'].mean()  # Average age by city
grouped = df.groupby('City').agg({'Age': 'mean', 'Name': 'count'})  # Multiple stats

8. Merging and Concatenating Data

Combine multiple DataFrames, similar to SQL joins.

# Concatenate vertically (stack rows)
df_combined = pd.concat([df1, df2], axis=0)

# Concatenate horizontally (add columns)
df_combined = pd.concat([df1, df2], axis=1)

# Merge on a key (like SQL join)
merged = pd.merge(df1, df2, on='Name', how='inner')  # inner, left, right, outer

9. Saving Data

After manipulation, export the result.

# Save to CSV
df.to_csv('cleaned_data.csv', index=False)

# Save to Excel
df.to_excel('cleaned_data.xlsx', index=False)

# Save to JSON
df.to_json('cleaned_data.json', orient='records')

Pandas makes data manipulation intuitive and efficient. Start with small datasets to practice these operations, and gradually apply them to real-world problems. The key is to become familiar with indexing, filtering, and aggregation patterns — they form the backbone of most data workflows.

Basically, once you get comfortable with loc, groupby, and apply, you can handle most day-to-day data tasks.

The above is the detailed content of How to use the pandas library for data manipulation in Python?. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undress AI Tool

Undress images for free

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Roblox: Grow A Garden - Complete Guide To Travelling Merchants

1 months ago By Jack chen

How to rescue all 4 Missing Kids in 99 Nights in the Forest

4 weeks ago By DDD

Windows 11 KB5062660 24H2 out with features, direct download links for offline installer (.msu)

3 weeks ago By Jack chen

How to get free Diamonds in 99 Nights in the Forest

1 months ago By DDD

PHP calls AI intelligent voice assistant PHP voice interaction system construction

3 weeks ago By

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

PHP Tutorial

1527

276

Related knowledge

Solving common pandas installation problems: interpretation and solutions to installation errors Feb 19, 2024 am 09:19 AM

Pandas installation tutorial: Analysis of common installation errors and their solutions, specific code examples are required Introduction: Pandas is a powerful data analysis tool that is widely used in data cleaning, data processing, and data visualization, so it is highly respected in the field of data science . However, due to environment configuration and dependency issues, you may encounter some difficulties and errors when installing pandas. This article will provide you with a pandas installation tutorial and analyze some common installation errors and their solutions. 1. Install pandas

Simple pandas installation tutorial: detailed guidance on how to install pandas on different operating systems Feb 21, 2024 pm 06:00 PM

Simple pandas installation tutorial: Detailed guidance on how to install pandas on different operating systems, specific code examples are required. As the demand for data processing and analysis continues to increase, pandas has become one of the preferred tools for many data scientists and analysts. pandas is a powerful data processing and analysis library that can easily process and analyze large amounts of structured data. This article will detail how to install pandas on different operating systems and provide specific code examples. Install on Windows operating system

How does Golang improve data processing efficiency? May 08, 2024 pm 06:03 PM

Golang improves data processing efficiency through concurrency, efficient memory management, native data structures and rich third-party libraries. Specific advantages include: Parallel processing: Coroutines support the execution of multiple tasks at the same time. Efficient memory management: The garbage collection mechanism automatically manages memory. Efficient data structures: Data structures such as slices, maps, and channels quickly access and process data. Third-party libraries: covering various data processing libraries such as fasthttp and x/text.

Revealing the efficient data deduplication method in Pandas: Tips for quickly removing duplicate data Jan 24, 2024 am 08:12 AM

The secret of Pandas deduplication method: a fast and efficient way to deduplicate data, which requires specific code examples. In the process of data analysis and processing, duplication in the data is often encountered. Duplicate data may mislead the analysis results, so deduplication is a very important step. Pandas, a powerful data processing library, provides a variety of methods to achieve data deduplication. This article will introduce some commonly used deduplication methods, and attach specific code examples. The most common case of deduplication based on a single column is based on whether the value of a certain column is duplicated.

Use Redis to improve data processing efficiency of Laravel applications Mar 06, 2024 pm 03:45 PM

Use Redis to improve the data processing efficiency of Laravel applications. With the continuous development of Internet applications, data processing efficiency has become one of the focuses of developers. When developing applications based on the Laravel framework, we can use Redis to improve data processing efficiency and achieve fast access and caching of data. This article will introduce how to use Redis for data processing in Laravel applications and provide specific code examples. 1. Introduction to Redis Redis is a high-performance memory data

How do the data processing capabilities in Laravel and CodeIgniter compare? Jun 01, 2024 pm 01:34 PM

Compare the data processing capabilities of Laravel and CodeIgniter: ORM: Laravel uses EloquentORM, which provides class-object relational mapping, while CodeIgniter uses ActiveRecord to represent the database model as a subclass of PHP classes. Query builder: Laravel has a flexible chained query API, while CodeIgniter’s query builder is simpler and array-based. Data validation: Laravel provides a Validator class that supports custom validation rules, while CodeIgniter has less built-in validation functions and requires manual coding of custom rules. Practical case: User registration example shows Lar

Installation guide for PythonPandas: easy to understand and operate Jan 24, 2024 am 09:39 AM

Simple and easy-to-understand PythonPandas installation guide PythonPandas is a powerful data manipulation and analysis library. It provides flexible and easy-to-use data structures and data analysis tools, and is one of the important tools for Python data analysis. This article will provide you with a simple and easy-to-understand PythonPandas installation guide to help you quickly install Pandas, and attach specific code examples to make it easy for you to get started. Installing Python Before installing Pandas, you need to first

Getting Started Guide: Using Go Language to Process Big Data Feb 25, 2024 pm 09:51 PM

As an open source programming language, Go language has gradually received widespread attention and use in recent years. It is favored by programmers for its simplicity, efficiency, and powerful concurrent processing capabilities. In the field of big data processing, the Go language also has strong potential. It can be used to process massive data, optimize performance, and can be well integrated with various big data processing tools and frameworks. In this article, we will introduce some basic concepts and techniques of big data processing in Go language, and show how to use Go language through specific code examples.

See all articles