


How to use the pandas library for data manipulation in Python?
Pandas is a powerful Python library for data manipulation and analysis using Series and DataFrame structures. 1. Import and load data from CSV, Excel, JSON, or create manually with pd.DataFrame(). 2. Explore data using head(), tail(), info(), describe(), shape, and columns. 3. Select and filter data by column, row index (loc/iloc), or conditions with boolean indexing. 4. Handle missing values using isnull(), dropna(), or fillna() with mean or custom values. 5. Add columns with conditional logic or remove/rename them using drop() and rename(). 6. Transform data using apply(), map(), replace(), and astype() for cleaning and type conversion. 7. Sort data with sort_values() and group by categories using groupby() with aggregation functions like mean() or count(). 8. Merge DataFrames with merge() or concatenate with concat() along rows or columns. 9. Save processed data to CSV, Excel, or JSON using to_csv(), to_excel(), or to_json() with index=False if needed. Mastering loc, groupby, and apply enables efficient handling of most real-world data tasks, making pandas essential for data workflows.
Pandas is one of the most powerful and widely used libraries in Python for data manipulation and analysis. It provides easy-to-use data structures like Series (1D) and DataFrame (2D), along with a wide range of functions to clean, filter, transform, and analyze data efficiently.

Here’s a practical guide on how to use pandas for common data manipulation tasks:
1. Importing and Loading Data
Start by importing pandas and loading data from common formats like CSV, Excel, or JSON.

import pandas as pd # Load data from a CSV file df = pd.read_csv('data.csv') # Load from Excel (requires openpyxl) df = pd.read_excel('data.xlsx') # Load from JSON df = pd.read_json('data.json')
You can also create a DataFrame manually:
data = { 'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35], 'City': ['New York', 'Los Angeles', 'Chicago'] } df = pd.DataFrame(data)
2. Exploring the Data
Before manipulating, inspect your data to understand its structure.

# Display first 5 rows df.head() # Last 3 rows df.tail(3) # General info: data types, missing values, memory usage df.info() # Summary statistics for numeric columns df.describe() # Shape of the DataFrame (rows, columns) df.shape # Column names df.columns
3. Selecting and Filtering Data
Access specific parts of the DataFrame using labels, positions, or conditions.
# Select a single column df['Name'] # Select multiple columns df[['Name', 'Age']] # Select rows by index df.loc[0] # by label df.iloc[0] # by position # Filter rows based on condition df[df['Age'] > 28] # Multiple conditions (use & for AND, | for OR, parentheses required) df[(df['Age'] > 25) & (df['City'] == 'Chicago')]
4. Handling Missing Data
Real-world data often has missing values (NaN). Pandas provides tools to manage them.
# Check for missing values df.isnull() # Count missing values per column df.isnull().sum() # Drop rows with any missing values df.dropna() # Drop columns with more than 50% missing df.dropna(thresh=len(df)*0.5, axis=1) # Fill missing values df['Age'].fillna(df['Age'].mean(), inplace=True) df['City'].fillna('Unknown', inplace=True)
5. Adding and Removing Columns
Modify the structure of your DataFrame as needed.
# Add a new column df['Senior'] = df['Age'] > 30 # Remove a column df.drop('Senior', axis=1, inplace=True) # Rename columns df.rename(columns={'Name': 'Full Name'}, inplace=True)
6. Data Transformation
Apply functions to transform or clean data.
# Apply a function to a column df['Age'] = df['Age'].apply(lambda x: x 1) # Increment age # Map values (e.g., replace city names) df['City'] = df['City'].map({'New York': 'NYC', 'Los Angeles': 'LA'}) # Replace specific values df.replace({'NYC': 'New York City'}, inplace=True) # Convert data types df['Age'] = df['Age'].astype(int)
7. Sorting and Grouping
Organize and summarize data effectively.
# Sort by one or more columns df.sort_values('Age', ascending=False) # Sort by multiple columns df.sort_values(['City', 'Age'], ascending=[True, False]) # Group data and aggregate grouped = df.groupby('City')['Age'].mean() # Average age by city grouped = df.groupby('City').agg({'Age': 'mean', 'Name': 'count'}) # Multiple stats
8. Merging and Concatenating Data
Combine multiple DataFrames, similar to SQL joins.
# Concatenate vertically (stack rows) df_combined = pd.concat([df1, df2], axis=0) # Concatenate horizontally (add columns) df_combined = pd.concat([df1, df2], axis=1) # Merge on a key (like SQL join) merged = pd.merge(df1, df2, on='Name', how='inner') # inner, left, right, outer
9. Saving Data
After manipulation, export the result.
# Save to CSV df.to_csv('cleaned_data.csv', index=False) # Save to Excel df.to_excel('cleaned_data.xlsx', index=False) # Save to JSON df.to_json('cleaned_data.json', orient='records')
Pandas makes data manipulation intuitive and efficient. Start with small datasets to practice these operations, and gradually apply them to real-world problems. The key is to become familiar with indexing, filtering, and aggregation patterns — they form the backbone of most data workflows.
Basically, once you get comfortable with loc
, groupby
, and apply
, you can handle most day-to-day data tasks.
The above is the detailed content of How to use the pandas library for data manipulation in Python?. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undress AI Tool
Undress images for free

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Pandas installation tutorial: Analysis of common installation errors and their solutions, specific code examples are required Introduction: Pandas is a powerful data analysis tool that is widely used in data cleaning, data processing, and data visualization, so it is highly respected in the field of data science . However, due to environment configuration and dependency issues, you may encounter some difficulties and errors when installing pandas. This article will provide you with a pandas installation tutorial and analyze some common installation errors and their solutions. 1. Install pandas

Simple pandas installation tutorial: Detailed guidance on how to install pandas on different operating systems, specific code examples are required. As the demand for data processing and analysis continues to increase, pandas has become one of the preferred tools for many data scientists and analysts. pandas is a powerful data processing and analysis library that can easily process and analyze large amounts of structured data. This article will detail how to install pandas on different operating systems and provide specific code examples. Install on Windows operating system

Golang improves data processing efficiency through concurrency, efficient memory management, native data structures and rich third-party libraries. Specific advantages include: Parallel processing: Coroutines support the execution of multiple tasks at the same time. Efficient memory management: The garbage collection mechanism automatically manages memory. Efficient data structures: Data structures such as slices, maps, and channels quickly access and process data. Third-party libraries: covering various data processing libraries such as fasthttp and x/text.

The secret of Pandas deduplication method: a fast and efficient way to deduplicate data, which requires specific code examples. In the process of data analysis and processing, duplication in the data is often encountered. Duplicate data may mislead the analysis results, so deduplication is a very important step. Pandas, a powerful data processing library, provides a variety of methods to achieve data deduplication. This article will introduce some commonly used deduplication methods, and attach specific code examples. The most common case of deduplication based on a single column is based on whether the value of a certain column is duplicated.

Use Redis to improve the data processing efficiency of Laravel applications. With the continuous development of Internet applications, data processing efficiency has become one of the focuses of developers. When developing applications based on the Laravel framework, we can use Redis to improve data processing efficiency and achieve fast access and caching of data. This article will introduce how to use Redis for data processing in Laravel applications and provide specific code examples. 1. Introduction to Redis Redis is a high-performance memory data

Compare the data processing capabilities of Laravel and CodeIgniter: ORM: Laravel uses EloquentORM, which provides class-object relational mapping, while CodeIgniter uses ActiveRecord to represent the database model as a subclass of PHP classes. Query builder: Laravel has a flexible chained query API, while CodeIgniter’s query builder is simpler and array-based. Data validation: Laravel provides a Validator class that supports custom validation rules, while CodeIgniter has less built-in validation functions and requires manual coding of custom rules. Practical case: User registration example shows Lar

Simple and easy-to-understand PythonPandas installation guide PythonPandas is a powerful data manipulation and analysis library. It provides flexible and easy-to-use data structures and data analysis tools, and is one of the important tools for Python data analysis. This article will provide you with a simple and easy-to-understand PythonPandas installation guide to help you quickly install Pandas, and attach specific code examples to make it easy for you to get started. Installing Python Before installing Pandas, you need to first

As an open source programming language, Go language has gradually received widespread attention and use in recent years. It is favored by programmers for its simplicity, efficiency, and powerful concurrent processing capabilities. In the field of big data processing, the Go language also has strong potential. It can be used to process massive data, optimize performance, and can be well integrated with various big data processing tools and frameworks. In this article, we will introduce some basic concepts and techniques of big data processing in Go language, and show how to use Go language through specific code examples.
