search
HomeBackend DevelopmentPython TutorialSeries of data processing using pandas

Series of data processing using pandas

Sep 15, 2020 pm 04:10 PM
pandasseries

Series of data processing using pandas

Related learning recommendations: python tutorial

##In python, today we start to introduce a new A commonly used computing tool library is the famous

Pandas.

The full name of Pandas is Python Data Analysis Library, which is a

scientific computing tool based on Numpy. Its biggest feature is that it can operate structured data just like operating tables in a database, so it supports many complex and advanced operations and can be considered an enhanced version of Numpy. It can easily construct complete data from a csv or excel table, and supports many table-level batch data calculation interfaces.

Installation usingLike almost all Python packages, pandas can also be installed through pip. If you have installed the Anaconda suite, libraries such as numpy and pandas have been installed automatically. If you have not installed it, it does not matter. We can complete the installation with one line of commands.
pip install pandas复制代码

Like Numpy, we usually give it an alias when using pandas. The alias of pandas is pd. Therefore, the convention for using pandas is:

import pandas as pd复制代码
If you run this line without an error, it means that your pandas has been installed. There are two other packages that are generally used together with pandas. One of them is also a scientific computing package called Scipy, and the other is a tool package for visualizing data, called Matplotlib. We can also use pip to install these two packages together. In subsequent articles, when these two packages are used, their usage will be briefly introduced.

pip install scipy matplotlib复制代码

Series indexThere are two most commonly used data structures in pandas, one is Series and the other One is a DataFrame. Among them, series is a one-dimensional data structure

, which can be simply understood as a one-dimensional array or a one-dimensional vector. DataFrame is naturally a two-dimensional data structure, which can be understood as a table or a two-dimensional array.

Let’s take a look at Series first. There are two main types of data stored in Series. One is an array composed of a set of data, and the other is the index or label of this set of data. We simply create a Series and print it out to understand.

Here we randomly created a Series containing four elements, and then printed it out. You can see that there are two columns in the printed data. The second column is the data we entered when we just created it. The first column is its index
. Since we did not specify an index when we created it, pandas will automatically create a row number index for us. We can view the data and indexes stored in the Series through the values ​​and index attributes in the Series type:

The values ​​output here are a Numpy array. This is not surprising, because as we said earlier, pandas is a scientific computing library developed based on Numpy. Numpy is its underlying layer. From the printed index information, we can see that this is a Range type index, its range and step size.

The index is a default parameter in the Series construction function. If we do not fill it in, it will generate a Range index for us by default, which is actually the row number of the data. We can also specify the index of the data ourselves. For example, if we add the index parameter to the code just now, we can specify the index ourselves.

When we specify the index of the character type, the result returned by index is no longer RangeIndex but Index. Note that pandas internally distinguishes between numeric indexes and character indexes.

With the index, it is naturally used to find elements. We can directly use the index as the subscript of the array, and the effect of the two is the same. Not only that, index arrays are also acceptable, and we can directly query the values ​​of several indexes.

In addition, when creating a Series,

duplicate indexes are also allowed. Similarly, when we use index queries, we will also get multiple results.

Not only that, bool indexes like Numpy are still supported:

Series calculation


Series supports many types of calculations, we can directly use

addition, subtraction, multiplication and division operations Perform operations on the entire Series:

You can also use the operation function in Numpy to perform some complex mathematical operations, but the result of this calculation will be a Numpy array.

Because there is an index in the Series, we can also use dict to determine whether the index is in the Series:

Series has indexes and values. In fact, the storage structure is the same as dict, so Seires also supports initialization through a dict:

Through this The order created in this way is the order in which the keys are stored in the dict. We canspecify index when creating, so that we can control its order.

We passed in an additional key that did not appear in the dict when specifying the index. Since the corresponding value cannot be found in the dict, Series will Record it as NAN (Not a number). It can be understood as illegal value or null value. When we process features or training data, we often encounter situations where a certain feature of the data with some entries is blank. We can use pandas The isnull and notnull functions check for vacancies.

Of course, there is also an isnull function in Series, which we can also call.

Finally, the index in the Series can also be modified, we can directly assign a new value to it:

Summary

At its core, Series in pandas isA layer of encapsulation on Numpy one-dimensional array, adding some related functions such as indexing. So we can imagine that DataFrame is actually an encapsulation of a Series array, with more data processing-related functions added. Once we have grasped the core structure, it is much more useful to understand the entire function of pandas than to memorize these APIs one by one.

pandas is a great tool for Python data processing. As a qualified algorithm engineer, it is almost a must-know. It is also the basis for us to use Python for machine learning and deep learning. According to survey data, 70% of the daily work of algorithm engineers is invested in data processing, and less than 30% is actually used to implement and train models. Therefore, we can see the importance of data processing. If you want to develop in the industry, it is not just enough to learn the model. This article uses mdnice for typesetting

If you want to learn more about programming, please pay attention to the
php training

column!

The above is the detailed content of Series of data processing using pandas. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:juejin. If there is any infringement, please contact admin@php.cn delete
Python and Time: Making the Most of Your Study TimePython and Time: Making the Most of Your Study TimeApr 14, 2025 am 12:02 AM

To maximize the efficiency of learning Python in a limited time, you can use Python's datetime, time, and schedule modules. 1. The datetime module is used to record and plan learning time. 2. The time module helps to set study and rest time. 3. The schedule module automatically arranges weekly learning tasks.

Python: Games, GUIs, and MorePython: Games, GUIs, and MoreApr 13, 2025 am 12:14 AM

Python excels in gaming and GUI development. 1) Game development uses Pygame, providing drawing, audio and other functions, which are suitable for creating 2D games. 2) GUI development can choose Tkinter or PyQt. Tkinter is simple and easy to use, PyQt has rich functions and is suitable for professional development.

Python vs. C  : Applications and Use Cases ComparedPython vs. C : Applications and Use Cases ComparedApr 12, 2025 am 12:01 AM

Python is suitable for data science, web development and automation tasks, while C is suitable for system programming, game development and embedded systems. Python is known for its simplicity and powerful ecosystem, while C is known for its high performance and underlying control capabilities.

The 2-Hour Python Plan: A Realistic ApproachThe 2-Hour Python Plan: A Realistic ApproachApr 11, 2025 am 12:04 AM

You can learn basic programming concepts and skills of Python within 2 hours. 1. Learn variables and data types, 2. Master control flow (conditional statements and loops), 3. Understand the definition and use of functions, 4. Quickly get started with Python programming through simple examples and code snippets.

Python: Exploring Its Primary ApplicationsPython: Exploring Its Primary ApplicationsApr 10, 2025 am 09:41 AM

Python is widely used in the fields of web development, data science, machine learning, automation and scripting. 1) In web development, Django and Flask frameworks simplify the development process. 2) In the fields of data science and machine learning, NumPy, Pandas, Scikit-learn and TensorFlow libraries provide strong support. 3) In terms of automation and scripting, Python is suitable for tasks such as automated testing and system management.

How Much Python Can You Learn in 2 Hours?How Much Python Can You Learn in 2 Hours?Apr 09, 2025 pm 04:33 PM

You can learn the basics of Python within two hours. 1. Learn variables and data types, 2. Master control structures such as if statements and loops, 3. Understand the definition and use of functions. These will help you start writing simple Python programs.

How to teach computer novice programming basics in project and problem-driven methods within 10 hours?How to teach computer novice programming basics in project and problem-driven methods within 10 hours?Apr 02, 2025 am 07:18 AM

How to teach computer novice programming basics within 10 hours? If you only have 10 hours to teach computer novice some programming knowledge, what would you choose to teach...

How to avoid being detected by the browser when using Fiddler Everywhere for man-in-the-middle reading?How to avoid being detected by the browser when using Fiddler Everywhere for man-in-the-middle reading?Apr 02, 2025 am 07:15 AM

How to avoid being detected when using FiddlerEverywhere for man-in-the-middle readings When you use FiddlerEverywhere...

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
WWE 2K25: How To Unlock Everything In MyRise
1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Dreamweaver Mac version

Dreamweaver Mac version

Visual web development tools

SublimeText3 English version

SublimeText3 English version

Recommended: Win version, supports code prompts!

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

Atom editor mac version download

Atom editor mac version download

The most popular open source editor

SAP NetWeaver Server Adapter for Eclipse

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.