Time series classification examples in Python-Python Tutorial-php.cn

Time series classification examples in Python

WBOY

Release： 2023-06-10 11:58:44

Original

1798 people have browsed it

Python is currently one of the most popular programming languages. Its powerful and flexible features make it the language of choice in the field of data science and machine learning. In data analysis, time series is a very important concept because it can be used to describe time-ordered data, such as stock prices, weather changes, etc.

In this article, we will explore how to classify time series data using Python.

Data preparation

First, we need to prepare the data for classification. In this example, we will use a dataset from the UCI Machine Learning Repository, which contains a 1000-day time series, each consisting of 24 hours of meteorological data. This dataset aims to predict whether the next day's minimum temperature will fall below a certain threshold.

We will use the pandas library to load the dataset.

import pandas as pd

# 加载数据集
data = pd.read_csv("weather.csv")

# 查看前几行数据
print(data.head())

Copy after login

Output:

      Date  R1  R2  R3  R4  R5  R6  R7  R8  R9  ...  R15  R16  R17  R18  R19  R20  R21  R22  R23  R24  Tmin
0  1/01/14  58  41  67  63  44  50  46  52  64  ...   82   83   62   49   67   73   65   52   39   23    42
1  2/01/14  46  45  36  63  72  75  80  65  68  ...   74   73   52   43   36   47   19   16   13   15    26
2  3/01/14  48  37  39  45  74  75  76  66  45  ...   76   62   49   50   38   50   29   15   13   15    30
3  4/01/14  46  43  47  76  48  68  77  61  61  ...   24   28   39   33   26    3    4    6    0   10    50
4  5/01/14  49  42  58  74  70  47  68  59  43  ...   55   37   36   42   30   29   35   31   25   22    32

Copy after login

As we can see, the data set contains information such as date, 24 hours of weather data, and minimum temperature (Tmin).

Feature Engineering

Before classification, we need to preprocess the data. One of the steps is feature engineering, where we need to extract new features from the original data to improve the performance of the model.

We can extract the following features from the time series:

Mean
Variance
Maximum value
Minimum value
Median value
Standard deviation

We can use pandas to quickly extract these features.

# 提取以下特征
features = []
for i in range(1, 25):
    features.append("R"+str(i))
    
data['Mean'] = data[features].mean(axis=1)
data['Std'] = data[features].std(axis=1)
data['Min'] = data[features].min(axis=1)
data['Max'] = data[features].max(axis=1)
data['Median'] = data[features].median(axis=1)
data['Var'] = data[features].var(axis=1)

# 查看更新后的数据集
print(data.head())

Copy after login

Output:

      Date  R1  R2  R3  R4  R5  R6  R7  R8  R9  ...  R18  R19  R20  R21  R22  R23  R24  Tmin       Mean        Std  Min  Max  Median         Var
0  1/01/14  58  41  67  63  44  50  46  52  64  ...   49   67   73   65   52   39   23    42  55.166667  15.181057   23   83    54.5  230.456140
1  2/01/14  46  45  36  63  72  75  80  65  68  ...   43   36   47   19   16   13   15    26  47.125000  20.236742   13   80    45.5  410.114035
2  3/01/14  48  37  39  45  74  75  76  66  45  ...   50   38   50   29   15   13   15    30  47.208333  19.541905   13   76    44.5  382.149123
3  4/01/14  46  43  47  76  48  68  77  61  61  ...   33   26    3    4    6    0   10    50  36.750000  19.767969    0   77    42.5  390.350877
4  5/01/14  49  42  58  74  70  47  68  59  43  ...   42   30   29   35   31   25   22    32  45.666667  16.013175   22   74    43.5  256.508772

Copy after login

Now, we have successfully extracted some new features from the time series, which will provide more information for our classifier.

Data Partition

Next, we need to divide the data set into a training set and a test set. We will use the scikit-learn library to accomplish this task.

from sklearn.model_selection import train_test_split

X = data.drop(['Date','Tmin'], axis=1)
y = data['Tmin']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Copy after login

Here we divide the data set into 80% training set and 20% test set.

Time Series Classification

Now, we are ready to classify the data using a time series classifier. In this example we will use the LightGBM model.

import lightgbm as lgb

# 创建LightGBM分类器
clf = lgb.LGBMClassifier()

# 训练模型
clf.fit(X_train, y_train)

# 在测试集上进行预测
y_pred = clf.predict(X_test)

# 计算精度
accuracy = sum(y_pred == y_test) / len(y_test)
print("Accuracy: {:.2f}%".format(accuracy * 100))

Copy after login

Output:

Accuracy: 94.50%

Copy after login

We got 94.5% accuracy, which means our model predicted very accurately whether the minimum temperature is below the predefined threshold.

Conclusion

In Python, classifying time series data becomes very easy using a time series classifier. In this article, we use the LightGBM model to classify time series data, and use the pandas library to preprocess the data and extract features.

Whether you are working in stock price forecasting, weather change prediction, or other time series tasks, these tools and techniques can help you better perform data analysis and forecasting.

The above is the detailed content of Time series classification examples in Python. For more information, please follow other related articles on the PHP Chinese website!