Python is a very popular programming language. Its powerful scientific computing and data processing capabilities make it widely used in the fields of data analysis and machine learning. This article will introduce how to use univariate linear regression in Python for data modeling and prediction, and demonstrate its practical application through an example.
First of all, what is linear regression? In statistics and machine learning, linear regression is a method used to establish a relationship between two variables. In univariate linear regression, we have only one explanatory variable (independent variable) and one response variable (dependent variable).
Next, we will introduce how to use the scikit-learn library in Python to implement univariate linear regression. scikit-learn is a popular machine learning library that contains many tools for data modeling and visualization.
Step 1: Import libraries and data
First, we need to import some libraries. In this article, we will use NumPy, Pandas, Matplotlib and Scikit-learn.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
Next, we need to prepare what we want to analyze data. In this example, we will use a set of data about house size and price, which is a very simple data set.
df = pd.DataFrame({'Area': [1400, 1600, 1700, 1875, 1100, 1550, 2350, 2450, 1425, 1700],
'价格': [245000, 312000, 279000, 308000, 199000, 219000, 405000, 324000, 319000, 255000]})
print(df)
The output is as follows:
面积 价格
0 1400 245000
1 1600 312000
2 1700 279000
3 1875 308000
4 1100 199000
5 1550 219000
6 2350 405000
7 2450 324000
8 1425 319000
9 1700 255000
#Step 2: Data Analysis and Visualization
Once we import data, we can start doing some data analysis and visualization. Let's draw a scatter plot, where the abscissa is the house area and the ordinate is the sales price.
plt.scatter(df['area'], df['price'])
plt.xlabel('area')
plt.ylabel('price')
plt.show()
Output:
This scatter plot tells us that as the area of the house increases, the selling price also increases. Therefore, there may be a linear relationship between these two variables.
Step 3: Fit a linear regression model
Now, we can start fitting the linear regression model. In scikit-learn, you need to use the LinearRegression() function to build a linear model.
X = df[['area']]
Y = df['price']
model = LinearRegression().fit(X, Y)
Here, we assign the area to the independent variable X, the price to the dependent variable Y, and Passed into the LinearRegression() function. After fitting the model, we can check the slope and intercept.
print('Slope:', model.coef_)
print('Intercept:', model .intercept_)
Output:
Slope: [126.88610769]
Intercept: 36646.35077294225
Step 4: Visualization results
Complete the training of the model, we You can use Matplotlib to draw a regression line and predict house prices. The following code will show how to predict the selling price of a new house area.
y_pred = model.predict([[2000]])
print('Predicted selling price:', y_pred)
plt.scatter(df['area'], df['price'])
plt.plot(df['area'], model.predict(df[['area']]), color ='r')
plt.xlabel('area')
plt.ylabel('price')
plt.show()
Output:
Yes Seeing that our regression line fits our data points, we can use the fitted model to predict the sales price of new homes by square footage.
This article introduces how to use the scikit-learn library in Python to implement univariate linear regression, including data preparation, data analysis and visualization, fitting linear regression models and predicting results. Linear regression is a simple yet powerful tool that can be used to study the relationship between two variables and make predictions. It has wide applications in data analysis and machine learning.
The above is the detailed content of Univariate linear regression example in Python. For more information, please follow other related articles on the PHP Chinese website!