House_Price_Prediction-Python Tutorial-php.cn

House_Price_Prediction

Patricia Arquette

Release： 2024-11-03 12:28:29

Original

207 people have browsed it

In the world of real estate, determining property prices involves numerous factors, from location and size to amenities and market trends. Simple linear regression, a foundational technique in machine learning, provides a practical way to predict housing prices based on key features like the number of rooms or square footage.

In this article, I delve into the process of applying simple linear regression to a housing dataset, from data preprocessing and feature selection to building a model that can offer valuable price insights. Whether you’re new to data science or seeking to deepen your understanding, this project serves as a hands-on exploration of how data-driven predictions can shape smarter real estate decisions.

First things first, you start by importing your libraries:

import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt

Copy after login

#Read from the directory where you stored the data

data  = pd.read_csv('/kaggle/input/california-housing-prices/housing.csv')

Copy after login

data

Copy after login

House_Price_Prediction

#Test to see if there arent any null values
data.info()

Copy after login

House_Price_Prediction

#Trying to draw the same number of null values
data.dropna(inplace = True)

Copy after login

data.info()

Copy after login

House_Price_Prediction

#From our data, we are going to train and test our data

from sklearn.model_selection import train_test_split

X = data.drop(['median_house_value'], axis = 1)
y = data['median_house_value']

Copy after login

House_Price_Prediction

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)

Copy after login

#Examining correlation between x and y training data
train_data = X_train.join(y_train)

Copy after login

train_data

Copy after login

House_Price_Prediction

#Visualizing the above
train_data.hist(figsize=(15, 8))

Copy after login

House_Price_Prediction

#Encoding non-numeric columns to see if they are useful and categorical for analysis

train_data_encoded = pd.get_dummies(train_data, drop_first=True)
correlation_matrix = train_data_encoded.corr()
print(correlation_matrix)

Copy after login

House_Price_Prediction

train_data_encoded.corr()

Copy after login

House_Price_Prediction

plt.figure(figsize=(15,8))
sns.heatmap(train_data_encoded.corr(), annot=True, cmap = "inferno")

Copy after login

House_Price_Prediction

import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt

Copy after login

#Read from the directory where you stored the data

data  = pd.read_csv('/kaggle/input/california-housing-prices/housing.csv')

Copy after login

House_Price_Prediction

data

Copy after login

ocean_proximity
INLAND 5183
NEAR OCEAN 2108
NEAR BAY 1783
ISLAND 5
Name: count, dtype: int64

#Test to see if there arent any null values
data.info()

Copy after login

House_Price_Prediction

#Trying to draw the same number of null values
data.dropna(inplace = True)

Copy after login

data.info()

Copy after login

House_Price_Prediction

#From our data, we are going to train and test our data

from sklearn.model_selection import train_test_split

X = data.drop(['median_house_value'], axis = 1)
y = data['median_house_value']

Copy after login

House_Price_Prediction

Copy after login

House_Price_Prediction

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)

Copy after login

#Examining correlation between x and y training data
train_data = X_train.join(y_train)

Copy after login

House_Price_Prediction

train_data

Copy after login

House_Price_Prediction

#Visualizing the above
train_data.hist(figsize=(15, 8))

Copy after login

#Encoding non-numeric columns to see if they are useful and categorical for analysis

train_data_encoded = pd.get_dummies(train_data, drop_first=True)
correlation_matrix = train_data_encoded.corr()
print(correlation_matrix)

Copy after login

train_data_encoded.corr()

Copy after login

plt.figure(figsize=(15,8))
sns.heatmap(train_data_encoded.corr(), annot=True, cmap = "inferno")

Copy after login

train_data['total_rooms'] = np.log(train_data['total_rooms'] + 1)
train_data['total_bedrooms'] = np.log(train_data['total_bedrooms'] +1)
train_data['population'] = np.log(train_data['population'] + 1)
train_data['households'] = np.log(train_data['households'] + 1)

Copy after login

train_data.hist(figsize=(15, 8))

Copy after login

0.5092972905670141

#convert ocean_proximity factors into binary's using one_hot_encoding
train_data.ocean_proximity.value_counts()

Copy after login

House_Price_Prediction

#For each feature of the above we will then create its binary(0 or 1)
pd.get_dummies(train_data.ocean_proximity)

Copy after login

0.4447616558596853

#Dropping afterwards the proximity
train_data = train_data.join(pd.get_dummies(train_data.ocean_proximity)).drop(['ocean_proximity'], axis=1)

Copy after login

House_Price_Prediction

train_data

Copy after login

House_Price_Prediction

#recheck for correlation
plt.figure(figsize=(18, 8))
sns.heatmap(train_data.corr(), annot=True, cmap ='twilight')

Copy after login

0.5384474921332503

I would really say that training a machine is not the easiest of processes but to keep improving the results above you can add more features under the param_grid such as the min_feature and in that way your best estimator score can keep on improvimng.

If you got till this far please like and share your comment below, your opinion really matters. Thank you!??❤️

The above is the detailed content of House_Price_Prediction. For more information, please follow other related articles on the PHP Chinese website!