Simple Linear Regression - Predicting housing prices in Kansas City
In this problem, we split our data into test and training data sets. We feed our training data for the machine to learn, then we see how accurately we can predict the test data prices. Data set to the problem can be downloaded from here.
Python Code and Explanation:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
Seaborn is built upon matplotlib library and makes the plots look more artistic
Reading data and extracting space and prices:
# Reading Data
housing_data = pd.read_excel(r'PATH')
space = np.array(housing_data['sqft_living']).reshape(-1, 1)
prices = np.array(housing_data['price'])
As a rule of thumb in sklearn interfaces, X should be a 2D array & Y should be a 1D array .reshape(len(a), 1) or .reshape(-1, 1) turns 1D array to a 2D array by turning each of its elements into an array of its own
Splitting data into train and test data sets:
space_train, space_test, prices_train, prices_test = train_test_split(space, prices, test_size=0.25)
# Training the data
regressor = LinearRegression()
regressor.fit(space_train, prices_train)
Now that the regressor learnt training data & formed a model, we will see how accurately it can predict the test data
prices_pred = regressor.predict(space_test)
sns.set()
Overwriting default Matplotlib settings with Seaborn for plots to make plots look more elegant
Regression line of training data set:
plt.scatter(space_train, prices_train, color='blue')
plt.plot(space_train, regressor.predict(space_train), color='red')
plt.title('Regression Analysis - Training Data')
plt.xlabel('Living Space')
plt.ylabel('Price')
plt.show()
Visualizing Test Dataset:
plt.scatter(space_test, prices_test, color='blue')
plt.plot(space_train, regressor.predict(space_train), color='orange')
We also plot training data here because our model is trained with that data
plt.title('Regression Analysis - Test Data')
plt.xlabel('Living Space')
plt.ylabel('Price')
plt.show()
Test Data:

Training Data:
