Simple Linear Regression - Predicting housing prices in Kansas City

In this problem, we split our data into test and training data sets. We feed our training data for the machine to learn, then we see how accurately we can predict the test data prices. Data set to the problem can be downloaded from here.

Python Code and Explanation:

import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns  

Seaborn is built upon matplotlib library and makes the plots look more artistic

Reading data and extracting space and prices:

# Reading Data
housing_data = pd.read_excel(r'PATH')

space = np.array(housing_data['sqft_living']).reshape(-1, 1)
prices = np.array(housing_data['price'])

As a rule of thumb in sklearn interfaces, X should be a 2D array & Y should be a 1D array .reshape(len(a), 1) or .reshape(-1, 1) turns 1D array to a 2D array by turning each of its elements into an array of its own

Splitting data into train and test data sets:

space_train, space_test, prices_train, prices_test = train_test_split(space, prices, test_size=0.25)

# Training the data
regressor = LinearRegression(), prices_train)

Now that the regressor learnt training data & formed a model, we will see how accurately it can predict the test data

prices_pred = regressor.predict(space_test)

Overwriting default Matplotlib settings with Seaborn for plots to make plots look more elegant

Regression line of training data set:

plt.scatter(space_train, prices_train, color='blue')
plt.plot(space_train, regressor.predict(space_train), color='red')
plt.title('Regression Analysis - Training Data')
plt.xlabel('Living Space')

Visualizing Test Dataset:

plt.scatter(space_test, prices_test, color='blue')
plt.plot(space_train, regressor.predict(space_train), color='orange')

We also plot training data here because our model is trained with that data

plt.title('Regression Analysis - Test Data')
plt.xlabel('Living Space')

Test Data:

Scatter Plot of Test Data

Training Data:

Scatter Plot of Training Data