# Simple Linear Regression - Predicting housing prices in Kansas City

In this problem, we split our data into test and training data sets. We feed our training data for the machine to learn, then we see how accurately we can predict the test data prices. Data set to the problem can be downloaded from __here__.

Python Code and Explanation:

```
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
```

Seaborn is built upon matplotlib library and makes the plots look more artistic

Reading data and extracting space and prices:

```
# Reading Data
housing_data = pd.read_excel(r'PATH')
space = np.array(housing_data['sqft_living']).reshape(-1, 1)
prices = np.array(housing_data['price'])
```

As a rule of thumb in sklearn interfaces, X should be a 2D array & Y should be a 1D array .reshape(len(a), 1) or .reshape(-1, 1) turns 1D array to a 2D array by turning each of its elements into an array of its own

Splitting data into train and test data sets:

```
space_train, space_test, prices_train, prices_test = train_test_split(space, prices, test_size=0.25)
# Training the data
regressor = LinearRegression()
regressor.fit(space_train, prices_train)
```

Now that the regressor learnt training data & formed a model, we will see how accurately it can predict the test data

```
prices_pred = regressor.predict(space_test)
sns.set()
```

Overwriting default Matplotlib settings with Seaborn for plots to make plots look more elegant

Regression line of training data set:

```
plt.scatter(space_train, prices_train, color='blue')
plt.plot(space_train, regressor.predict(space_train), color='red')
plt.title('Regression Analysis - Training Data')
plt.xlabel('Living Space')
plt.ylabel('Price')
plt.show()
```

Visualizing Test Dataset:

```
plt.scatter(space_test, prices_test, color='blue')
plt.plot(space_train, regressor.predict(space_train), color='orange')
```

We also plot training data here because our model is trained with that data

```
plt.title('Regression Analysis - Test Data')
plt.xlabel('Living Space')
plt.ylabel('Price')
plt.show()
```

Test Data:

Training Data: