223 lines
6.2 KiB
Plaintext
223 lines
6.2 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Optional Lab: Linear Regression using Scikit-Learn"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"There is an open-source, commercially usable machine learning toolkit called [scikit-learn](https://scikit-learn.org/stable/index.html). This toolkit contains implementations of many of the algorithms that you will work with in this course.\n",
|
|
"\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Goals\n",
|
|
"In this lab you will:\n",
|
|
"- Utilize scikit-learn to implement linear regression using Gradient Descent"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Tools\n",
|
|
"You will utilize functions from scikit-learn as well as matplotlib and NumPy. "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import numpy as np\n",
|
|
"import matplotlib.pyplot as plt\n",
|
|
"from sklearn.linear_model import SGDRegressor\n",
|
|
"from sklearn.preprocessing import StandardScaler\n",
|
|
"from lab_utils_multi import load_house_data\n",
|
|
"from lab_utils_common import dlc\n",
|
|
"np.set_printoptions(precision=2)\n",
|
|
"plt.style.use('./deeplearning.mplstyle')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Gradient Descent\n",
|
|
"Scikit-learn has a gradient descent regression model [sklearn.linear_model.SGDRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDRegressor.html#examples-using-sklearn-linear-model-sgdregressor). Like your previous implementation of gradient descent, this model performs best with normalized inputs. [sklearn.preprocessing.StandardScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html#sklearn.preprocessing.StandardScaler) will perform z-score normalization as in a previous lab. Here it is referred to as 'standard score'."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Load the data set"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"X_train, y_train = load_house_data()\n",
|
|
"X_features = ['size(sqft)','bedrooms','floors','age']"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Scale/normalize the training data"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"scaler = StandardScaler()\n",
|
|
"X_norm = scaler.fit_transform(X_train)\n",
|
|
"print(f\"Peak to Peak range by column in Raw X:{np.ptp(X_train,axis=0)}\") \n",
|
|
"print(f\"Peak to Peak range by column in Normalized X:{np.ptp(X_norm,axis=0)}\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Create and fit the regression model"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"sgdr = SGDRegressor(max_iter=1000)\n",
|
|
"sgdr.fit(X_norm, y_train)\n",
|
|
"print(sgdr)\n",
|
|
"print(f\"number of iterations completed: {sgdr.n_iter_}, number of weight updates: {sgdr.t_}\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### View parameters\n",
|
|
"Note, the parameters are associated with the *normalized* input data. The fit parameters are very close to those found in the previous lab with this data."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"b_norm = sgdr.intercept_\n",
|
|
"w_norm = sgdr.coef_\n",
|
|
"print(f\"model parameters: w: {w_norm}, b:{b_norm}\")\n",
|
|
"print( \"model parameters from previous lab: w: [110.56 -21.27 -32.71 -37.97], b: 363.16\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Make predictions\n",
|
|
"Predict the targets of the training data. Use both the `predict` routine and compute using $w$ and $b$."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# make a prediction using sgdr.predict()\n",
|
|
"y_pred_sgd = sgdr.predict(X_norm)\n",
|
|
"# make a prediction using w,b. \n",
|
|
"y_pred = np.dot(X_norm, w_norm) + b_norm \n",
|
|
"print(f\"prediction using np.dot() and sgdr.predict match: {(y_pred == y_pred_sgd).all()}\")\n",
|
|
"\n",
|
|
"print(f\"Prediction on training set:\\n{y_pred[:4]}\" )\n",
|
|
"print(f\"Target values \\n{y_train[:4]}\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Plot Results\n",
|
|
"Let's plot the predictions versus the target values."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# plot predictions and targets vs original features \n",
|
|
"fig,ax=plt.subplots(1,4,figsize=(12,3),sharey=True)\n",
|
|
"for i in range(len(ax)):\n",
|
|
" ax[i].scatter(X_train[:,i],y_train, label = 'target')\n",
|
|
" ax[i].set_xlabel(X_features[i])\n",
|
|
" ax[i].scatter(X_train[:,i],y_pred,color=dlc[\"dlorange\"], label = 'predict')\n",
|
|
"ax[0].set_ylabel(\"Price\"); ax[0].legend();\n",
|
|
"fig.suptitle(\"target versus prediction using z-score normalized model\")\n",
|
|
"plt.show()"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Congratulations!\n",
|
|
"In this lab you:\n",
|
|
"- utilized an open-source machine learning toolkit, scikit-learn\n",
|
|
"- implemented linear regression using gradient descent and feature normalization from that toolkit"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": []
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.7.6"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|