ml-schoo-and-maybe-andrew-ng/work/C1_W2_Lab05_Sklearn_GD_Soln...

223 lines
6.2 KiB
Plaintext
Raw Permalink Normal View History

2022-11-15 10:53:25 -05:00
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Optional Lab: Linear Regression using Scikit-Learn"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There is an open-source, commercially usable machine learning toolkit called [scikit-learn](https://scikit-learn.org/stable/index.html). This toolkit contains implementations of many of the algorithms that you will work with in this course.\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Goals\n",
"In this lab you will:\n",
"- Utilize scikit-learn to implement linear regression using Gradient Descent"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Tools\n",
"You will utilize functions from scikit-learn as well as matplotlib and NumPy. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"from sklearn.linear_model import SGDRegressor\n",
"from sklearn.preprocessing import StandardScaler\n",
"from lab_utils_multi import load_house_data\n",
"from lab_utils_common import dlc\n",
"np.set_printoptions(precision=2)\n",
"plt.style.use('./deeplearning.mplstyle')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Gradient Descent\n",
"Scikit-learn has a gradient descent regression model [sklearn.linear_model.SGDRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDRegressor.html#examples-using-sklearn-linear-model-sgdregressor). Like your previous implementation of gradient descent, this model performs best with normalized inputs. [sklearn.preprocessing.StandardScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html#sklearn.preprocessing.StandardScaler) will perform z-score normalization as in a previous lab. Here it is referred to as 'standard score'."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load the data set"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"X_train, y_train = load_house_data()\n",
"X_features = ['size(sqft)','bedrooms','floors','age']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Scale/normalize the training data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"scaler = StandardScaler()\n",
"X_norm = scaler.fit_transform(X_train)\n",
"print(f\"Peak to Peak range by column in Raw X:{np.ptp(X_train,axis=0)}\") \n",
"print(f\"Peak to Peak range by column in Normalized X:{np.ptp(X_norm,axis=0)}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create and fit the regression model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sgdr = SGDRegressor(max_iter=1000)\n",
"sgdr.fit(X_norm, y_train)\n",
"print(sgdr)\n",
"print(f\"number of iterations completed: {sgdr.n_iter_}, number of weight updates: {sgdr.t_}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### View parameters\n",
"Note, the parameters are associated with the *normalized* input data. The fit parameters are very close to those found in the previous lab with this data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"b_norm = sgdr.intercept_\n",
"w_norm = sgdr.coef_\n",
"print(f\"model parameters: w: {w_norm}, b:{b_norm}\")\n",
"print( \"model parameters from previous lab: w: [110.56 -21.27 -32.71 -37.97], b: 363.16\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Make predictions\n",
"Predict the targets of the training data. Use both the `predict` routine and compute using $w$ and $b$."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# make a prediction using sgdr.predict()\n",
"y_pred_sgd = sgdr.predict(X_norm)\n",
"# make a prediction using w,b. \n",
"y_pred = np.dot(X_norm, w_norm) + b_norm \n",
"print(f\"prediction using np.dot() and sgdr.predict match: {(y_pred == y_pred_sgd).all()}\")\n",
"\n",
"print(f\"Prediction on training set:\\n{y_pred[:4]}\" )\n",
"print(f\"Target values \\n{y_train[:4]}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Plot Results\n",
"Let's plot the predictions versus the target values."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# plot predictions and targets vs original features \n",
"fig,ax=plt.subplots(1,4,figsize=(12,3),sharey=True)\n",
"for i in range(len(ax)):\n",
" ax[i].scatter(X_train[:,i],y_train, label = 'target')\n",
" ax[i].set_xlabel(X_features[i])\n",
" ax[i].scatter(X_train[:,i],y_pred,color=dlc[\"dlorange\"], label = 'predict')\n",
"ax[0].set_ylabel(\"Price\"); ax[0].legend();\n",
"fig.suptitle(\"target versus prediction using z-score normalized model\")\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Congratulations!\n",
"In this lab you:\n",
"- utilized an open-source machine learning toolkit, scikit-learn\n",
"- implemented linear regression using gradient descent and feature normalization from that toolkit"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.10"
}
},
"nbformat": 4,
"nbformat_minor": 5
}