ml-schoo-and-maybe-andrew-ng/work/C1_W2_Lab05_Sklearn_GD_Soln...

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Optional Lab: Linear Regression using Scikit-Learn"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There is an open-source, commercially usable machine learning toolkit called [scikit-learn](https://scikit-learn.org/stable/index.html). This toolkit contains implementations of many of the algorithms that you will work with in this course.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Goals\n",
    "In this lab you will:\n",
    "- Utilize  scikit-learn to implement linear regression using Gradient Descent"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Tools\n",
    "You will utilize functions from scikit-learn as well as matplotlib and NumPy. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "from sklearn.linear_model import SGDRegressor\n",
    "from sklearn.preprocessing import StandardScaler\n",
    "from lab_utils_multi import  load_house_data\n",
    "from lab_utils_common import dlc\n",
    "np.set_printoptions(precision=2)\n",
    "plt.style.use('./deeplearning.mplstyle')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Gradient Descent\n",
    "Scikit-learn has a gradient descent regression model [sklearn.linear_model.SGDRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDRegressor.html#examples-using-sklearn-linear-model-sgdregressor).  Like your previous implementation of gradient descent, this model performs best with normalized inputs. [sklearn.preprocessing.StandardScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html#sklearn.preprocessing.StandardScaler) will perform z-score normalization as in a previous lab. Here it is referred to as 'standard score'."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Load the data set"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "X_train, y_train = load_house_data()\n",
    "X_features = ['size(sqft)','bedrooms','floors','age']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Scale/normalize the training data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "scaler = StandardScaler()\n",
    "X_norm = scaler.fit_transform(X_train)\n",
    "print(f\"Peak to Peak range by column in Raw        X:{np.ptp(X_train,axis=0)}\")   \n",
    "print(f\"Peak to Peak range by column in Normalized X:{np.ptp(X_norm,axis=0)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create and fit the regression model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sgdr = SGDRegressor(max_iter=1000)\n",
    "sgdr.fit(X_norm, y_train)\n",
    "print(sgdr)\n",
    "print(f\"number of iterations completed: {sgdr.n_iter_}, number of weight updates: {sgdr.t_}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### View parameters\n",
    "Note, the parameters are associated with the *normalized* input data. The fit parameters are very close to those found in the previous lab with this data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "b_norm = sgdr.intercept_\n",
    "w_norm = sgdr.coef_\n",
    "print(f\"model parameters:                   w: {w_norm}, b:{b_norm}\")\n",
    "print( \"model parameters from previous lab: w: [110.56 -21.27 -32.71 -37.97], b: 363.16\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Make predictions\n",
    "Predict the targets of the training data. Use both the `predict` routine and compute using $w$ and $b$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# make a prediction using sgdr.predict()\n",
    "y_pred_sgd = sgdr.predict(X_norm)\n",
    "# make a prediction using w,b. \n",
    "y_pred = np.dot(X_norm, w_norm) + b_norm  \n",
    "print(f\"prediction using np.dot() and sgdr.predict match: {(y_pred == y_pred_sgd).all()}\")\n",
    "\n",
    "print(f\"Prediction on training set:\\n{y_pred[:4]}\" )\n",
    "print(f\"Target values \\n{y_train[:4]}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Plot Results\n",
    "Let's plot the predictions versus the target values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# plot predictions and targets vs original features    \n",
    "fig,ax=plt.subplots(1,4,figsize=(12,3),sharey=True)\n",
    "for i in range(len(ax)):\n",
    "    ax[i].scatter(X_train[:,i],y_train, label = 'target')\n",
    "    ax[i].set_xlabel(X_features[i])\n",
    "    ax[i].scatter(X_train[:,i],y_pred,color=dlc[\"dlorange\"], label = 'predict')\n",
    "ax[0].set_ylabel(\"Price\"); ax[0].legend();\n",
    "fig.suptitle(\"target versus prediction using z-score normalized model\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Congratulations!\n",
    "In this lab you:\n",
    "- utilized an open-source machine learning toolkit, scikit-learn\n",
    "- implemented linear regression using gradient descent and feature normalization from that toolkit"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
first commit 2022-11-15 10:53:25 -05:00			`{`
			`"cells": [`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"# Optional Lab: Linear Regression using Scikit-Learn"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"There is an open-source, commercially usable machine learning toolkit called [scikit-learn](https://scikit-learn.org/stable/index.html). This toolkit contains implementations of many of the algorithms that you will work with in this course.\n",`
			`"\n"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"## Goals\n",`
			`"In this lab you will:\n",`
			`"- Utilize scikit-learn to implement linear regression using Gradient Descent"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"## Tools\n",`
			`"You will utilize functions from scikit-learn as well as matplotlib and NumPy. "`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"import numpy as np\n",`
			`"import matplotlib.pyplot as plt\n",`
			`"from sklearn.linear_model import SGDRegressor\n",`
			`"from sklearn.preprocessing import StandardScaler\n",`
			`"from lab_utils_multi import load_house_data\n",`
			`"from lab_utils_common import dlc\n",`
			`"np.set_printoptions(precision=2)\n",`
			`"plt.style.use('./deeplearning.mplstyle')"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"# Gradient Descent\n",`
			"Scikit-learn has a gradient descent regression model [sklearn.linear_model.SGDRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDRegressor.html#examples-using-sklearn-linear-model-sgdregressor). Like your previous implementation of gradient descent, this model performs best with normalized inputs. [sklearn.preprocessing.StandardScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html#sklearn.preprocessing.StandardScaler) will perform z-score normalization as in a previous lab. Here it is referred to as 'standard score'."
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"### Load the data set"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"X_train, y_train = load_house_data()\n",`
			`"X_features = ['size(sqft)','bedrooms','floors','age']"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"### Scale/normalize the training data"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"scaler = StandardScaler()\n",`
			`"X_norm = scaler.fit_transform(X_train)\n",`
			`"print(f\"Peak to Peak range by column in Raw X:{np.ptp(X_train,axis=0)}\") \n",`
			`"print(f\"Peak to Peak range by column in Normalized X:{np.ptp(X_norm,axis=0)}\")"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"### Create and fit the regression model"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"sgdr = SGDRegressor(max_iter=1000)\n",`
			`"sgdr.fit(X_norm, y_train)\n",`
			`"print(sgdr)\n",`
			`"print(f\"number of iterations completed: {sgdr.n_iter_}, number of weight updates: {sgdr.t_}\")"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"### View parameters\n",`
			`"Note, the parameters are associated with the normalized input data. The fit parameters are very close to those found in the previous lab with this data."`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"b_norm = sgdr.intercept_\n",`
			`"w_norm = sgdr.coef_\n",`
			`"print(f\"model parameters: w: {w_norm}, b:{b_norm}\")\n",`
			`"print( \"model parameters from previous lab: w: [110.56 -21.27 -32.71 -37.97], b: 363.16\")"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"### Make predictions\n",`
			"Predict the targets of the training data. Use both the `predict` routine and compute using $w$ and $b$."
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"# make a prediction using sgdr.predict()\n",`
			`"y_pred_sgd = sgdr.predict(X_norm)\n",`
			`"# make a prediction using w,b. \n",`
			`"y_pred = np.dot(X_norm, w_norm) + b_norm \n",`
			`"print(f\"prediction using np.dot() and sgdr.predict match: {(y_pred == y_pred_sgd).all()}\")\n",`
			`"\n",`
			`"print(f\"Prediction on training set:\\n{y_pred[:4]}\" )\n",`
			`"print(f\"Target values \\n{y_train[:4]}\")"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"### Plot Results\n",`
			`"Let's plot the predictions versus the target values."`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"metadata": {},`
			`"outputs": [],`
			`"source": [`
			`"# plot predictions and targets vs original features \n",`
			`"fig,ax=plt.subplots(1,4,figsize=(12,3),sharey=True)\n",`
			`"for i in range(len(ax)):\n",`
			`" ax[i].scatter(X_train[:,i],y_train, label = 'target')\n",`
			`" ax[i].set_xlabel(X_features[i])\n",`
			`" ax[i].scatter(X_train[:,i],y_pred,color=dlc[\"dlorange\"], label = 'predict')\n",`
			`"ax[0].set_ylabel(\"Price\"); ax[0].legend();\n",`
			`"fig.suptitle(\"target versus prediction using z-score normalized model\")\n",`
			`"plt.show()"`
			`]`
			`},`
			`{`
			`"cell_type": "markdown",`
			`"metadata": {},`
			`"source": [`
			`"## Congratulations!\n",`
			`"In this lab you:\n",`
			`"- utilized an open-source machine learning toolkit, scikit-learn\n",`
			`"- implemented linear regression using gradient descent and feature normalization from that toolkit"`
			`]`
			`},`
			`{`
			`"cell_type": "code",`
			`"execution_count": null,`
			`"metadata": {},`
			`"outputs": [],`
			`"source": []`
			`}`
			`],`
			`"metadata": {`
			`"kernelspec": {`
			`"display_name": "Python 3",`
			`"language": "python",`
			`"name": "python3"`
			`},`
			`"language_info": {`
			`"codemirror_mode": {`
			`"name": "ipython",`
			`"version": 3`
			`},`
			`"file_extension": ".py",`
			`"mimetype": "text/x-python",`
			`"name": "python",`
			`"nbconvert_exporter": "python",`
			`"pygments_lexer": "ipython3",`
			`"version": "3.8.10"`
			`}`
			`},`
			`"nbformat": 4,`
			`"nbformat_minor": 5`
			`}`