{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Optional Lab: Linear Regression using Scikit-Learn"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There is an open-source, commercially usable machine learning toolkit called [scikit-learn](https://scikit-learn.org/stable/index.html). This toolkit contains implementations of many of the algorithms that you will work with in this course.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Goals\n",
    "In this lab you will:\n",
    "- Utilize  scikit-learn to implement linear regression using Gradient Descent"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Tools\n",
    "You will utilize functions from scikit-learn as well as matplotlib and NumPy. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "from sklearn.linear_model import SGDRegressor\n",
    "from sklearn.preprocessing import StandardScaler\n",
    "from lab_utils_multi import  load_house_data\n",
    "from lab_utils_common import dlc\n",
    "np.set_printoptions(precision=2)\n",
    "plt.style.use('./deeplearning.mplstyle')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Gradient Descent\n",
    "Scikit-learn has a gradient descent regression model [sklearn.linear_model.SGDRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDRegressor.html#examples-using-sklearn-linear-model-sgdregressor).  Like your previous implementation of gradient descent, this model performs best with normalized inputs. [sklearn.preprocessing.StandardScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html#sklearn.preprocessing.StandardScaler) will perform z-score normalization as in a previous lab. Here it is referred to as 'standard score'."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Load the data set"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "X_train, y_train = load_house_data()\n",
    "X_features = ['size(sqft)','bedrooms','floors','age']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Scale/normalize the training data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "scaler = StandardScaler()\n",
    "X_norm = scaler.fit_transform(X_train)\n",
    "print(f\"Peak to Peak range by column in Raw        X:{np.ptp(X_train,axis=0)}\")   \n",
    "print(f\"Peak to Peak range by column in Normalized X:{np.ptp(X_norm,axis=0)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create and fit the regression model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sgdr = SGDRegressor(max_iter=1000)\n",
    "sgdr.fit(X_norm, y_train)\n",
    "print(sgdr)\n",
    "print(f\"number of iterations completed: {sgdr.n_iter_}, number of weight updates: {sgdr.t_}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### View parameters\n",
    "Note, the parameters are associated with the *normalized* input data. The fit parameters are very close to those found in the previous lab with this data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "b_norm = sgdr.intercept_\n",
    "w_norm = sgdr.coef_\n",
    "print(f\"model parameters:                   w: {w_norm}, b:{b_norm}\")\n",
    "print( \"model parameters from previous lab: w: [110.56 -21.27 -32.71 -37.97], b: 363.16\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Make predictions\n",
    "Predict the targets of the training data. Use both the `predict` routine and compute using $w$ and $b$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# make a prediction using sgdr.predict()\n",
    "y_pred_sgd = sgdr.predict(X_norm)\n",
    "# make a prediction using w,b. \n",
    "y_pred = np.dot(X_norm, w_norm) + b_norm  \n",
    "print(f\"prediction using np.dot() and sgdr.predict match: {(y_pred == y_pred_sgd).all()}\")\n",
    "\n",
    "print(f\"Prediction on training set:\\n{y_pred[:4]}\" )\n",
    "print(f\"Target values \\n{y_train[:4]}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Plot Results\n",
    "Let's plot the predictions versus the target values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# plot predictions and targets vs original features    \n",
    "fig,ax=plt.subplots(1,4,figsize=(12,3),sharey=True)\n",
    "for i in range(len(ax)):\n",
    "    ax[i].scatter(X_train[:,i],y_train, label = 'target')\n",
    "    ax[i].set_xlabel(X_features[i])\n",
    "    ax[i].scatter(X_train[:,i],y_pred,color=dlc[\"dlorange\"], label = 'predict')\n",
    "ax[0].set_ylabel(\"Price\"); ax[0].legend();\n",
    "fig.suptitle(\"target versus prediction using z-score normalized model\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Congratulations!\n",
    "In this lab you:\n",
    "- utilized an open-source machine learning toolkit, scikit-learn\n",
    "- implemented linear regression using gradient descent and feature normalization from that toolkit"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}