{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Optional Lab: Linear Regression using Scikit-Learn" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There is an open-source, commercially usable machine learning toolkit called [scikit-learn](https://scikit-learn.org/stable/index.html). This toolkit contains implementations of many of the algorithms that you will work with in this course.\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Goals\n", "In this lab you will:\n", "- Utilize scikit-learn to implement linear regression using Gradient Descent" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Tools\n", "You will utilize functions from scikit-learn as well as matplotlib and NumPy. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from sklearn.linear_model import SGDRegressor\n", "from sklearn.preprocessing import StandardScaler\n", "from lab_utils_multi import load_house_data\n", "from lab_utils_common import dlc\n", "np.set_printoptions(precision=2)\n", "plt.style.use('./deeplearning.mplstyle')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Gradient Descent\n", "Scikit-learn has a gradient descent regression model [sklearn.linear_model.SGDRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDRegressor.html#examples-using-sklearn-linear-model-sgdregressor). Like your previous implementation of gradient descent, this model performs best with normalized inputs. [sklearn.preprocessing.StandardScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html#sklearn.preprocessing.StandardScaler) will perform z-score normalization as in a previous lab. Here it is referred to as 'standard score'." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load the data set" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "X_train, y_train = load_house_data()\n", "X_features = ['size(sqft)','bedrooms','floors','age']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Scale/normalize the training data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "scaler = StandardScaler()\n", "X_norm = scaler.fit_transform(X_train)\n", "print(f\"Peak to Peak range by column in Raw X:{np.ptp(X_train,axis=0)}\") \n", "print(f\"Peak to Peak range by column in Normalized X:{np.ptp(X_norm,axis=0)}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create and fit the regression model" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sgdr = SGDRegressor(max_iter=1000)\n", "sgdr.fit(X_norm, y_train)\n", "print(sgdr)\n", "print(f\"number of iterations completed: {sgdr.n_iter_}, number of weight updates: {sgdr.t_}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### View parameters\n", "Note, the parameters are associated with the *normalized* input data. The fit parameters are very close to those found in the previous lab with this data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "b_norm = sgdr.intercept_\n", "w_norm = sgdr.coef_\n", "print(f\"model parameters: w: {w_norm}, b:{b_norm}\")\n", "print( \"model parameters from previous lab: w: [110.56 -21.27 -32.71 -37.97], b: 363.16\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Make predictions\n", "Predict the targets of the training data. Use both the `predict` routine and compute using $w$ and $b$." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# make a prediction using sgdr.predict()\n", "y_pred_sgd = sgdr.predict(X_norm)\n", "# make a prediction using w,b. \n", "y_pred = np.dot(X_norm, w_norm) + b_norm \n", "print(f\"prediction using np.dot() and sgdr.predict match: {(y_pred == y_pred_sgd).all()}\")\n", "\n", "print(f\"Prediction on training set:\\n{y_pred[:4]}\" )\n", "print(f\"Target values \\n{y_train[:4]}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Plot Results\n", "Let's plot the predictions versus the target values." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# plot predictions and targets vs original features \n", "fig,ax=plt.subplots(1,4,figsize=(12,3),sharey=True)\n", "for i in range(len(ax)):\n", " ax[i].scatter(X_train[:,i],y_train, label = 'target')\n", " ax[i].set_xlabel(X_features[i])\n", " ax[i].scatter(X_train[:,i],y_pred,color=dlc[\"dlorange\"], label = 'predict')\n", "ax[0].set_ylabel(\"Price\"); ax[0].legend();\n", "fig.suptitle(\"target versus prediction using z-score normalized model\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Congratulations!\n", "In this lab you:\n", "- utilized an open-source machine learning toolkit, scikit-learn\n", "- implemented linear regression using gradient descent and feature normalization from that toolkit" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 5 }