{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Optional Lab: Linear Regression using Scikit-Learn" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There is an open-source, commercially usable machine learning toolkit called [scikit-learn](https://scikit-learn.org/stable/index.html). This toolkit contains implementations of many of the algorithms that you will work with in this course.\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Goals\n", "In this lab you will:\n", "- Utilize scikit-learn to implement linear regression using a close form solution based on the normal equation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Tools\n", "You will utilize functions from scikit-learn as well as matplotlib and NumPy. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from sklearn.linear_model import LinearRegression\n", "from lab_utils_multi import load_house_data\n", "plt.style.use('./deeplearning.mplstyle')\n", "np.set_printoptions(precision=2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "# Linear Regression, closed-form solution\n", "Scikit-learn has the [linear regression model](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression) which implements a closed-form linear regression.\n", "\n", "Let's use the data from the early labs - a house with 1000 square feet sold for \\\\$300,000 and a house with 2000 square feet sold for \\\\$500,000.\n", "\n", "| Size (1000 sqft) | Price (1000s of dollars) |\n", "| ----------------| ------------------------ |\n", "| 1 | 300 |\n", "| 2 | 500 |\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Load the data set" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "X_train = np.array([1.0, 2.0]) #features\n", "y_train = np.array([300, 500]) #target value" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create and fit the model\n", "The code below performs regression using scikit-learn. \n", "The first step creates a regression object. \n", "The second step utilizes one of the methods associated with the object, `fit`. This performs regression, fitting the parameters to the input data. The toolkit expects a two-dimensional X matrix." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "linear_model = LinearRegression()\n", "#X must be a 2-D Matrix\n", "linear_model.fit(X_train.reshape(-1, 1), y_train) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### View Parameters \n", "The $\\mathbf{w}$ and $\\mathbf{b}$ parameters are referred to as 'coefficients' and 'intercept' in scikit-learn." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "b = linear_model.intercept_\n", "w = linear_model.coef_\n", "print(f\"w = {w:}, b = {b:0.2f}\")\n", "print(f\"'manual' prediction: f_wb = wx+b : {1200*w + b}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Make Predictions\n", "\n", "Calling the `predict` function generates predictions." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "y_pred = linear_model.predict(X_train.reshape(-1, 1))\n", "\n", "print(\"Prediction on training set:\", y_pred)\n", "\n", "X_test = np.array([[1200]])\n", "print(f\"Prediction for 1200 sqft house: ${linear_model.predict(X_test)[0]:0.2f}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Second Example\n", "The second example is from an earlier lab with multiple features. The final parameter values and predictions are very close to the results from the un-normalized 'long-run' from that lab. That un-normalized run took hours to produce results, while this is nearly instantaneous. The closed-form solution work well on smaller data sets such as these but can be computationally demanding on larger data sets. \n", ">The closed-form solution does not require normalization." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# load the dataset\n", "X_train, y_train = load_house_data()\n", "X_features = ['size(sqft)','bedrooms','floors','age']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "linear_model = LinearRegression()\n", "linear_model.fit(X_train, y_train) " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "b = linear_model.intercept_\n", "w = linear_model.coef_\n", "print(f\"w = {w:}, b = {b:0.2f}\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(f\"Prediction on training set:\\n {linear_model.predict(X_train)[:4]}\" )\n", "print(f\"prediction using w,b:\\n {(X_train @ w + b)[:4]}\")\n", "print(f\"Target values \\n {y_train[:4]}\")\n", "\n", "x_house = np.array([1200, 3,1, 40]).reshape(-1,4)\n", "x_house_predict = linear_model.predict(x_house)[0]\n", "print(f\" predicted price of a house with 1200 sqft, 3 bedrooms, 1 floor, 40 years old = ${x_house_predict*1000:0.2f}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Congratulations!\n", "In this lab you:\n", "- utilized an open-source machine learning toolkit, scikit-learn\n", "- implemented linear regression using a close-form solution from that toolkit" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" } }, "nbformat": 4, "nbformat_minor": 5 }