first commit
159
C1_W1_Lab01_Python_Jupyter_Soln.ipynb
Normal file
@ -0,0 +1,159 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Optional Lab: Brief Introduction to Python and Jupyter Notebooks\n",
|
||||
"Welcome to the first optional lab! \n",
|
||||
"Optional labs are available to:\n",
|
||||
"- provide information - like this notebook\n",
|
||||
"- reinforce lecture material with hands-on examples\n",
|
||||
"- provide working examples of routines used in the graded labs"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Goals\n",
|
||||
"In this lab, you will:\n",
|
||||
"- Get a brief introduction to Jupyter notebooks\n",
|
||||
"- Take a tour of Jupyter notebooks\n",
|
||||
"- Learn the difference between markdown cells and code cells\n",
|
||||
"- Practice some basic python\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The easiest way to become familiar with Jupyter notebooks is to take the tour available above in the Help menu:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<figure>\n",
|
||||
" <center> <img src=\"./images/C1W1L1_Tour.PNG\" alt='missing' width=\"400\" ><center/>\n",
|
||||
"<figure/>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Jupyter notebooks have two types of cells that are used in this course. Cells such as this which contain documentation called `Markdown Cells`. The name is derived from the simple formatting language used in the cells. You will not be required to produce markdown cells. Its useful to understand the `cell pulldown` shown in graphic below. Occasionally, a cell will end up in the wrong mode and you may need to restore it to the right state:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<figure>\n",
|
||||
" <img src=\"./images/C1W1L1_Markdown.PNG\" alt='missing' width=\"400\" >\n",
|
||||
"<figure/>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The other type of cell is the `code cell` where you will write your code:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"This is code cell\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"#This is a 'Code' Cell\n",
|
||||
"print(\"This is code cell\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Python\n",
|
||||
"You can write your code in the code cells. \n",
|
||||
"To run the code, select the cell and either\n",
|
||||
"- hold the shift-key down and hit 'enter' or 'return'\n",
|
||||
"- click the 'run' arrow above\n",
|
||||
"<figure>\n",
|
||||
" <img src=\"./images/C1W1L1_Run.PNG\" width=\"400\" >\n",
|
||||
"<figure/>\n",
|
||||
"\n",
|
||||
" "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Print statement\n",
|
||||
"Print statements will generally use the python f-string style. \n",
|
||||
"Try creating your own print in the following cell. \n",
|
||||
"Try both methods of running the cell."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"f strings allow you to embed variables right in the strings!\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# print statements\n",
|
||||
"variable = \"right in the strings!\"\n",
|
||||
"print(f\"f strings allow you to embed variables {variable}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Congratulations!\n",
|
||||
"You now know how to find your way around a Jupyter Notebook."
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
478
C1_W1_Lab02_Model_Representation_Soln.ipynb
Normal file
284
C1_W1_Lab03_Cost_function_Soln.ipynb
Normal file
@ -0,0 +1,284 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Optional Lab: Cost Function \n",
|
||||
"<figure>\n",
|
||||
" <center> <img src=\"./images/C1_W1_L3_S2_Lecture_b.png\" style=\"width:1000px;height:200px;\" ></center>\n",
|
||||
"</figure>\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Goals\n",
|
||||
"In this lab you will:\n",
|
||||
"- you will implement and explore the `cost` function for linear regression with one variable. \n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Tools\n",
|
||||
"In this lab we will make use of: \n",
|
||||
"- NumPy, a popular library for scientific computing\n",
|
||||
"- Matplotlib, a popular library for plotting data\n",
|
||||
"- local plotting routines in the lab_utils_uni.py file in the local directory"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import numpy as np\n",
|
||||
"%matplotlib widget\n",
|
||||
"import matplotlib.pyplot as plt\n",
|
||||
"from lab_utils_uni import plt_intuition, plt_stationary, plt_update_onclick, soup_bowl\n",
|
||||
"plt.style.use('./deeplearning.mplstyle')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Problem Statement\n",
|
||||
"\n",
|
||||
"You would like a model which can predict housing prices given the size of the house. \n",
|
||||
"Let's use the same two data points as before the previous lab- a house with 1000 square feet sold for \\\\$300,000 and a house with 2000 square feet sold for \\\\$500,000.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"| Size (1000 sqft) | Price (1000s of dollars) |\n",
|
||||
"| -------------------| ------------------------ |\n",
|
||||
"| 1 | 300 |\n",
|
||||
"| 2 | 500 |\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"x_train = np.array([1.0, 2.0]) #(size in 1000 square feet)\n",
|
||||
"y_train = np.array([300.0, 500.0]) #(price in 1000s of dollars)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Computing Cost\n",
|
||||
"The term 'cost' in this assignment might be a little confusing since the data is housing cost. Here, cost is a measure how well our model is predicting the target price of the house. The term 'price' is used for housing data.\n",
|
||||
"\n",
|
||||
"The equation for cost with one variable is:\n",
|
||||
" $$J(w,b) = \\frac{1}{2m} \\sum\\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) - y^{(i)})^2 \\tag{1}$$ \n",
|
||||
" \n",
|
||||
"where \n",
|
||||
" $$f_{w,b}(x^{(i)}) = wx^{(i)} + b \\tag{2}$$\n",
|
||||
" \n",
|
||||
"- $f_{w,b}(x^{(i)})$ is our prediction for example $i$ using parameters $w,b$. \n",
|
||||
"- $(f_{w,b}(x^{(i)}) -y^{(i)})^2$ is the squared difference between the target value and the prediction. \n",
|
||||
"- These differences are summed over all the $m$ examples and divided by `2m` to produce the cost, $J(w,b)$. \n",
|
||||
">Note, in lecture summation ranges are typically from 1 to m, while code will be from 0 to m-1.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The code below calculates cost by looping over each example. In each loop:\n",
|
||||
"- `f_wb`, a prediction is calculated\n",
|
||||
"- the difference between the target and the prediction is calculated and squared.\n",
|
||||
"- this is added to the total cost."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def compute_cost(x, y, w, b): \n",
|
||||
" \"\"\"\n",
|
||||
" Computes the cost function for linear regression.\n",
|
||||
" \n",
|
||||
" Args:\n",
|
||||
" x (ndarray (m,)): Data, m examples \n",
|
||||
" y (ndarray (m,)): target values\n",
|
||||
" w,b (scalar) : model parameters \n",
|
||||
" \n",
|
||||
" Returns\n",
|
||||
" total_cost (float): The cost of using w,b as the parameters for linear regression\n",
|
||||
" to fit the data points in x and y\n",
|
||||
" \"\"\"\n",
|
||||
" # number of training examples\n",
|
||||
" m = x.shape[0] \n",
|
||||
" \n",
|
||||
" cost_sum = 0 \n",
|
||||
" for i in range(m): \n",
|
||||
" f_wb = w * x[i] + b \n",
|
||||
" cost = (f_wb - y[i]) ** 2 \n",
|
||||
" cost_sum = cost_sum + cost \n",
|
||||
" total_cost = (1 / (2 * m)) * cost_sum \n",
|
||||
"\n",
|
||||
" return total_cost"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Cost Function Intuition"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<img align=\"left\" src=\"./images/C1_W1_Lab02_GoalOfRegression.PNG\" style=\" width:380px; padding: 10px; \" /> Your goal is to find a model $f_{w,b}(x) = wx + b$, with parameters $w,b$, which will accurately predict house values given an input $x$. The cost is a measure of how accurate the model is on the training data.\n",
|
||||
"\n",
|
||||
"The cost equation (1) above shows that if $w$ and $b$ can be selected such that the predictions $f_{w,b}(x)$ match the target data $y$, the $(f_{w,b}(x^{(i)}) - y^{(i)})^2 $ term will be zero and the cost minimized. In this simple two point example, you can achieve this!\n",
|
||||
"\n",
|
||||
"In the previous lab, you determined that $b=100$ provided an optimal solution so let's set $b$ to 100 and focus on $w$.\n",
|
||||
"\n",
|
||||
"<br/>\n",
|
||||
"Below, use the slider control to select the value of $w$ that minimizes cost. It can take a few seconds for the plot to update."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"plt_intuition(x_train,y_train)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The plot contains a few points that are worth mentioning.\n",
|
||||
"- cost is minimized when $w = 200$, which matches results from the previous lab\n",
|
||||
"- Because the difference between the target and pediction is squared in the cost equation, the cost increases rapidly when $w$ is either too large or too small.\n",
|
||||
"- Using the `w` and `b` selected by minimizing cost results in a line which is a perfect fit to the data."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Cost Function Visualization- 3D\n",
|
||||
"\n",
|
||||
"You can see how cost varies with respect to *both* `w` and `b` by plotting in 3D or using a contour plot. \n",
|
||||
"It is worth noting that some of the plotting in this course can become quite involved. The plotting routines are provided and while it can be instructive to read through the code to become familiar with the methods, it is not needed to complete the course successfully. The routines are in lab_utils_uni.py in the local directory."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Larger Data Set\n",
|
||||
"It's use instructive to view a scenario with a few more data points. This data set includes data points that do not fall on the same line. What does that mean for the cost equation? Can we find $w$, and $b$ that will give us a cost of 0? "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"x_train = np.array([1.0, 1.7, 2.0, 2.5, 3.0, 3.2])\n",
|
||||
"y_train = np.array([250, 300, 480, 430, 630, 730,])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In the contour plot, click on a point to select `w` and `b` to achieve the lowest cost. Use the contours to guide your selections. Note, it can take a few seconds to update the graph. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"plt.close('all') \n",
|
||||
"fig, ax, dyn_items = plt_stationary(x_train, y_train)\n",
|
||||
"updater = plt_update_onclick(fig, ax, x_train, y_train, dyn_items)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Above, note the dashed lines in the left plot. These represent the portion of the cost contributed by each example in your training set. In this case, values of approximately $w=209$ and $b=2.4$ provide low cost. Note that, because our training examples are not on a line, the minimum cost is not zero."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Convex Cost surface\n",
|
||||
"The fact that the cost function squares the loss ensures that the 'error surface' is convex like a soup bowl. It will always have a minimum that can be reached by following the gradient in all dimensions. In the previous plot, because the $w$ and $b$ dimensions scale differently, this is not easy to recognize. The following plot, where $w$ and $b$ are symmetric, was shown in lecture:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"soup_bowl()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Congratulations!\n",
|
||||
"You have learned the following:\n",
|
||||
" - The cost equation provides a measure of how well your predictions match your training data.\n",
|
||||
" - Minimizing the cost can provide optimal values of $w$, $b$."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.6"
|
||||
},
|
||||
"toc-autonumbering": false
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
627
C1_W1_Lab04_Gradient_Descent_Soln.ipynb
Normal file
730
C1_W2_Lab01_Python_Numpy_Vectorization_Soln.ipynb
Normal file
@ -0,0 +1,730 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Optional Lab: Python, NumPy and Vectorization\n",
|
||||
"A brief introduction to some of the scientific computing used in this course. In particular the NumPy scientific computing package and its use with python.\n",
|
||||
"\n",
|
||||
"# Outline\n",
|
||||
"- [ 1.1 Goals](#toc_40015_1.1)\n",
|
||||
"- [ 1.2 Useful References](#toc_40015_1.2)\n",
|
||||
"- [2 Python and NumPy <a name='Python and NumPy'></a>](#toc_40015_2)\n",
|
||||
"- [3 Vectors](#toc_40015_3)\n",
|
||||
"- [ 3.1 Abstract](#toc_40015_3.1)\n",
|
||||
"- [ 3.2 NumPy Arrays](#toc_40015_3.2)\n",
|
||||
"- [ 3.3 Vector Creation](#toc_40015_3.3)\n",
|
||||
"- [ 3.4 Operations on Vectors](#toc_40015_3.4)\n",
|
||||
"- [4 Matrices](#toc_40015_4)\n",
|
||||
"- [ 4.1 Abstract](#toc_40015_4.1)\n",
|
||||
"- [ 4.2 NumPy Arrays](#toc_40015_4.2)\n",
|
||||
"- [ 4.3 Matrix Creation](#toc_40015_4.3)\n",
|
||||
"- [ 4.4 Operations on Matrices](#toc_40015_4.4)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import numpy as np # it is an unofficial standard to use np for numpy\n",
|
||||
"import time"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_1.1\"></a>\n",
|
||||
"## 1.1 Goals\n",
|
||||
"In this lab, you will:\n",
|
||||
"- Review the features of NumPy and Python that are used in Course 1"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_1.2\"></a>\n",
|
||||
"## 1.2 Useful References\n",
|
||||
"- NumPy Documentation including a basic introduction: [NumPy.org](https://NumPy.org/doc/stable/)\n",
|
||||
"- A challenging feature topic: [NumPy Broadcasting](https://NumPy.org/doc/stable/user/basics.broadcasting.html)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_2\"></a>\n",
|
||||
"# 2 Python and NumPy <a name='Python and NumPy'></a>\n",
|
||||
"Python is the programming language we will be using in this course. It has a set of numeric data types and arithmetic operations. NumPy is a library that extends the base capabilities of python to add a richer data set including more numeric types, vectors, matrices, and many matrix functions. NumPy and python work together fairly seamlessly. Python arithmetic operators work on NumPy data types and many NumPy functions will accept python data types.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3\"></a>\n",
|
||||
"# 3 Vectors\n",
|
||||
"<a name=\"toc_40015_3.1\"></a>\n",
|
||||
"## 3.1 Abstract\n",
|
||||
"<img align=\"right\" src=\"./images/C1_W2_Lab04_Vectors.PNG\" style=\"width:340px;\" >Vectors, as you will use them in this course, are ordered arrays of numbers. In notation, vectors are denoted with lower case bold letters such as $\\mathbf{x}$. The elements of a vector are all the same type. A vector does not, for example, contain both characters and numbers. The number of elements in the array is often referred to as the *dimension* though mathematicians may prefer *rank*. The vector shown has a dimension of $n$. The elements of a vector can be referenced with an index. In math settings, indexes typically run from 1 to n. In computer science and these labs, indexing will typically run from 0 to n-1. In notation, elements of a vector, when referenced individually will indicate the index in a subscript, for example, the $0^{th}$ element, of the vector $\\mathbf{x}$ is $x_0$. Note, the x is not bold in this case. \n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.2\"></a>\n",
|
||||
"## 3.2 NumPy Arrays\n",
|
||||
"\n",
|
||||
"NumPy's basic data structure is an indexable, n-dimensional *array* containing elements of the same type (`dtype`). Right away, you may notice we have overloaded the term 'dimension'. Above, it was the number of elements in the vector, here, dimension refers to the number of indexes of an array. A one-dimensional or 1-D array has one index. In Course 1, we will represent vectors as NumPy 1-D arrays. \n",
|
||||
"\n",
|
||||
" - 1-D array, shape (n,): n elements indexed [0] through [n-1]\n",
|
||||
" "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.3\"></a>\n",
|
||||
"## 3.3 Vector Creation\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Data creation routines in NumPy will generally have a first parameter which is the shape of the object. This can either be a single value for a 1-D result or a tuple (n,m,...) specifying the shape of the result. Below are examples of creating vectors using these routines."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# NumPy routines which allocate memory and fill arrays with value\n",
|
||||
"a = np.zeros(4); print(f\"np.zeros(4) : a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")\n",
|
||||
"a = np.zeros((4,)); print(f\"np.zeros(4,) : a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")\n",
|
||||
"a = np.random.random_sample(4); print(f\"np.random.random_sample(4): a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Some data creation routines do not take a shape tuple:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# NumPy routines which allocate memory and fill arrays with value but do not accept shape as input argument\n",
|
||||
"a = np.arange(4.); print(f\"np.arange(4.): a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")\n",
|
||||
"a = np.random.rand(4); print(f\"np.random.rand(4): a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"values can be specified manually as well. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# NumPy routines which allocate memory and fill with user specified values\n",
|
||||
"a = np.array([5,4,3,2]); print(f\"np.array([5,4,3,2]): a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")\n",
|
||||
"a = np.array([5.,4,3,2]); print(f\"np.array([5.,4,3,2]): a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"These have all created a one-dimensional vector `a` with four elements. `a.shape` returns the dimensions. Here we see a.shape = `(4,)` indicating a 1-d array with 4 elements. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.4\"></a>\n",
|
||||
"## 3.4 Operations on Vectors\n",
|
||||
"Let's explore some operations using vectors.\n",
|
||||
"<a name=\"toc_40015_3.4.1\"></a>\n",
|
||||
"### 3.4.1 Indexing\n",
|
||||
"Elements of vectors can be accessed via indexing and slicing. NumPy provides a very complete set of indexing and slicing capabilities. We will explore only the basics needed for the course here. Reference [Slicing and Indexing](https://NumPy.org/doc/stable/reference/arrays.indexing.html) for more details. \n",
|
||||
"**Indexing** means referring to *an element* of an array by its position within the array. \n",
|
||||
"**Slicing** means getting a *subset* of elements from an array based on their indices. \n",
|
||||
"NumPy starts indexing at zero so the 3rd element of an vector $\\mathbf{a}$ is `a[2]`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#vector indexing operations on 1-D vectors\n",
|
||||
"a = np.arange(10)\n",
|
||||
"print(a)\n",
|
||||
"\n",
|
||||
"#access an element\n",
|
||||
"print(f\"a[2].shape: {a[2].shape} a[2] = {a[2]}, Accessing an element returns a scalar\")\n",
|
||||
"\n",
|
||||
"# access the last element, negative indexes count from the end\n",
|
||||
"print(f\"a[-1] = {a[-1]}\")\n",
|
||||
"\n",
|
||||
"#indexs must be within the range of the vector or they will produce and error\n",
|
||||
"try:\n",
|
||||
" c = a[10]\n",
|
||||
"except Exception as e:\n",
|
||||
" print(\"The error message you'll see is:\")\n",
|
||||
" print(e)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.4.2\"></a>\n",
|
||||
"### 3.4.2 Slicing\n",
|
||||
"Slicing creates an array of indices using a set of three values (`start:stop:step`). A subset of values is also valid. Its use is best explained by example:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#vector slicing operations\n",
|
||||
"a = np.arange(10)\n",
|
||||
"print(f\"a = {a}\")\n",
|
||||
"\n",
|
||||
"#access 5 consecutive elements (start:stop:step)\n",
|
||||
"c = a[2:7:1]; print(\"a[2:7:1] = \", c)\n",
|
||||
"\n",
|
||||
"# access 3 elements separated by two \n",
|
||||
"c = a[2:7:2]; print(\"a[2:7:2] = \", c)\n",
|
||||
"\n",
|
||||
"# access all elements index 3 and above\n",
|
||||
"c = a[3:]; print(\"a[3:] = \", c)\n",
|
||||
"\n",
|
||||
"# access all elements below index 3\n",
|
||||
"c = a[:3]; print(\"a[:3] = \", c)\n",
|
||||
"\n",
|
||||
"# access all elements\n",
|
||||
"c = a[:]; print(\"a[:] = \", c)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.4.3\"></a>\n",
|
||||
"### 3.4.3 Single vector operations\n",
|
||||
"There are a number of useful operations that involve operations on a single vector."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"a = np.array([1,2,3,4])\n",
|
||||
"print(f\"a : {a}\")\n",
|
||||
"# negate elements of a\n",
|
||||
"b = -a \n",
|
||||
"print(f\"b = -a : {b}\")\n",
|
||||
"\n",
|
||||
"# sum all elements of a, returns a scalar\n",
|
||||
"b = np.sum(a) \n",
|
||||
"print(f\"b = np.sum(a) : {b}\")\n",
|
||||
"\n",
|
||||
"b = np.mean(a)\n",
|
||||
"print(f\"b = np.mean(a): {b}\")\n",
|
||||
"\n",
|
||||
"b = a**2\n",
|
||||
"print(f\"b = a**2 : {b}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.4.4\"></a>\n",
|
||||
"### 3.4.4 Vector Vector element-wise operations\n",
|
||||
"Most of the NumPy arithmetic, logical and comparison operations apply to vectors as well. These operators work on an element-by-element basis. For example \n",
|
||||
"$$ c_i = a_i + b_i $$"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"a = np.array([ 1, 2, 3, 4])\n",
|
||||
"b = np.array([-1,-2, 3, 4])\n",
|
||||
"print(f\"Binary operators work element wise: {a + b}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Of course, for this to work correctly, the vectors must be of the same size:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#try a mismatched vector operation\n",
|
||||
"c = np.array([1, 2])\n",
|
||||
"try:\n",
|
||||
" d = a + c\n",
|
||||
"except Exception as e:\n",
|
||||
" print(\"The error message you'll see is:\")\n",
|
||||
" print(e)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.4.5\"></a>\n",
|
||||
"### 3.4.5 Scalar Vector operations\n",
|
||||
"Vectors can be 'scaled' by scalar values. A scalar value is just a number. The scalar multiplies all the elements of the vector."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"a = np.array([1, 2, 3, 4])\n",
|
||||
"\n",
|
||||
"# multiply a by a scalar\n",
|
||||
"b = 5 * a \n",
|
||||
"print(f\"b = 5 * a : {b}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.4.6\"></a>\n",
|
||||
"### 3.4.6 Vector Vector dot product\n",
|
||||
"The dot product is a mainstay of Linear Algebra and NumPy. This is an operation used extensively in this course and should be well understood. The dot product is shown below."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<img src=\"./images/C1_W2_Lab04_dot_notrans.gif\" width=800> "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The dot product multiplies the values in two vectors element-wise and then sums the result.\n",
|
||||
"Vector dot product requires the dimensions of the two vectors to be the same. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's implement our own version of the dot product below:\n",
|
||||
"\n",
|
||||
"**Using a for loop**, implement a function which returns the dot product of two vectors. The function to return given inputs $a$ and $b$:\n",
|
||||
"$$ x = \\sum_{i=0}^{n-1} a_i b_i $$\n",
|
||||
"Assume both `a` and `b` are the same shape."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def my_dot(a, b): \n",
|
||||
" \"\"\"\n",
|
||||
" Compute the dot product of two vectors\n",
|
||||
" \n",
|
||||
" Args:\n",
|
||||
" a (ndarray (n,)): input vector \n",
|
||||
" b (ndarray (n,)): input vector with same dimension as a\n",
|
||||
" \n",
|
||||
" Returns:\n",
|
||||
" x (scalar): \n",
|
||||
" \"\"\"\n",
|
||||
" x=0\n",
|
||||
" for i in range(a.shape[0]):\n",
|
||||
" x = x + a[i] * b[i]\n",
|
||||
" return x"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# test 1-D\n",
|
||||
"a = np.array([1, 2, 3, 4])\n",
|
||||
"b = np.array([-1, 4, 3, 2])\n",
|
||||
"print(f\"my_dot(a, b) = {my_dot(a, b)}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Note, the dot product is expected to return a scalar value. \n",
|
||||
"\n",
|
||||
"Let's try the same operations using `np.dot`. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# test 1-D\n",
|
||||
"a = np.array([1, 2, 3, 4])\n",
|
||||
"b = np.array([-1, 4, 3, 2])\n",
|
||||
"c = np.dot(a, b)\n",
|
||||
"print(f\"NumPy 1-D np.dot(a, b) = {c}, np.dot(a, b).shape = {c.shape} \") \n",
|
||||
"c = np.dot(b, a)\n",
|
||||
"print(f\"NumPy 1-D np.dot(b, a) = {c}, np.dot(a, b).shape = {c.shape} \")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Above, you will note that the results for 1-D matched our implementation."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.4.7\"></a>\n",
|
||||
"### 3.4.7 The Need for Speed: vector vs for loop\n",
|
||||
"We utilized the NumPy library because it improves speed memory efficiency. Let's demonstrate:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"np.random.seed(1)\n",
|
||||
"a = np.random.rand(10000000) # very large arrays\n",
|
||||
"b = np.random.rand(10000000)\n",
|
||||
"\n",
|
||||
"tic = time.time() # capture start time\n",
|
||||
"c = np.dot(a, b)\n",
|
||||
"toc = time.time() # capture end time\n",
|
||||
"\n",
|
||||
"print(f\"np.dot(a, b) = {c:.4f}\")\n",
|
||||
"print(f\"Vectorized version duration: {1000*(toc-tic):.4f} ms \")\n",
|
||||
"\n",
|
||||
"tic = time.time() # capture start time\n",
|
||||
"c = my_dot(a,b)\n",
|
||||
"toc = time.time() # capture end time\n",
|
||||
"\n",
|
||||
"print(f\"my_dot(a, b) = {c:.4f}\")\n",
|
||||
"print(f\"loop version duration: {1000*(toc-tic):.4f} ms \")\n",
|
||||
"\n",
|
||||
"del(a);del(b) #remove these big arrays from memory"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"So, vectorization provides a large speed up in this example. This is because NumPy makes better use of available data parallelism in the underlying hardware. GPU's and modern CPU's implement Single Instruction, Multiple Data (SIMD) pipelines allowing multiple operations to be issued in parallel. This is critical in Machine Learning where the data sets are often very large."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_12345_3.4.8\"></a>\n",
|
||||
"### 3.4.8 Vector Vector operations in Course 1\n",
|
||||
"Vector Vector operations will appear frequently in course 1. Here is why:\n",
|
||||
"- Going forward, our examples will be stored in an array, `X_train` of dimension (m,n). This will be explained more in context, but here it is important to note it is a 2 Dimensional array or matrix (see next section on matrices).\n",
|
||||
"- `w` will be a 1-dimensional vector of shape (n,).\n",
|
||||
"- we will perform operations by looping through the examples, extracting each example to work on individually by indexing X. For example:`X[i]`\n",
|
||||
"- `X[i]` returns a value of shape (n,), a 1-dimensional vector. Consequently, operations involving `X[i]` are often vector-vector. \n",
|
||||
"\n",
|
||||
"That is a somewhat lengthy explanation, but aligning and understanding the shapes of your operands is important when performing vector operations."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# show common Course 1 example\n",
|
||||
"X = np.array([[1],[2],[3],[4]])\n",
|
||||
"w = np.array([2])\n",
|
||||
"c = np.dot(X[1], w)\n",
|
||||
"\n",
|
||||
"print(f\"X[1] has shape {X[1].shape}\")\n",
|
||||
"print(f\"w has shape {w.shape}\")\n",
|
||||
"print(f\"c has shape {c.shape}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_4\"></a>\n",
|
||||
"# 4 Matrices\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_4.1\"></a>\n",
|
||||
"## 4.1 Abstract\n",
|
||||
"Matrices, are two dimensional arrays. The elements of a matrix are all of the same type. In notation, matrices are denoted with capitol, bold letter such as $\\mathbf{X}$. In this and other labs, `m` is often the number of rows and `n` the number of columns. The elements of a matrix can be referenced with a two dimensional index. In math settings, numbers in the index typically run from 1 to n. In computer science and these labs, indexing will run from 0 to n-1. \n",
|
||||
"<figure>\n",
|
||||
" <center> <img src=\"./images/C1_W2_Lab04_Matrices.PNG\" alt='missing' width=900><center/>\n",
|
||||
" <figcaption> Generic Matrix Notation, 1st index is row, 2nd is column </figcaption>\n",
|
||||
"<figure/>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_4.2\"></a>\n",
|
||||
"## 4.2 NumPy Arrays\n",
|
||||
"\n",
|
||||
"NumPy's basic data structure is an indexable, n-dimensional *array* containing elements of the same type (`dtype`). These were described earlier. Matrices have a two-dimensional (2-D) index [m,n].\n",
|
||||
"\n",
|
||||
"In Course 1, 2-D matrices are used to hold training data. Training data is $m$ examples by $n$ features creating an (m,n) array. Course 1 does not do operations directly on matrices but typically extracts an example as a vector and operates on that. Below you will review: \n",
|
||||
"- data creation\n",
|
||||
"- slicing and indexing"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_4.3\"></a>\n",
|
||||
"## 4.3 Matrix Creation\n",
|
||||
"The same functions that created 1-D vectors will create 2-D or n-D arrays. Here are some examples\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Below, the shape tuple is provided to achieve a 2-D result. Notice how NumPy uses brackets to denote each dimension. Notice further than NumPy, when printing, will print one row per line.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"a = np.zeros((1, 5)) \n",
|
||||
"print(f\"a shape = {a.shape}, a = {a}\") \n",
|
||||
"\n",
|
||||
"a = np.zeros((2, 1)) \n",
|
||||
"print(f\"a shape = {a.shape}, a = {a}\") \n",
|
||||
"\n",
|
||||
"a = np.random.random_sample((1, 1)) \n",
|
||||
"print(f\"a shape = {a.shape}, a = {a}\") "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"One can also manually specify data. Dimensions are specified with additional brackets matching the format in the printing above."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# NumPy routines which allocate memory and fill with user specified values\n",
|
||||
"a = np.array([[5], [4], [3]]); print(f\" a shape = {a.shape}, np.array: a = {a}\")\n",
|
||||
"a = np.array([[5], # One can also\n",
|
||||
" [4], # separate values\n",
|
||||
" [3]]); #into separate rows\n",
|
||||
"print(f\" a shape = {a.shape}, np.array: a = {a}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_4.4\"></a>\n",
|
||||
"## 4.4 Operations on Matrices\n",
|
||||
"Let's explore some operations using matrices."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_4.4.1\"></a>\n",
|
||||
"### 4.4.1 Indexing\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Matrices include a second index. The two indexes describe [row, column]. Access can either return an element or a row/column. See below:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#vector indexing operations on matrices\n",
|
||||
"a = np.arange(6).reshape(-1, 2) #reshape is a convenient way to create matrices\n",
|
||||
"print(f\"a.shape: {a.shape}, \\na= {a}\")\n",
|
||||
"\n",
|
||||
"#access an element\n",
|
||||
"print(f\"\\na[2,0].shape: {a[2, 0].shape}, a[2,0] = {a[2, 0]}, type(a[2,0]) = {type(a[2, 0])} Accessing an element returns a scalar\\n\")\n",
|
||||
"\n",
|
||||
"#access a row\n",
|
||||
"print(f\"a[2].shape: {a[2].shape}, a[2] = {a[2]}, type(a[2]) = {type(a[2])}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"It is worth drawing attention to the last example. Accessing a matrix by just specifying the row will return a *1-D vector*."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Reshape** \n",
|
||||
"The previous example used [reshape](https://numpy.org/doc/stable/reference/generated/numpy.reshape.html) to shape the array. \n",
|
||||
"`a = np.arange(6).reshape(-1, 2) ` \n",
|
||||
"This line of code first created a *1-D Vector* of six elements. It then reshaped that vector into a *2-D* array using the reshape command. This could have been written: \n",
|
||||
"`a = np.arange(6).reshape(3, 2) ` \n",
|
||||
"To arrive at the same 3 row, 2 column array.\n",
|
||||
"The -1 argument tells the routine to compute the number of rows given the size of the array and the number of columns.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_4.4.2\"></a>\n",
|
||||
"### 4.4.2 Slicing\n",
|
||||
"Slicing creates an array of indices using a set of three values (`start:stop:step`). A subset of values is also valid. Its use is best explained by example:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#vector 2-D slicing operations\n",
|
||||
"a = np.arange(20).reshape(-1, 10)\n",
|
||||
"print(f\"a = \\n{a}\")\n",
|
||||
"\n",
|
||||
"#access 5 consecutive elements (start:stop:step)\n",
|
||||
"print(\"a[0, 2:7:1] = \", a[0, 2:7:1], \", a[0, 2:7:1].shape =\", a[0, 2:7:1].shape, \"a 1-D array\")\n",
|
||||
"\n",
|
||||
"#access 5 consecutive elements (start:stop:step) in two rows\n",
|
||||
"print(\"a[:, 2:7:1] = \\n\", a[:, 2:7:1], \", a[:, 2:7:1].shape =\", a[:, 2:7:1].shape, \"a 2-D array\")\n",
|
||||
"\n",
|
||||
"# access all elements\n",
|
||||
"print(\"a[:,:] = \\n\", a[:,:], \", a[:,:].shape =\", a[:,:].shape)\n",
|
||||
"\n",
|
||||
"# access all elements in one row (very common usage)\n",
|
||||
"print(\"a[1,:] = \", a[1,:], \", a[1,:].shape =\", a[1,:].shape, \"a 1-D array\")\n",
|
||||
"# same as\n",
|
||||
"print(\"a[1] = \", a[1], \", a[1].shape =\", a[1].shape, \"a 1-D array\")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_5.0\"></a>\n",
|
||||
"## Congratulations!\n",
|
||||
"In this lab you mastered the features of Python and NumPy that are needed for Course 1."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"dl_toc_settings": {
|
||||
"rndtag": "40015"
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.10"
|
||||
},
|
||||
"toc-autonumbering": false
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
648
C1_W2_Lab02_Multiple_Variable_Soln.ipynb
Normal file
@ -0,0 +1,648 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Optional Lab: Multiple Variable Linear Regression\n",
|
||||
"\n",
|
||||
"In this lab, you will extend the data structures and previously developed routines to support multiple features. Several routines are updated making the lab appear lengthy, but it makes minor adjustments to previous routines making it quick to review.\n",
|
||||
"# Outline\n",
|
||||
"- [ 1.1 Goals](#toc_15456_1.1)\n",
|
||||
"- [ 1.2 Tools](#toc_15456_1.2)\n",
|
||||
"- [ 1.3 Notation](#toc_15456_1.3)\n",
|
||||
"- [2 Problem Statement](#toc_15456_2)\n",
|
||||
"- [ 2.1 Matrix X containing our examples](#toc_15456_2.1)\n",
|
||||
"- [ 2.2 Parameter vector w, b](#toc_15456_2.2)\n",
|
||||
"- [3 Model Prediction With Multiple Variables](#toc_15456_3)\n",
|
||||
"- [ 3.1 Single Prediction element by element](#toc_15456_3.1)\n",
|
||||
"- [ 3.2 Single Prediction, vector](#toc_15456_3.2)\n",
|
||||
"- [4 Compute Cost With Multiple Variables](#toc_15456_4)\n",
|
||||
"- [5 Gradient Descent With Multiple Variables](#toc_15456_5)\n",
|
||||
"- [ 5.1 Compute Gradient with Multiple Variables](#toc_15456_5.1)\n",
|
||||
"- [ 5.2 Gradient Descent With Multiple Variables](#toc_15456_5.2)\n",
|
||||
"- [6 Congratulations](#toc_15456_6)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_1.1\"></a>\n",
|
||||
"## 1.1 Goals\n",
|
||||
"- Extend our regression model routines to support multiple features\n",
|
||||
" - Extend data structures to support multiple features\n",
|
||||
" - Rewrite prediction, cost and gradient routines to support multiple features\n",
|
||||
" - Utilize NumPy `np.dot` to vectorize their implementations for speed and simplicity"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_1.2\"></a>\n",
|
||||
"## 1.2 Tools\n",
|
||||
"In this lab, we will make use of: \n",
|
||||
"- NumPy, a popular library for scientific computing\n",
|
||||
"- Matplotlib, a popular library for plotting data"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import copy, math\n",
|
||||
"import numpy as np\n",
|
||||
"import matplotlib.pyplot as plt\n",
|
||||
"plt.style.use('./deeplearning.mplstyle')\n",
|
||||
"np.set_printoptions(precision=2) # reduced display precision on numpy arrays"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_1.3\"></a>\n",
|
||||
"## 1.3 Notation\n",
|
||||
"Here is a summary of some of the notation you will encounter, updated for multiple features. \n",
|
||||
"\n",
|
||||
"|General <img width=70/> <br /> Notation <img width=70/> | Description<img width=350/>| Python (if applicable) |\n",
|
||||
"|: ------------|: ------------------------------------------------------------||\n",
|
||||
"| $a$ | scalar, non bold ||\n",
|
||||
"| $\\mathbf{a}$ | vector, bold ||\n",
|
||||
"| $\\mathbf{A}$ | matrix, bold capital ||\n",
|
||||
"| **Regression** | | | |\n",
|
||||
"| $\\mathbf{X}$ | training example matrix | `X_train` | \n",
|
||||
"| $\\mathbf{y}$ | training example targets | `y_train` \n",
|
||||
"| $\\mathbf{x}^{(i)}$, $y^{(i)}$ | $i_{th}$Training Example | `X[i]`, `y[i]`|\n",
|
||||
"| m | number of training examples | `m`|\n",
|
||||
"| n | number of features in each example | `n`|\n",
|
||||
"| $\\mathbf{w}$ | parameter: weight, | `w` |\n",
|
||||
"| $b$ | parameter: bias | `b` | \n",
|
||||
"| $f_{\\mathbf{w},b}(\\mathbf{x}^{(i)})$ | The result of the model evaluation at $\\mathbf{x^{(i)}}$ parameterized by $\\mathbf{w},b$: $f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) = \\mathbf{w} \\cdot \\mathbf{x}^{(i)}+b$ | `f_wb` | \n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_2\"></a>\n",
|
||||
"# 2 Problem Statement\n",
|
||||
"\n",
|
||||
"You will use the motivating example of housing price prediction. The training dataset contains three examples with four features (size, bedrooms, floors and, age) shown in the table below. Note that, unlike the earlier labs, size is in sqft rather than 1000 sqft. This causes an issue, which you will solve in the next lab!\n",
|
||||
"\n",
|
||||
"| Size (sqft) | Number of Bedrooms | Number of floors | Age of Home | Price (1000s dollars) | \n",
|
||||
"| ----------------| ------------------- |----------------- |--------------|-------------- | \n",
|
||||
"| 2104 | 5 | 1 | 45 | 460 | \n",
|
||||
"| 1416 | 3 | 2 | 40 | 232 | \n",
|
||||
"| 852 | 2 | 1 | 35 | 178 | \n",
|
||||
"\n",
|
||||
"You will build a linear regression model using these values so you can then predict the price for other houses. For example, a house with 1200 sqft, 3 bedrooms, 1 floor, 40 years old. \n",
|
||||
"\n",
|
||||
"Please run the following code cell to create your `X_train` and `y_train` variables."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"X_train = np.array([[2104, 5, 1, 45], [1416, 3, 2, 40], [852, 2, 1, 35]])\n",
|
||||
"y_train = np.array([460, 232, 178])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_2.1\"></a>\n",
|
||||
"## 2.1 Matrix X containing our examples\n",
|
||||
"Similar to the table above, examples are stored in a NumPy matrix `X_train`. Each row of the matrix represents one example. When you have $m$ training examples ( $m$ is three in our example), and there are $n$ features (four in our example), $\\mathbf{X}$ is a matrix with dimensions ($m$, $n$) (m rows, n columns).\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"$$\\mathbf{X} = \n",
|
||||
"\\begin{pmatrix}\n",
|
||||
" x^{(0)}_0 & x^{(0)}_1 & \\cdots & x^{(0)}_{n-1} \\\\ \n",
|
||||
" x^{(1)}_0 & x^{(1)}_1 & \\cdots & x^{(1)}_{n-1} \\\\\n",
|
||||
" \\cdots \\\\\n",
|
||||
" x^{(m-1)}_0 & x^{(m-1)}_1 & \\cdots & x^{(m-1)}_{n-1} \n",
|
||||
"\\end{pmatrix}\n",
|
||||
"$$\n",
|
||||
"notation:\n",
|
||||
"- $\\mathbf{x}^{(i)}$ is vector containing example i. $\\mathbf{x}^{(i)}$ $ = (x^{(i)}_0, x^{(i)}_1, \\cdots,x^{(i)}_{n-1})$\n",
|
||||
"- $x^{(i)}_j$ is element j in example i. The superscript in parenthesis indicates the example number while the subscript represents an element. \n",
|
||||
"\n",
|
||||
"Display the input data."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# data is stored in numpy array/matrix\n",
|
||||
"print(f\"X Shape: {X_train.shape}, X Type:{type(X_train)})\")\n",
|
||||
"print(X_train)\n",
|
||||
"print(f\"y Shape: {y_train.shape}, y Type:{type(y_train)})\")\n",
|
||||
"print(y_train)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_2.2\"></a>\n",
|
||||
"## 2.2 Parameter vector w, b\n",
|
||||
"\n",
|
||||
"* $\\mathbf{w}$ is a vector with $n$ elements.\n",
|
||||
" - Each element contains the parameter associated with one feature.\n",
|
||||
" - in our dataset, n is 4.\n",
|
||||
" - notionally, we draw this as a column vector\n",
|
||||
"\n",
|
||||
"$$\\mathbf{w} = \\begin{pmatrix}\n",
|
||||
"w_0 \\\\ \n",
|
||||
"w_1 \\\\\n",
|
||||
"\\cdots\\\\\n",
|
||||
"w_{n-1}\n",
|
||||
"\\end{pmatrix}\n",
|
||||
"$$\n",
|
||||
"* $b$ is a scalar parameter. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"For demonstration, $\\mathbf{w}$ and $b$ will be loaded with some initial selected values that are near the optimal. $\\mathbf{w}$ is a 1-D NumPy vector."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"b_init = 785.1811367994083\n",
|
||||
"w_init = np.array([ 0.39133535, 18.75376741, -53.36032453, -26.42131618])\n",
|
||||
"print(f\"w_init shape: {w_init.shape}, b_init type: {type(b_init)}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_3\"></a>\n",
|
||||
"# 3 Model Prediction With Multiple Variables\n",
|
||||
"The model's prediction with multiple variables is given by the linear model:\n",
|
||||
"\n",
|
||||
"$$ f_{\\mathbf{w},b}(\\mathbf{x}) = w_0x_0 + w_1x_1 +... + w_{n-1}x_{n-1} + b \\tag{1}$$\n",
|
||||
"or in vector notation:\n",
|
||||
"$$ f_{\\mathbf{w},b}(\\mathbf{x}) = \\mathbf{w} \\cdot \\mathbf{x} + b \\tag{2} $$ \n",
|
||||
"where $\\cdot$ is a vector `dot product`\n",
|
||||
"\n",
|
||||
"To demonstrate the dot product, we will implement prediction using (1) and (2)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_3.1\"></a>\n",
|
||||
"## 3.1 Single Prediction element by element\n",
|
||||
"Our previous prediction multiplied one feature value by one parameter and added a bias parameter. A direct extension of our previous implementation of prediction to multiple features would be to implement (1) above using loop over each element, performing the multiply with its parameter and then adding the bias parameter at the end.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def predict_single_loop(x, w, b): \n",
|
||||
" \"\"\"\n",
|
||||
" single predict using linear regression\n",
|
||||
" \n",
|
||||
" Args:\n",
|
||||
" x (ndarray): Shape (n,) example with multiple features\n",
|
||||
" w (ndarray): Shape (n,) model parameters \n",
|
||||
" b (scalar): model parameter \n",
|
||||
" \n",
|
||||
" Returns:\n",
|
||||
" p (scalar): prediction\n",
|
||||
" \"\"\"\n",
|
||||
" n = x.shape[0]\n",
|
||||
" p = 0\n",
|
||||
" for i in range(n):\n",
|
||||
" p_i = x[i] * w[i] \n",
|
||||
" p = p + p_i \n",
|
||||
" p = p + b \n",
|
||||
" return p"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# get a row from our training data\n",
|
||||
"x_vec = X_train[0,:]\n",
|
||||
"print(f\"x_vec shape {x_vec.shape}, x_vec value: {x_vec}\")\n",
|
||||
"\n",
|
||||
"# make a prediction\n",
|
||||
"f_wb = predict_single_loop(x_vec, w_init, b_init)\n",
|
||||
"print(f\"f_wb shape {f_wb.shape}, prediction: {f_wb}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Note the shape of `x_vec`. It is a 1-D NumPy vector with 4 elements, (4,). The result, `f_wb` is a scalar."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_3.2\"></a>\n",
|
||||
"## 3.2 Single Prediction, vector\n",
|
||||
"\n",
|
||||
"Noting that equation (1) above can be implemented using the dot product as in (2) above. We can make use of vector operations to speed up predictions.\n",
|
||||
"\n",
|
||||
"Recall from the Python/Numpy lab that NumPy `np.dot()`[[link](https://numpy.org/doc/stable/reference/generated/numpy.dot.html)] can be used to perform a vector dot product. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def predict(x, w, b): \n",
|
||||
" \"\"\"\n",
|
||||
" single predict using linear regression\n",
|
||||
" Args:\n",
|
||||
" x (ndarray): Shape (n,) example with multiple features\n",
|
||||
" w (ndarray): Shape (n,) model parameters \n",
|
||||
" b (scalar): model parameter \n",
|
||||
" \n",
|
||||
" Returns:\n",
|
||||
" p (scalar): prediction\n",
|
||||
" \"\"\"\n",
|
||||
" p = np.dot(x, w) + b \n",
|
||||
" return p "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# get a row from our training data\n",
|
||||
"x_vec = X_train[0,:]\n",
|
||||
"print(f\"x_vec shape {x_vec.shape}, x_vec value: {x_vec}\")\n",
|
||||
"\n",
|
||||
"# make a prediction\n",
|
||||
"f_wb = predict(x_vec,w_init, b_init)\n",
|
||||
"print(f\"f_wb shape {f_wb.shape}, prediction: {f_wb}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The results and shapes are the same as the previous version which used looping. Going forward, `np.dot` will be used for these operations. The prediction is now a single statement. Most routines will implement it directly rather than calling a separate predict routine."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_4\"></a>\n",
|
||||
"# 4 Compute Cost With Multiple Variables\n",
|
||||
"The equation for the cost function with multiple variables $J(\\mathbf{w},b)$ is:\n",
|
||||
"$$J(\\mathbf{w},b) = \\frac{1}{2m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) - y^{(i)})^2 \\tag{3}$$ \n",
|
||||
"where:\n",
|
||||
"$$ f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) = \\mathbf{w} \\cdot \\mathbf{x}^{(i)} + b \\tag{4} $$ \n",
|
||||
"\n",
|
||||
"\n",
|
||||
"In contrast to previous labs, $\\mathbf{w}$ and $\\mathbf{x}^{(i)}$ are vectors rather than scalars supporting multiple features."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Below is an implementation of equations (3) and (4). Note that this uses a *standard pattern for this course* where a for loop over all `m` examples is used."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def compute_cost(X, y, w, b): \n",
|
||||
" \"\"\"\n",
|
||||
" compute cost\n",
|
||||
" Args:\n",
|
||||
" X (ndarray (m,n)): Data, m examples with n features\n",
|
||||
" y (ndarray (m,)) : target values\n",
|
||||
" w (ndarray (n,)) : model parameters \n",
|
||||
" b (scalar) : model parameter\n",
|
||||
" \n",
|
||||
" Returns:\n",
|
||||
" cost (scalar): cost\n",
|
||||
" \"\"\"\n",
|
||||
" m = X.shape[0]\n",
|
||||
" cost = 0.0\n",
|
||||
" for i in range(m): \n",
|
||||
" f_wb_i = np.dot(X[i], w) + b #(n,)(n,) = scalar (see np.dot)\n",
|
||||
" cost = cost + (f_wb_i - y[i])**2 #scalar\n",
|
||||
" cost = cost / (2 * m) #scalar \n",
|
||||
" return cost"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Compute and display cost using our pre-chosen optimal parameters. \n",
|
||||
"cost = compute_cost(X_train, y_train, w_init, b_init)\n",
|
||||
"print(f'Cost at optimal w : {cost}')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Expected Result**: Cost at optimal w : 1.5578904045996674e-12"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_5\"></a>\n",
|
||||
"# 5 Gradient Descent With Multiple Variables\n",
|
||||
"Gradient descent for multiple variables:\n",
|
||||
"\n",
|
||||
"$$\\begin{align*} \\text{repeat}&\\text{ until convergence:} \\; \\lbrace \\newline\\;\n",
|
||||
"& w_j = w_j - \\alpha \\frac{\\partial J(\\mathbf{w},b)}{\\partial w_j} \\tag{5} \\; & \\text{for j = 0..n-1}\\newline\n",
|
||||
"&b\\ \\ = b - \\alpha \\frac{\\partial J(\\mathbf{w},b)}{\\partial b} \\newline \\rbrace\n",
|
||||
"\\end{align*}$$\n",
|
||||
"\n",
|
||||
"where, n is the number of features, parameters $w_j$, $b$, are updated simultaneously and where \n",
|
||||
"\n",
|
||||
"$$\n",
|
||||
"\\begin{align}\n",
|
||||
"\\frac{\\partial J(\\mathbf{w},b)}{\\partial w_j} &= \\frac{1}{m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) - y^{(i)})x_{j}^{(i)} \\tag{6} \\\\\n",
|
||||
"\\frac{\\partial J(\\mathbf{w},b)}{\\partial b} &= \\frac{1}{m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) - y^{(i)}) \\tag{7}\n",
|
||||
"\\end{align}\n",
|
||||
"$$\n",
|
||||
"* m is the number of training examples in the data set\n",
|
||||
"\n",
|
||||
" \n",
|
||||
"* $f_{\\mathbf{w},b}(\\mathbf{x}^{(i)})$ is the model's prediction, while $y^{(i)}$ is the target value\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_5.1\"></a>\n",
|
||||
"## 5.1 Compute Gradient with Multiple Variables\n",
|
||||
"An implementation for calculating the equations (6) and (7) is below. There are many ways to implement this. In this version, there is an\n",
|
||||
"- outer loop over all m examples. \n",
|
||||
" - $\\frac{\\partial J(\\mathbf{w},b)}{\\partial b}$ for the example can be computed directly and accumulated\n",
|
||||
" - in a second loop over all n features:\n",
|
||||
" - $\\frac{\\partial J(\\mathbf{w},b)}{\\partial w_j}$ is computed for each $w_j$.\n",
|
||||
" "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def compute_gradient(X, y, w, b): \n",
|
||||
" \"\"\"\n",
|
||||
" Computes the gradient for linear regression \n",
|
||||
" Args:\n",
|
||||
" X (ndarray (m,n)): Data, m examples with n features\n",
|
||||
" y (ndarray (m,)) : target values\n",
|
||||
" w (ndarray (n,)) : model parameters \n",
|
||||
" b (scalar) : model parameter\n",
|
||||
" \n",
|
||||
" Returns:\n",
|
||||
" dj_dw (ndarray (n,)): The gradient of the cost w.r.t. the parameters w. \n",
|
||||
" dj_db (scalar): The gradient of the cost w.r.t. the parameter b. \n",
|
||||
" \"\"\"\n",
|
||||
" m,n = X.shape #(number of examples, number of features)\n",
|
||||
" dj_dw = np.zeros((n,))\n",
|
||||
" dj_db = 0.\n",
|
||||
"\n",
|
||||
" for i in range(m): \n",
|
||||
" err = (np.dot(X[i], w) + b) - y[i] \n",
|
||||
" for j in range(n): \n",
|
||||
" dj_dw[j] = dj_dw[j] + err * X[i, j] \n",
|
||||
" dj_db = dj_db + err \n",
|
||||
" dj_dw = dj_dw / m \n",
|
||||
" dj_db = dj_db / m \n",
|
||||
" \n",
|
||||
" return dj_db, dj_dw"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#Compute and display gradient \n",
|
||||
"tmp_dj_db, tmp_dj_dw = compute_gradient(X_train, y_train, w_init, b_init)\n",
|
||||
"print(f'dj_db at initial w,b: {tmp_dj_db}')\n",
|
||||
"print(f'dj_dw at initial w,b: \\n {tmp_dj_dw}')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Expected Result**: \n",
|
||||
"dj_db at initial w,b: -1.6739251122999121e-06 \n",
|
||||
"dj_dw at initial w,b: \n",
|
||||
" [-2.73e-03 -6.27e-06 -2.22e-06 -6.92e-05] "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_5.2\"></a>\n",
|
||||
"## 5.2 Gradient Descent With Multiple Variables\n",
|
||||
"The routine below implements equation (5) above."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def gradient_descent(X, y, w_in, b_in, cost_function, gradient_function, alpha, num_iters): \n",
|
||||
" \"\"\"\n",
|
||||
" Performs batch gradient descent to learn w and b. Updates w and b by taking \n",
|
||||
" num_iters gradient steps with learning rate alpha\n",
|
||||
" \n",
|
||||
" Args:\n",
|
||||
" X (ndarray (m,n)) : Data, m examples with n features\n",
|
||||
" y (ndarray (m,)) : target values\n",
|
||||
" w_in (ndarray (n,)) : initial model parameters \n",
|
||||
" b_in (scalar) : initial model parameter\n",
|
||||
" cost_function : function to compute cost\n",
|
||||
" gradient_function : function to compute the gradient\n",
|
||||
" alpha (float) : Learning rate\n",
|
||||
" num_iters (int) : number of iterations to run gradient descent\n",
|
||||
" \n",
|
||||
" Returns:\n",
|
||||
" w (ndarray (n,)) : Updated values of parameters \n",
|
||||
" b (scalar) : Updated value of parameter \n",
|
||||
" \"\"\"\n",
|
||||
" \n",
|
||||
" # An array to store cost J and w's at each iteration primarily for graphing later\n",
|
||||
" J_history = []\n",
|
||||
" w = copy.deepcopy(w_in) #avoid modifying global w within function\n",
|
||||
" b = b_in\n",
|
||||
" \n",
|
||||
" for i in range(num_iters):\n",
|
||||
"\n",
|
||||
" # Calculate the gradient and update the parameters\n",
|
||||
" dj_db,dj_dw = gradient_function(X, y, w, b) ##None\n",
|
||||
"\n",
|
||||
" # Update Parameters using w, b, alpha and gradient\n",
|
||||
" w = w - alpha * dj_dw ##None\n",
|
||||
" b = b - alpha * dj_db ##None\n",
|
||||
" \n",
|
||||
" # Save cost J at each iteration\n",
|
||||
" if i<100000: # prevent resource exhaustion \n",
|
||||
" J_history.append( cost_function(X, y, w, b))\n",
|
||||
"\n",
|
||||
" # Print cost every at intervals 10 times or as many iterations if < 10\n",
|
||||
" if i% math.ceil(num_iters / 10) == 0:\n",
|
||||
" print(f\"Iteration {i:4d}: Cost {J_history[-1]:8.2f} \")\n",
|
||||
" \n",
|
||||
" return w, b, J_history #return final w,b and J history for graphing"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In the next cell you will test the implementation. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# initialize parameters\n",
|
||||
"initial_w = np.zeros_like(w_init)\n",
|
||||
"initial_b = 0.\n",
|
||||
"# some gradient descent settings\n",
|
||||
"iterations = 1000\n",
|
||||
"alpha = 5.0e-7\n",
|
||||
"# run gradient descent \n",
|
||||
"w_final, b_final, J_hist = gradient_descent(X_train, y_train, initial_w, initial_b,\n",
|
||||
" compute_cost, compute_gradient, \n",
|
||||
" alpha, iterations)\n",
|
||||
"print(f\"b,w found by gradient descent: {b_final:0.2f},{w_final} \")\n",
|
||||
"m,_ = X_train.shape\n",
|
||||
"for i in range(m):\n",
|
||||
" print(f\"prediction: {np.dot(X_train[i], w_final) + b_final:0.2f}, target value: {y_train[i]}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Expected Result**: \n",
|
||||
"b,w found by gradient descent: -0.00,[ 0.2 0. -0.01 -0.07] \n",
|
||||
"prediction: 426.19, target value: 460 \n",
|
||||
"prediction: 286.17, target value: 232 \n",
|
||||
"prediction: 171.47, target value: 178 "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# plot cost versus iteration \n",
|
||||
"fig, (ax1, ax2) = plt.subplots(1, 2, constrained_layout=True, figsize=(12, 4))\n",
|
||||
"ax1.plot(J_hist)\n",
|
||||
"ax2.plot(100 + np.arange(len(J_hist[100:])), J_hist[100:])\n",
|
||||
"ax1.set_title(\"Cost vs. iteration\"); ax2.set_title(\"Cost vs. iteration (tail)\")\n",
|
||||
"ax1.set_ylabel('Cost') ; ax2.set_ylabel('Cost') \n",
|
||||
"ax1.set_xlabel('iteration step') ; ax2.set_xlabel('iteration step') \n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"*These results are not inspiring*! Cost is still declining and our predictions are not very accurate. The next lab will explore how to improve on this."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"<a name=\"toc_15456_6\"></a>\n",
|
||||
"# 6 Congratulations!\n",
|
||||
"In this lab you:\n",
|
||||
"- Redeveloped the routines for linear regression, now with multiple variables.\n",
|
||||
"- Utilized NumPy `np.dot` to vectorize the implementations"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"dl_toc_settings": {
|
||||
"rndtag": "15456"
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.6"
|
||||
},
|
||||
"toc-autonumbering": false
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
666
C1_W2_Lab03_Feature_Scaling_and_Learning_Rate_Soln.ipynb
Normal file
@ -0,0 +1,666 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Optional Lab: Feature scaling and Learning Rate (Multi-variable)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Goals\n",
|
||||
"In this lab you will:\n",
|
||||
"- Utilize the multiple variables routines developed in the previous lab\n",
|
||||
"- run Gradient Descent on a data set with multiple features\n",
|
||||
"- explore the impact of the *learning rate alpha* on gradient descent\n",
|
||||
"- improve performance of gradient descent by *feature scaling* using z-score normalization"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Tools\n",
|
||||
"You will utilize the functions developed in the last lab as well as matplotlib and NumPy. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import numpy as np\n",
|
||||
"import matplotlib.pyplot as plt\n",
|
||||
"from lab_utils_multi import load_house_data, run_gradient_descent \n",
|
||||
"from lab_utils_multi import norm_plot, plt_equal_scale, plot_cost_i_w\n",
|
||||
"from lab_utils_common import dlc\n",
|
||||
"np.set_printoptions(precision=2)\n",
|
||||
"plt.style.use('./deeplearning.mplstyle')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Notation\n",
|
||||
"\n",
|
||||
"|General <br /> Notation | Description| Python (if applicable) |\n",
|
||||
"|: ------------|: ------------------------------------------------------------||\n",
|
||||
"| $a$ | scalar, non bold ||\n",
|
||||
"| $\\mathbf{a}$ | vector, bold ||\n",
|
||||
"| $\\mathbf{A}$ | matrix, bold capital ||\n",
|
||||
"| **Regression** | | | |\n",
|
||||
"| $\\mathbf{X}$ | training example maxtrix | `X_train` | \n",
|
||||
"| $\\mathbf{y}$ | training example targets | `y_train` \n",
|
||||
"| $\\mathbf{x}^{(i)}$, $y^{(i)}$ | $i_{th}$Training Example | `X[i]`, `y[i]`|\n",
|
||||
"| m | number of training examples | `m`|\n",
|
||||
"| n | number of features in each example | `n`|\n",
|
||||
"| $\\mathbf{w}$ | parameter: weight, | `w` |\n",
|
||||
"| $b$ | parameter: bias | `b` | \n",
|
||||
"| $f_{\\mathbf{w},b}(\\mathbf{x}^{(i)})$ | The result of the model evaluation at $\\mathbf{x}^{(i)}$ parameterized by $\\mathbf{w},b$: $f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) = \\mathbf{w} \\cdot \\mathbf{x}^{(i)}+b$ | `f_wb` | \n",
|
||||
"|$\\frac{\\partial J(\\mathbf{w},b)}{\\partial w_j}$| the gradient or partial derivative of cost with respect to a parameter $w_j$ |`dj_dw[j]`| \n",
|
||||
"|$\\frac{\\partial J(\\mathbf{w},b)}{\\partial b}$| the gradient or partial derivative of cost with respect to a parameter $b$| `dj_db`|"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Problem Statement\n",
|
||||
"\n",
|
||||
"As in the previous labs, you will use the motivating example of housing price prediction. The training data set contains many examples with 4 features (size, bedrooms, floors and age) shown in the table below. Note, in this lab, the Size feature is in sqft while earlier labs utilized 1000 sqft. This data set is larger than the previous lab.\n",
|
||||
"\n",
|
||||
"We would like to build a linear regression model using these values so we can then predict the price for other houses - say, a house with 1200 sqft, 3 bedrooms, 1 floor, 40 years old. \n",
|
||||
"\n",
|
||||
"## Dataset: \n",
|
||||
"| Size (sqft) | Number of Bedrooms | Number of floors | Age of Home | Price (1000s dollars) | \n",
|
||||
"| ----------------| ------------------- |----------------- |--------------|----------------------- | \n",
|
||||
"| 952 | 2 | 1 | 65 | 271.5 | \n",
|
||||
"| 1244 | 3 | 2 | 64 | 232 | \n",
|
||||
"| 1947 | 3 | 2 | 17 | 509.8 | \n",
|
||||
"| ... | ... | ... | ... | ... |\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# load the dataset\n",
|
||||
"X_train, y_train = load_house_data()\n",
|
||||
"X_features = ['size(sqft)','bedrooms','floors','age']"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's view the dataset and its features by plotting each feature versus price."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"fig,ax=plt.subplots(1, 4, figsize=(12, 3), sharey=True)\n",
|
||||
"for i in range(len(ax)):\n",
|
||||
" ax[i].scatter(X_train[:,i],y_train)\n",
|
||||
" ax[i].set_xlabel(X_features[i])\n",
|
||||
"ax[0].set_ylabel(\"Price (1000's)\")\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Plotting each feature vs. the target, price, provides some indication of which features have the strongest influence on price. Above, increasing size also increases price. Bedrooms and floors don't seem to have a strong impact on price. Newer houses have higher prices than older houses."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_5\"></a>\n",
|
||||
"## Gradient Descent With Multiple Variables\n",
|
||||
"Here are the equations you developed in the last lab on gradient descent for multiple variables.:\n",
|
||||
"\n",
|
||||
"$$\\begin{align*} \\text{repeat}&\\text{ until convergence:} \\; \\lbrace \\newline\\;\n",
|
||||
"& w_j := w_j - \\alpha \\frac{\\partial J(\\mathbf{w},b)}{\\partial w_j} \\tag{1} \\; & \\text{for j = 0..n-1}\\newline\n",
|
||||
"&b\\ \\ := b - \\alpha \\frac{\\partial J(\\mathbf{w},b)}{\\partial b} \\newline \\rbrace\n",
|
||||
"\\end{align*}$$\n",
|
||||
"\n",
|
||||
"where, n is the number of features, parameters $w_j$, $b$, are updated simultaneously and where \n",
|
||||
"\n",
|
||||
"$$\n",
|
||||
"\\begin{align}\n",
|
||||
"\\frac{\\partial J(\\mathbf{w},b)}{\\partial w_j} &= \\frac{1}{m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) - y^{(i)})x_{j}^{(i)} \\tag{2} \\\\\n",
|
||||
"\\frac{\\partial J(\\mathbf{w},b)}{\\partial b} &= \\frac{1}{m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) - y^{(i)}) \\tag{3}\n",
|
||||
"\\end{align}\n",
|
||||
"$$\n",
|
||||
"* m is the number of training examples in the data set\n",
|
||||
"\n",
|
||||
" \n",
|
||||
"* $f_{\\mathbf{w},b}(\\mathbf{x}^{(i)})$ is the model's prediction, while $y^{(i)}$ is the target value\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Learning Rate\n",
|
||||
"<figure>\n",
|
||||
" <img src=\"./images/C1_W2_Lab06_learningrate.PNG\" style=\"width:1200px;\" >\n",
|
||||
"</figure>\n",
|
||||
"The lectures discussed some of the issues related to setting the learning rate $\\alpha$. The learning rate controls the size of the update to the parameters. See equation (1) above. It is shared by all the parameters. \n",
|
||||
"\n",
|
||||
"Let's run gradient descent and try a few settings of $\\alpha$ on our data set"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### $\\alpha$ = 9.9e-7"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#set alpha to 9.9e-7\n",
|
||||
"_, _, hist = run_gradient_descent(X_train, y_train, 10, alpha = 9.9e-7)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"It appears the learning rate is too high. The solution does not converge. Cost is *increasing* rather than decreasing. Let's plot the result:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"plot_cost_i_w(X_train, y_train, hist)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The plot on the right shows the value of one of the parameters, $w_0$. At each iteration, it is overshooting the optimal value and as a result, cost ends up *increasing* rather than approaching the minimum. Note that this is not a completely accurate picture as there are 4 parameters being modified each pass rather than just one. This plot is only showing $w_0$ with the other parameters fixed at benign values. In this and later plots you may notice the blue and orange lines being slightly off."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"### $\\alpha$ = 9e-7\n",
|
||||
"Let's try a bit smaller value and see what happens."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#set alpha to 9e-7\n",
|
||||
"_,_,hist = run_gradient_descent(X_train, y_train, 10, alpha = 9e-7)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Cost is decreasing throughout the run showing that alpha is not too large. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"plot_cost_i_w(X_train, y_train, hist)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"On the left, you see that cost is decreasing as it should. On the right, you can see that $w_0$ is still oscillating around the minimum, but it is decreasing each iteration rather than increasing. Note above that `dj_dw[0]` changes sign with each iteration as `w[0]` jumps over the optimal value.\n",
|
||||
"This alpha value will converge. You can vary the number of iterations to see how it behaves."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### $\\alpha$ = 1e-7\n",
|
||||
"Let's try a bit smaller value for $\\alpha$ and see what happens."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#set alpha to 1e-7\n",
|
||||
"_,_,hist = run_gradient_descent(X_train, y_train, 10, alpha = 1e-7)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Cost is decreasing throughout the run showing that $\\alpha$ is not too large. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"plot_cost_i_w(X_train,y_train,hist)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"On the left, you see that cost is decreasing as it should. On the right you can see that $w_0$ is decreasing without crossing the minimum. Note above that `dj_w0` is negative throughout the run. This solution will also converge, though not quite as quickly as the previous example."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"source": [
|
||||
"## Feature Scaling \n",
|
||||
"<figure>\n",
|
||||
" <img src=\"./images/C1_W2_Lab06_featurescalingheader.PNG\" style=\"width:1200px;\" >\n",
|
||||
"</figure>\n",
|
||||
"The lectures described the importance of rescaling the dataset so the features have a similar range.\n",
|
||||
"If you are interested in the details of why this is the case, click on the 'details' header below. If not, the section below will walk through an implementation of how to do feature scaling."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<details>\n",
|
||||
"<summary>\n",
|
||||
" <font size='3', color='darkgreen'><b>Details</b></font>\n",
|
||||
"</summary>\n",
|
||||
"\n",
|
||||
"Let's look again at the situation with $\\alpha$ = 9e-7. This is pretty close to the maximum value we can set $\\alpha$ to without diverging. This is a short run showing the first few iterations:\n",
|
||||
"\n",
|
||||
"<figure>\n",
|
||||
" <img src=\"./images/C1_W2_Lab06_ShortRun.PNG\" style=\"width:1200px;\" >\n",
|
||||
"</figure>\n",
|
||||
"\n",
|
||||
"Above, while cost is being decreased, its clear that $w_0$ is making more rapid progress than the other parameters due to its much larger gradient.\n",
|
||||
"\n",
|
||||
"The graphic below shows the result of a very long run with $\\alpha$ = 9e-7. This takes several hours.\n",
|
||||
"\n",
|
||||
"<figure>\n",
|
||||
" <img src=\"./images/C1_W2_Lab06_LongRun.PNG\" style=\"width:1200px;\" >\n",
|
||||
"</figure>\n",
|
||||
" \n",
|
||||
"Above, you can see cost decreased slowly after its initial reduction. Notice the difference between `w0` and `w1`,`w2`,`w3` as well as `dj_dw0` and `dj_dw1-3`. `w0` reaches its near final value very quickly and `dj_dw0` has quickly decreased to a small value showing that `w0` is near the final value. The other parameters were reduced much more slowly.\n",
|
||||
"\n",
|
||||
"Why is this? Is there something we can improve? See below:\n",
|
||||
"<figure>\n",
|
||||
" <center> <img src=\"./images/C1_W2_Lab06_scale.PNG\" ></center>\n",
|
||||
"</figure> \n",
|
||||
"\n",
|
||||
"The figure above shows why $w$'s are updated unevenly. \n",
|
||||
"- $\\alpha$ is shared by all parameter updates ($w$'s and $b$).\n",
|
||||
"- the common error term is multiplied by the features for the $w$'s. (not $b$).\n",
|
||||
"- the features vary significantly in magnitude making some features update much faster than others. In this case, $w_0$ is multiplied by 'size(sqft)', which is generally > 1000, while $w_1$ is multiplied by 'number of bedrooms', which is generally 2-4. \n",
|
||||
" \n",
|
||||
"The solution is Feature Scaling."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The lectures discussed three different techniques: \n",
|
||||
"- Feature scaling, essentially dividing each positive feature by its maximum value, or more generally, rescale each feature by both its minimum and maximum values using (x-min)/(max-min). Both ways normalizes features to the range of -1 and 1, where the former method works for positive features which is simple and serves well for the lecture's example, and the latter method works for any features.\n",
|
||||
"- Mean normalization: $x_i := \\dfrac{x_i - \\mu_i}{max - min} $ \n",
|
||||
"- Z-score normalization which we will explore below. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"### z-score normalization \n",
|
||||
"After z-score normalization, all features will have a mean of 0 and a standard deviation of 1.\n",
|
||||
"\n",
|
||||
"To implement z-score normalization, adjust your input values as shown in this formula:\n",
|
||||
"$$x^{(i)}_j = \\dfrac{x^{(i)}_j - \\mu_j}{\\sigma_j} \\tag{4}$$ \n",
|
||||
"where $j$ selects a feature or a column in the $\\mathbf{X}$ matrix. $µ_j$ is the mean of all the values for feature (j) and $\\sigma_j$ is the standard deviation of feature (j).\n",
|
||||
"$$\n",
|
||||
"\\begin{align}\n",
|
||||
"\\mu_j &= \\frac{1}{m} \\sum_{i=0}^{m-1} x^{(i)}_j \\tag{5}\\\\\n",
|
||||
"\\sigma^2_j &= \\frac{1}{m} \\sum_{i=0}^{m-1} (x^{(i)}_j - \\mu_j)^2 \\tag{6}\n",
|
||||
"\\end{align}\n",
|
||||
"$$\n",
|
||||
"\n",
|
||||
">**Implementation Note:** When normalizing the features, it is important\n",
|
||||
"to store the values used for normalization - the mean value and the standard deviation used for the computations. After learning the parameters\n",
|
||||
"from the model, we often want to predict the prices of houses we have not\n",
|
||||
"seen before. Given a new x value (living room area and number of bed-\n",
|
||||
"rooms), we must first normalize x using the mean and standard deviation\n",
|
||||
"that we had previously computed from the training set.\n",
|
||||
"\n",
|
||||
"**Implementation**"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def zscore_normalize_features(X):\n",
|
||||
" \"\"\"\n",
|
||||
" computes X, zcore normalized by column\n",
|
||||
" \n",
|
||||
" Args:\n",
|
||||
" X (ndarray (m,n)) : input data, m examples, n features\n",
|
||||
" \n",
|
||||
" Returns:\n",
|
||||
" X_norm (ndarray (m,n)): input normalized by column\n",
|
||||
" mu (ndarray (n,)) : mean of each feature\n",
|
||||
" sigma (ndarray (n,)) : standard deviation of each feature\n",
|
||||
" \"\"\"\n",
|
||||
" # find the mean of each column/feature\n",
|
||||
" mu = np.mean(X, axis=0) # mu will have shape (n,)\n",
|
||||
" # find the standard deviation of each column/feature\n",
|
||||
" sigma = np.std(X, axis=0) # sigma will have shape (n,)\n",
|
||||
" # element-wise, subtract mu for that column from each example, divide by std for that column\n",
|
||||
" X_norm = (X - mu) / sigma \n",
|
||||
"\n",
|
||||
" return (X_norm, mu, sigma)\n",
|
||||
" \n",
|
||||
"#check our work\n",
|
||||
"#from sklearn.preprocessing import scale\n",
|
||||
"#scale(X_orig, axis=0, with_mean=True, with_std=True, copy=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's look at the steps involved in Z-score normalization. The plot below shows the transformation step by step."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"mu = np.mean(X_train,axis=0) \n",
|
||||
"sigma = np.std(X_train,axis=0) \n",
|
||||
"X_mean = (X_train - mu)\n",
|
||||
"X_norm = (X_train - mu)/sigma \n",
|
||||
"\n",
|
||||
"fig,ax=plt.subplots(1, 3, figsize=(12, 3))\n",
|
||||
"ax[0].scatter(X_train[:,0], X_train[:,3])\n",
|
||||
"ax[0].set_xlabel(X_features[0]); ax[0].set_ylabel(X_features[3]);\n",
|
||||
"ax[0].set_title(\"unnormalized\")\n",
|
||||
"ax[0].axis('equal')\n",
|
||||
"\n",
|
||||
"ax[1].scatter(X_mean[:,0], X_mean[:,3])\n",
|
||||
"ax[1].set_xlabel(X_features[0]); ax[0].set_ylabel(X_features[3]);\n",
|
||||
"ax[1].set_title(r\"X - $\\mu$\")\n",
|
||||
"ax[1].axis('equal')\n",
|
||||
"\n",
|
||||
"ax[2].scatter(X_norm[:,0], X_norm[:,3])\n",
|
||||
"ax[2].set_xlabel(X_features[0]); ax[0].set_ylabel(X_features[3]);\n",
|
||||
"ax[2].set_title(r\"Z-score normalized\")\n",
|
||||
"ax[2].axis('equal')\n",
|
||||
"plt.tight_layout(rect=[0, 0.03, 1, 0.95])\n",
|
||||
"fig.suptitle(\"distribution of features before, during, after normalization\")\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The plot above shows the relationship between two of the training set parameters, \"age\" and \"size(sqft)\". *These are plotted with equal scale*. \n",
|
||||
"- Left: Unnormalized: The range of values or the variance of the 'size(sqft)' feature is much larger than that of age\n",
|
||||
"- Middle: The first step removes the mean or average value from each feature. This leaves features that are centered around zero. It's difficult to see the difference for the 'age' feature, but 'size(sqft)' is clearly around zero.\n",
|
||||
"- Right: The second step divides by the standard deviation. This leaves both features centered at zero with a similar scale."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's normalize the data and compare it to the original data."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# normalize the original features\n",
|
||||
"X_norm, X_mu, X_sigma = zscore_normalize_features(X_train)\n",
|
||||
"print(f\"X_mu = {X_mu}, \\nX_sigma = {X_sigma}\")\n",
|
||||
"print(f\"Peak to Peak range by column in Raw X:{np.ptp(X_train,axis=0)}\") \n",
|
||||
"print(f\"Peak to Peak range by column in Normalized X:{np.ptp(X_norm,axis=0)}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The peak to peak range of each column is reduced from a factor of thousands to a factor of 2-3 by normalization."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"fig,ax=plt.subplots(1, 4, figsize=(12, 3))\n",
|
||||
"for i in range(len(ax)):\n",
|
||||
" norm_plot(ax[i],X_train[:,i],)\n",
|
||||
" ax[i].set_xlabel(X_features[i])\n",
|
||||
"ax[0].set_ylabel(\"count\");\n",
|
||||
"fig.suptitle(\"distribution of features before normalization\")\n",
|
||||
"plt.show()\n",
|
||||
"fig,ax=plt.subplots(1,4,figsize=(12,3))\n",
|
||||
"for i in range(len(ax)):\n",
|
||||
" norm_plot(ax[i],X_norm[:,i],)\n",
|
||||
" ax[i].set_xlabel(X_features[i])\n",
|
||||
"ax[0].set_ylabel(\"count\"); \n",
|
||||
"fig.suptitle(\"distribution of features after normalization\")\n",
|
||||
"\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Notice, above, the range of the normalized data (x-axis) is centered around zero and roughly +/- 2. Most importantly, the range is similar for each feature."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's re-run our gradient descent algorithm with normalized data.\n",
|
||||
"Note the **vastly larger value of alpha**. This will speed up gradient descent."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"w_norm, b_norm, hist = run_gradient_descent(X_norm, y_train, 1000, 1.0e-1, )"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The scaled features get very accurate results **much, much faster!**. Notice the gradient of each parameter is tiny by the end of this fairly short run. A learning rate of 0.1 is a good start for regression with normalized features.\n",
|
||||
"Let's plot our predictions versus the target values. Note, the prediction is made using the normalized feature while the plot is shown using the original feature values."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#predict target using normalized features\n",
|
||||
"m = X_norm.shape[0]\n",
|
||||
"yp = np.zeros(m)\n",
|
||||
"for i in range(m):\n",
|
||||
" yp[i] = np.dot(X_norm[i], w_norm) + b_norm\n",
|
||||
"\n",
|
||||
" # plot predictions and targets versus original features \n",
|
||||
"fig,ax=plt.subplots(1,4,figsize=(12, 3),sharey=True)\n",
|
||||
"for i in range(len(ax)):\n",
|
||||
" ax[i].scatter(X_train[:,i],y_train, label = 'target')\n",
|
||||
" ax[i].set_xlabel(X_features[i])\n",
|
||||
" ax[i].scatter(X_train[:,i],yp,color=dlc[\"dlorange\"], label = 'predict')\n",
|
||||
"ax[0].set_ylabel(\"Price\"); ax[0].legend();\n",
|
||||
"fig.suptitle(\"target versus prediction using z-score normalized model\")\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The results look good. A few points to note:\n",
|
||||
"- with multiple features, we can no longer have a single plot showing results versus features.\n",
|
||||
"- when generating the plot, the normalized features were used. Any predictions using the parameters learned from a normalized training set must also be normalized."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Prediction**\n",
|
||||
"The point of generating our model is to use it to predict housing prices that are not in the data set. Let's predict the price of a house with 1200 sqft, 3 bedrooms, 1 floor, 40 years old. Recall, that you must normalize the data with the mean and standard deviation derived when the training data was normalized. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# First, normalize out example.\n",
|
||||
"x_house = np.array([1200, 3, 1, 40])\n",
|
||||
"x_house_norm = (x_house - X_mu) / X_sigma\n",
|
||||
"print(x_house_norm)\n",
|
||||
"x_house_predict = np.dot(x_house_norm, w_norm) + b_norm\n",
|
||||
"print(f\" predicted price of a house with 1200 sqft, 3 bedrooms, 1 floor, 40 years old = ${x_house_predict*1000:0.0f}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Cost Contours** \n",
|
||||
"<img align=\"left\" src=\"./images/C1_W2_Lab06_contours.PNG\" style=\"width:240px;\" >Another way to view feature scaling is in terms of the cost contours. When feature scales do not match, the plot of cost versus parameters in a contour plot is asymmetric. \n",
|
||||
"\n",
|
||||
"In the plot below, the scale of the parameters is matched. The left plot is the cost contour plot of w[0], the square feet versus w[1], the number of bedrooms before normalizing the features. The plot is so asymmetric, the curves completing the contours are not visible. In contrast, when the features are normalized, the cost contour is much more symmetric. The result is that updates to parameters during gradient descent can make equal progress for each parameter. \n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"plt_equal_scale(X_train, X_norm, y_train)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"## Congratulations!\n",
|
||||
"In this lab you:\n",
|
||||
"- utilized the routines for linear regression with multiple features you developed in previous labs\n",
|
||||
"- explored the impact of the learning rate $\\alpha$ on convergence \n",
|
||||
"- discovered the value of feature scaling using z-score normalization in speeding convergence"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Acknowledgments\n",
|
||||
"The housing data was derived from the [Ames Housing dataset](http://jse.amstat.org/v19n3/decock.pdf) compiled by Dean De Cock for use in data science education."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.6"
|
||||
},
|
||||
"toc-autonumbering": false
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
344
C1_W2_Lab04_FeatEng_PolyReg_Soln.ipynb
Normal file
@ -0,0 +1,344 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Optional Lab: Feature Engineering and Polynomial Regression\n",
|
||||
"\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Goals\n",
|
||||
"In this lab you will:\n",
|
||||
"- explore feature engineering and polynomial regression which allows you to use the machinery of linear regression to fit very complicated, even very non-linear functions.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Tools\n",
|
||||
"You will utilize the function developed in previous labs as well as matplotlib and NumPy. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import numpy as np\n",
|
||||
"import matplotlib.pyplot as plt\n",
|
||||
"from lab_utils_multi import zscore_normalize_features, run_gradient_descent_feng\n",
|
||||
"np.set_printoptions(precision=2) # reduced display precision on numpy arrays"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name='FeatureEng'></a>\n",
|
||||
"# Feature Engineering and Polynomial Regression Overview\n",
|
||||
"\n",
|
||||
"Out of the box, linear regression provides a means of building models of the form:\n",
|
||||
"$$f_{\\mathbf{w},b} = w_0x_0 + w_1x_1+ ... + w_{n-1}x_{n-1} + b \\tag{1}$$ \n",
|
||||
"What if your features/data are non-linear or are combinations of features? For example, Housing prices do not tend to be linear with living area but penalize very small or very large houses resulting in the curves shown in the graphic above. How can we use the machinery of linear regression to fit this curve? Recall, the 'machinery' we have is the ability to modify the parameters $\\mathbf{w}$, $\\mathbf{b}$ in (1) to 'fit' the equation to the training data. However, no amount of adjusting of $\\mathbf{w}$,$\\mathbf{b}$ in (1) will achieve a fit to a non-linear curve.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name='PolynomialFeatures'></a>\n",
|
||||
"## Polynomial Features\n",
|
||||
"\n",
|
||||
"Above we were considering a scenario where the data was non-linear. Let's try using what we know so far to fit a non-linear curve. We'll start with a simple quadratic: $y = 1+x^2$\n",
|
||||
"\n",
|
||||
"You're familiar with all the routines we're using. They are available in the lab_utils.py file for review. We'll use [`np.c_[..]`](https://numpy.org/doc/stable/reference/generated/numpy.c_.html) which is a NumPy routine to concatenate along the column boundary."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# create target data\n",
|
||||
"x = np.arange(0, 20, 1)\n",
|
||||
"y = 1 + x**2\n",
|
||||
"X = x.reshape(-1, 1)\n",
|
||||
"\n",
|
||||
"model_w,model_b = run_gradient_descent_feng(X,y,iterations=1000, alpha = 1e-2)\n",
|
||||
"\n",
|
||||
"plt.scatter(x, y, marker='x', c='r', label=\"Actual Value\"); plt.title(\"no feature engineering\")\n",
|
||||
"plt.plot(x,X@model_w + model_b, label=\"Predicted Value\"); plt.xlabel(\"X\"); plt.ylabel(\"y\"); plt.legend(); plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Well, as expected, not a great fit. What is needed is something like $y= w_0x_0^2 + b$, or a **polynomial feature**.\n",
|
||||
"To accomplish this, you can modify the *input data* to *engineer* the needed features. If you swap the original data with a version that squares the $x$ value, then you can achieve $y= w_0x_0^2 + b$. Let's try it. Swap `X` for `X**2` below:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# create target data\n",
|
||||
"x = np.arange(0, 20, 1)\n",
|
||||
"y = 1 + x**2\n",
|
||||
"\n",
|
||||
"# Engineer features \n",
|
||||
"X = x**2 #<-- added engineered feature"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"X = X.reshape(-1, 1) #X should be a 2-D Matrix\n",
|
||||
"model_w,model_b = run_gradient_descent_feng(X, y, iterations=10000, alpha = 1e-5)\n",
|
||||
"\n",
|
||||
"plt.scatter(x, y, marker='x', c='r', label=\"Actual Value\"); plt.title(\"Added x**2 feature\")\n",
|
||||
"plt.plot(x, np.dot(X,model_w) + model_b, label=\"Predicted Value\"); plt.xlabel(\"x\"); plt.ylabel(\"y\"); plt.legend(); plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Great! near perfect fit. Notice the values of $\\mathbf{w}$ and b printed right above the graph: `w,b found by gradient descent: w: [1.], b: 0.0490`. Gradient descent modified our initial values of $\\mathbf{w},b $ to be (1.0,0.049) or a model of $y=1*x_0^2+0.049$, very close to our target of $y=1*x_0^2+1$. If you ran it longer, it could be a better match. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Selecting Features\n",
|
||||
"<a name='GDF'></a>\n",
|
||||
"Above, we knew that an $x^2$ term was required. It may not always be obvious which features are required. One could add a variety of potential features to try and find the most useful. For example, what if we had instead tried : $y=w_0x_0 + w_1x_1^2 + w_2x_2^3+b$ ? \n",
|
||||
"\n",
|
||||
"Run the next cells. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# create target data\n",
|
||||
"x = np.arange(0, 20, 1)\n",
|
||||
"y = x**2\n",
|
||||
"\n",
|
||||
"# engineer features .\n",
|
||||
"X = np.c_[x, x**2, x**3] #<-- added engineered feature"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"model_w,model_b = run_gradient_descent_feng(X, y, iterations=10000, alpha=1e-7)\n",
|
||||
"\n",
|
||||
"plt.scatter(x, y, marker='x', c='r', label=\"Actual Value\"); plt.title(\"x, x**2, x**3 features\")\n",
|
||||
"plt.plot(x, X@model_w + model_b, label=\"Predicted Value\"); plt.xlabel(\"x\"); plt.ylabel(\"y\"); plt.legend(); plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Note the value of $\\mathbf{w}$, `[0.08 0.54 0.03]` and b is `0.0106`.This implies the model after fitting/training is:\n",
|
||||
"$$ 0.08x + 0.54x^2 + 0.03x^3 + 0.0106 $$\n",
|
||||
"Gradient descent has emphasized the data that is the best fit to the $x^2$ data by increasing the $w_1$ term relative to the others. If you were to run for a very long time, it would continue to reduce the impact of the other terms. \n",
|
||||
">Gradient descent is picking the 'correct' features for us by emphasizing its associated parameter\n",
|
||||
"\n",
|
||||
"Let's review this idea:\n",
|
||||
"- Intially, the features were re-scaled so they are comparable to each other\n",
|
||||
"- less weight value implies less important/correct feature, and in extreme, when the weight becomes zero or very close to zero, the associated feature is not useful in fitting the model to the data.\n",
|
||||
"- above, after fitting, the weight associated with the $x^2$ feature is much larger than the weights for $x$ or $x^3$ as it is the most useful in fitting the data. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### An Alternate View\n",
|
||||
"Above, polynomial features were chosen based on how well they matched the target data. Another way to think about this is to note that we are still using linear regression once we have created new features. Given that, the best features will be linear relative to the target. This is best understood with an example. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# create target data\n",
|
||||
"x = np.arange(0, 20, 1)\n",
|
||||
"y = x**2\n",
|
||||
"\n",
|
||||
"# engineer features .\n",
|
||||
"X = np.c_[x, x**2, x**3] #<-- added engineered feature\n",
|
||||
"X_features = ['x','x^2','x^3']"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"fig,ax=plt.subplots(1, 3, figsize=(12, 3), sharey=True)\n",
|
||||
"for i in range(len(ax)):\n",
|
||||
" ax[i].scatter(X[:,i],y)\n",
|
||||
" ax[i].set_xlabel(X_features[i])\n",
|
||||
"ax[0].set_ylabel(\"y\")\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Above, it is clear that the $x^2$ feature mapped against the target value $y$ is linear. Linear regression can then easily generate a model using that feature."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Scaling features\n",
|
||||
"As described in the last lab, if the data set has features with significantly different scales, one should apply feature scaling to speed gradient descent. In the example above, there is $x$, $x^2$ and $x^3$ which will naturally have very different scales. Let's apply Z-score normalization to our example."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# create target data\n",
|
||||
"x = np.arange(0,20,1)\n",
|
||||
"X = np.c_[x, x**2, x**3]\n",
|
||||
"print(f\"Peak to Peak range by column in Raw X:{np.ptp(X,axis=0)}\")\n",
|
||||
"\n",
|
||||
"# add mean_normalization \n",
|
||||
"X = zscore_normalize_features(X) \n",
|
||||
"print(f\"Peak to Peak range by column in Normalized X:{np.ptp(X,axis=0)}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now we can try again with a more aggressive value of alpha:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"x = np.arange(0,20,1)\n",
|
||||
"y = x**2\n",
|
||||
"\n",
|
||||
"X = np.c_[x, x**2, x**3]\n",
|
||||
"X = zscore_normalize_features(X) \n",
|
||||
"\n",
|
||||
"model_w, model_b = run_gradient_descent_feng(X, y, iterations=100000, alpha=1e-1)\n",
|
||||
"\n",
|
||||
"plt.scatter(x, y, marker='x', c='r', label=\"Actual Value\"); plt.title(\"Normalized x x**2, x**3 feature\")\n",
|
||||
"plt.plot(x,X@model_w + model_b, label=\"Predicted Value\"); plt.xlabel(\"x\"); plt.ylabel(\"y\"); plt.legend(); plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Feature scaling allows this to converge much faster. \n",
|
||||
"Note again the values of $\\mathbf{w}$. The $w_1$ term, which is the $x^2$ term is the most emphasized. Gradient descent has all but eliminated the $x^3$ term."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Complex Functions\n",
|
||||
"With feature engineering, even quite complex functions can be modeled:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"x = np.arange(0,20,1)\n",
|
||||
"y = np.cos(x/2)\n",
|
||||
"\n",
|
||||
"X = np.c_[x, x**2, x**3,x**4, x**5, x**6, x**7, x**8, x**9, x**10, x**11, x**12, x**13]\n",
|
||||
"X = zscore_normalize_features(X) \n",
|
||||
"\n",
|
||||
"model_w,model_b = run_gradient_descent_feng(X, y, iterations=1000000, alpha = 1e-1)\n",
|
||||
"\n",
|
||||
"plt.scatter(x, y, marker='x', c='r', label=\"Actual Value\"); plt.title(\"Normalized x x**2, x**3 feature\")\n",
|
||||
"plt.plot(x,X@model_w + model_b, label=\"Predicted Value\"); plt.xlabel(\"x\"); plt.ylabel(\"y\"); plt.legend(); plt.show()\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"## Congratulations!\n",
|
||||
"In this lab you:\n",
|
||||
"- learned how linear regression can model complex, even highly non-linear functions using feature engineering\n",
|
||||
"- recognized that it is important to apply feature scaling when doing feature engineering"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.6"
|
||||
},
|
||||
"toc-autonumbering": false
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
222
C1_W2_Lab05_Sklearn_GD_Soln.ipynb
Normal file
@ -0,0 +1,222 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Optional Lab: Linear Regression using Scikit-Learn"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"There is an open-source, commercially usable machine learning toolkit called [scikit-learn](https://scikit-learn.org/stable/index.html). This toolkit contains implementations of many of the algorithms that you will work with in this course.\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Goals\n",
|
||||
"In this lab you will:\n",
|
||||
"- Utilize scikit-learn to implement linear regression using Gradient Descent"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Tools\n",
|
||||
"You will utilize functions from scikit-learn as well as matplotlib and NumPy. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import numpy as np\n",
|
||||
"import matplotlib.pyplot as plt\n",
|
||||
"from sklearn.linear_model import SGDRegressor\n",
|
||||
"from sklearn.preprocessing import StandardScaler\n",
|
||||
"from lab_utils_multi import load_house_data\n",
|
||||
"from lab_utils_common import dlc\n",
|
||||
"np.set_printoptions(precision=2)\n",
|
||||
"plt.style.use('./deeplearning.mplstyle')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Gradient Descent\n",
|
||||
"Scikit-learn has a gradient descent regression model [sklearn.linear_model.SGDRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDRegressor.html#examples-using-sklearn-linear-model-sgdregressor). Like your previous implementation of gradient descent, this model performs best with normalized inputs. [sklearn.preprocessing.StandardScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html#sklearn.preprocessing.StandardScaler) will perform z-score normalization as in a previous lab. Here it is referred to as 'standard score'."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Load the data set"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"X_train, y_train = load_house_data()\n",
|
||||
"X_features = ['size(sqft)','bedrooms','floors','age']"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Scale/normalize the training data"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"scaler = StandardScaler()\n",
|
||||
"X_norm = scaler.fit_transform(X_train)\n",
|
||||
"print(f\"Peak to Peak range by column in Raw X:{np.ptp(X_train,axis=0)}\") \n",
|
||||
"print(f\"Peak to Peak range by column in Normalized X:{np.ptp(X_norm,axis=0)}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Create and fit the regression model"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"sgdr = SGDRegressor(max_iter=1000)\n",
|
||||
"sgdr.fit(X_norm, y_train)\n",
|
||||
"print(sgdr)\n",
|
||||
"print(f\"number of iterations completed: {sgdr.n_iter_}, number of weight updates: {sgdr.t_}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### View parameters\n",
|
||||
"Note, the parameters are associated with the *normalized* input data. The fit parameters are very close to those found in the previous lab with this data."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"b_norm = sgdr.intercept_\n",
|
||||
"w_norm = sgdr.coef_\n",
|
||||
"print(f\"model parameters: w: {w_norm}, b:{b_norm}\")\n",
|
||||
"print( \"model parameters from previous lab: w: [110.56 -21.27 -32.71 -37.97], b: 363.16\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Make predictions\n",
|
||||
"Predict the targets of the training data. Use both the `predict` routine and compute using $w$ and $b$."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# make a prediction using sgdr.predict()\n",
|
||||
"y_pred_sgd = sgdr.predict(X_norm)\n",
|
||||
"# make a prediction using w,b. \n",
|
||||
"y_pred = np.dot(X_norm, w_norm) + b_norm \n",
|
||||
"print(f\"prediction using np.dot() and sgdr.predict match: {(y_pred == y_pred_sgd).all()}\")\n",
|
||||
"\n",
|
||||
"print(f\"Prediction on training set:\\n{y_pred[:4]}\" )\n",
|
||||
"print(f\"Target values \\n{y_train[:4]}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Plot Results\n",
|
||||
"Let's plot the predictions versus the target values."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# plot predictions and targets vs original features \n",
|
||||
"fig,ax=plt.subplots(1,4,figsize=(12,3),sharey=True)\n",
|
||||
"for i in range(len(ax)):\n",
|
||||
" ax[i].scatter(X_train[:,i],y_train, label = 'target')\n",
|
||||
" ax[i].set_xlabel(X_features[i])\n",
|
||||
" ax[i].scatter(X_train[:,i],y_pred,color=dlc[\"dlorange\"], label = 'predict')\n",
|
||||
"ax[0].set_ylabel(\"Price\"); ax[0].legend();\n",
|
||||
"fig.suptitle(\"target versus prediction using z-score normalized model\")\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Congratulations!\n",
|
||||
"In this lab you:\n",
|
||||
"- utilized an open-source machine learning toolkit, scikit-learn\n",
|
||||
"- implemented linear regression using gradient descent and feature normalization from that toolkit"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
@ -0,0 +1,329 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Ungraded Lab: Model Representation\n",
|
||||
"\n",
|
||||
"In this ungraded lab, you will implement the model $f_w$ for linear regression with one variable.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Problem Statement\n",
|
||||
"\n",
|
||||
"You will use the motivating example of housing price prediction. There are two data points - a house with 1000 square feet sold for \\\\$200,000 and a house with 2000 square feet sold for \\\\$400,000.\n",
|
||||
"\n",
|
||||
"Therefore, your dataset contains the following two points - \n",
|
||||
"\n",
|
||||
"| Size (feet$^2$) | Price (1000s of dollars) |\n",
|
||||
"| -------------------| ------------------------ |\n",
|
||||
"| 1000 | 200 |\n",
|
||||
"| 2000 | 400 |\n",
|
||||
"\n",
|
||||
"You would like to fit a linear regression model (represented with a straight line) through these two points, so you can then predict price for other houses - say, a house with 1200 feet$^2$.\n",
|
||||
"\n",
|
||||
"### Notation: `X` and `y`\n",
|
||||
"\n",
|
||||
"For the next few labs, you will use lists in python to represent your dataset. As shown in the video:\n",
|
||||
"- `X` represents input variables, also called input features (in this case - Size (feet$^2$)) and \n",
|
||||
"- `y` represents output variables, also known as target variables (in this case - Price (1000s of dollars)). \n",
|
||||
"\n",
|
||||
"Please run the following code cell to create your `X` and `y` variables."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# X is the input variable (size in square feet)\n",
|
||||
"# y in the output variable (price in 1000s of dollars)\n",
|
||||
"X = [1000, 2000] \n",
|
||||
"y = [200, 400]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Number of training examples `m`\n",
|
||||
"You will use `m` to denote the number of training examples. In Python, use the `len()` function to get the number of examples in a list. You can get `m` by running the next code cell."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# m is the number of training examples\n",
|
||||
"m = len(X)\n",
|
||||
"print(f\"Number of training examples is: {m}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Training example `x_i, y_i`\n",
|
||||
"\n",
|
||||
"You will use (x$^i$, y$^i$) to denote the $i^{th}$ training example. Since Python is zero indexed, (x$^0$, y$^0$) is (1000, 200) and (x$^1$, y$^1$) is (2000, 400). \n",
|
||||
"\n",
|
||||
"Run the next code block below to get the $i^{th}$ training example in a Python list."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"i = 0 # Change this to 1 to see (x^1, y^1)\n",
|
||||
"\n",
|
||||
"x_i = X[i]\n",
|
||||
"y_i = y[i]\n",
|
||||
"print(f\"(x^({i}), y^({i})) = ({x_i}, {y_i})\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Plotting the data\n",
|
||||
"First, let's run the cell below to import [matplotlib](http://matplotlib.org), which is a famous library to plot graphs in Python. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import matplotlib.pyplot as plt"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You can plot these two points using the `scatter()` function in the `matplotlib` library, as shown in the cell below. \n",
|
||||
"- The function arguments `marker` and `c` show the points as red crosses (the default is blue dots).\n",
|
||||
"\n",
|
||||
"You can also use other functions in the `matplotlib` library to display the title and labels for the axes."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Plot the data points\n",
|
||||
"plt.scatter(X, y, marker='x', c='r')\n",
|
||||
"\n",
|
||||
"# Set the title\n",
|
||||
"plt.title(\"Housing Prices\")\n",
|
||||
"# Set the y-axis label\n",
|
||||
"plt.ylabel('Price (in 1000s of dollars)')\n",
|
||||
"# Set the x-axis label\n",
|
||||
"plt.xlabel('Size (feet^2)')\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Model function\n",
|
||||
"\n",
|
||||
"The model function for linear regression (which is a function that maps from `X` to `y`) is represented as \n",
|
||||
"\n",
|
||||
"$f(x) = w_0 + w_1x$\n",
|
||||
"\n",
|
||||
"The formula above is how you can represent straight lines - different values of $w_0$ and $w_1$ give you different straight lines on the plot. Let's try to get a better intuition for this through the code blocks below.\n",
|
||||
"\n",
|
||||
"Let's represent $w$ as a list in python, with $w_0$ as the first item in the list and $w_1$ as the second. \n",
|
||||
"\n",
|
||||
"Let's start with $w_0 = 3$ and $w_1 = 1$ \n",
|
||||
"\n",
|
||||
"### Note: You can come back to this cell to adjust the model's w0 and w1 parameters"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# You can come back here later to adjust w0 and w1\n",
|
||||
"w = [3, 1] \n",
|
||||
"print(\"w_0:\", w[0])\n",
|
||||
"print(\"w_1:\", w[1])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now, let's calculate the value of $f(x)$ for your two data points. You can explicitly write this out for each data point as - \n",
|
||||
"\n",
|
||||
"for $x^0$, `f = w[0]+w[1]*X[0]`\n",
|
||||
"\n",
|
||||
"for $x^1$, `f = w[0]+w[1]*X[1]`\n",
|
||||
"\n",
|
||||
"For a large number of data points, this can get unwieldy and repetitive. So instead, you can calculate the function output in a `for` loop as follows - \n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"f = []\n",
|
||||
"for i in range(len(X)):\n",
|
||||
" f_x = w[0] + w[1]*X[i]\n",
|
||||
" f.append(f_x)\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"Paste the code shown above in the `calculate_model_output` function below.\n",
|
||||
"Please recall that in Python, indentation is significant. Incorrect indentation may result in a Python error message."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def calculate_model_output(w, X):\n",
|
||||
" ### START CODE HERE ### \n",
|
||||
"\n",
|
||||
" ### END CODE HERE ###\n",
|
||||
" return f"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now let's call the `calculate_model_output` function and plot the output using the `plot` method from `matplotlib` library."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"f = calculate_model_output(w, X)\n",
|
||||
"\n",
|
||||
"# Plot our model prediction\n",
|
||||
"plt.plot(X, f, c='b',label='Our Prediction')\n",
|
||||
"\n",
|
||||
"# Plot the data points\n",
|
||||
"plt.scatter(X, y, marker='x', c='r',label='Actual Values')\n",
|
||||
"\n",
|
||||
"# Set the title\n",
|
||||
"plt.title(\"Housing Prices\")\n",
|
||||
"# Set the y-axis label\n",
|
||||
"plt.ylabel('Price (in 1000s of dollars)')\n",
|
||||
"# Set the x-axis label\n",
|
||||
"plt.xlabel('Size (feet^2)')\n",
|
||||
"plt.legend()\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"As you can see, setting $w_0 = 3$ and $w_1 = 1$ does not result in a line that fits our data. \n",
|
||||
"\n",
|
||||
"### Challenge\n",
|
||||
"Try experimenting with different values of $w_0$ and $w_1$. What should the values be for getting a line that fits our data?\n",
|
||||
"\n",
|
||||
"#### Tip:\n",
|
||||
"You can use your mouse to click on the triangle to the left of the green \"Hints\" below to reveal some hints for choosing w0 and w1."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<details>\n",
|
||||
"<summary>\n",
|
||||
" <font size='3', color='darkgreen'><b>Hints</b></font>\n",
|
||||
"</summary>\n",
|
||||
" <p>\n",
|
||||
" <ul>\n",
|
||||
" <li>Try w0 = 1 and w1 = 0.5, w = [1, 0.5] </li>\n",
|
||||
" <li>Try w0 = 0 and w1 = 0.2, w = [0, 0.2] </li>\n",
|
||||
" </ul>\n",
|
||||
" </p>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Prediction\n",
|
||||
"Now that we have a model, we can use it to make our original prediction. Write the expression to predict the price of a house with 1200 feet^2. You can check your answer below.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"\n",
|
||||
"print(f\"{cost_1200sqft:.0f} thousand dollars\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<details>\n",
|
||||
"<summary>\n",
|
||||
" <font size='3', color='darkgreen'><b>Answer</b></font> \n",
|
||||
"</summary> \n",
|
||||
"\n",
|
||||
"```\n",
|
||||
" w = [0, 0.2] \n",
|
||||
" cost_1200sqft = w[0] + w[1]*1200\n",
|
||||
" ```\n",
|
||||
"\n",
|
||||
"240 thousand dollars"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.8.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
@ -0,0 +1,689 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Optional Lab: Python, NumPy and Vectorization\n",
|
||||
"A brief introduction to some of the scientific computing used in this course. In particular the NumPy scientific computing package and its use with python.\n",
|
||||
"\n",
|
||||
"# Outline\n",
|
||||
"- [ 1.1 Goals](#toc_40015_1.1)\n",
|
||||
"- [ 1.2 Useful References](#toc_40015_1.2)\n",
|
||||
"- [2 Python and NumPy <a name='Python and NumPy'></a>](#toc_40015_2)\n",
|
||||
"- [3 Vectors](#toc_40015_3)\n",
|
||||
"- [ 3.1 Abstract](#toc_40015_3.1)\n",
|
||||
"- [ 3.2 NumPy Arrays](#toc_40015_3.2)\n",
|
||||
"- [ 3.3 Vector Creation](#toc_40015_3.3)\n",
|
||||
"- [ 3.4 Operations on Vectors](#toc_40015_3.4)\n",
|
||||
"- [4 Matrices](#toc_40015_4)\n",
|
||||
"- [ 4.1 Abstract](#toc_40015_4.1)\n",
|
||||
"- [ 4.2 NumPy Arrays](#toc_40015_4.2)\n",
|
||||
"- [ 4.3 Matrix Creation](#toc_40015_4.3)\n",
|
||||
"- [ 4.4 Operations on Matrices](#toc_40015_4.4)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import numpy as np # it is an unofficial standard to use np for numpy\n",
|
||||
"import sys\n",
|
||||
"import numpy.random as rand"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_1.1\"></a>\n",
|
||||
"## 1.1 Goals\n",
|
||||
"In this lab you will:\n",
|
||||
"- Review the features of NumPy and Python that are used in Course 1"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_1.2\"></a>\n",
|
||||
"## 1.2 Useful References\n",
|
||||
"- NumPy Documentation including a basic introduction: [NumPy.org](https://NumPy.org/doc/stable/)\n",
|
||||
"- A challenging feature topic: [NumPy Broadcasting](https://NumPy.org/doc/stable/user/basics.broadcasting.html)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_2\"></a>\n",
|
||||
"# 2 Python and NumPy <a name='Python and NumPy'></a>\n",
|
||||
"Python is the programming language we will be using in this course. It has built-in, a set of numeric data types and arithmetic operations. NumPy is a library that extends the base capabilities of python to add a richer data set including more numeric types, vectors, matrices, and many matrix functions. NumPy and python work together fairly seamlessly. Python arithmetic operators work on NumPy data types and many NumPy functions will accept python data types.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3\"></a>\n",
|
||||
"# 3 Vectors\n",
|
||||
"<a name=\"toc_40015_3.1\"></a>\n",
|
||||
"## 3.1 Abstract\n",
|
||||
"<img align=\"right\" src=\"./images/C1_W2_Lab04_Vectors.PNG\" style=\"width:340px;\" >Vectors, as you will use them in this course, are ordered arrays of numbers. In notation, vectors are denoted with lower case bold letters such as $\\mathbf{x}$. The elements of a vector are all the same type. A vector does not, for example, contain both characters and numbers. The number of elements in the array is often referred to as the *dimension* though mathematicians may prefer *rank*. The vector shown has a dimension of $n$. The elements of a vector can be referenced with an index. In math settings, indexes typically run from 1 to n. In computer science and these labs, indexing will typically run from 0 to n-1. In notation, elements of a vector, when referenced individually will indicate the index in a subscript, for example, the $0^{th}$ element, of the vector $\\mathbf{x}$ is $x_0$. Note, the x is not bold in this case. \n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.2\"></a>\n",
|
||||
"## 3.2 NumPy Arrays\n",
|
||||
"\n",
|
||||
"NumPy's basic data structure is an indexable, n-dimensional *array* containing elements of the same type (`dtype`). Right away, you may notice we have overloaded the term 'dimension'. Above, it was the number of elements in the vector, here, dimension refers to the number of indexes of an array. A one-dimensional or 1-D array has one index. In Course 1, we will represent vectors as NumPy 1-D arrays. \n",
|
||||
"\n",
|
||||
" - 1-D array, shape (n,): n elements indexed [0] through [n-1]\n",
|
||||
" "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.3\"></a>\n",
|
||||
"## 3.3 Vector Creation\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Data creation routines in NumPy will generally have a first parameter which is the shape of the object. This can either be a single value for a 1-D result or a tuple (n,m,...) specifying the shape of the result. Below are examples of creating vectors using these routines."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# NumPy routines which allocate memory and fill arrays with value\n",
|
||||
"a = np.zeros(4); print(f\"np.zeros(4) : a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")\n",
|
||||
"a = np.zeros((4,)); print(f\"np.zeros(4,) : a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")\n",
|
||||
"a = np.random.random_sample(4); print(f\"np.random.random_sample(4): a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Some data creation routines do not take a shape tuple:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# NumPy routines which allocate memory and fill arrays with value but do not accept shape as input argument\n",
|
||||
"a = np.arange(4.); print(f\"np.arange(4.): a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")\n",
|
||||
"a = np.random.rand(4); print(f\"np.random.rand(4): a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"values can be specified manually as well. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# NumPy routines which allocate memory and fill with user specified values\n",
|
||||
"a = np.array([5,4,3,2]); print(f\"np.array([5,4,3,2]): a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")\n",
|
||||
"a = np.array([5.,4,3,2]); print(f\"np.array([5.,4,3,2]): a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"These have all created a one-dimensional vector `a` with four elements. `a.shape` returns the dimensions. Here we see a.shape = `(4,)` indicating a 1-d array with 4 elements. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.4\"></a>\n",
|
||||
"## 3.4 Operations on Vectors\n",
|
||||
"Let's explore some operations using vectors.\n",
|
||||
"<a name=\"toc_40015_3.4.1\"></a>\n",
|
||||
"### 3.4.1 Indexing\n",
|
||||
"Elements of vectors can be accessed via indexing and slicing. NumPy provides a very complete set of indexing and slicing capabilities. We will explore only the basics needed for the course here. Reference [Slicing and Indexing](https://NumPy.org/doc/stable/reference/arrays.indexing.html) for more details. \n",
|
||||
"**Indexing** means referring to *an element* of an array by its position within the array. \n",
|
||||
"**Slicing** means getting a *subset* of elements from an array based on their indices. \n",
|
||||
"NumPy starts indexing at zero so the 3rd element of an vector $\\mathbf{a}$ is `a[2]`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#vector indexing operations on 1-D vectors\n",
|
||||
"a = np.arange(10)\n",
|
||||
"print(a)\n",
|
||||
"\n",
|
||||
"#access an element\n",
|
||||
"print(f\"a[2].shape: {a[2].shape} a[2] = {a[2]}, Accessing an element returns a scalar\")\n",
|
||||
"\n",
|
||||
"# access the last element, negative indexes count from the end\n",
|
||||
"print(f\"a[-1] = {a[-1]}\")\n",
|
||||
"\n",
|
||||
"#indexs must be within the range of the vector or they will produce and error\n",
|
||||
"try:\n",
|
||||
" c = a[10]\n",
|
||||
"except Exception as e:\n",
|
||||
" print(\"The error message you'll see is:\")\n",
|
||||
" print(e)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.4.2\"></a>\n",
|
||||
"### 3.4.2 Slicing\n",
|
||||
"Slicing creates an array of indices using a set of three values (`start:stop:step`). A subset of values is also valid. Its use is best explained by example:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#vector slicing operations\n",
|
||||
"a = np.arange(10)\n",
|
||||
"print(f\"a = {a}\")\n",
|
||||
"\n",
|
||||
"#access 5 consecutive elements (start:stop:step)\n",
|
||||
"c = a[2:7:1]; print(\"a[2:7:1] = \", c)\n",
|
||||
"\n",
|
||||
"# access 3 elements separated by two \n",
|
||||
"c = a[2:7:2]; print(\"a[2:7:2] = \", c)\n",
|
||||
"\n",
|
||||
"# access all elements index 3 and above\n",
|
||||
"c = a[3:]; print(\"a[3:] = \", c)\n",
|
||||
"\n",
|
||||
"# access all elements below index 3\n",
|
||||
"c = a[:3]; print(\"a[:3] = \", c)\n",
|
||||
"\n",
|
||||
"# access all elements\n",
|
||||
"c = a[:]; print(\"a[:] = \", c)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.4.3\"></a>\n",
|
||||
"### 3.4.3 Single vector operations\n",
|
||||
"There are a number of useful operations that involve operations on a single vector."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"a = np.array([1,2,3,4])\n",
|
||||
"print(f\"a : {a}\")\n",
|
||||
"# negate elements of a\n",
|
||||
"b = -a \n",
|
||||
"print(f\"b = -a : {b}\")\n",
|
||||
"\n",
|
||||
"# sum all elements of a, returns a scalar\n",
|
||||
"b = np.sum(a) \n",
|
||||
"print(f\"b = np.sum(a) : {b}\")\n",
|
||||
"\n",
|
||||
"b = np.mean(a)\n",
|
||||
"print(f\"b = np.mean(a): {b}\")\n",
|
||||
"\n",
|
||||
"b = a**2\n",
|
||||
"print(f\"b = a**2 : {b}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.4.4\"></a>\n",
|
||||
"### 3.4.4 Vector Vector element-wise operations\n",
|
||||
"Most of the NumPy arithmetic, logical and comparison operations apply to vectors as well. These operators work on an element-by-element basis. For example \n",
|
||||
"$$ \\mathbf{a} + \\mathbf{b} = \\sum_{i=0}^{n-1} a_i + b_i $$"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"a = np.array([ 1, 2, 3, 4])\n",
|
||||
"b = np.array([-1,-2, 3, 4])\n",
|
||||
"print(f\"Binary operators work element wise: {a + b}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Of course, for this to work correctly, the vectors must be of the same size:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#try a mismatched vector operation\n",
|
||||
"c = np.array([1, 2])\n",
|
||||
"try:\n",
|
||||
" d = a + c\n",
|
||||
"except Exception as e:\n",
|
||||
" print(\"The error message you'll see is:\")\n",
|
||||
" print(e)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.4.5\"></a>\n",
|
||||
"### 3.4.5 Scalar Vector operations\n",
|
||||
"Vectors can be 'scaled' by scalar values. A scalar value is just a number. The scalar multiplies all the elements of the vector."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"a = np.array([1, 2, 3, 4])\n",
|
||||
"\n",
|
||||
"# multiply a by a scalar\n",
|
||||
"b = 5 * a \n",
|
||||
"print(f\"b = 5 * a : {b}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.4.6\"></a>\n",
|
||||
"### 3.4.6 Vector Vector dot product\n",
|
||||
"The dot product is a mainstay of Linear Algebra and NumPy. This is an operation used extensively in this course and should be well understood. The dot product is shown below."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<img src=\"./images/C1_W2_Lab04_dot_notrans.gif\" width=800> "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The dot product multiplies the values in two vectors element-wise and then sums the result.\n",
|
||||
"Vector dot product requires the dimensions of the two vectors to be the same. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's implement our own version of the dot product below:\n",
|
||||
"\n",
|
||||
"**Using a for loop**, implement a function which returns the dot product of two vectors. The function to return given inputs $a$ and $b$:\n",
|
||||
"$$ x = \\sum_{i=0}^{n-1} a_i b_i $$\n",
|
||||
"Assume both `a` and `b` are the same shape."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def my_dot(a, b): \n",
|
||||
" \"\"\"\n",
|
||||
" Compute the dot product of two vectors\n",
|
||||
" \n",
|
||||
" Args:\n",
|
||||
" a (ndarray): Shape (n,) input vector \n",
|
||||
" b (ndarray): Shape (n,) input vector with same dimension as a\n",
|
||||
" \n",
|
||||
" Returns:\n",
|
||||
" x (scalar): \n",
|
||||
" \"\"\"\n",
|
||||
" x=0\n",
|
||||
" a_shape = a.shape\n",
|
||||
" for i in range(a.shape[0]):\n",
|
||||
" x = x + a[i] * b[i]\n",
|
||||
" return (x)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# test 1-D\n",
|
||||
"a = np.array([1, 2, 3, 4])\n",
|
||||
"b = np.array([-1, 4, 3, 2])\n",
|
||||
"print(f\"my_dot(a, b) = {my_dot(a, b)}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Note, the dot product is expected to return a scalar value. \n",
|
||||
"\n",
|
||||
"Let's try the same operations using `np.dot`. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# test 1-D\n",
|
||||
"a = np.array([1, 2, 3, 4])\n",
|
||||
"b = np.array([-1, 4, 3, 2])\n",
|
||||
"c = np.dot(a, b)\n",
|
||||
"print(f\"NumPy 1-D np.dot(a, b) = {c}, np.dot(a, b).shape = {c.shape} \") \n",
|
||||
"c = np.dot(b, a)\n",
|
||||
"print(f\"NumPy 1-D np.dot(b, a) = {c}, np.dot(a, b).shape = {c.shape} \")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Above, you will note that the results for 1-D matched our implementation."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.4.7\"></a>\n",
|
||||
"### 3.4.7 The Need for Speed: vector vs for loop\n",
|
||||
"We utilized the NumPy library because it improves speed memory efficiency. Let's demonstrate:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import time\n",
|
||||
"np.random.seed(1)\n",
|
||||
"a = np.random.rand(1000000) # very large arrays\n",
|
||||
"b = np.random.rand(1000000)\n",
|
||||
"\n",
|
||||
"tic = time.time() # capture start time\n",
|
||||
"c = np.dot(a, b)\n",
|
||||
"toc = time.time() # capture end time\n",
|
||||
"\n",
|
||||
"print(f\"np.dot(a, b) = {c:.4f}\")\n",
|
||||
"print(f\"Vectorized version duration: {1000*(toc-tic):.4f} ms \")\n",
|
||||
"\n",
|
||||
"tic = time.time() # capture start time\n",
|
||||
"c = my_dot(a,b)\n",
|
||||
"toc = time.time() # capture end time\n",
|
||||
"\n",
|
||||
"print(f\"my_dot(a, b) = {c:.4f}\")\n",
|
||||
"print(f\"loop version duration: {1000*(toc-tic):.4f} ms \")\n",
|
||||
"\n",
|
||||
"del(a);del(b) #remove these big arrays from memory"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"So, vectorization provides more than 100x speed up in this example! This is because NumPy makes better use of available data parallelism in the underlying hardware. GPU's and modern CPU's implement Single Instruction, Multiple Data (SIMD) pipelines allowing multiple operations to be issued in parallel. This is critical in Machine Learning where the data sets are often very large."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_4\"></a>\n",
|
||||
"# 4 Matrices\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_4.1\"></a>\n",
|
||||
"## 4.1 Abstract\n",
|
||||
"Matrices, are two dimensional arrays. The elements of a matrix are all of the same type. In notation, matrices are denoted with capitol, bold letter such as $\\mathbf{X}$. In this and other labs, `m` is often the number of rows and `n` the number of columns. The elements of a matrix can be referenced with a two dimensional index. In math settings, numbers in the index typically run from 1 to n. In computer science and these labs, indexing will run from 0 to n-1. \n",
|
||||
"<figure>\n",
|
||||
" <center> <img src=\"./images/C1_W2_Lab04_Matrices.PNG\" alt='missing' width=900><center/>\n",
|
||||
" <figcaption> Generic Matrix Notation, 1st index is row, 2nd is column </figcaption>\n",
|
||||
"<figure/>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_4.2\"></a>\n",
|
||||
"## 4.2 NumPy Arrays\n",
|
||||
"\n",
|
||||
"NumPy's basic data structure is an indexable, n-dimensional *array* containing elements of the same type (`dtype`). These were described earlier. Matrices have a two-dimensional (2-D) index [m,n].\n",
|
||||
"\n",
|
||||
"In Course 1, 2-D matrices are used to hold training data. Training data is $m$ examples by $n$ features creating an (m,n) array. Course 1 does not do operations directly on matrices but typically extracts an example as a vector and operates on that. Below you will review: \n",
|
||||
"- data creation\n",
|
||||
"- slicing and indexing"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_4.3\"></a>\n",
|
||||
"## 4.3 Matrix Creation\n",
|
||||
"The same functions that created 1-D vectors will create 2-D or n-D arrays. Here are some examples\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Below, the shape tuple is provided to achieve a 2-D result. Notice how NumPy uses brackets to denote each dimension. Notice further than NumPy, when printing, will print one row per line.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"a = np.zeros((1, 5)) \n",
|
||||
"print(f\"a shape = {a.shape}, a = {a}\") \n",
|
||||
"\n",
|
||||
"a = np.zeros((2, 1)) \n",
|
||||
"print(f\"a shape = {a.shape}, a = {a}\") \n",
|
||||
"\n",
|
||||
"a = np.random.random_sample((1, 1)) \n",
|
||||
"print(f\"a shape = {a.shape}, a = {a}\") "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"One can also manually specify data. Dimensions are specified with additional brackets matching the format in the printing above."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# NumPy routines which allocate memory and fill with user specified values\n",
|
||||
"a = np.array([[5], [4], [3]]); print(f\" a shape = {a.shape}, np.array: a = {a}\")\n",
|
||||
"a = np.array([[5], # One can also\n",
|
||||
" [4], # separate values\n",
|
||||
" [3]]); #into separate rows\n",
|
||||
"print(f\" a shape = {a.shape}, np.array: a = {a}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_4.4\"></a>\n",
|
||||
"## 4.4 Operations on Matrices\n",
|
||||
"Let's explore some operations using matrices."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_4.4.1\"></a>\n",
|
||||
"### 4.4.1 Indexing\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Matrices include a second index. The two indexes describe [row, column]. Access can either return an element or a row/column. See below:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#vector indexing operations on matrices\n",
|
||||
"a = np.arange(6).reshape(-1, 2) #reshape is a convenient way to create matrices\n",
|
||||
"print(f\"a.shape: {a.shape}, \\na= {a}\")\n",
|
||||
"\n",
|
||||
"#access an element\n",
|
||||
"print(f\"\\na[2,0].shape: {a[2, 0].shape}, a[2,0] = {a[2, 0]}, type(a[2,0]) = {type(a[2, 0])} Accessing an element returns a scalar\\n\")\n",
|
||||
"\n",
|
||||
"#access a row\n",
|
||||
"print(f\"a[2].shape: {a[2].shape}, a[2] = {a[2]}, type(a[2]) = {type(a[2])}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"It is worth drawing attention to the last example. Accessing a matrix by just specifying the row will return a *1-D vector*."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_4.4.2\"></a>\n",
|
||||
"### 4.4.2 Slicing\n",
|
||||
"Slicing creates an array of indices using a set of three values (`start:stop:step`). A subset of values is also valid. Its use is best explained by example:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#vector 2-D slicing operations\n",
|
||||
"a = np.arange(20).reshape(-1, 10)\n",
|
||||
"print(f\"a = \\n{a}\")\n",
|
||||
"\n",
|
||||
"#access 5 consecutive elements (start:stop:step)\n",
|
||||
"print(\"a[0, 2:7:1] = \", a[0, 2:7:1], \", a[0, 2:7:1].shape =\", a[0, 2:7:1].shape, \"a 1-D array\")\n",
|
||||
"\n",
|
||||
"#access 5 consecutive elements (start:stop:step) in two rows\n",
|
||||
"print(\"a[:, 2:7:1] = \\n\", a[:, 2:7:1], \", a[:, 2:7:1].shape =\", a[:, 2:7:1].shape, \"a 2-D array\")\n",
|
||||
"\n",
|
||||
"# access all elements\n",
|
||||
"print(\"a[:,:] = \\n\", a[:,:], \", a[:,:].shape =\", a[:,:].shape)\n",
|
||||
"\n",
|
||||
"# access all elements in one row (very common usage)\n",
|
||||
"print(\"a[1,:] = \", a[1,:], \", a[1,:].shape =\", a[1,:].shape, \"a 1-D array\")\n",
|
||||
"# same as\n",
|
||||
"print(\"a[1] = \", a[1], \", a[1].shape =\", a[1].shape, \"a 1-D array\")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_5.0\"></a>\n",
|
||||
"## Congratulations!\n",
|
||||
"In this lab you mastered the features of Python and NumPy that are needed for Course 1."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"dl_toc_settings": {
|
||||
"rndtag": "40015"
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.6"
|
||||
},
|
||||
"toc-autonumbering": false
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
@ -0,0 +1,310 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Ungraded Lab: Cost Function \n",
|
||||
"\n",
|
||||
"In this ungraded lab, you will implement the `cost` function for linear regression with one variable. The term 'cost' in this assignment might be a little confusing since the data is housing cost. Here, cost is a measure how well our model is predicting the actual value of the house. We will use the term 'price' for the data.\n",
|
||||
"\n",
|
||||
"First, let's run the cell below to import [matplotlib](http://matplotlib.org), which is a famous library to plot graphs in Python. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import matplotlib.pyplot as plt"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Problem Statement\n",
|
||||
"\n",
|
||||
"Let's use the same two data points as before - a house with 1000 square feet sold for \\\\$200,000 and a house with 2000 square feet sold for \\\\$400,000.\n",
|
||||
"\n",
|
||||
"That is our dataset contains has the following two points - \n",
|
||||
"\n",
|
||||
"| Size (feet$^2$) | Price (1000s of dollars) |\n",
|
||||
"| -------------------| ------------------------ |\n",
|
||||
"| 1000 | 200 |\n",
|
||||
"| 2000 | 400 |\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# X_train is the input features, in this case (size in square feet)\n",
|
||||
"# y_train is the actual value (price in 1000s of dollars)\n",
|
||||
"X_train = [1000, 2000] \n",
|
||||
"y_train = [200, 400]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# routine to plot the data points\n",
|
||||
"def plt_house(X, y,f_w=None):\n",
|
||||
" plt.scatter(X, y, marker='x', c='r', label=\"Actual Value\")\n",
|
||||
"\n",
|
||||
" # Set the title\n",
|
||||
" plt.title(\"Housing Prices\")\n",
|
||||
" # Set the y-axis label\n",
|
||||
" plt.ylabel('Price (in 1000s of dollars)')\n",
|
||||
" # Set the x-axis label\n",
|
||||
" plt.xlabel('Size (feet^2)')\n",
|
||||
" # print predictions\n",
|
||||
" if f_w != None:\n",
|
||||
" plt.plot(X, f_w, c='b', label=\"Our Prediction\")\n",
|
||||
" plt.legend()\n",
|
||||
" plt.show()\n",
|
||||
" \n",
|
||||
"plt_house(X_train,y_train)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Computing Cost\n",
|
||||
"\n",
|
||||
"The cost is:\n",
|
||||
" $$J(\\mathbf{w}) = \\frac{1}{2m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w}}(\\mathbf{x}^{(i)}) - y^{(i)})^2$$ \n",
|
||||
" \n",
|
||||
"where \n",
|
||||
" $$f_{\\mathbf{w}}(\\mathbf{x}^{(i)}) = w_0x_0^{(i)} + w_1x_1^{(i)} \\tag{1}$$\n",
|
||||
" \n",
|
||||
"- $f_{\\mathbf{w}}(\\mathbf{x}^{(i)})$ is our prediction for example $i$ using our parameters $\\mathbf{w}$. \n",
|
||||
"- $(f_{\\mathbf{w}}(\\mathbf{x}^{(i)}) -y^{(i)})^2$ is the squared difference between the actual value and our prediction. \n",
|
||||
"- These differences are summed over all the $m$ examples and averaged to produce the cost, $J(\\mathbf{w})$. \n",
|
||||
"Note, in lecture summation ranges are typically from 1 to m while in code, we will run 0 to m-1."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<details>\n",
|
||||
"<summary>\n",
|
||||
" <font size='3', color='darkgreen'><b>Hints</b></font>\n",
|
||||
"</summary>\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"#Function to calculate the cost\n",
|
||||
"def compute_cost(X, y, w):\n",
|
||||
" \n",
|
||||
" m = len(X)\n",
|
||||
" cost = 0\n",
|
||||
" \n",
|
||||
" for i in range(m):\n",
|
||||
" \n",
|
||||
" # Calculate the model prediction\n",
|
||||
" f_w = w[0] + w[1]*X[i]\n",
|
||||
" \n",
|
||||
" # Calculate the cost\n",
|
||||
" cost = cost + (f_w - y[i])**2\n",
|
||||
" \n",
|
||||
" total_cost = 1/(2*m) * cost\n",
|
||||
"\n",
|
||||
" return total_cost\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#Function to calculate the cost\n",
|
||||
"def compute_cost(X, y, w):\n",
|
||||
" \n",
|
||||
" m = len(X)\n",
|
||||
" cost = 0\n",
|
||||
" \n",
|
||||
" for i in range(m):\n",
|
||||
" ### START CODE HERE ### \n",
|
||||
"\n",
|
||||
" ### END CODE HERE ### \n",
|
||||
" total_cost = 1/(2*m) * cost\n",
|
||||
"\n",
|
||||
" return total_cost"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"w_p = [1, 2] # w0 = w[0], w1 = w[1] \n",
|
||||
"\n",
|
||||
"total_cost = compute_cost(X_train, y_train, w_p)\n",
|
||||
"print(\"Total cost :\", total_cost)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Expected Output**:\n",
|
||||
"```Total cost : 4052700.5```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In the next lab, we will minimise the cost by optimizing our parameters $\\mathbf{w}$ using gradient descent. For now, we can try various values manually. To to keep it simple, we know from the previous lab that $w_0 = 0$ produces a minimum. So, we'll set $w_0$ to zero and vary $w_1$."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Print w1 vs cost to see minimum\n",
|
||||
"\n",
|
||||
"w1_list = [-0.6, -0.4, -0.2, 0, 0.2, 0.4, 0.6]\n",
|
||||
"cost_list = []\n",
|
||||
"\n",
|
||||
"for w1 in w1_list:\n",
|
||||
" w_p = [0, w1]\n",
|
||||
" total_cost = compute_cost(X_train, y_train, w_p)\n",
|
||||
" cost_list.append(total_cost)\n",
|
||||
" \n",
|
||||
"plt.plot(w1_list, cost_list)\n",
|
||||
"plt.title(\"Cost vs w1\")\n",
|
||||
"plt.ylabel('Cost')\n",
|
||||
"plt.xlabel('w1')\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# We can see a global minimum at w1 = 0.2 Therefore, let's try w = [0,0.2] \n",
|
||||
"# to see if that fits the data\n",
|
||||
"w_p = [0, 0.2] # w0 = 0, w1 = 0.2\n",
|
||||
"\n",
|
||||
"total_cost = compute_cost(X_train, y_train,w_p)\n",
|
||||
"print(\"Total cost :\", total_cost)\n",
|
||||
"f_w = [w_p[0] + w_p[1]*X_train[0], w_p[0] + w_p[1]*X_train[1]]\n",
|
||||
"plt_house(X_train, y_train, f_w)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"We can see how our cost varies as we modify both $w_0$ and $w_1$ by plotting in 3D or in contour plots."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from mpl_toolkits.mplot3d import axes3d\n",
|
||||
"import matplotlib.pyplot as plt\n",
|
||||
"import numpy as np\n",
|
||||
"\n",
|
||||
"w0 = np.arange(-500, 500, 5)\n",
|
||||
"w1 = np.arange(-0.2, 0.8, 0.005)\n",
|
||||
"w0,w1 = np.meshgrid(w0,w1)\n",
|
||||
"z=np.zeros_like(w0)\n",
|
||||
"n,_ = w0.shape\n",
|
||||
"for i in range(n):\n",
|
||||
" for j in range(n):\n",
|
||||
" z[i][j] = compute_cost(X_train, y_train, [w0[i][j],w1[i][j]] )\n",
|
||||
"\n",
|
||||
"fig = plt.figure(figsize=(12,6))\n",
|
||||
"plt.subplots_adjust( wspace=0.5 )\n",
|
||||
"#===============\n",
|
||||
"# First subplot\n",
|
||||
"#===============\n",
|
||||
"# set up the axes for the first plot\n",
|
||||
"ax = fig.add_subplot(1, 2, 1, projection='3d')\n",
|
||||
"ax.plot_surface(w1, w0, z, rstride=8, cstride=8, alpha=0.3)\n",
|
||||
"\n",
|
||||
"ax.set_xlabel('w_1')\n",
|
||||
"ax.set_ylabel('w_0')\n",
|
||||
"ax.set_zlabel('cost')\n",
|
||||
"plt.title('3D plot of cost vs w0, w1')\n",
|
||||
"# Customize the view angle \n",
|
||||
"ax.view_init(elev=20., azim=-65)\n",
|
||||
"\n",
|
||||
"#===============\n",
|
||||
"# Second subplot\n",
|
||||
"#===============\n",
|
||||
"# set up the axes for the second plot\n",
|
||||
"ax = fig.add_subplot(1, 2, 2)\n",
|
||||
"CS = ax.contour(w1, w0, z,[0,50,1000,5000,10000,25000,50000])\n",
|
||||
"plt.clabel(CS, inline=1, fmt='%1.0f', fontsize=10)\n",
|
||||
"plt.title('Contour plot of cost vs (w0,w1)')\n",
|
||||
"\n",
|
||||
"ax.set_xlabel('w_1')\n",
|
||||
"ax.set_ylabel('w_0')\n",
|
||||
"\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<details>\n",
|
||||
"<summary>\n",
|
||||
" <font size='3'><b>Expected graph</b></font>\n",
|
||||
"</summary>\n",
|
||||
" <img src=\"./figures/ThreeD_And_ContourLab3.PNG\" alt=\"Contour Plot\">\n",
|
||||
"<\\details>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.8.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
@ -0,0 +1,648 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Optional Lab: Multiple Variable Linear Regression\n",
|
||||
"\n",
|
||||
"In this lab, you will extend the data structures and previously developed routines to support multiple features. Several routines are updated making the lab appear lengthy, but it makes minor adjustments to previous routines making it quick to review.\n",
|
||||
"# Outline\n",
|
||||
"- [ 1.1 Goals](#toc_15456_1.1)\n",
|
||||
"- [ 1.2 Tools](#toc_15456_1.2)\n",
|
||||
"- [ 1.3 Notation](#toc_15456_1.3)\n",
|
||||
"- [2 Problem Statement](#toc_15456_2)\n",
|
||||
"- [ 2.1 Matrix X containing our examples](#toc_15456_2.1)\n",
|
||||
"- [ 2.2 Parameter vector w, b](#toc_15456_2.2)\n",
|
||||
"- [3 Model Prediction With Multiple Variables](#toc_15456_3)\n",
|
||||
"- [ 3.1 Single Prediction element by element](#toc_15456_3.1)\n",
|
||||
"- [ 3.2 Single Prediction, vector](#toc_15456_3.2)\n",
|
||||
"- [4 Compute Cost With Multiple Variables](#toc_15456_4)\n",
|
||||
"- [5 Gradient Descent With Multiple Variables](#toc_15456_5)\n",
|
||||
"- [ 5.1 Compute Gradient with Multiple Variables](#toc_15456_5.1)\n",
|
||||
"- [ 5.2 Gradient Descent With Multiple Variables](#toc_15456_5.2)\n",
|
||||
"- [6 Congratulations](#toc_15456_6)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_1.1\"></a>\n",
|
||||
"## 1.1 Goals\n",
|
||||
"- Extend our regression model routines to support multiple features\n",
|
||||
" - Extend data structures to support multiple features\n",
|
||||
" - Rewrite prediction, cost and gradient routines to support multiple features\n",
|
||||
" - Utilize NumPy `np.dot` to vectorize their implementations for speed and simplicity"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_1.2\"></a>\n",
|
||||
"## 1.2 Tools\n",
|
||||
"In this lab, we will make use of: \n",
|
||||
"- NumPy, a popular library for scientific computing\n",
|
||||
"- Matplotlib, a popular library for plotting data"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import copy, math\n",
|
||||
"import numpy as np\n",
|
||||
"import matplotlib.pyplot as plt\n",
|
||||
"plt.style.use('./deeplearning.mplstyle')\n",
|
||||
"np.set_printoptions(precision=2) # reduced display precision on numpy arrays"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_1.3\"></a>\n",
|
||||
"## 1.3 Notation\n",
|
||||
"Here is a summary of some of the notation you will encounter, updated for multiple features. \n",
|
||||
"\n",
|
||||
"|General <img width=70/> <br /> Notation <img width=70/> | Description<img width=350/>| Python (if applicable) |\n",
|
||||
"|: ------------|: ------------------------------------------------------------||\n",
|
||||
"| $a$ | scalar, non bold ||\n",
|
||||
"| $\\mathbf{a}$ | vector, bold ||\n",
|
||||
"| $\\mathbf{A}$ | matrix, bold capital ||\n",
|
||||
"| **Regression** | | | |\n",
|
||||
"| $\\mathbf{X}$ | training example matrix | `X_train` | \n",
|
||||
"| $\\mathbf{y}$ | training example targets | `y_train` \n",
|
||||
"| $\\mathbf{x}^{(i)}$, $y^{(i)}$ | $i_{th}$Training Example | `X[i]`, `y[i]`|\n",
|
||||
"| m | number of training examples | `m`|\n",
|
||||
"| n | number of features in each example | `n`|\n",
|
||||
"| $\\mathbf{w}$ | parameter: weight, | `w` |\n",
|
||||
"| $b$ | parameter: bias | `b` | \n",
|
||||
"| $f_{\\mathbf{w},b}(\\mathbf{x}^{(i)})$ | The result of the model evaluation at $\\mathbf{x^{(i)}}$ parameterized by $\\mathbf{w},b$: $f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) = \\mathbf{w} \\cdot \\mathbf{x}^{(i)}+b$ | `f_wb` | \n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_2\"></a>\n",
|
||||
"# 2 Problem Statement\n",
|
||||
"\n",
|
||||
"You will use the motivating example of housing price prediction. The training dataset contains three examples with four features (size, bedrooms, floors and, age) shown in the table below. Note that, unlike the earlier labs, size is in sqft rather than 1000 sqft. This causes an issue, which you will solve in the next lab!\n",
|
||||
"\n",
|
||||
"| Size (sqft) | Number of Bedrooms | Number of floors | Age of Home | Price (1000s dollars) | \n",
|
||||
"| ----------------| ------------------- |----------------- |--------------|-------------- | \n",
|
||||
"| 2104 | 5 | 1 | 45 | 460 | \n",
|
||||
"| 1416 | 3 | 2 | 40 | 232 | \n",
|
||||
"| 852 | 2 | 1 | 35 | 178 | \n",
|
||||
"\n",
|
||||
"You will build a linear regression model using these values so you can then predict the price for other houses. For example, a house with 1200 sqft, 3 bedrooms, 1 floor, 40 years old. \n",
|
||||
"\n",
|
||||
"Please run the following code cell to create your `X_train` and `y_train` variables."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"X_train = np.array([[2104, 5, 1, 45], [1416, 3, 2, 40], [852, 2, 1, 35]])\n",
|
||||
"y_train = np.array([460, 232, 178])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_2.1\"></a>\n",
|
||||
"## 2.1 Matrix X containing our examples\n",
|
||||
"Similar to the table above, examples are stored in a NumPy matrix `X_train`. Each row of the matrix represents one example. When you have $m$ training examples ( $m$ is three in our example), and there are $n$ features (four in our example), $\\mathbf{X}$ is a matrix with dimensions ($m$, $n$) (m rows, n columns).\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"$$\\mathbf{X} = \n",
|
||||
"\\begin{pmatrix}\n",
|
||||
" x^{(0)}_0 & x^{(0)}_1 & \\cdots & x^{(0)}_{n-1} \\\\ \n",
|
||||
" x^{(1)}_0 & x^{(1)}_1 & \\cdots & x^{(1)}_{n-1} \\\\\n",
|
||||
" \\cdots \\\\\n",
|
||||
" x^{(m-1)}_0 & x^{(m-1)}_1 & \\cdots & x^{(m-1)}_{n-1} \n",
|
||||
"\\end{pmatrix}\n",
|
||||
"$$\n",
|
||||
"notation:\n",
|
||||
"- $\\mathbf{x}^{(i)}$ is vector containing example i. $\\mathbf{x}^{(i)}$ $ = (x^{(i)}_0, x^{(i)}_1, \\cdots,x^{(i)}_{n-1})$\n",
|
||||
"- $x^{(i)}_j$ is element j in example i. The superscript in parenthesis indicates the example number while the subscript represents an element. \n",
|
||||
"\n",
|
||||
"Display the input data."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# data is stored in numpy array/matrix\n",
|
||||
"print(f\"X Shape: {X_train.shape}, X Type:{type(X_train)})\")\n",
|
||||
"print(X_train)\n",
|
||||
"print(f\"y Shape: {y_train.shape}, y Type:{type(y_train)})\")\n",
|
||||
"print(y_train)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_2.2\"></a>\n",
|
||||
"## 2.2 Parameter vector w, b\n",
|
||||
"\n",
|
||||
"* $\\mathbf{w}$ is a vector with $n$ elements.\n",
|
||||
" - Each element contains the parameter associated with one feature.\n",
|
||||
" - in our dataset, n is 4.\n",
|
||||
" - notionally, we draw this as a column vector\n",
|
||||
"\n",
|
||||
"$$\\mathbf{w} = \\begin{pmatrix}\n",
|
||||
"w_0 \\\\ \n",
|
||||
"w_1 \\\\\n",
|
||||
"\\cdots\\\\\n",
|
||||
"w_{n-1}\n",
|
||||
"\\end{pmatrix}\n",
|
||||
"$$\n",
|
||||
"* $b$ is a scalar parameter. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"For demonstration, $\\mathbf{w}$ and $b$ will be loaded with some initial selected values that are near the optimal. $\\mathbf{w}$ is a 1-D NumPy vector."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"b_init = 785.1811367994083\n",
|
||||
"w_init = np.array([ 0.39133535, 18.75376741, -53.36032453, -26.42131618])\n",
|
||||
"print(f\"w_init shape: {w_init.shape}, b_init type: {type(b_init)}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_3\"></a>\n",
|
||||
"# 3 Model Prediction With Multiple Variables\n",
|
||||
"The model's prediction with multiple variables is given by the linear model:\n",
|
||||
"\n",
|
||||
"$$ f_{\\mathbf{w},b}(\\mathbf{x}) = w_0x_0 + w_1x_1 +... + w_{n-1}x_{n-1} + b \\tag{1}$$\n",
|
||||
"or in vector notation:\n",
|
||||
"$$ f_{\\mathbf{w},b}(\\mathbf{x}) = \\mathbf{w} \\cdot \\mathbf{x} + b \\tag{2} $$ \n",
|
||||
"where $\\cdot$ is a vector `dot product`\n",
|
||||
"\n",
|
||||
"To demonstrate the dot product, we will implement prediction using (1) and (2)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_3.1\"></a>\n",
|
||||
"## 3.1 Single Prediction element by element\n",
|
||||
"Our previous prediction multiplied one feature value by one parameter and added a bias parameter. A direct extension of our previous implementation of prediction to multiple features would be to implement (1) above using loop over each element, performing the multiply with its parameter and then adding the bias parameter at the end.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def predict_single_loop(x, w, b): \n",
|
||||
" \"\"\"\n",
|
||||
" single predict using linear regression\n",
|
||||
" \n",
|
||||
" Args:\n",
|
||||
" x (ndarray): Shape (n,) example with multiple features\n",
|
||||
" w (ndarray): Shape (n,) model parameters \n",
|
||||
" b (scalar): model parameter \n",
|
||||
" \n",
|
||||
" Returns:\n",
|
||||
" p (scalar): prediction\n",
|
||||
" \"\"\"\n",
|
||||
" n = x.shape[0]\n",
|
||||
" p = 0\n",
|
||||
" for i in range(n):\n",
|
||||
" p_i = x[i] * w[i] \n",
|
||||
" p = p + p_i \n",
|
||||
" p = p + b \n",
|
||||
" return p"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# get a row from our training data\n",
|
||||
"x_vec = X_train[0,:]\n",
|
||||
"print(f\"x_vec shape {x_vec.shape}, x_vec value: {x_vec}\")\n",
|
||||
"\n",
|
||||
"# make a prediction\n",
|
||||
"f_wb = predict_single_loop(x_vec, w_init, b_init)\n",
|
||||
"print(f\"f_wb shape {f_wb.shape}, prediction: {f_wb}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Note the shape of `x_vec`. It is a 1-D NumPy vector with 4 elements, (4,). The result, `f_wb` is a scalar."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_3.2\"></a>\n",
|
||||
"## 3.2 Single Prediction, vector\n",
|
||||
"\n",
|
||||
"Noting that equation (1) above can be implemented using the dot product as in (2) above. We can make use of vector operations to speed up predictions.\n",
|
||||
"\n",
|
||||
"Recall from the Python/Numpy lab that NumPy `np.dot()`[[link](https://numpy.org/doc/stable/reference/generated/numpy.dot.html)] can be used to perform a vector dot product. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def predict(x, w, b): \n",
|
||||
" \"\"\"\n",
|
||||
" single predict using linear regression\n",
|
||||
" Args:\n",
|
||||
" x (ndarray): Shape (n,) example with multiple features\n",
|
||||
" w (ndarray): Shape (n,) model parameters \n",
|
||||
" b (scalar): model parameter \n",
|
||||
" \n",
|
||||
" Returns:\n",
|
||||
" p (scalar): prediction\n",
|
||||
" \"\"\"\n",
|
||||
" p = np.dot(x, w) + b \n",
|
||||
" return p "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# get a row from our training data\n",
|
||||
"x_vec = X_train[0,:]\n",
|
||||
"print(f\"x_vec shape {x_vec.shape}, x_vec value: {x_vec}\")\n",
|
||||
"\n",
|
||||
"# make a prediction\n",
|
||||
"f_wb = predict(x_vec,w_init, b_init)\n",
|
||||
"print(f\"f_wb shape {f_wb.shape}, prediction: {f_wb}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The results and shapes are the same as the previous version which used looping. Going forward, `np.dot` will be used for these operations. The prediction is now a single statement. Most routines will implement it directly rather than calling a separate predict routine."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_4\"></a>\n",
|
||||
"# 4 Compute Cost With Multiple Variables\n",
|
||||
"The equation for the cost function with multiple variables $J(\\mathbf{w},b)$ is:\n",
|
||||
"$$J(\\mathbf{w},b) = \\frac{1}{2m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) - y^{(i)})^2 \\tag{3}$$ \n",
|
||||
"where:\n",
|
||||
"$$ f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) = \\mathbf{w} \\cdot \\mathbf{x}^{(i)} + b \\tag{4} $$ \n",
|
||||
"\n",
|
||||
"\n",
|
||||
"In contrast to previous labs, $\\mathbf{w}$ and $\\mathbf{x}^{(i)}$ are vectors rather than scalars supporting multiple features."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Below is an implementation of equations (3) and (4). Note that this uses a *standard pattern for this course* where a for loop over all `m` examples is used."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def compute_cost(X, y, w, b): \n",
|
||||
" \"\"\"\n",
|
||||
" compute cost\n",
|
||||
" Args:\n",
|
||||
" X (ndarray (m,n)): Data, m examples with n features\n",
|
||||
" y (ndarray (m,)) : target values\n",
|
||||
" w (ndarray (n,)) : model parameters \n",
|
||||
" b (scalar) : model parameter\n",
|
||||
" \n",
|
||||
" Returns:\n",
|
||||
" cost (scalar): cost\n",
|
||||
" \"\"\"\n",
|
||||
" m = X.shape[0]\n",
|
||||
" cost = 0.0\n",
|
||||
" for i in range(m): \n",
|
||||
" f_wb_i = np.dot(X[i], w) + b #(n,)(n,) = scalar (see np.dot)\n",
|
||||
" cost = cost + (f_wb_i - y[i])**2 #scalar\n",
|
||||
" cost = cost / (2 * m) #scalar \n",
|
||||
" return cost"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Compute and display cost using our pre-chosen optimal parameters. \n",
|
||||
"cost = compute_cost(X_train, y_train, w_init, b_init)\n",
|
||||
"print(f'Cost at optimal w : {cost}')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Expected Result**: Cost at optimal w : 1.5578904045996674e-12"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_5\"></a>\n",
|
||||
"# 5 Gradient Descent With Multiple Variables\n",
|
||||
"Gradient descent for multiple variables:\n",
|
||||
"\n",
|
||||
"$$\\begin{align*} \\text{repeat}&\\text{ until convergence:} \\; \\lbrace \\newline\\;\n",
|
||||
"& w_j = w_j - \\alpha \\frac{\\partial J(\\mathbf{w},b)}{\\partial w_j} \\tag{5} \\; & \\text{for j = 0..n-1}\\newline\n",
|
||||
"&b\\ \\ = b - \\alpha \\frac{\\partial J(\\mathbf{w},b)}{\\partial b} \\newline \\rbrace\n",
|
||||
"\\end{align*}$$\n",
|
||||
"\n",
|
||||
"where, n is the number of features, parameters $w_j$, $b$, are updated simultaneously and where \n",
|
||||
"\n",
|
||||
"$$\n",
|
||||
"\\begin{align}\n",
|
||||
"\\frac{\\partial J(\\mathbf{w},b)}{\\partial w_j} &= \\frac{1}{m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) - y^{(i)})x_{j}^{(i)} \\tag{6} \\\\\n",
|
||||
"\\frac{\\partial J(\\mathbf{w},b)}{\\partial b} &= \\frac{1}{m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) - y^{(i)}) \\tag{7}\n",
|
||||
"\\end{align}\n",
|
||||
"$$\n",
|
||||
"* m is the number of training examples in the data set\n",
|
||||
"\n",
|
||||
" \n",
|
||||
"* $f_{\\mathbf{w},b}(\\mathbf{x}^{(i)})$ is the model's prediction, while $y^{(i)}$ is the target value\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_5.1\"></a>\n",
|
||||
"## 5.1 Compute Gradient with Multiple Variables\n",
|
||||
"An implementation for calculating the equations (6) and (7) is below. There are many ways to implement this. In this version, there is an\n",
|
||||
"- outer loop over all m examples. \n",
|
||||
" - $\\frac{\\partial J(\\mathbf{w},b)}{\\partial b}$ for the example can be computed directly and accumulated\n",
|
||||
" - in a second loop over all n features:\n",
|
||||
" - $\\frac{\\partial J(\\mathbf{w},b)}{\\partial w_j}$ is computed for each $w_j$.\n",
|
||||
" "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def compute_gradient(X, y, w, b): \n",
|
||||
" \"\"\"\n",
|
||||
" Computes the gradient for linear regression \n",
|
||||
" Args:\n",
|
||||
" X (ndarray (m,n)): Data, m examples with n features\n",
|
||||
" y (ndarray (m,)) : target values\n",
|
||||
" w (ndarray (n,)) : model parameters \n",
|
||||
" b (scalar) : model parameter\n",
|
||||
" \n",
|
||||
" Returns:\n",
|
||||
" dj_dw (ndarray (n,)): The gradient of the cost w.r.t. the parameters w. \n",
|
||||
" dj_db (scalar): The gradient of the cost w.r.t. the parameter b. \n",
|
||||
" \"\"\"\n",
|
||||
" m,n = X.shape #(number of examples, number of features)\n",
|
||||
" dj_dw = np.zeros((n,))\n",
|
||||
" dj_db = 0.\n",
|
||||
"\n",
|
||||
" for i in range(m): \n",
|
||||
" err = (np.dot(X[i], w) + b) - y[i] \n",
|
||||
" for j in range(n): \n",
|
||||
" dj_dw[j] = dj_dw[j] + err * X[i, j] \n",
|
||||
" dj_db = dj_db + err \n",
|
||||
" dj_dw = dj_dw / m \n",
|
||||
" dj_db = dj_db / m \n",
|
||||
" \n",
|
||||
" return dj_db, dj_dw"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#Compute and display gradient \n",
|
||||
"tmp_dj_db, tmp_dj_dw = compute_gradient(X_train, y_train, w_init, b_init)\n",
|
||||
"print(f'dj_db at initial w,b: {tmp_dj_db}')\n",
|
||||
"print(f'dj_dw at initial w,b: \\n {tmp_dj_dw}')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Expected Result**: \n",
|
||||
"dj_db at initial w,b: -1.6739251122999121e-06 \n",
|
||||
"dj_dw at initial w,b: \n",
|
||||
" [-2.73e-03 -6.27e-06 -2.22e-06 -6.92e-05] "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_5.2\"></a>\n",
|
||||
"## 5.2 Gradient Descent With Multiple Variables\n",
|
||||
"The routine below implements equation (5) above."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def gradient_descent(X, y, w_in, b_in, cost_function, gradient_function, alpha, num_iters): \n",
|
||||
" \"\"\"\n",
|
||||
" Performs batch gradient descent to learn w and b. Updates w and b by taking \n",
|
||||
" num_iters gradient steps with learning rate alpha\n",
|
||||
" \n",
|
||||
" Args:\n",
|
||||
" X (ndarray (m,n)) : Data, m examples with n features\n",
|
||||
" y (ndarray (m,)) : target values\n",
|
||||
" w_in (ndarray (n,)) : initial model parameters \n",
|
||||
" b_in (scalar) : initial model parameter\n",
|
||||
" cost_function : function to compute cost\n",
|
||||
" gradient_function : function to compute the gradient\n",
|
||||
" alpha (float) : Learning rate\n",
|
||||
" num_iters (int) : number of iterations to run gradient descent\n",
|
||||
" \n",
|
||||
" Returns:\n",
|
||||
" w (ndarray (n,)) : Updated values of parameters \n",
|
||||
" b (scalar) : Updated value of parameter \n",
|
||||
" \"\"\"\n",
|
||||
" \n",
|
||||
" # An array to store cost J and w's at each iteration primarily for graphing later\n",
|
||||
" J_history = []\n",
|
||||
" w = copy.deepcopy(w_in) #avoid modifying global w within function\n",
|
||||
" b = b_in\n",
|
||||
" \n",
|
||||
" for i in range(num_iters):\n",
|
||||
"\n",
|
||||
" # Calculate the gradient and update the parameters\n",
|
||||
" dj_db,dj_dw = gradient_function(X, y, w, b) ##None\n",
|
||||
"\n",
|
||||
" # Update Parameters using w, b, alpha and gradient\n",
|
||||
" w = w - alpha * dj_dw ##None\n",
|
||||
" b = b - alpha * dj_db ##None\n",
|
||||
" \n",
|
||||
" # Save cost J at each iteration\n",
|
||||
" if i<100000: # prevent resource exhaustion \n",
|
||||
" J_history.append( cost_function(X, y, w, b))\n",
|
||||
"\n",
|
||||
" # Print cost every at intervals 10 times or as many iterations if < 10\n",
|
||||
" if i% math.ceil(num_iters / 10) == 0:\n",
|
||||
" print(f\"Iteration {i:4d}: Cost {J_history[-1]:8.2f} \")\n",
|
||||
" \n",
|
||||
" return w, b, J_history #return final w,b and J history for graphing"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In the next cell you will test the implementation. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# initialize parameters\n",
|
||||
"initial_w = np.zeros_like(w_init)\n",
|
||||
"initial_b = 0.\n",
|
||||
"# some gradient descent settings\n",
|
||||
"iterations = 1000\n",
|
||||
"alpha = 5.0e-7\n",
|
||||
"# run gradient descent \n",
|
||||
"w_final, b_final, J_hist = gradient_descent(X_train, y_train, initial_w, initial_b,\n",
|
||||
" compute_cost, compute_gradient, \n",
|
||||
" alpha, iterations)\n",
|
||||
"print(f\"b,w found by gradient descent: {b_final:0.2f},{w_final} \")\n",
|
||||
"m,_ = X_train.shape\n",
|
||||
"for i in range(m):\n",
|
||||
" print(f\"prediction: {np.dot(X_train[i], w_final) + b_final:0.2f}, target value: {y_train[i]}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Expected Result**: \n",
|
||||
"b,w found by gradient descent: -0.00,[ 0.2 0. -0.01 -0.07] \n",
|
||||
"prediction: 426.19, target value: 460 \n",
|
||||
"prediction: 286.17, target value: 232 \n",
|
||||
"prediction: 171.47, target value: 178 "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# plot cost versus iteration \n",
|
||||
"fig, (ax1, ax2) = plt.subplots(1, 2, constrained_layout=True, figsize=(12, 4))\n",
|
||||
"ax1.plot(J_hist)\n",
|
||||
"ax2.plot(100 + np.arange(len(J_hist[100:])), J_hist[100:])\n",
|
||||
"ax1.set_title(\"Cost vs. iteration\"); ax2.set_title(\"Cost vs. iteration (tail)\")\n",
|
||||
"ax1.set_ylabel('Cost') ; ax2.set_ylabel('Cost') \n",
|
||||
"ax1.set_xlabel('iteration step') ; ax2.set_xlabel('iteration step') \n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"*These results are not inspiring*! Cost is still declining and our predictions are not very accurate. The next lab will explore how to improve on this."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"<a name=\"toc_15456_6\"></a>\n",
|
||||
"# 6 Congratulations!\n",
|
||||
"In this lab you:\n",
|
||||
"- Redeveloped the routines for linear regression, now with multiple variables.\n",
|
||||
"- Utilized NumPy `np.dot` to vectorize the implementations"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"dl_toc_settings": {
|
||||
"rndtag": "15456"
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.6"
|
||||
},
|
||||
"toc-autonumbering": false
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
@ -0,0 +1,666 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Optional Lab: Feature scaling and Learning Rate (Multi-variable)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Goals\n",
|
||||
"In this lab you will:\n",
|
||||
"- Utilize the multiple variables routines developed in the previous lab\n",
|
||||
"- run Gradient Descent on a data set with multiple features\n",
|
||||
"- explore the impact of the *learning rate alpha* on gradient descent\n",
|
||||
"- improve performance of gradient descent by *feature scaling* using z-score normalization"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Tools\n",
|
||||
"You will utilize the functions developed in the last lab as well as matplotlib and NumPy. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import numpy as np\n",
|
||||
"import matplotlib.pyplot as plt\n",
|
||||
"from lab_utils_multi import load_house_data, run_gradient_descent \n",
|
||||
"from lab_utils_multi import norm_plot, plt_equal_scale, plot_cost_i_w\n",
|
||||
"from lab_utils_common import dlc\n",
|
||||
"np.set_printoptions(precision=2)\n",
|
||||
"plt.style.use('./deeplearning.mplstyle')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Notation\n",
|
||||
"\n",
|
||||
"|General <br /> Notation | Description| Python (if applicable) |\n",
|
||||
"|: ------------|: ------------------------------------------------------------||\n",
|
||||
"| $a$ | scalar, non bold ||\n",
|
||||
"| $\\mathbf{a}$ | vector, bold ||\n",
|
||||
"| $\\mathbf{A}$ | matrix, bold capital ||\n",
|
||||
"| **Regression** | | | |\n",
|
||||
"| $\\mathbf{X}$ | training example maxtrix | `X_train` | \n",
|
||||
"| $\\mathbf{y}$ | training example targets | `y_train` \n",
|
||||
"| $\\mathbf{x}^{(i)}$, $y^{(i)}$ | $i_{th}$Training Example | `X[i]`, `y[i]`|\n",
|
||||
"| m | number of training examples | `m`|\n",
|
||||
"| n | number of features in each example | `n`|\n",
|
||||
"| $\\mathbf{w}$ | parameter: weight, | `w` |\n",
|
||||
"| $b$ | parameter: bias | `b` | \n",
|
||||
"| $f_{\\mathbf{w},b}(\\mathbf{x}^{(i)})$ | The result of the model evaluation at $\\mathbf{x}^{(i)}$ parameterized by $\\mathbf{w},b$: $f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) = \\mathbf{w} \\cdot \\mathbf{x}^{(i)}+b$ | `f_wb` | \n",
|
||||
"|$\\frac{\\partial J(\\mathbf{w},b)}{\\partial w_j}$| the gradient or partial derivative of cost with respect to a parameter $w_j$ |`dj_dw[j]`| \n",
|
||||
"|$\\frac{\\partial J(\\mathbf{w},b)}{\\partial b}$| the gradient or partial derivative of cost with respect to a parameter $b$| `dj_db`|"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Problem Statement\n",
|
||||
"\n",
|
||||
"As in the previous labs, you will use the motivating example of housing price prediction. The training data set contains many examples with 4 features (size, bedrooms, floors and age) shown in the table below. Note, in this lab, the Size feature is in sqft while earlier labs utilized 1000 sqft. This data set is larger than the previous lab.\n",
|
||||
"\n",
|
||||
"We would like to build a linear regression model using these values so we can then predict the price for other houses - say, a house with 1200 sqft, 3 bedrooms, 1 floor, 40 years old. \n",
|
||||
"\n",
|
||||
"## Dataset: \n",
|
||||
"| Size (sqft) | Number of Bedrooms | Number of floors | Age of Home | Price (1000s dollars) | \n",
|
||||
"| ----------------| ------------------- |----------------- |--------------|----------------------- | \n",
|
||||
"| 952 | 2 | 1 | 65 | 271.5 | \n",
|
||||
"| 1244 | 3 | 2 | 64 | 232 | \n",
|
||||
"| 1947 | 3 | 2 | 17 | 509.8 | \n",
|
||||
"| ... | ... | ... | ... | ... |\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# load the dataset\n",
|
||||
"X_train, y_train = load_house_data()\n",
|
||||
"X_features = ['size(sqft)','bedrooms','floors','age']"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's view the dataset and its features by plotting each feature versus price."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"fig,ax=plt.subplots(1, 4, figsize=(12, 3), sharey=True)\n",
|
||||
"for i in range(len(ax)):\n",
|
||||
" ax[i].scatter(X_train[:,i],y_train)\n",
|
||||
" ax[i].set_xlabel(X_features[i])\n",
|
||||
"ax[0].set_ylabel(\"Price (1000's)\")\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Plotting each feature vs. the target, price, provides some indication of which features have the strongest influence on price. Above, increasing size also increases price. Bedrooms and floors don't seem to have a strong impact on price. Newer houses have higher prices than older houses."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_5\"></a>\n",
|
||||
"## Gradient Descent With Multiple Variables\n",
|
||||
"Here are the equations you developed in the last lab on gradient descent for multiple variables.:\n",
|
||||
"\n",
|
||||
"$$\\begin{align*} \\text{repeat}&\\text{ until convergence:} \\; \\lbrace \\newline\\;\n",
|
||||
"& w_j := w_j - \\alpha \\frac{\\partial J(\\mathbf{w},b)}{\\partial w_j} \\tag{1} \\; & \\text{for j = 0..n-1}\\newline\n",
|
||||
"&b\\ \\ := b - \\alpha \\frac{\\partial J(\\mathbf{w},b)}{\\partial b} \\newline \\rbrace\n",
|
||||
"\\end{align*}$$\n",
|
||||
"\n",
|
||||
"where, n is the number of features, parameters $w_j$, $b$, are updated simultaneously and where \n",
|
||||
"\n",
|
||||
"$$\n",
|
||||
"\\begin{align}\n",
|
||||
"\\frac{\\partial J(\\mathbf{w},b)}{\\partial w_j} &= \\frac{1}{m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) - y^{(i)})x_{j}^{(i)} \\tag{2} \\\\\n",
|
||||
"\\frac{\\partial J(\\mathbf{w},b)}{\\partial b} &= \\frac{1}{m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) - y^{(i)}) \\tag{3}\n",
|
||||
"\\end{align}\n",
|
||||
"$$\n",
|
||||
"* m is the number of training examples in the data set\n",
|
||||
"\n",
|
||||
" \n",
|
||||
"* $f_{\\mathbf{w},b}(\\mathbf{x}^{(i)})$ is the model's prediction, while $y^{(i)}$ is the target value\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Learning Rate\n",
|
||||
"<figure>\n",
|
||||
" <img src=\"./images/C1_W2_Lab06_learningrate.PNG\" style=\"width:1200px;\" >\n",
|
||||
"</figure>\n",
|
||||
"The lectures discussed some of the issues related to setting the learning rate $\\alpha$. The learning rate controls the size of the update to the parameters. See equation (1) above. It is shared by all the parameters. \n",
|
||||
"\n",
|
||||
"Let's run gradient descent and try a few settings of $\\alpha$ on our data set"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### $\\alpha$ = 9.9e-7"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#set alpha to 9.9e-7\n",
|
||||
"_, _, hist = run_gradient_descent(X_train, y_train, 10, alpha = 9.9e-7)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"It appears the learning rate is too high. The solution does not converge. Cost is *increasing* rather than decreasing. Let's plot the result:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"plot_cost_i_w(X_train, y_train, hist)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The plot on the right shows the value of one of the parameters, $w_0$. At each iteration, it is overshooting the optimal value and as a result, cost ends up *increasing* rather than approaching the minimum. Note that this is not a completely accurate picture as there are 4 parameters being modified each pass rather than just one. This plot is only showing $w_0$ with the other parameters fixed at benign values. In this and later plots you may notice the blue and orange lines being slightly off."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"### $\\alpha$ = 9e-7\n",
|
||||
"Let's try a bit smaller value and see what happens."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#set alpha to 9e-7\n",
|
||||
"_,_,hist = run_gradient_descent(X_train, y_train, 10, alpha = 9e-7)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Cost is decreasing throughout the run showing that alpha is not too large. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"plot_cost_i_w(X_train, y_train, hist)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"On the left, you see that cost is decreasing as it should. On the right, you can see that $w_0$ is still oscillating around the minimum, but it is decreasing each iteration rather than increasing. Note above that `dj_dw[0]` changes sign with each iteration as `w[0]` jumps over the optimal value.\n",
|
||||
"This alpha value will converge. You can vary the number of iterations to see how it behaves."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### $\\alpha$ = 1e-7\n",
|
||||
"Let's try a bit smaller value for $\\alpha$ and see what happens."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#set alpha to 1e-7\n",
|
||||
"_,_,hist = run_gradient_descent(X_train, y_train, 10, alpha = 1e-7)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Cost is decreasing throughout the run showing that $\\alpha$ is not too large. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"plot_cost_i_w(X_train,y_train,hist)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"On the left, you see that cost is decreasing as it should. On the right you can see that $w_0$ is decreasing without crossing the minimum. Note above that `dj_w0` is negative throughout the run. This solution will also converge, though not quite as quickly as the previous example."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"source": [
|
||||
"## Feature Scaling \n",
|
||||
"<figure>\n",
|
||||
" <img src=\"./images/C1_W2_Lab06_featurescalingheader.PNG\" style=\"width:1200px;\" >\n",
|
||||
"</figure>\n",
|
||||
"The lectures described the importance of rescaling the dataset so the features have a similar range.\n",
|
||||
"If you are interested in the details of why this is the case, click on the 'details' header below. If not, the section below will walk through an implementation of how to do feature scaling."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<details>\n",
|
||||
"<summary>\n",
|
||||
" <font size='3', color='darkgreen'><b>Details</b></font>\n",
|
||||
"</summary>\n",
|
||||
"\n",
|
||||
"Let's look again at the situation with $\\alpha$ = 9e-7. This is pretty close to the maximum value we can set $\\alpha$ to without diverging. This is a short run showing the first few iterations:\n",
|
||||
"\n",
|
||||
"<figure>\n",
|
||||
" <img src=\"./images/C1_W2_Lab06_ShortRun.PNG\" style=\"width:1200px;\" >\n",
|
||||
"</figure>\n",
|
||||
"\n",
|
||||
"Above, while cost is being decreased, its clear that $w_0$ is making more rapid progress than the other parameters due to its much larger gradient.\n",
|
||||
"\n",
|
||||
"The graphic below shows the result of a very long run with $\\alpha$ = 9e-7. This takes several hours.\n",
|
||||
"\n",
|
||||
"<figure>\n",
|
||||
" <img src=\"./images/C1_W2_Lab06_LongRun.PNG\" style=\"width:1200px;\" >\n",
|
||||
"</figure>\n",
|
||||
" \n",
|
||||
"Above, you can see cost decreased slowly after its initial reduction. Notice the difference between `w0` and `w1`,`w2`,`w3` as well as `dj_dw0` and `dj_dw1-3`. `w0` reaches its near final value very quickly and `dj_dw0` has quickly decreased to a small value showing that `w0` is near the final value. The other parameters were reduced much more slowly.\n",
|
||||
"\n",
|
||||
"Why is this? Is there something we can improve? See below:\n",
|
||||
"<figure>\n",
|
||||
" <center> <img src=\"./images/C1_W2_Lab06_scale.PNG\" ></center>\n",
|
||||
"</figure> \n",
|
||||
"\n",
|
||||
"The figure above shows why $w$'s are updated unevenly. \n",
|
||||
"- $\\alpha$ is shared by all parameter updates ($w$'s and $b$).\n",
|
||||
"- the common error term is multiplied by the features for the $w$'s. (not $b$).\n",
|
||||
"- the features vary significantly in magnitude making some features update much faster than others. In this case, $w_0$ is multiplied by 'size(sqft)', which is generally > 1000, while $w_1$ is multiplied by 'number of bedrooms', which is generally 2-4. \n",
|
||||
" \n",
|
||||
"The solution is Feature Scaling."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The lectures discussed three different techniques: \n",
|
||||
"- Feature scaling, essentially dividing each positive feature by its maximum value, or more generally, rescale each feature by both its minimum and maximum values using (x-min)/(max-min). Both ways normalizes features to the range of -1 and 1, where the former method works for positive features which is simple and serves well for the lecture's example, and the latter method works for any features.\n",
|
||||
"- Mean normalization: $x_i := \\dfrac{x_i - \\mu_i}{max - min} $ \n",
|
||||
"- Z-score normalization which we will explore below. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"### z-score normalization \n",
|
||||
"After z-score normalization, all features will have a mean of 0 and a standard deviation of 1.\n",
|
||||
"\n",
|
||||
"To implement z-score normalization, adjust your input values as shown in this formula:\n",
|
||||
"$$x^{(i)}_j = \\dfrac{x^{(i)}_j - \\mu_j}{\\sigma_j} \\tag{4}$$ \n",
|
||||
"where $j$ selects a feature or a column in the $\\mathbf{X}$ matrix. $µ_j$ is the mean of all the values for feature (j) and $\\sigma_j$ is the standard deviation of feature (j).\n",
|
||||
"$$\n",
|
||||
"\\begin{align}\n",
|
||||
"\\mu_j &= \\frac{1}{m} \\sum_{i=0}^{m-1} x^{(i)}_j \\tag{5}\\\\\n",
|
||||
"\\sigma^2_j &= \\frac{1}{m} \\sum_{i=0}^{m-1} (x^{(i)}_j - \\mu_j)^2 \\tag{6}\n",
|
||||
"\\end{align}\n",
|
||||
"$$\n",
|
||||
"\n",
|
||||
">**Implementation Note:** When normalizing the features, it is important\n",
|
||||
"to store the values used for normalization - the mean value and the standard deviation used for the computations. After learning the parameters\n",
|
||||
"from the model, we often want to predict the prices of houses we have not\n",
|
||||
"seen before. Given a new x value (living room area and number of bed-\n",
|
||||
"rooms), we must first normalize x using the mean and standard deviation\n",
|
||||
"that we had previously computed from the training set.\n",
|
||||
"\n",
|
||||
"**Implementation**"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def zscore_normalize_features(X):\n",
|
||||
" \"\"\"\n",
|
||||
" computes X, zcore normalized by column\n",
|
||||
" \n",
|
||||
" Args:\n",
|
||||
" X (ndarray (m,n)) : input data, m examples, n features\n",
|
||||
" \n",
|
||||
" Returns:\n",
|
||||
" X_norm (ndarray (m,n)): input normalized by column\n",
|
||||
" mu (ndarray (n,)) : mean of each feature\n",
|
||||
" sigma (ndarray (n,)) : standard deviation of each feature\n",
|
||||
" \"\"\"\n",
|
||||
" # find the mean of each column/feature\n",
|
||||
" mu = np.mean(X, axis=0) # mu will have shape (n,)\n",
|
||||
" # find the standard deviation of each column/feature\n",
|
||||
" sigma = np.std(X, axis=0) # sigma will have shape (n,)\n",
|
||||
" # element-wise, subtract mu for that column from each example, divide by std for that column\n",
|
||||
" X_norm = (X - mu) / sigma \n",
|
||||
"\n",
|
||||
" return (X_norm, mu, sigma)\n",
|
||||
" \n",
|
||||
"#check our work\n",
|
||||
"#from sklearn.preprocessing import scale\n",
|
||||
"#scale(X_orig, axis=0, with_mean=True, with_std=True, copy=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's look at the steps involved in Z-score normalization. The plot below shows the transformation step by step."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"mu = np.mean(X_train,axis=0) \n",
|
||||
"sigma = np.std(X_train,axis=0) \n",
|
||||
"X_mean = (X_train - mu)\n",
|
||||
"X_norm = (X_train - mu)/sigma \n",
|
||||
"\n",
|
||||
"fig,ax=plt.subplots(1, 3, figsize=(12, 3))\n",
|
||||
"ax[0].scatter(X_train[:,0], X_train[:,3])\n",
|
||||
"ax[0].set_xlabel(X_features[0]); ax[0].set_ylabel(X_features[3]);\n",
|
||||
"ax[0].set_title(\"unnormalized\")\n",
|
||||
"ax[0].axis('equal')\n",
|
||||
"\n",
|
||||
"ax[1].scatter(X_mean[:,0], X_mean[:,3])\n",
|
||||
"ax[1].set_xlabel(X_features[0]); ax[0].set_ylabel(X_features[3]);\n",
|
||||
"ax[1].set_title(r\"X - $\\mu$\")\n",
|
||||
"ax[1].axis('equal')\n",
|
||||
"\n",
|
||||
"ax[2].scatter(X_norm[:,0], X_norm[:,3])\n",
|
||||
"ax[2].set_xlabel(X_features[0]); ax[0].set_ylabel(X_features[3]);\n",
|
||||
"ax[2].set_title(r\"Z-score normalized\")\n",
|
||||
"ax[2].axis('equal')\n",
|
||||
"plt.tight_layout(rect=[0, 0.03, 1, 0.95])\n",
|
||||
"fig.suptitle(\"distribution of features before, during, after normalization\")\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The plot above shows the relationship between two of the training set parameters, \"age\" and \"size(sqft)\". *These are plotted with equal scale*. \n",
|
||||
"- Left: Unnormalized: The range of values or the variance of the 'size(sqft)' feature is much larger than that of age\n",
|
||||
"- Middle: The first step removes the mean or average value from each feature. This leaves features that are centered around zero. It's difficult to see the difference for the 'age' feature, but 'size(sqft)' is clearly around zero.\n",
|
||||
"- Right: The second step divides by the standard deviation. This leaves both features centered at zero with a similar scale."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's normalize the data and compare it to the original data."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# normalize the original features\n",
|
||||
"X_norm, X_mu, X_sigma = zscore_normalize_features(X_train)\n",
|
||||
"print(f\"X_mu = {X_mu}, \\nX_sigma = {X_sigma}\")\n",
|
||||
"print(f\"Peak to Peak range by column in Raw X:{np.ptp(X_train,axis=0)}\") \n",
|
||||
"print(f\"Peak to Peak range by column in Normalized X:{np.ptp(X_norm,axis=0)}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The peak to peak range of each column is reduced from a factor of thousands to a factor of 2-3 by normalization."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"fig,ax=plt.subplots(1, 4, figsize=(12, 3))\n",
|
||||
"for i in range(len(ax)):\n",
|
||||
" norm_plot(ax[i],X_train[:,i],)\n",
|
||||
" ax[i].set_xlabel(X_features[i])\n",
|
||||
"ax[0].set_ylabel(\"count\");\n",
|
||||
"fig.suptitle(\"distribution of features before normalization\")\n",
|
||||
"plt.show()\n",
|
||||
"fig,ax=plt.subplots(1,4,figsize=(12,3))\n",
|
||||
"for i in range(len(ax)):\n",
|
||||
" norm_plot(ax[i],X_norm[:,i],)\n",
|
||||
" ax[i].set_xlabel(X_features[i])\n",
|
||||
"ax[0].set_ylabel(\"count\"); \n",
|
||||
"fig.suptitle(\"distribution of features after normalization\")\n",
|
||||
"\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Notice, above, the range of the normalized data (x-axis) is centered around zero and roughly +/- 2. Most importantly, the range is similar for each feature."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's re-run our gradient descent algorithm with normalized data.\n",
|
||||
"Note the **vastly larger value of alpha**. This will speed up gradient descent."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"w_norm, b_norm, hist = run_gradient_descent(X_norm, y_train, 1000, 1.0e-1, )"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The scaled features get very accurate results **much, much faster!**. Notice the gradient of each parameter is tiny by the end of this fairly short run. A learning rate of 0.1 is a good start for regression with normalized features.\n",
|
||||
"Let's plot our predictions versus the target values. Note, the prediction is made using the normalized feature while the plot is shown using the original feature values."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#predict target using normalized features\n",
|
||||
"m = X_norm.shape[0]\n",
|
||||
"yp = np.zeros(m)\n",
|
||||
"for i in range(m):\n",
|
||||
" yp[i] = np.dot(X_norm[i], w_norm) + b_norm\n",
|
||||
"\n",
|
||||
" # plot predictions and targets versus original features \n",
|
||||
"fig,ax=plt.subplots(1,4,figsize=(12, 3),sharey=True)\n",
|
||||
"for i in range(len(ax)):\n",
|
||||
" ax[i].scatter(X_train[:,i],y_train, label = 'target')\n",
|
||||
" ax[i].set_xlabel(X_features[i])\n",
|
||||
" ax[i].scatter(X_train[:,i],yp,color=dlc[\"dlorange\"], label = 'predict')\n",
|
||||
"ax[0].set_ylabel(\"Price\"); ax[0].legend();\n",
|
||||
"fig.suptitle(\"target versus prediction using z-score normalized model\")\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The results look good. A few points to note:\n",
|
||||
"- with multiple features, we can no longer have a single plot showing results versus features.\n",
|
||||
"- when generating the plot, the normalized features were used. Any predictions using the parameters learned from a normalized training set must also be normalized."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Prediction**\n",
|
||||
"The point of generating our model is to use it to predict housing prices that are not in the data set. Let's predict the price of a house with 1200 sqft, 3 bedrooms, 1 floor, 40 years old. Recall, that you must normalize the data with the mean and standard deviation derived when the training data was normalized. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# First, normalize out example.\n",
|
||||
"x_house = np.array([1200, 3, 1, 40])\n",
|
||||
"x_house_norm = (x_house - X_mu) / X_sigma\n",
|
||||
"print(x_house_norm)\n",
|
||||
"x_house_predict = np.dot(x_house_norm, w_norm) + b_norm\n",
|
||||
"print(f\" predicted price of a house with 1200 sqft, 3 bedrooms, 1 floor, 40 years old = ${x_house_predict*1000:0.0f}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Cost Contours** \n",
|
||||
"<img align=\"left\" src=\"./images/C1_W2_Lab06_contours.PNG\" style=\"width:240px;\" >Another way to view feature scaling is in terms of the cost contours. When feature scales do not match, the plot of cost versus parameters in a contour plot is asymmetric. \n",
|
||||
"\n",
|
||||
"In the plot below, the scale of the parameters is matched. The left plot is the cost contour plot of w[0], the square feet versus w[1], the number of bedrooms before normalizing the features. The plot is so asymmetric, the curves completing the contours are not visible. In contrast, when the features are normalized, the cost contour is much more symmetric. The result is that updates to parameters during gradient descent can make equal progress for each parameter. \n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"plt_equal_scale(X_train, X_norm, y_train)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"## Congratulations!\n",
|
||||
"In this lab you:\n",
|
||||
"- utilized the routines for linear regression with multiple features you developed in previous labs\n",
|
||||
"- explored the impact of the learning rate $\\alpha$ on convergence \n",
|
||||
"- discovered the value of feature scaling using z-score normalization in speeding convergence"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Acknowledgments\n",
|
||||
"The housing data was derived from the [Ames Housing dataset](http://jse.amstat.org/v19n3/decock.pdf) compiled by Dean De Cock for use in data science education."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.6"
|
||||
},
|
||||
"toc-autonumbering": false
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
@ -0,0 +1,772 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Ungraded Lab - Gradient Descent for Linear Regression\n",
|
||||
"\n",
|
||||
"In the previous labs, we determined the optimal values of $w_0$ and $w_1$ manually. In this lab we will automate this process with gradient descent with one variable as described in lecture."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Outline\n",
|
||||
"\n",
|
||||
"- [Exercise 01- Compute Gradient](#ex01)\n",
|
||||
"- [Exercise 02- Checking the Gradient](#ex02)\n",
|
||||
"- [Exercise 03- Learning Parameters with Batch Gradient Descent](#ex-03)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import matplotlib.pyplot as plt\n",
|
||||
"import math \n",
|
||||
"import copy"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Problem Statement\n",
|
||||
"\n",
|
||||
"Let's use the same two data points as before - a house with 1000 square feet sold for \\\\$200,000 and a house with 2000 square feet sold for \\\\$400,000.\n",
|
||||
"\n",
|
||||
"That is our dataset contains has the following two points - \n",
|
||||
"\n",
|
||||
"| Size (feet$^2$) | Price (1000s of dollars) |\n",
|
||||
"| -------------------| ------------------------ |\n",
|
||||
"| 1000 | 200 |\n",
|
||||
"| 2000 | 400 |\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Load our data set\n",
|
||||
"X_train = [1000, 2000] #feature \n",
|
||||
"y_train = [200, 400] #actual value"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"## routine to plot the data points\n",
|
||||
"def plt_house(X, y,f_w=None):\n",
|
||||
" plt.scatter(X, y, marker='x', c='r', label=\"Actual Value\")\n",
|
||||
"\n",
|
||||
" # Set the title\n",
|
||||
" plt.title(\"Housing Prices\")\n",
|
||||
" # Set the y-axis label\n",
|
||||
" plt.ylabel('Price (in 1000s of dollars)')\n",
|
||||
" # Set the x-axis label\n",
|
||||
" plt.xlabel('Size (feet^2)')\n",
|
||||
" # print predictions\n",
|
||||
" if f_w != None:\n",
|
||||
" plt.plot(X, f_w, c='b', label=\"Our Prediction\")\n",
|
||||
" plt.legend()\n",
|
||||
" plt.show()\n",
|
||||
" \n",
|
||||
"plt_house(X_train,y_train)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Compute_Cost\n",
|
||||
"You produced this in the last lab, so this is supplied here for later use"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#Function to calculate the cost\n",
|
||||
"def compute_cost(X, y, w):\n",
|
||||
" \n",
|
||||
" m = len(X)\n",
|
||||
" cost = 0\n",
|
||||
" \n",
|
||||
" for i in range(m):\n",
|
||||
" \n",
|
||||
" # Calculate the model prediction\n",
|
||||
" f_w = w[0] + w[1]*X[i]\n",
|
||||
" \n",
|
||||
" # Calculate the cost\n",
|
||||
" cost = cost + (f_w - y[i])**2\n",
|
||||
"\n",
|
||||
" total_cost = 1/(2*m) * cost\n",
|
||||
"\n",
|
||||
" return total_cost"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Gradient descent summary\n",
|
||||
"So far in this course we have developed a linear model that predicts $f_{\\mathbf{w}}(x)$ based a single input $x$ using trained parameters $w_0$,$w_1$.\n",
|
||||
"$$f_\\mathbf{w}(x)= w_0 + w_1x \\tag{1}$$\n",
|
||||
"In machine learning, we utilize input data to train the parameters $w_0$,$w_1$ by minimizing a measure of the error between our predictions $f_{\\mathbf{w}}(x)$ and the actual data $y$. The measure is called the $cost$, $J(\\mathbf{w})$. In training we measure the cost over all of our training samples $x^{(i)},y^{(i)}$\n",
|
||||
"$$J(\\mathbf{w}) = \\frac{1}{2m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w}}(x^{(i)}) - y^{(i)})^2\\tag{2}$$ \n",
|
||||
"From calculus we know the partial derivitive of the cost relative to one of the parameters tells us how a small change in that parameter $w_j$, or $\\Delta{w_j}$, causes a small change in $J(\\mathbf{w})$, or $\\Delta(J(w)$.\n",
|
||||
"\n",
|
||||
"$$ \\frac{\\partial J(w)}{\\partial w_j} \\approx \\frac{\\Delta{J(w)}}{\\Delta{w_j}}$$\n",
|
||||
"Using that information, we can iteratively make small adjustments to $w_j$ that reduce the value of $J(\\mathbf{w})$. This iterative process is called gradient descent. \n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"In lecture, *gradient descent* was described as:\n",
|
||||
"\n",
|
||||
"$$\\begin{align*}& \\text{repeat until convergence:} \\; \\lbrace \\newline \\; & w_j := w_j - \\alpha \\frac{\\partial J(\\mathbf{w})}{\\partial w_j} \\tag{3} \\; & \\text{for j := 0,1}\\newline & \\rbrace\\end{align*}$$\n",
|
||||
"where, parameters $w_0$, $w_1$ are updated simultaneously. \n",
|
||||
"As in lecture:\n",
|
||||
"$$\n",
|
||||
"\\begin{align}\n",
|
||||
" \\frac{\\partial J(\\mathbf{w})}{\\partial w_0} &:= \\frac{1}{m} \\sum\\limits_{i = 0}^{m-1} (f_{w}(x^{(i)}) - y^{(i)} \\tag{4}\\\\\n",
|
||||
" \\frac{\\partial J(\\mathbf{w})}{\\partial w_1} &:= \\frac{1}{m} \\sum\\limits_{i = 0}^{m-1} (f_{w}(x^{(i)}) - y^{(i)})x^{(i)} \\tag{5}\\\\\n",
|
||||
"\\end{align}\n",
|
||||
"$$"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name='ex01'></a>\n",
|
||||
"## Exercise 01- Compute Gradient\n",
|
||||
"We will implement a batch gradient descent algorithm for one variable. We'll need three functions. \n",
|
||||
"- compute_gradient implementing equation (4) and (5) above\n",
|
||||
"- compute_cost implementing equation (2) above (code from previous lab)\n",
|
||||
"- gradient_descent, utilizing compute_gradient and compute_cost, runs the iterative algorithm to find the parameters with the lowest cost."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## compute_gradient\n",
|
||||
"<a name='ex-01'></a>\n",
|
||||
"Implement `compute_gradient` which will return $\\frac{\\partial J(\\mathbf{w})}{\\partial w}$. A naming convention we will use in code when referring to gradients is to infer the dJ(w) and name variables for the parameter. For example, $\\frac{\\partial J(\\mathbf{w})}{\\partial w_0}$ will be `dw0`.\n",
|
||||
"\n",
|
||||
"Please complete the `compute_gradient` function to:\n",
|
||||
"\n",
|
||||
"- Create a list to store the gradient `dw`. \n",
|
||||
"- Loop over all examples in the training set `m`. \n",
|
||||
" - Inside the loop, calculate the gradient update from each training example:\n",
|
||||
" - Calculate the model prediction `f`\n",
|
||||
" $$\n",
|
||||
" f_\\mathbf{w}(x^{(i)}) = w_0+ w_1x^{(i)} \n",
|
||||
" $$\n",
|
||||
" - Calculate the gradient for $w_0$ and $w_1$\n",
|
||||
" $$\n",
|
||||
"\\begin{align}\n",
|
||||
"\\frac{\\partial{J(w)}}{\\partial{w_0}} &= f_\\mathbf{w}(x^{(i)}) - y^{(i)} \\\\ \n",
|
||||
"\\frac{\\partial{J(w)}}{\\partial{w_1}} &= (f_\\mathbf{w}(x^{(i)}) - y^{(i)})x^{(i)} \\\\\n",
|
||||
"\\end{align} \n",
|
||||
"$$\n",
|
||||
" - Add these gradients to the total gradients `dw`\n",
|
||||
" \n",
|
||||
" - Compute total gradient by dividing by the number of examples `m`.\n",
|
||||
"**Note** that this assignment continues to use python lists rather than the NumPy data structures that will be described in upcoming lectures. This will require writing some expressions 'per element' where later, these could be a single operation. Also note that these routines are specifically for one variable. Later labs and the weekly assignment will use more general cases."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<details>\n",
|
||||
"<summary>\n",
|
||||
" <font size='3', color='darkgreen'><b>Hints</b></font>\n",
|
||||
"</summary>\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"def compute_gradient(X, y, w): \n",
|
||||
" \"\"\"\n",
|
||||
" Computes the gradient for linear regression \n",
|
||||
" \n",
|
||||
" Args:\n",
|
||||
" X : (array_like Shape (m,)) variable such as house size \n",
|
||||
" y : (array_like Shape (m,)) actual value \n",
|
||||
" w : (array_like Shape (2,)) Initial values of parameters of the model \n",
|
||||
" Returns\n",
|
||||
" dw: (array_like Shape (2,)) The gradient of the cost w.r.t. the parameters w. \n",
|
||||
" Note that dw has the same dimensions as w.\n",
|
||||
" \"\"\"\n",
|
||||
" m = len(X)\n",
|
||||
" \n",
|
||||
" dw = [0,0]\n",
|
||||
" for i in range(m): \n",
|
||||
" f = w[0] + w[1]*X[i]\n",
|
||||
" dw0 = f-y[i]\n",
|
||||
" dw1 = (f-y[i])*X[i] \n",
|
||||
" dw[0] = dw[0] + dw0\n",
|
||||
" dw[1] = dw[1] + dw1\n",
|
||||
" dw[0] = (1/m) * dw[0]\n",
|
||||
" dw[1] = (1/m) * dw[1] \n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def compute_gradient(X, y, w): \n",
|
||||
" \"\"\"\n",
|
||||
" Computes the gradient for linear regression \n",
|
||||
" \n",
|
||||
" Args:\n",
|
||||
" X : (array_like Shape (m,)) variable such as house size \n",
|
||||
" y : (array_like Shape (m,)) actual value \n",
|
||||
" w : (array_like Shape (2,)) Initial values of parameters of the model \n",
|
||||
" Returns\n",
|
||||
" dw: (array_like Shape (2,)) The gradient of the cost w.r.t. the parameters w. \n",
|
||||
" Note that dw has the same dimensions as w.\n",
|
||||
" \"\"\"\n",
|
||||
" m = len(X)\n",
|
||||
" dw = [0,0] \n",
|
||||
" ### START CODE HERE ### \n",
|
||||
"\n",
|
||||
" ### END CODE HERE ### \n",
|
||||
" \n",
|
||||
" return dw"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#Compute and display gradient with w initialized to zeroes\n",
|
||||
"initial_w = [0,0]\n",
|
||||
"grad = compute_gradient(X_train, y_train, initial_w)\n",
|
||||
"print('Gradient at initial w (zeros):', grad)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Expected Output**:\n",
|
||||
"```Gradient at initial w (zeros): [-300.0, -500000.0]```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#Now, lets try setting w to a value we know, from previous labs, is the optimal value\n",
|
||||
"initial_w = [0,0.2]\n",
|
||||
"grad = compute_gradient(X_train, y_train, initial_w)\n",
|
||||
"print('Gradient when w is set to optimal values:', grad)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Expected Output**:\n",
|
||||
"```Gradient when w is set to optimal values: [0.0, 0.0]``` \n",
|
||||
"As we expected, the gradient is zero at the \"bottom of the bowl\"."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# one more test case to ensure we are using all the w values.\n",
|
||||
"initial_w = [0.1,0.1]\n",
|
||||
"grad = compute_gradient(X_train, y_train, initial_w)\n",
|
||||
"print('Gradient:', grad)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Expected Output**:\n",
|
||||
"```Gradient: [-149.9, -249850.0]``` "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Checking the gradient\n",
|
||||
"What do these gradient values mean? \n",
|
||||
"If you have taken calculus, you may recall an early lecture describing a derivative as:\n",
|
||||
"$$\\frac{df(x)}{dx} = \\lim_{\\Delta{x} \\to 0} \\frac{f(x+\\Delta{x}) - f(x)}{\\Delta{x}}$$\n",
|
||||
"The derivative then is just a measure of how a small change in x, the $\\Delta{x}$ above, changes $f(x)$.\n",
|
||||
"\n",
|
||||
"Above, we calculated `dw1` or $\\frac{\\partial J(\\mathbf{w})}{\\partial w_1}$ to be -249850.0. That says that when $\\mathbf{w} = [0.1,0.1]$, a small change in $w_1$ will result in a change in the **cost**, $J(\\mathbf{w})$, that is -249850.0 times that change. Note the change in notation from $d$ to $\\partial{}$ just indicates the J has multiple dependencies and that this is a derivative with respect to one of them - a partial derivative.\n",
|
||||
"\n",
|
||||
"We can use this knowledge to perform a simple check of our implementation of the gradient."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name='ex02'></a>\n",
|
||||
"## Exercise 2 \n",
|
||||
"Let's check our gradient descent algorithm by \n",
|
||||
"calculating an approximation to the partial derivative with respect to $w_1$. We can't make $\\Delta{x}$ go to zero as in the equation above, but we can just use a small value: \n",
|
||||
"$$ \\frac{\\partial J(\\mathbf{w})}{\\partial w_1} \\approx\\frac{Cost(w_0,w_1+\\Delta)-Cost(w_0,w_1)}{\\Delta{w_1}}$$\n",
|
||||
"Of course, the same method can be applied to any of the parameters."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<details>\n",
|
||||
"<summary>\n",
|
||||
" <font size='3', color='darkgreen'><b>Hints</b></font>\n",
|
||||
"</summary>\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"# calculate a derivative and compare with our implementaion.\n",
|
||||
"delta = 0.00001\n",
|
||||
"w_check = [0.1,0.1]\n",
|
||||
"\n",
|
||||
"# compute the gradient using our derivation and implementation\n",
|
||||
"grad = compute_gradient(X_train, y_train, initial_w)\n",
|
||||
"\n",
|
||||
"# compute point 1\n",
|
||||
"c1 = compute_cost(X_train,y_train,w_check)\n",
|
||||
"\n",
|
||||
"#increment parameter w_check[1] by delta, leave w_check[0] the same\n",
|
||||
"w_check[0] = w_check[0] # leave the same\n",
|
||||
"w_check[1] = w_check[1] + delta\n",
|
||||
"\n",
|
||||
"#compute point 2\n",
|
||||
"c2 = compute_cost(X_train,y_train,w_check)\n",
|
||||
"calculated_dw1 = (c2 - c1)/delta\n",
|
||||
"print(f\"calculated_dw1 {calculated_dw1:0.1f}, expected dw1 {grad[1]}\" )#increment parameter w_check[1] by delta, leave w_check[0] the same \n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# calculate a derivative and compare with our implementaion.\n",
|
||||
"delta = 0.00001\n",
|
||||
"w_check = [0.1,0.1]\n",
|
||||
"\n",
|
||||
"# compute the gradient using our derivation and implementation\n",
|
||||
"### START CODE HERE ### \n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# compute point 1\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"#increment parameter w_check[1] by delta, leave w_check[0] the same\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"#compute point 2\n",
|
||||
"\n",
|
||||
"### END CODE HERE ### \n",
|
||||
"\n",
|
||||
"print(f\"calculated_dw1 {calculated_dw1:0.1f}, expected dw1 {grad[1]}\" )"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Expected Output**: \n",
|
||||
"```calculated_dw1 -249837.5, expected dw1 -249850.0``` \n",
|
||||
"Not *exactly* the same, but close. The real derivative would take delta to zero. Try changing the value of delta."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name='ex-03'></a>\n",
|
||||
"## Exercise 3 Learning parameters using batch gradient descent \n",
|
||||
"\n",
|
||||
"You will now find the optimal parameters of a linear regression model by using batch gradient descent. Recall batch refers to running all the examples in one batch. \n",
|
||||
"- You don't need to implement anything for this part. Simply run the cells below. \n",
|
||||
"- A good way to verify that gradient descent is working correctly is to look\n",
|
||||
"at the value of $J(\\mathbf{w})$ and check that it is decreasing with each step. \n",
|
||||
"- Assuming you have implemented the gradient and computed the cost correctly, your value of $J(\\mathbf{w})$ should never increase and should converge to a steady value by the end of the algorithm."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def gradient_descent(X, y, w_in, cost_function, gradient_function, alpha, num_iters): \n",
|
||||
" \"\"\"\n",
|
||||
" Performs batch gradient descent to learn theta. Updates theta by taking \n",
|
||||
" num_iters gradient steps with learning rate alpha\n",
|
||||
" \n",
|
||||
" Args:\n",
|
||||
" X : (array_like Shape (m,)\n",
|
||||
" y : (array_like Shape (m,) )\n",
|
||||
" w_in : (array_like Shape (2,)) Initial values of parameters of the model\n",
|
||||
" alpha : (float) Learning rate\n",
|
||||
" num_iters : (int) number of iterations to run gradient descent\n",
|
||||
" Returns\n",
|
||||
" w : (array_like Shape (2,)) Updated values of parameters of the model after\n",
|
||||
" running gradient descent\n",
|
||||
" \"\"\"\n",
|
||||
" \n",
|
||||
" # number of training examples\n",
|
||||
" m = len(X)\n",
|
||||
" w = copy.deepcopy(w_in) # avoid modifying global w\n",
|
||||
" # An array to store cost J and w's at each iteration primarily for graphing later\n",
|
||||
" J_history = []\n",
|
||||
" w_history = []\n",
|
||||
" \n",
|
||||
" for i in range(num_iters):\n",
|
||||
" \n",
|
||||
" # Calculate the gradient and update the parameters\n",
|
||||
" gradient = gradient_function(X, y, w)\n",
|
||||
"\n",
|
||||
" # Update Parameters \n",
|
||||
" w[0] = w[0] - alpha * gradient[0]\n",
|
||||
" w[1] = w[1] - alpha * gradient[1]\n",
|
||||
"\n",
|
||||
" # Save cost J at each iteration\n",
|
||||
" if i<100000: # prevent resource exhaustion \n",
|
||||
" J_history.append( compute_cost(X, y, w))\n",
|
||||
" \n",
|
||||
" # Print cost every at intervals 10 times or as many iterations if < 10\n",
|
||||
" if i% math.ceil(num_iters/10) == 0:\n",
|
||||
" w_history.append([w[0],w[1]])\n",
|
||||
" print(f\"Iteration {i:4}: Cost {J_history[-1]:8.2f} \",\n",
|
||||
" f\"gradient: {gradient[0]:9.4f},{gradient[1]:14.4f}\")\n",
|
||||
" \n",
|
||||
" return w, J_history, w_history #return w and J,w history for graphing"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# initialize parameters\n",
|
||||
"w_init = [0,0]\n",
|
||||
"# some gradient descent settings\n",
|
||||
"iterations = 1000\n",
|
||||
"alpha = 1.0e-8\n",
|
||||
"# run gradient descent\n",
|
||||
"w_final, J_hist, w_hist = gradient_descent(X_train ,y_train, w_init, compute_cost, compute_gradient, alpha, iterations)\n",
|
||||
"print(f\"w found by gradient descent: ({w_final[0]:8.4f},{w_final[1]:8.4f})\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Expected Output**: \n",
|
||||
"```w found by gradient descent: (0.0001,0.2000)``` \n",
|
||||
"As we expected, the calculated parameter values are very close to (0,0.2) from previous labs."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print(f\"1000 sqft house estimate {w_final[0] + w_final[1]*1000:0.2f} Thousand dollars\")\n",
|
||||
"print(f\"1000 sqft house estimate {w_final[0] + w_final[1]*1200:0.2f} Thousand dollars\")\n",
|
||||
"print(f\"2000 sqft house estimate {w_final[0] + w_final[1]*2000:0.2f} Thousand dollars\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# plot cost vs iteration \n",
|
||||
"plt.plot(J_hist)\n",
|
||||
"plt.title(\"Cost vs iteration\")\n",
|
||||
"plt.ylabel('Cost')\n",
|
||||
"plt.xlabel('iteration step')\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The plot shows that we rapidly reduced cost early. Recall from lecture that the gradient tends to be larger when further from the optimum creating larger step sizes. As you approach the final value, the gradient is smaller resulting in smaller step sizes."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Plotting\n",
|
||||
"Let's produce some of the fancy graphs that are popular for showing gradient descent. First we'll create some extra test cases."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# generate some more paths\n",
|
||||
"w_init = [400,0.6]\n",
|
||||
"# some gradient descent settings\n",
|
||||
"iterations = 1000\n",
|
||||
"alpha = 1.0e-7\n",
|
||||
"# run gradient descent\n",
|
||||
"w2_final, J2_hist, w2_hist = gradient_descent(X_train ,y_train, w_init, compute_cost, compute_gradient, alpha, iterations)\n",
|
||||
"print(f\"w found by gradient descent: ({w2_final[0]:0.4f},{w2_final[1]:0.4f})\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Note, cost seems to have **plateaued**."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#generate some more paths\n",
|
||||
"w_init = [100,0.1]\n",
|
||||
"# some gradient descent settings\n",
|
||||
"iterations = 5\n",
|
||||
"alpha = 1.0e-6 # larger alpha\n",
|
||||
"# run gradient descent\n",
|
||||
"w3_final, J3_hist, w3_hist = gradient_descent(X_train ,y_train, w_init, compute_cost, compute_gradient, alpha, iterations)\n",
|
||||
"print(f\"w found by gradient descent: ({w3_final[0]:0.4f},{w3_final[1]:0.4f})\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Note, cost is **increasing**!"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from mpl_toolkits.mplot3d import axes3d\n",
|
||||
"import matplotlib.pyplot as plt\n",
|
||||
"from matplotlib import cm\n",
|
||||
"import numpy as np\n",
|
||||
"\n",
|
||||
"w0 = np.arange(-500, 500, 5)\n",
|
||||
"w1 = np.arange(-0.2, 0.8, 0.005)\n",
|
||||
"w0,w1 = np.meshgrid(w0,w1)\n",
|
||||
"z=np.zeros_like(w0)\n",
|
||||
"n,_ = w0.shape\n",
|
||||
"for i in range(n):\n",
|
||||
" for j in range(n):\n",
|
||||
" z[i][j] = compute_cost(X_train, y_train, [w0[i][j],w1[i][j]] )\n",
|
||||
"\n",
|
||||
" \n",
|
||||
"fig = plt.figure(figsize=(24,6))\n",
|
||||
"\n",
|
||||
"ax = fig.add_subplot(1, 2, 2)\n",
|
||||
"CS = ax.contour(w1, w0, z,[0,50,1000,5000,10000,25000,50000])\n",
|
||||
"plt.clabel(CS, inline=1, fmt='%1.0f', fontsize=10)\n",
|
||||
"plt.title('Contour plot of cost J(w), vs w0,w1 with path of gradient descent')\n",
|
||||
"\n",
|
||||
"w_sub = [ (i[1],i[0]) for i in w_hist]\n",
|
||||
"for i in range(len(w_sub)-1):\n",
|
||||
" plt.annotate('', xy=w_sub[i + 1], xytext=w_sub[i],\n",
|
||||
" arrowprops={'arrowstyle': '->', 'color': 'r', 'lw': 1},\n",
|
||||
" va='center', ha='center')\n",
|
||||
"\n",
|
||||
"w_sub = [ (i[1],i[0]) for i in w2_hist]\n",
|
||||
"for i in range(len(w_sub)-1):\n",
|
||||
" plt.annotate('', xy=w_sub[i + 1], xytext=w_sub[i],\n",
|
||||
" arrowprops={'arrowstyle': '->', 'color': 'b', 'lw': 1},\n",
|
||||
" va='center', ha='center')\n",
|
||||
"w_sub = [ (i[1],i[0]) for i in w3_hist]\n",
|
||||
"for i in range(len(w_sub)-1):\n",
|
||||
" plt.annotate('', xy=w_sub[i + 1], xytext=w_sub[i],\n",
|
||||
" arrowprops={'arrowstyle': '->', 'color': 'g', 'lw': 1},\n",
|
||||
" va='center', ha='center')\n",
|
||||
" \n",
|
||||
"ax.set_xlabel('w_1')\n",
|
||||
"ax.set_ylabel('w_0')\n",
|
||||
"\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<details>\n",
|
||||
"<summary>\n",
|
||||
" <font size='3'><b>Expected graph</b></font>\n",
|
||||
"</summary>\n",
|
||||
" <img src=\"./figures/ContourPlotLab3.PNG\" alt=\"Contour Plot\">\n",
|
||||
"<\\details>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"What is this graph showing? The ellipses are describing the surface of the cost $J(\\mathbf{w})$. The lines are the paths take from initial values of $(w_0,w_1)$ to their final values. \n",
|
||||
"The **red line** is our first run with w_init = (0,0). Gradient Descent successfully moves the parameters to (0,0.2) where cost is a minimum. But what about the Blue and Green lines? \n",
|
||||
"The **Blue** lines has w_init = (400,0.6) and alpha = 1.0e-7. Notice that while `w1` moves, `w0` doesn't seem to move. Why? \n",
|
||||
"The **Green** line has w_init = (100,0.1) and alpha = 1.0e-6. It never fully converges. Why?"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<details>\n",
|
||||
"<summary>\n",
|
||||
" <font size='3', color='darkgreen'><b>Hints</b></font>\n",
|
||||
"</summary>\n",
|
||||
" \n",
|
||||
"In next week's lectures we will cover some fine tuning of gradient descent that is required to get it to run well. The **blue line** is one of these cases. It it does not seem that `w0` is being updated, but it is, just slowly. `w1` is multiplied by $x_1$ which is the square footage of houses in the dataset, a value in the thousands. This makes `w1` update much more quickly than `w0`. Review the update equations (4) and (5) above. With alpha = 1.0e-7, it will take many iterations to update `w0` to the right value. \n",
|
||||
" \n",
|
||||
"Why not just increase the value of alpha? The **green** line demonstrates the problem with doing this. We use a larger value for alpha in that run and the solution _diverges_. The update for `w1` is so large that the cost is larger on each iteration rather than smaller. If you run it long enough, you will generate a numerical overflow (try it). The lecture described this scenario. \n",
|
||||
" \n",
|
||||
"So, we have a situation where alpha is too big for `w1` but too small for `w0`. A means of dealing with this will be described next week. It involves _scaling_ or _normalizing_ the features in the data set so they fall within the same range. Once the data is normalized, alpha will impact all features evenly.\n",
|
||||
" \n",
|
||||
"Another way to handle this is to select the largest value of alpha you can that doesn't cause the solution to diverge, and then run it a long time. Try this in the next section _if you have the time!_"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#TAKES A LONG TIME, 10 minutes or so.\n",
|
||||
"w_init = [400,0.1]\n",
|
||||
"# some gradient descent settings\n",
|
||||
"iterations = 40000000\n",
|
||||
"alpha = 7.0e-7\n",
|
||||
"# run gradient descent\n",
|
||||
"w4_final, J4_hist, w4_hist = gradient_descent(X_train ,y_train, w_init, compute_cost, compute_gradient, alpha, iterations)\n",
|
||||
"print(f\"w found by gradient descent: ({w4_final[0]:0.4f},{w4_final[1]:0.4f})\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"w0 = np.arange(-500, 500, 5)\n",
|
||||
"w1 = np.arange(-0.2, 0.8, 0.005)\n",
|
||||
"w0,w1 = np.meshgrid(w0,w1)\n",
|
||||
"z=np.zeros_like(w0)\n",
|
||||
"n,_ = w0.shape\n",
|
||||
"for i in range(n):\n",
|
||||
" for j in range(n):\n",
|
||||
" z[i][j] = compute_cost(X_train, y_train, [w0[i][j],w1[i][j]] )\n",
|
||||
"\n",
|
||||
" \n",
|
||||
"fig = plt.figure(figsize=(24,6))\n",
|
||||
"\n",
|
||||
"ax = fig.add_subplot(1, 2, 2)\n",
|
||||
"CS = ax.contour(w1, w0, z,[0,50,1000,5000,10000,25000,50000])\n",
|
||||
"plt.clabel(CS, inline=1, fmt='%1.0f', fontsize=10)\n",
|
||||
"plt.title('Contour plot of cost, w0 vs w1')\n",
|
||||
"\n",
|
||||
"w_sub = [ (i[1],i[0]) for i in w_hist]\n",
|
||||
"for i in range(len(w_sub)-1):\n",
|
||||
" plt.annotate('', xy=w_sub[i + 1], xytext=w_sub[i],\n",
|
||||
" arrowprops={'arrowstyle': '->', 'color': 'r', 'lw': 1},\n",
|
||||
" va='center', ha='center')\n",
|
||||
"\n",
|
||||
"w_sub = [ (i[1],i[0]) for i in w4_hist]\n",
|
||||
"for i in range(len(w_sub)-1):\n",
|
||||
" plt.annotate('', xy=w_sub[i + 1], xytext=w_sub[i],\n",
|
||||
" arrowprops={'arrowstyle': '->', 'color': 'c', 'lw': 1},\n",
|
||||
" va='center', ha='center')\n",
|
||||
" \n",
|
||||
"ax.set_xlabel('w_1')\n",
|
||||
"ax.set_ylabel('w_0')\n",
|
||||
"\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The cyan line is our long-running solution. Scaling or Normalizing features will get us to the right solution faster. We will cover this next week."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.8.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
@ -0,0 +1,344 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Optional Lab: Feature Engineering and Polynomial Regression\n",
|
||||
"\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Goals\n",
|
||||
"In this lab you will:\n",
|
||||
"- explore feature engineering and polynomial regression which allows you to use the machinery of linear regression to fit very complicated, even very non-linear functions.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Tools\n",
|
||||
"You will utilize the function developed in previous labs as well as matplotlib and NumPy. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import numpy as np\n",
|
||||
"import matplotlib.pyplot as plt\n",
|
||||
"from lab_utils_multi import zscore_normalize_features, run_gradient_descent_feng\n",
|
||||
"np.set_printoptions(precision=2) # reduced display precision on numpy arrays"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name='FeatureEng'></a>\n",
|
||||
"# Feature Engineering and Polynomial Regression Overview\n",
|
||||
"\n",
|
||||
"Out of the box, linear regression provides a means of building models of the form:\n",
|
||||
"$$f_{\\mathbf{w},b} = w_0x_0 + w_1x_1+ ... + w_{n-1}x_{n-1} + b \\tag{1}$$ \n",
|
||||
"What if your features/data are non-linear or are combinations of features? For example, Housing prices do not tend to be linear with living area but penalize very small or very large houses resulting in the curves shown in the graphic above. How can we use the machinery of linear regression to fit this curve? Recall, the 'machinery' we have is the ability to modify the parameters $\\mathbf{w}$, $\\mathbf{b}$ in (1) to 'fit' the equation to the training data. However, no amount of adjusting of $\\mathbf{w}$,$\\mathbf{b}$ in (1) will achieve a fit to a non-linear curve.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name='PolynomialFeatures'></a>\n",
|
||||
"## Polynomial Features\n",
|
||||
"\n",
|
||||
"Above we were considering a scenario where the data was non-linear. Let's try using what we know so far to fit a non-linear curve. We'll start with a simple quadratic: $y = 1+x^2$\n",
|
||||
"\n",
|
||||
"You're familiar with all the routines we're using. They are available in the lab_utils.py file for review. We'll use [`np.c_[..]`](https://numpy.org/doc/stable/reference/generated/numpy.c_.html) which is a NumPy routine to concatenate along the column boundary."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# create target data\n",
|
||||
"x = np.arange(0, 20, 1)\n",
|
||||
"y = 1 + x**2\n",
|
||||
"X = x.reshape(-1, 1)\n",
|
||||
"\n",
|
||||
"model_w,model_b = run_gradient_descent_feng(X,y,iterations=1000, alpha = 1e-2)\n",
|
||||
"\n",
|
||||
"plt.scatter(x, y, marker='x', c='r', label=\"Actual Value\"); plt.title(\"no feature engineering\")\n",
|
||||
"plt.plot(x,X@model_w + model_b, label=\"Predicted Value\"); plt.xlabel(\"X\"); plt.ylabel(\"y\"); plt.legend(); plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Well, as expected, not a great fit. What is needed is something like $y= w_0x_0^2 + b$, or a **polynomial feature**.\n",
|
||||
"To accomplish this, you can modify the *input data* to *engineer* the needed features. If you swap the original data with a version that squares the $x$ value, then you can achieve $y= w_0x_0^2 + b$. Let's try it. Swap `X` for `X**2` below:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# create target data\n",
|
||||
"x = np.arange(0, 20, 1)\n",
|
||||
"y = 1 + x**2\n",
|
||||
"\n",
|
||||
"# Engineer features \n",
|
||||
"X = x**2 #<-- added engineered feature"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"X = X.reshape(-1, 1) #X should be a 2-D Matrix\n",
|
||||
"model_w,model_b = run_gradient_descent_feng(X, y, iterations=10000, alpha = 1e-5)\n",
|
||||
"\n",
|
||||
"plt.scatter(x, y, marker='x', c='r', label=\"Actual Value\"); plt.title(\"Added x**2 feature\")\n",
|
||||
"plt.plot(x, np.dot(X,model_w) + model_b, label=\"Predicted Value\"); plt.xlabel(\"x\"); plt.ylabel(\"y\"); plt.legend(); plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Great! near perfect fit. Notice the values of $\\mathbf{w}$ and b printed right above the graph: `w,b found by gradient descent: w: [1.], b: 0.0490`. Gradient descent modified our initial values of $\\mathbf{w},b $ to be (1.0,0.049) or a model of $y=1*x_0^2+0.049$, very close to our target of $y=1*x_0^2+1$. If you ran it longer, it could be a better match. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Selecting Features\n",
|
||||
"<a name='GDF'></a>\n",
|
||||
"Above, we knew that an $x^2$ term was required. It may not always be obvious which features are required. One could add a variety of potential features to try and find the most useful. For example, what if we had instead tried : $y=w_0x_0 + w_1x_1^2 + w_2x_2^3+b$ ? \n",
|
||||
"\n",
|
||||
"Run the next cells. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# create target data\n",
|
||||
"x = np.arange(0, 20, 1)\n",
|
||||
"y = x**2\n",
|
||||
"\n",
|
||||
"# engineer features .\n",
|
||||
"X = np.c_[x, x**2, x**3] #<-- added engineered feature"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"model_w,model_b = run_gradient_descent_feng(X, y, iterations=10000, alpha=1e-7)\n",
|
||||
"\n",
|
||||
"plt.scatter(x, y, marker='x', c='r', label=\"Actual Value\"); plt.title(\"x, x**2, x**3 features\")\n",
|
||||
"plt.plot(x, X@model_w + model_b, label=\"Predicted Value\"); plt.xlabel(\"x\"); plt.ylabel(\"y\"); plt.legend(); plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Note the value of $\\mathbf{w}$, `[0.08 0.54 0.03]` and b is `0.0106`.This implies the model after fitting/training is:\n",
|
||||
"$$ 0.08x + 0.54x^2 + 0.03x^3 + 0.0106 $$\n",
|
||||
"Gradient descent has emphasized the data that is the best fit to the $x^2$ data by increasing the $w_1$ term relative to the others. If you were to run for a very long time, it would continue to reduce the impact of the other terms. \n",
|
||||
">Gradient descent is picking the 'correct' features for us by emphasizing its associated parameter\n",
|
||||
"\n",
|
||||
"Let's review this idea:\n",
|
||||
"- Intially, the features were re-scaled so they are comparable to each other\n",
|
||||
"- less weight value implies less important/correct feature, and in extreme, when the weight becomes zero or very close to zero, the associated feature is not useful in fitting the model to the data.\n",
|
||||
"- above, after fitting, the weight associated with the $x^2$ feature is much larger than the weights for $x$ or $x^3$ as it is the most useful in fitting the data. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### An Alternate View\n",
|
||||
"Above, polynomial features were chosen based on how well they matched the target data. Another way to think about this is to note that we are still using linear regression once we have created new features. Given that, the best features will be linear relative to the target. This is best understood with an example. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# create target data\n",
|
||||
"x = np.arange(0, 20, 1)\n",
|
||||
"y = x**2\n",
|
||||
"\n",
|
||||
"# engineer features .\n",
|
||||
"X = np.c_[x, x**2, x**3] #<-- added engineered feature\n",
|
||||
"X_features = ['x','x^2','x^3']"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"fig,ax=plt.subplots(1, 3, figsize=(12, 3), sharey=True)\n",
|
||||
"for i in range(len(ax)):\n",
|
||||
" ax[i].scatter(X[:,i],y)\n",
|
||||
" ax[i].set_xlabel(X_features[i])\n",
|
||||
"ax[0].set_ylabel(\"y\")\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Above, it is clear that the $x^2$ feature mapped against the target value $y$ is linear. Linear regression can then easily generate a model using that feature."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Scaling features\n",
|
||||
"As described in the last lab, if the data set has features with significantly different scales, one should apply feature scaling to speed gradient descent. In the example above, there is $x$, $x^2$ and $x^3$ which will naturally have very different scales. Let's apply Z-score normalization to our example."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# create target data\n",
|
||||
"x = np.arange(0,20,1)\n",
|
||||
"X = np.c_[x, x**2, x**3]\n",
|
||||
"print(f\"Peak to Peak range by column in Raw X:{np.ptp(X,axis=0)}\")\n",
|
||||
"\n",
|
||||
"# add mean_normalization \n",
|
||||
"X = zscore_normalize_features(X) \n",
|
||||
"print(f\"Peak to Peak range by column in Normalized X:{np.ptp(X,axis=0)}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now we can try again with a more aggressive value of alpha:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"x = np.arange(0,20,1)\n",
|
||||
"y = x**2\n",
|
||||
"\n",
|
||||
"X = np.c_[x, x**2, x**3]\n",
|
||||
"X = zscore_normalize_features(X) \n",
|
||||
"\n",
|
||||
"model_w, model_b = run_gradient_descent_feng(X, y, iterations=100000, alpha=1e-1)\n",
|
||||
"\n",
|
||||
"plt.scatter(x, y, marker='x', c='r', label=\"Actual Value\"); plt.title(\"Normalized x x**2, x**3 feature\")\n",
|
||||
"plt.plot(x,X@model_w + model_b, label=\"Predicted Value\"); plt.xlabel(\"x\"); plt.ylabel(\"y\"); plt.legend(); plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Feature scaling allows this to converge much faster. \n",
|
||||
"Note again the values of $\\mathbf{w}$. The $w_1$ term, which is the $x^2$ term is the most emphasized. Gradient descent has all but eliminated the $x^3$ term."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Complex Functions\n",
|
||||
"With feature engineering, even quite complex functions can be modeled:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"x = np.arange(0,20,1)\n",
|
||||
"y = np.cos(x/2)\n",
|
||||
"\n",
|
||||
"X = np.c_[x, x**2, x**3,x**4, x**5, x**6, x**7, x**8, x**9, x**10, x**11, x**12, x**13]\n",
|
||||
"X = zscore_normalize_features(X) \n",
|
||||
"\n",
|
||||
"model_w,model_b = run_gradient_descent_feng(X, y, iterations=1000000, alpha = 1e-1)\n",
|
||||
"\n",
|
||||
"plt.scatter(x, y, marker='x', c='r', label=\"Actual Value\"); plt.title(\"Normalized x x**2, x**3 feature\")\n",
|
||||
"plt.plot(x,X@model_w + model_b, label=\"Predicted Value\"); plt.xlabel(\"x\"); plt.ylabel(\"y\"); plt.legend(); plt.show()\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"## Congratulations!\n",
|
||||
"In this lab you:\n",
|
||||
"- learned how linear regression can model complex, even highly non-linear functions using feature engineering\n",
|
||||
"- recognized that it is important to apply feature scaling when doing feature engineering"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.6"
|
||||
},
|
||||
"toc-autonumbering": false
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
@ -0,0 +1,501 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Ungraded Lab - Multiple Variable Model Representation\n",
|
||||
"\n",
|
||||
"In this ungraded lab, you will extend our model to support multiple features. You will also utilized a popular python numeric library, NumPy to efficiently store and manipulate data. For detailed descriptions and examples of routines used, see [Numpy Documentation](https://numpy.org/doc/stable/reference/)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import numpy as np"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Problem Statement\n",
|
||||
"\n",
|
||||
"You will use the motivating example of housing price prediction. The training dataset contains three examples with 4 features (size,bedrooms,floors and age) shown in the table below.\n",
|
||||
"\n",
|
||||
"| Size (feet$^2$) | Number of Bedrooms | Number of floors | Age of Home | Price (1000s dollars) | \n",
|
||||
"| ----------------| ------------------- |----------------- |--------------|-------------- | \n",
|
||||
"| 2104 | 5 | 1 | 45 | 460 | \n",
|
||||
"| 1416 | 3 | 2 | 40 | 232 | \n",
|
||||
"| 852 | 2 | 1 | 35 | 178 | \n",
|
||||
"\n",
|
||||
"We would like to build a linear regression model using these values so we can then predict the price for other houses - say, a house with 1200 feet$^2$, 3 bedrooms, 1 floor, 40 years old. In this lab you will create the model. In the following labs, we will fit the data.\n",
|
||||
"\n",
|
||||
"### Notation: X, y and parameters w\n",
|
||||
"\n",
|
||||
"The lectures and equations describe $\\mathbf{X}$, $\\mathbf{y}$, $\\mathbf{w}$. In our code these are represented by variables:\n",
|
||||
"- `X_orig` represents input variables, also called input features. In previous labs, there was just one feature, now there are four. \n",
|
||||
"- `y_orig` represents output variables, also known as target variables (in this case - Price (1000s of dollars)). \n",
|
||||
"- `w_init` represents our parameters. \n",
|
||||
"Please run the following code cell to create your `X_orig` and `y_orig` variables."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"X_orig = np.array([[2104,5,1,45], [1416,3,2,40], [852,2,1,35]])\n",
|
||||
"y_orig = np.array([460,232,178]).reshape(-1,1) #reshape creates (m,1) matrix"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Matrix X containing our examples\n",
|
||||
"Similar to the table above, examples are stored in a NumPy matrix `X_init`. Each row of the matrix represents one example. As described in lecture, examples are extended by a column of ones creating `X_init_e`, described below. In general, when you have $m$ training examples ( $m$ is three in our example), and there are $n$ features (four in our example), $\\mathbf{X}$ is a matrix with dimensions ($m$, $n+1$) (m rows, n+1 columns).\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"$$\\mathbf{X} = \\begin{pmatrix}\n",
|
||||
" \\mathbf{x}^{(0)} \\\\ \n",
|
||||
" \\mathbf{x}^{(1)} \\\\\n",
|
||||
" \\cdots \\\\\n",
|
||||
" \\mathbf{x}^{(m-1)}\n",
|
||||
"\\end{pmatrix}\n",
|
||||
"= \\begin{pmatrix}\n",
|
||||
" x^{(0)}_0 & x^{(0)}_1 & \\cdots & x^{(0)}_{n} \\\\ \n",
|
||||
" x^{(1)}_0 & x^{(1)}_1 & \\cdots & x^{(1)}_{n} \\\\\n",
|
||||
" \\cdots \\\\\n",
|
||||
" x^{(m-1)}_0 & x^{(m-1)}_1 & \\cdots & x^{(m-1)}_{n} \n",
|
||||
"\\end{pmatrix}\n",
|
||||
"$$\n",
|
||||
"notation:\n",
|
||||
"- $\\mathbf{x}^{(0)}$ is example 0. The superscript in parenthesis indicates the example number. The bold indicates a vector (described more below)\n",
|
||||
"- $x^{(0)}_2$ is element 2 in example 0. The superscript in parenthesis indicates the example number while the subscript represents an element. \n",
|
||||
"\n",
|
||||
"For our dataset, $\\mathbf{X}$ is (3,5):\n",
|
||||
"$$\\mathbf{X} = \\begin{pmatrix}\n",
|
||||
" \\mathbf{x}^{(0)} \\\\ \n",
|
||||
" \\mathbf{x}^{(1)} \\\\\n",
|
||||
" \\mathbf{x}^{(2)}\n",
|
||||
"\\end{pmatrix}\n",
|
||||
"= \\begin{pmatrix}\n",
|
||||
" 1 & 2104 & 5 & 1 & 45 & 460 \\\\ \n",
|
||||
" 1 & 1416 & 3 & 2 & 40 & 232 \\\\\n",
|
||||
" 1 & 852 & 2 & 1 & 35 & 178\n",
|
||||
"\\end{pmatrix}\n",
|
||||
"$$\n",
|
||||
"Lets try implementing this. Start by examining our input data"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# data is stored in numpy array/matrix\n",
|
||||
"print(f\"X Shape: {X_orig.shape}, X Type:{type(X_orig)})\")\n",
|
||||
"print(X_orig)\n",
|
||||
"print(f\"y Shape: {y_orig.shape}, y Type:{type(y_orig)})\")\n",
|
||||
"print(y_orig)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"To simplify matrix/vector operations, you will want to first add another column to your data (as $x_0$) to accomodate the $w_0$ intercept term. This allows you to treat $w_0$ the same as the other parameters.\n",
|
||||
"\n",
|
||||
"So if your original `X_orig` looks like this:\n",
|
||||
"\n",
|
||||
"$$ \n",
|
||||
"\\mathbf{X_{orig}} = \n",
|
||||
"\\begin{pmatrix}\n",
|
||||
" x^{(0)}_1 \\\\ \n",
|
||||
" x^{(1)}_1 \\\\\n",
|
||||
" \\cdots \\\\\n",
|
||||
" x^{(m-1)}_1 \n",
|
||||
"\\end{pmatrix}\n",
|
||||
"$$\n",
|
||||
"\n",
|
||||
"You will want to combine it with a vector of ones:\n",
|
||||
"$$\n",
|
||||
"\\mathbf{1} = \n",
|
||||
"\\begin{pmatrix}\n",
|
||||
" 1 \\\\ \n",
|
||||
" 1 \\\\\n",
|
||||
" \\cdots \\\\\n",
|
||||
" 1\n",
|
||||
"\\end{pmatrix}\n",
|
||||
"$$\n",
|
||||
"\n",
|
||||
"So it will look like this:\n",
|
||||
"$$\n",
|
||||
"\\mathbf{X_{train}} = \\begin{pmatrix} \\mathbf{1} & \\mathbf{X_{orig}}\\end{pmatrix}\n",
|
||||
"=\n",
|
||||
"\\begin{pmatrix}\n",
|
||||
" 1 & x^{(0)}_1 \\\\ \n",
|
||||
" 1 & x^{(1)}_1 \\\\\n",
|
||||
" \\cdots \\\\\n",
|
||||
" 1 & x^{(m-1)}_1 \n",
|
||||
"\\end{pmatrix}\n",
|
||||
"$$\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"tmp_ones = np.ones((3,1), dtype=np.int64) #dtype just added to keep examples neat.. not required\n",
|
||||
"print (\"(m,1) column of ones\")\n",
|
||||
"print(tmp_ones)\n",
|
||||
"#extend X_orig with column of ones\n",
|
||||
"X_train = np.concatenate([tmp_ones, X_orig], axis=1)\n",
|
||||
"y_train = y_orig # just for symmetry\n",
|
||||
"\n",
|
||||
"print(f\"Vector of ones stacked to the left of X_orig \")\n",
|
||||
"print(X_train)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Parameter vector w\n",
|
||||
"\n",
|
||||
"-$\\mathbf{w}$ is a vector with dimensions ($n+1$, $1$) (n+1 rows, 1 column)\n",
|
||||
" - Each column contains the parameters associated with one feature.\n",
|
||||
" - in our dataset, n+1 is 5.\n",
|
||||
"\n",
|
||||
"$$\\mathbf{w} = \\begin{pmatrix}\n",
|
||||
"w_0 \\\\ \n",
|
||||
"w_1 \\\\\n",
|
||||
"\\cdots\\\\\n",
|
||||
"w_{n}\n",
|
||||
"\\end{pmatrix}\n",
|
||||
"$$\n",
|
||||
"For this lab, lets initialize `w` with some handy predetermined values. Normally, `w` would be initalized with random values or zero. Note the use of \".reshape\" to create a (n,1) column vector. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"w_init = np.array([ 785.1811367994083, 0.39133535, 18.75376741, \n",
|
||||
" -53.36032453, -26.42131618]).reshape(-1,1)\n",
|
||||
"print(f\"w_init shape: {w_init.shape}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Model prediction\n",
|
||||
"The model's prediction with multiple variables is given by the linear model:\n",
|
||||
"\n",
|
||||
"$$ f_{\\mathbf{w}}(\\mathbf{x}) = w_0 + w_1x_1 + ... + w_nx_n \\tag{1}$$\n",
|
||||
"\n",
|
||||
"This is where representing our data in matrices and vectors pays off. Recall from the Linear Algebra review the Matrix Vector multiplication. This is shown below\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"Note that Row/Column that is highlighted. Knowing that we have set the $x_0$ values to 1, its clear the first row/column operation implements the prediction (1) above for $\\mathbf{x}^{(0)}$ , resulting in $f_{\\mathbf{w}}(\\mathbf{x}^{(0)})$. The second row of the result is $f_{\\mathbf{w}}(\\mathbf{x}^{(1)})$ and so on. By utilizing Matrix Vector multiplication, we can compute the prediction of all of the examples in $X$ in one statement!.\n",
|
||||
"\n",
|
||||
"$$f_{\\mathbf{w}}(\\mathbf{X})=\\mathbf{X}\\mathbf{w} \\tag{2}$$\n",
|
||||
"\n",
|
||||
"Let's try this. We have previously initized `X_train` and `w_init`. Before you run the cell below, what shape will `f_w` be?"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# calculate f_w for all examples.\n",
|
||||
"f_w = X_train @ w_init # the same as np.matmul(x_orig_e, w_init)\n",
|
||||
"print(\"f_w calculated using a matrix multiply\")\n",
|
||||
"print(f_w)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Using our carefully selected `w` values, the results nearly match our `y_train` values."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print(\"y_train values\")\n",
|
||||
"print(y_train)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Single Prediction\n",
|
||||
"\n",
|
||||
"We now can make prediction on a full set of examples, what about a single example? There are multiple ways to form this calculation, but here we will immitate the calculation that was highlighted in blue in the figure above.\n",
|
||||
"For convenience of notation, you'll define $\\mathbf{x}$ as a vector:\n",
|
||||
"\n",
|
||||
"$$ \\mathbf{x} = \\begin{pmatrix}\n",
|
||||
" x_0 & x_1 & ... & x_n\n",
|
||||
" \\end{pmatrix}\n",
|
||||
"$$\n",
|
||||
"\n",
|
||||
"- With $x_0 = 1$ and ($x_1$,..,$x_n$) being your input data. \n",
|
||||
"\n",
|
||||
"The prediction $f_{\\mathbf{w}}(\\mathbf{x})$ is now\n",
|
||||
"$$ f_{\\mathbf{w}}(\\mathbf{x}) = \\mathbf{x}\\mathbf{w} \\tag{3} $$ \n",
|
||||
"Which performs the following operation:\n",
|
||||
"$$\n",
|
||||
"f_{\\mathbf{w}}(\\mathbf{x}) = x_0w_0 + x_1w_1 + ... + x_nw_n\n",
|
||||
"$$\n",
|
||||
"Let's try it. Recall we wanted to predict the value of a house with 1200 feet$^2$, 3 bedrooms, 1 floor, 40 years old."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Define our x vector, extended with a 1.\n",
|
||||
"x_vec = np.array([1,1200,3,1,40]).reshape(1,-1) # row vector\n",
|
||||
"print(\"x_vec shape\", x_vec.shape)\n",
|
||||
"print(\"x_vec\")\n",
|
||||
"print(x_vec)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# make a prediction\n",
|
||||
"f_wv = x_vec @ w_init\n",
|
||||
"print(\"f_wv shape\", f_wv.shape)\n",
|
||||
"print(\"prediction f_wv\", f_wv)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Great! Now that we have realized our model in Matrix and Vector form lets \n",
|
||||
"- review some of the operations in more detail\n",
|
||||
"- try an example on your own."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### np.concatenate and axis\n",
|
||||
"We will use np.concatenate often. The use of `axis` is often confusing. Lets look at this in more detail with an example.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"tmp_X_orig = np.array([[9],\n",
|
||||
" [2]\n",
|
||||
" ])\n",
|
||||
"\n",
|
||||
"print(\"Matrix tmp_X_orig\")\n",
|
||||
"print(tmp_X_orig, \"\\n\")\n",
|
||||
"\n",
|
||||
"# Use np.ones to create a column vector of ones\n",
|
||||
"tmp_ones = np.ones((2,1))\n",
|
||||
"print(f\"Column vector of ones (2 rows and 1 column)\")\n",
|
||||
"print(tmp_ones, \"\\n\")\n",
|
||||
"\n",
|
||||
"tmp_X = np.concatenate([tmp_ones, tmp_X_orig], axis=1)\n",
|
||||
"print(\"Vector of ones stacked to the left of tmp_X_orig\")\n",
|
||||
"print(tmp_X, \"\\n\")\n",
|
||||
"\n",
|
||||
"print(f\"tmp_x has shape: {tmp_X.shape}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"In this small example, the $\\mathbf{X}$ is now:\n",
|
||||
"$$\\mathbf{X} = \n",
|
||||
"\\begin{pmatrix}\n",
|
||||
"1 & 9 \\\\\n",
|
||||
"1 & 2\n",
|
||||
"\\end{pmatrix}\n",
|
||||
"$$\n",
|
||||
"\n",
|
||||
"Notice that when calling `np.concatenate`, you're setting `axis=1`. \n",
|
||||
"- This puts the vector of ones on the left and the tmp_X_orig to the right.\n",
|
||||
"- If you set axis = 0, then `np.concatenate` would place the vector of ones ON TOP of tmp_X_orig"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print(\"Calling numpy.concatenate, setting axis=0\")\n",
|
||||
"tmp_X_version_2 = np.concatenate([tmp_ones, tmp_X_orig], axis=0)\n",
|
||||
"print(\"Vector of ones stacked to the ON TOP of tmp_X_orig\")\n",
|
||||
"print(tmp_X_version_2)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"So if you set axis=0, $\\mathbf{X}$ looks like this:\n",
|
||||
"$$\\mathbf{X} = \n",
|
||||
"\\begin{pmatrix}\n",
|
||||
"1 \\\\ 1 \\\\\n",
|
||||
"9 \\\\ 2\n",
|
||||
"\\end{pmatrix}\n",
|
||||
"$$\n",
|
||||
"This is **NOT** what you want.\n",
|
||||
"\n",
|
||||
"You'll want to set axis=1 so that you get a column vector of ones on the left and a column vector on the right:\n",
|
||||
"\n",
|
||||
"$$\\mathbf{X} = \n",
|
||||
"\\begin{pmatrix}\n",
|
||||
"1 & x^{(0)}_1 \\\\\n",
|
||||
"1 & x^{(1)}_1\n",
|
||||
"\\end{pmatrix}\n",
|
||||
"$$"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Second Example on your own\n",
|
||||
"Let's try a similar example with slightly different features.\n",
|
||||
"\n",
|
||||
"| Size (feet$^2$) | Number of Bedrooms | Age of Home | Price (1000s dollars) | \n",
|
||||
"| ----------------| ------------------- |--------------|-------------- | \n",
|
||||
"| 2104 | 5 | 45 | 460 | \n",
|
||||
"| 1416 | 3 | 40 | 232 | \n",
|
||||
"| 1534 | 4 | 30 | 315 | \n",
|
||||
"| 852 | 2 | 35 | 178 | \n",
|
||||
"\n",
|
||||
"**Using the previous example as a guide** as needed, \n",
|
||||
"- create the data structures for `X_orig`, `y_orig` \n",
|
||||
"- extend X_orig with a column of 1's.\n",
|
||||
"- calculate `f_w`\n",
|
||||
"- make a prediction for a single example, 1500sqft, 3 bedrooms, 40 years old"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# use these precalculated values as inital parameters\n",
|
||||
"w_init2 = np.array([-267.70709382, -0.37871854, 220.9610984, 9.32723112]).reshape(-1,1)\n",
|
||||
"\n",
|
||||
"X_orig2 =\n",
|
||||
"y_train2 = \n",
|
||||
"tmp_ones2 = \n",
|
||||
"X_train2 = \n",
|
||||
"f_w2 = \n",
|
||||
"print(f_w2)\n",
|
||||
"print(y_train2)\n",
|
||||
"\n",
|
||||
"x_vec2 = np.array([1,1500,3,40]).reshape(1,-1)\n",
|
||||
"f_wv2 = x_vec2 @ w_init2\n",
|
||||
"print(f_wv2)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<details>\n",
|
||||
"<summary>\n",
|
||||
" <font size='3', color='darkgreen'><b>Hints</b></font>\n",
|
||||
"</summary>\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"w_init2 = np.array([-267.70709382, -0.37871854, 220.9610984, 9.32723112]).reshape(-1,1)\n",
|
||||
"X_orig2 = np.array([[2104,5,45], [1416,3,40], [1534,4,30], [852,2,35]])\n",
|
||||
"y_train2 = np.array([460,232,315,178]).reshape(-1,1) #reshape creates (m,1) matrix\n",
|
||||
"tmp_ones2 = np.ones((4,1), dtype=np.int64)\n",
|
||||
"X_train2 = np.concatenate([tmp_ones2, X_orig2], axis=1)\n",
|
||||
"f_w2 = X_train2 @ w_init2\n",
|
||||
"print(f_w2)\n",
|
||||
"print(y_train2)\n",
|
||||
"\n",
|
||||
"x_vec2 = np.array([1,1500,3,40]).reshape(1,-1)\n",
|
||||
"f_wv2 = x_vec2 @ w_init2\n",
|
||||
"print(f_wv2)\n",
|
||||
"-----------------------------------------------------------------\n",
|
||||
" Output of cell\n",
|
||||
"-----------------------------------------------------------------\n",
|
||||
"[[459.99999042]\n",
|
||||
" [231.99999354]\n",
|
||||
" [314.99999302]\n",
|
||||
" [177.9999961 ]]\n",
|
||||
"[[460]\n",
|
||||
" [232]\n",
|
||||
" [315]\n",
|
||||
" [178]]\n",
|
||||
"[[200.18763618]]\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.8.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
@ -0,0 +1,396 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Ungraded Lab - Multiple Variable Cost\n",
|
||||
"\n",
|
||||
"In this lab we will adjust our previous single variable cost calculation to use multiple variables and utilize the NumPy vectors and matrices."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import numpy as np"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We will utilize the same data set and intialization as the last lab.\n",
|
||||
"### Problem Statement\n",
|
||||
"\n",
|
||||
"You will use the motivating example of housing price prediction. The training dataset contains three examples with 4 features (size,bedrooms,floors and age) shown in the table below.\n",
|
||||
"\n",
|
||||
"| Size (feet$^2$) | Number of Bedrooms | Number of floors | Age of Home | Price (1000s dollars) | \n",
|
||||
"| ----------------| ------------------- |----------------- |--------------|-------------- | \n",
|
||||
"| 2104 | 5 | 1 | 45 | 460 | \n",
|
||||
"| 1416 | 3 | 2 | 40 | 232 | \n",
|
||||
"| 852 | 2 | 1 | 35 | 178 | \n",
|
||||
"\n",
|
||||
"We would like to build a linear regression model using these values so we can then predict the price for other houses - say, a house with 1200 feet$^2$, 3 bedrooms, 1 floor, 40 years old. In this lab you will create the model. In the following labs, we will fit the data. \n",
|
||||
"\n",
|
||||
"We will set this up without much explaination. Refer to the previous lab for details."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Load data set\n",
|
||||
"X_orig = np.array([[2104,5,1,45], [1416,3,2,40], [852,2,1,35]])\n",
|
||||
"y_train = np.array([460,232,178]).reshape(-1,1) #reshape creates (m,1) matrix\n",
|
||||
"\n",
|
||||
"#extend X_orig with column of ones\n",
|
||||
"tmp_ones = np.ones((3,1), dtype=np.int64) #dtype just added to keep examples neat.. not required\n",
|
||||
"X_train = np.concatenate([tmp_ones, X_orig], axis=1)\n",
|
||||
"# load parameters. set to near optimal values\n",
|
||||
"w_init = np.array([ 785.1811367994083, 0.39133535, 18.75376741, \n",
|
||||
" -53.36032453, -26.42131618]).reshape(-1,1)\n",
|
||||
"print(f\"X shape: {X_train.shape}, w_shape: {w_init.shape}, y_shape: {y_train.shape}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Calculate the cost\n",
|
||||
"Next, calculate the cost $J(\\vec{w})$\n",
|
||||
"- Recall that the equation for the cost function $J(w)$ looks like this:\n",
|
||||
"$$J(\\mathbf{w}) = \\frac{1}{2m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w}}(\\mathbf{x}^{(i)}) - y^{(i)})^2 \\tag{1}$$ \n",
|
||||
"\n",
|
||||
"- The model prediction is a vector of size m:\n",
|
||||
"$$\\mathbf{f_{\\mathbf{w}}(\\mathbf{X})} = \\begin{pmatrix}\n",
|
||||
"f_{\\mathbf{w}}(x^{(0)}) \\\\\n",
|
||||
"f_{\\mathbf{w}}(x^{(1)}) \\\\\n",
|
||||
"\\cdots \\\\\n",
|
||||
"f_{\\mathbf{w}}(x^{(m-1)}) \\\\\n",
|
||||
"\\end{pmatrix} \n",
|
||||
"$$\n",
|
||||
"\n",
|
||||
"- Similarly, `y_train` contains the actual values as a column vector of m examples\n",
|
||||
"$$\\mathbf{y} = \\begin{pmatrix}\n",
|
||||
"y^{(0)} \\\\\n",
|
||||
"y^{(1)} \\\\\n",
|
||||
"\\cdots \\\\\n",
|
||||
"y^{(m-1)}\\\\\n",
|
||||
"\\end{pmatrix} \n",
|
||||
"$$\n",
|
||||
"\n",
|
||||
"Performing these calculations will involve some matrix and vector operations. These should be familiar from the Linear Algebra review. If not, a short review is at the end of this notebook.\n",
|
||||
"\n",
|
||||
"Notation:\n",
|
||||
"- Adjacent matrix, vector symbols such $\\mathbf{X}\\mathbf{w}$ or $\\mathbf{x}\\mathbf{w}$ implies a matrix multiplication. \n",
|
||||
"- An explicit $*$ implies element-wise multiplication.\n",
|
||||
"- $()^2$ is element-wise squaring\n",
|
||||
"- **bold** lowercase is a vector, **bold** uppercase is a matrix\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Instructions for Vectorized implementation of equation (1) above, computing cost :\n",
|
||||
"- calculate prediction for **all** training examples\n",
|
||||
"$$f_{\\mathbf{w}}(\\mathbf{X})=\\mathbf{X}\\mathbf{w} \\tag{2}$$\n",
|
||||
"- calculate the cost **all** examples\n",
|
||||
"$$cost = \\frac{1}{2m} \\sum\\limits_{i = 0}^{m-1}((f_{\\mathbf{w}}(\\mathbf{X})-\\mathbf{y})^2) \\tag{3}$$\n",
|
||||
" \n",
|
||||
" - where $m$ is the number of training examples. The result is a scalar."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<details>\n",
|
||||
"<summary>\n",
|
||||
" <font size='3', color='darkgreen'><b>Hints</b></font>\n",
|
||||
"</summary>\n",
|
||||
" \n",
|
||||
"```\n",
|
||||
"#Function to calculate the cost\n",
|
||||
"def compute_cost(X, y, w, verbose=False):\n",
|
||||
" \"\"\"\n",
|
||||
" Computes the gradient for linear regression \n",
|
||||
" Args:\n",
|
||||
" X : (array_like Shape (m,n)) variable such as house size \n",
|
||||
" y : (array_like Shape (m,)) actual value \n",
|
||||
" w : (array_like Shape (2,)) parameters of the model \n",
|
||||
" verbose : (Boolean) If true, print out intermediate value f_w\n",
|
||||
" Returns\n",
|
||||
" cost: (scalar) The gradient of the cost w.r.t. the parameters w. \n",
|
||||
" Note that dw has the same dimensions as w.\n",
|
||||
" \"\"\"\n",
|
||||
" \n",
|
||||
" m,n = X.shape\n",
|
||||
"\n",
|
||||
" # calculate f_w for all examples.\n",
|
||||
" f_w = X @ w # @ is np.matmul, this the same as np.matmul(X, w)\n",
|
||||
" if verbose: print(\"f_w:\")\n",
|
||||
" if verbose: print(f_w)\n",
|
||||
" \n",
|
||||
" # calculate cost\n",
|
||||
" total_cost = (1/(2*m)) * np.sum((f_w-y)**2)\n",
|
||||
" \n",
|
||||
" return total_cost\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#Function to calculate the cost\n",
|
||||
"def compute_cost(X, y, w, verbose=False):\n",
|
||||
" \"\"\"\n",
|
||||
" Computes the gradient for linear regression \n",
|
||||
" Args:\n",
|
||||
" X : (array_like Shape (m,n)) variable such as house size \n",
|
||||
" y : (array_like Shape (m,)) actual value \n",
|
||||
" w : (array_like Shape (2,)) parameters of the model \n",
|
||||
" verbose : (Boolean) If true, print out intermediate value f_w\n",
|
||||
" Returns\n",
|
||||
" cost: (scalar) The gradient of the cost w.r.t. the parameters w. \n",
|
||||
" Note that dw has the same dimensions as w.\n",
|
||||
" \"\"\"\n",
|
||||
" \n",
|
||||
" m,n = X.shape\n",
|
||||
" ### START CODE HERE ### \n",
|
||||
"\n",
|
||||
" ### END CODE HERE ### \n",
|
||||
" \n",
|
||||
" return total_cost"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Compute and display cost using our pre-chosen optimal parameters. \n",
|
||||
"# cost should be nearly zero\n",
|
||||
"\n",
|
||||
"cost = compute_cost(X_train, y_train, w_init, verbose = True)\n",
|
||||
"print(f'Cost at optimal w : {cost:.3f}')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<details>\n",
|
||||
"<summary>\n",
|
||||
" <b>**Expected Output**:</b>\n",
|
||||
"</summary>\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"f_w:\n",
|
||||
"[[459.99999762]\n",
|
||||
" [231.99999837]\n",
|
||||
" [177.99999899]]\n",
|
||||
"Cost at optimal w : 0.000\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Matrix/Vector Operation Review\n",
|
||||
"Here is a small example to show you how to apply element-wise operations on numpy arrays."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# make a matrix A with 2 rows and 2 columns\n",
|
||||
"tmp_A = np.array([[1,1],[1,1]])\n",
|
||||
"print(f\"matrix A has {tmp_A.shape[0]} rows and {tmp_A.shape[1]} columns\")\n",
|
||||
"print(tmp_A)\n",
|
||||
"print()\n",
|
||||
"\n",
|
||||
"# make a column vector B with 2 rows and 1 column\n",
|
||||
"tmp_b = np.array([[2],[1]])\n",
|
||||
"print(f\"Vector b has {tmp_b.shape[0]} rows and {tmp_b.shape[1]} column\")\n",
|
||||
"print(tmp_b)\n",
|
||||
"print()\n",
|
||||
"# perform matrix multiplication A x b (2,2)(2,1)\n",
|
||||
"tmp_A_times_b = np.dot(tmp_A,tmp_b)\n",
|
||||
"print(\"Multiply A times b\")\n",
|
||||
"print(tmp_A_times_b)\n",
|
||||
"print(f\"The product has {tmp_A_times_b.shape[0]} rows and {tmp_A_times_b.shape[1]} columns\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# make a matrix A with 2 rows and 2 columns\n",
|
||||
"tmp_A = np.array([[1,1],[1,1]])\n",
|
||||
"print(f\"matrix A has {tmp_A.shape[0]} rows and {tmp_A.shape[1]} columns\")\n",
|
||||
"print(tmp_A)\n",
|
||||
"print()\n",
|
||||
"\n",
|
||||
"# make a column vector B with 2 rows and 1 column\n",
|
||||
"tmp_b = np.array([[2],[1]])\n",
|
||||
"print(f\"Vector b has {tmp_b.shape[0]} rows and {tmp_b.shape[1]} column\")\n",
|
||||
"print(tmp_b)\n",
|
||||
"print()\n",
|
||||
"\n",
|
||||
"# Try to perform matrix multiplication b x A, (2,1)(2,2)\n",
|
||||
"try:\n",
|
||||
" tmp_b_times_A = np.dot(tmp_b,tmp_A)\n",
|
||||
"except Exception as e:\n",
|
||||
" print(\"The error message you'll see is:\")\n",
|
||||
" print(e)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The message says that it's checking:\n",
|
||||
" - The number of columns of the left matrix `b`, or `dim 1` is 1.\n",
|
||||
" - The number of rows on the right matrix `dim 0`, is 2.\n",
|
||||
" - 1 does not equal 2\n",
|
||||
" - So the two matrices cannot be multiplied together."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Create two sample column vectors\n",
|
||||
"tmp_c = np.array([[1],[2],[3]])\n",
|
||||
"print(\"Create a column vector c with 3 rows and 1 column\")\n",
|
||||
"print(tmp_c)\n",
|
||||
"print()\n",
|
||||
"\n",
|
||||
"tmp_d = np.array([[2],[2],[2]])\n",
|
||||
"print(\"Create a column vector c with 3 rows and 1 column\")\n",
|
||||
"print(tmp_d)\n",
|
||||
"print()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You can apply `+, -, *, /` operators on two vectors of the same length."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Take the element-wise multiplication of two vectors\n",
|
||||
"tmp_mult = tmp_c * tmp_d\n",
|
||||
"print(\"Take the element-wise multiplication between vectors c and d\")\n",
|
||||
"print(tmp_mult)\n",
|
||||
"print()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You can use `numpy.square` to apply the element-wise square of a vector\n",
|
||||
"- Note, `**2` will also work."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Take the element-wise square of vector c\n",
|
||||
"tmp_square = np.square(tmp_c)\n",
|
||||
"tmp_square_option_2 = tmp_c**2\n",
|
||||
"print(\"Take the element-wise square of vector c\")\n",
|
||||
"print(tmp_square)\n",
|
||||
"print()\n",
|
||||
"print(\"Another way to get the element-wise square of vector c\")\n",
|
||||
"print(tmp_square_option_2)\n",
|
||||
"print()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You can use `numpy.sum` to add up all the elements of a vector (or matrix)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Take the sum of all elements in vector d\n",
|
||||
"tmp_sum = np.sum(tmp_d)\n",
|
||||
"print(\"Vector d\")\n",
|
||||
"print(tmp_d)\n",
|
||||
"print()\n",
|
||||
"print(\"Take the sum of all the elements in vector d\")\n",
|
||||
"print(tmp_sum)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.8.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
@ -0,0 +1,222 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Optional Lab: Linear Regression using Scikit-Learn"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"There is an open-source, commercially usable machine learning toolkit called [scikit-learn](https://scikit-learn.org/stable/index.html). This toolkit contains implementations of many of the algorithms that you will work with in this course.\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Goals\n",
|
||||
"In this lab you will:\n",
|
||||
"- Utilize scikit-learn to implement linear regression using Gradient Descent"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Tools\n",
|
||||
"You will utilize functions from scikit-learn as well as matplotlib and NumPy. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import numpy as np\n",
|
||||
"np.set_printoptions(precision=2)\n",
|
||||
"from sklearn.linear_model import LinearRegression, SGDRegressor\n",
|
||||
"from sklearn.preprocessing import StandardScaler\n",
|
||||
"from lab_utils_multi import load_house_data\n",
|
||||
"import matplotlib.pyplot as plt\n",
|
||||
"dlblue = '#0096ff'; dlorange = '#FF9300'; dldarkred='#C00000'; dlmagenta='#FF40FF'; dlpurple='#7030A0'; \n",
|
||||
"plt.style.use('./deeplearning.mplstyle')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Gradient Descent\n",
|
||||
"Scikit-learn has a gradient descent regression model [sklearn.linear_model.SGDRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDRegressor.html#examples-using-sklearn-linear-model-sgdregressor). Like your previous implementation of gradient descent, this model performs best with normalized inputs. [sklearn.preprocessing.StandardScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html#sklearn.preprocessing.StandardScaler) will perform z-score normalization as in a previous lab. Here it is referred to as 'standard score'."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Load the data set"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"X_train, y_train = load_house_data()\n",
|
||||
"X_features = ['size(sqft)','bedrooms','floors','age']"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Scale/normalize the training data"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"scaler = StandardScaler()\n",
|
||||
"X_norm = scaler.fit_transform(X_train)\n",
|
||||
"print(f\"Peak to Peak range by column in Raw X:{np.ptp(X_train,axis=0)}\") \n",
|
||||
"print(f\"Peak to Peak range by column in Normalized X:{np.ptp(X_norm,axis=0)}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Create and fit the regression model"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"sgdr = SGDRegressor(max_iter=1000)\n",
|
||||
"sgdr.fit(X_norm, y_train)\n",
|
||||
"print(sgdr)\n",
|
||||
"print(f\"number of iterations completed: {sgdr.n_iter_}, number of weight updates: {sgdr.t_}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### View parameters\n",
|
||||
"Note, the parameters are associated with the *normalized* input data. The fit parameters are very close to those found in the previous lab with this data."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"b_norm = sgdr.intercept_\n",
|
||||
"w_norm = sgdr.coef_\n",
|
||||
"print(f\"model parameters: w: {w_norm}, b:{b_norm}\")\n",
|
||||
"print(f\"model parameters from previous lab: w: [110.56 -21.27 -32.71 -37.97], b: 363.16\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Make predictions\n",
|
||||
"Predict the targets of the training data. Use both the `predict` routine and compute using $w$ and $b$."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# make a prediction using sgdr.predict()\n",
|
||||
"y_pred_sgd = sgdr.predict(X_norm)\n",
|
||||
"# make a prediction using w,b. \n",
|
||||
"y_pred = np.dot(X_norm, w_norm) + b_norm \n",
|
||||
"print(f\"prediction using np.dot() and sgdr.predict match: {(y_pred == y_pred_sgd).all()}\")\n",
|
||||
"\n",
|
||||
"print(f\"Prediction on training set:\\n{y_pred[:4]}\" )\n",
|
||||
"print(f\"Target values \\n{y_train[:4]}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Plot Results\n",
|
||||
"Let's plot the predictions versus the target values."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# plot predictions and targets vs original features \n",
|
||||
"fig,ax=plt.subplots(1,4,figsize=(12,3),sharey=True)\n",
|
||||
"for i in range(len(ax)):\n",
|
||||
" ax[i].scatter(X_train[:,i],y_train, label = 'target')\n",
|
||||
" ax[i].set_xlabel(X_features[i])\n",
|
||||
" ax[i].scatter(X_train[:,i],y_pred,color=dlorange, label = 'predict')\n",
|
||||
"ax[0].set_ylabel(\"Price\"); ax[0].legend();\n",
|
||||
"fig.suptitle(\"target versus prediction using z-score normalized model\")\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Congratulations!\n",
|
||||
"In this lab you:\n",
|
||||
"- utilized an open-source machine learning toolkit, scikit-learn\n",
|
||||
"- implemented linear regression using gradient descent and feature normalization from that toolkit"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.8.10"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
@ -0,0 +1,945 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Ungraded Lab - Multiple Variable Gradient Descent\n",
|
||||
"\n",
|
||||
"In this ungraded lab, you will extend gradient descent to support multiple features. You will utilize mean normalization and alpha tuning to improve performance. You will also utilize a popular python numeric library, NumPy to efficiently store and manipulate data. For detailed descriptions and examples of routines used, see [Numpy Documentation](https://numpy.org/doc/stable/reference/)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Outline\n",
|
||||
"\n",
|
||||
"- [Exercise 01- Compute Gradient](#first)\n",
|
||||
"- [Exercise 02- Gradient Descent](#second)\n",
|
||||
"- [Exercise 03- Mean Normalization](#third)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import numpy as np\n",
|
||||
"import matplotlib.pyplot as plt\n",
|
||||
"import copy\n",
|
||||
"import math"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### 2.0 Problem Statement\n",
|
||||
"\n",
|
||||
"As in the previous two labs, you will use the motivating example of housing price prediction. The training dataset contains three examples with 4 features (size,bedrooms,floors and age) shown in the table below.\n",
|
||||
"\n",
|
||||
"We would like to build a linear regression model using these values so we can then predict the price for other houses - say, a house with 1200 feet$^2$, 3 bedrooms, 1 floor, 40 years old. \n",
|
||||
"\n",
|
||||
"### 2.1 Dataset: \n",
|
||||
"| Size (feet$^2$) | Number of Bedrooms | Number of floors | Age of Home | Price (1000s dollars) | \n",
|
||||
"| ----------------| ------------------- |----------------- |--------------|-------------- | \n",
|
||||
"| 2104 | 5 | 1 | 45 | 460 | \n",
|
||||
"| 1416 | 3 | 2 | 40 | 232 | \n",
|
||||
"| 852 | 2 | 1 | 35 | 178 | \n",
|
||||
"\n",
|
||||
"\n",
|
||||
"The lectures and equations describe $\\mathbf{X}$, $\\mathbf{y}$, $\\mathbf{w}$. In our code these are represented by variables:\n",
|
||||
"- `X_orig` represents input variables, also called input features. In previous labs, there was just one feature, now there are four. `X_train` is the data set extended with a column of ones.\n",
|
||||
"- `y_train` represents output variables, also known as target variables (in this case - Price (1000s of dollars)). \n",
|
||||
"- `w_init` represents our parameters. \n",
|
||||
"- `dw` represents our gradient. A naming convention we will use in code when referring to gradients is to infer the dJ(w) and name variables for the parameter. For example, $\\frac{\\partial J(\\mathbf{w})}{\\partial w_0}$ might be `dw0`. `dw` is the gradient vector.\n",
|
||||
"- `tmp_` is prepended to some global variable names to prevent naming conflicts.\n",
|
||||
"\n",
|
||||
"We will pick up where we left off in the last notebook. Run the following to initialize our variables."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Load data set\n",
|
||||
"X_orig = np.array([[2104,5,1,45], [1416,3,2,40], [852,2,1,35]])\n",
|
||||
"y_train = np.array([460,232,178]).reshape(-1,1) #reshape creates (m,1) matrix\n",
|
||||
"\n",
|
||||
"#extend X_orig with column of ones\n",
|
||||
"tmp_ones = np.ones((3,1), dtype=np.int64) #dtype just added to keep examples neat.. not required\n",
|
||||
"X_train = np.concatenate([tmp_ones, X_orig], axis=1)\n",
|
||||
"\n",
|
||||
"# initialize parameters to near optimal value for development\n",
|
||||
"w_init = np.array([ 785.1811367994083, 0.39133535, 18.75376741, \n",
|
||||
" -53.36032453, -26.42131618]).reshape(-1,1)\n",
|
||||
"print(f\"X shape: {X_train.shape}, w_shape: {w_init.shape}, y_shape: {y_train.shape}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Gradient Descent Review\n",
|
||||
"In lecture, gradient descent was described as:\n",
|
||||
"\n",
|
||||
"$$\\begin{align*}& \\text{repeat until convergence:} \\; \\lbrace \\newline \\; & w_j := w_j - \\alpha \\frac{\\partial J(\\mathbf{w})}{\\partial w_j} \\tag{1} \\; & \\text{for j := 0..n}\\newline & \\rbrace\\end{align*}$$\n",
|
||||
"where, parameters $w_j$ are all updated simultaniously and where \n",
|
||||
"$$\n",
|
||||
"\\frac{\\partial J(\\mathbf{w})}{\\partial w_j} := \\frac{1}{m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w}}(\\mathbf{x}^{(i)}) - \\mathbf{y}^{(i)})x_{j}^{(i)} \\tag{2}\n",
|
||||
"$$\n",
|
||||
"where \n",
|
||||
"$$ f_{\\mathbf{w}}(\\mathbf{x}) = w_0 + w_1x_1 + ... + w_nx_n \\tag{3}$$"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name='first'></a>\n",
|
||||
"## Exercise 1\n",
|
||||
"We will implement a batch gradient descent algorithm for multiple variables. We'll need three functions. \n",
|
||||
"- compute_gradient implementing equation (2) above\n",
|
||||
" - **we will do two versions** of this, one using loops, the other using linear algebra\n",
|
||||
"- compute_cost.\n",
|
||||
"- gradient_descent, utilizing compute_gradient and compute_cost, runs the iterative algorithm to find the parameters with the lowest cost."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### compute_gradient using looping\n",
|
||||
"Please extend the algorithm developed in Lab3 to support multiple variables and use NumPy. Implement equation (2) above for all $w_j$."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<details>\n",
|
||||
"<summary>\n",
|
||||
" <font size='3', color='darkgreen'><b>Hints</b></font>\n",
|
||||
"</summary>\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"def compute_gradient(X, y, w): \n",
|
||||
" \"\"\"\n",
|
||||
" Computes the gradient for linear regression \n",
|
||||
" \n",
|
||||
" Args:\n",
|
||||
" X : (array_like Shape (m,)) variable such as house size \n",
|
||||
" y : (array_like Shape (m,)) actual value \n",
|
||||
" w : (array_like Shape (2,)) Initial values of parameters of the model \n",
|
||||
" Returns\n",
|
||||
" dw: (array_like Shape (2,)) The gradient of the cost w.r.t. the parameters w. \n",
|
||||
" Note that dw has the same dimensions as w.\n",
|
||||
" \"\"\"\n",
|
||||
" m,n = X.shape\n",
|
||||
" dw = np.zeros((n,1))\n",
|
||||
" ### START CODE HERE ### \n",
|
||||
" for j in range(n):\n",
|
||||
" for i in range(m):\n",
|
||||
" f_w = 0\n",
|
||||
" for k in range(n):\n",
|
||||
" f_w = f_w + w[k]*X[i][k]\n",
|
||||
" dw[j] = dw[j] + (f_w-y[i])*X[i][j] \n",
|
||||
" dw[j] = dw[j]/m\n",
|
||||
" ### END CODE HERE ### \n",
|
||||
" \n",
|
||||
" return dw\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def compute_gradient(X, y, w): \n",
|
||||
" \"\"\"\n",
|
||||
" Computes the gradient for linear regression \n",
|
||||
" \n",
|
||||
" Args:\n",
|
||||
" X : (array_like Shape (m,)) variable such as house size \n",
|
||||
" y : (array_like Shape (m,)) actual value \n",
|
||||
" w : (array_like Shape (2,)) Initial values of parameters of the model \n",
|
||||
" Returns\n",
|
||||
" dw: (array_like Shape (2,)) The gradient of the cost w.r.t. the parameters w. \n",
|
||||
" Note that dw has the same dimensions as w.\n",
|
||||
" \"\"\"\n",
|
||||
" m,n = X.shape\n",
|
||||
" dw = np.zeros((n,1))\n",
|
||||
" ### START CODE HERE ### \n",
|
||||
"\n",
|
||||
" ### END CODE HERE ### \n",
|
||||
" \n",
|
||||
" return dw"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#Compute and display gradient \n",
|
||||
"initial_w = w_init\n",
|
||||
"grad = compute_gradient(X_train, y_train, initial_w)\n",
|
||||
"print('Gradient at initial w :\\n', grad)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<details>\n",
|
||||
"<summary>\n",
|
||||
" <b>**Expected Output**:</b>\n",
|
||||
"</summary>\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"Gradient at initial w :\n",
|
||||
" [[-1.67392519e-06]\n",
|
||||
" [-2.72623590e-03]\n",
|
||||
" [-6.27197293e-06]\n",
|
||||
" [-2.21745582e-06]\n",
|
||||
" [-6.92403412e-05]]\n",
|
||||
"\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Compute Gradient using Matrices\n",
|
||||
"In this section, we will implement the gradient calculation using matrices and vectors. _If you are familiar with linear algebra, you may want to skip the explanation and try it yourself first_.\n",
|
||||
"When dealing with multi-step matrix calculations, its helpful to do 'dimensional analysis'. The diagram below details the operations involved in calculating the gradient and the dimensions of the matrices involved."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Prediction: $\\mathbf{f}_{\\mathbf{w}}(\\mathbf{X})$\n",
|
||||
"- This is the model's prediction for _all examples_. As in previous labs, this calculated : $\\mathbf{f}_{\\mathbf{w}}(\\mathbf{X}) = \\mathbf{X}\\mathbf{w}$"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"tmp_f_w = X_train @ w_init\n",
|
||||
"print(f\"The model prediction for our training set is:\")\n",
|
||||
"print(tmp_f_w)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Error, e: $\\mathbf{f}_{\\mathbf{w}}(\\mathbf{X}) - \\mathbf{y}$\n",
|
||||
" - This is the difference between the model prediction and the actual value of y for all training examples.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"tmp_e = tmp_f_w - y_train\n",
|
||||
"print(\"Error\")\n",
|
||||
"print(tmp_e)\n",
|
||||
"print(f\"Error shape: {tmp_e.shape}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Gradient: $\\nabla_{\\mathbf{w}}\\mathbf{J}$\n",
|
||||
"- $\\nabla_{\\mathbf{w}}\\mathbf{J}$ is the gradient of $\\mathbf{J}$ with respect to $w$ in matrix form. The upside down triagle $\\nabla$ is the symbol for graident. More simply, the result of equation 4 above for all parameters $\\mathbf{w}$\n",
|
||||
"- $\\nabla_{\\mathbf{w}}\\mathbf{J} := \\frac{1}{m}(\\mathbf{X}^T \\mathbf{e} )$\n",
|
||||
"- Each element of this vector describes how the cost $\\mathbf{J}(\\mathbf{w})$ changes with respect to one parameter, $w_j$. For example, first element describes how the cost change relative to $w_0$. We will use this to determine if we should increase or decrease the parameter to decrease the cost."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"tmp_m,_ = X_train.shape\n",
|
||||
"tmp_dw = (1/tmp_m) * (X_train.T @ tmp_e) \n",
|
||||
"print(\"gradient\")\n",
|
||||
"print(tmp_dw)\n",
|
||||
"print(f\"gradient shape: {tmp_dw.shape}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Utilize the equations above to implement `compute_gradient_m`, the matrix version."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<details>\n",
|
||||
"<summary>\n",
|
||||
" <font size='3', color='darkgreen'><b>Hints</b></font>\n",
|
||||
"</summary>\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"def compute_gradient_m(X, y, w): \n",
|
||||
" \"\"\"\n",
|
||||
" Computes the gradient for linear regression \n",
|
||||
" \n",
|
||||
" Args:\n",
|
||||
" X : (array_like Shape (m,)) variable such as house size \n",
|
||||
" y : (array_like Shape (m,)) actual value \n",
|
||||
" w : (array_like Shape (2,)) Initial values of parameters of the model \n",
|
||||
" Returns\n",
|
||||
" dw: (array_like Shape (2,)) The gradient of the cost w.r.t. the parameters w. \n",
|
||||
" Note that dw has the same dimensions as w.\n",
|
||||
" \"\"\"\n",
|
||||
" m,n = X.shape\n",
|
||||
" ### START CODE HERE ### \n",
|
||||
" f_w = X @ w\n",
|
||||
" e = f_w - y\n",
|
||||
" dw = (1/m) * (X.T @ e)\n",
|
||||
" ### END CODE HERE ### \n",
|
||||
" \n",
|
||||
" return dw\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def compute_gradient_m(X, y, w): \n",
|
||||
" \"\"\"\n",
|
||||
" Computes the gradient for linear regression \n",
|
||||
" \n",
|
||||
" Args:\n",
|
||||
" X : (array_like Shape (m,)) variable such as house size \n",
|
||||
" y : (array_like Shape (m,)) actual value \n",
|
||||
" w : (array_like Shape (2,)) Initial values of parameters of the model \n",
|
||||
" Returns\n",
|
||||
" dw: (array_like Shape (2,)) The gradient of the cost w.r.t. the parameters w. \n",
|
||||
" Note that dw has the same dimensions as w.\n",
|
||||
" \"\"\"\n",
|
||||
" m,n = X.shape\n",
|
||||
" ### START CODE HERE ### \n",
|
||||
"\n",
|
||||
" ### END CODE HERE ### \n",
|
||||
" \n",
|
||||
" return dw"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#Compute and display gradient USING compute_gradeint_m version\n",
|
||||
"initial_w = w_init\n",
|
||||
"grad = compute_gradient_m(X_train, y_train, initial_w)\n",
|
||||
"print('Gradient at initial w :\\n', grad)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<details>\n",
|
||||
"<summary>\n",
|
||||
" <b>**Expected Output**:</b>\n",
|
||||
"</summary>\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"Gradient at initial w :\n",
|
||||
" [[-1.67392519e-06]\n",
|
||||
" [-2.72623590e-03]\n",
|
||||
" [-6.27197293e-06]\n",
|
||||
" [-2.21745582e-06]\n",
|
||||
" [-6.92403412e-05]]\n",
|
||||
"\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Learning parameters using batch gradient descent \n",
|
||||
"\n",
|
||||
"You will now find the optimal parameters of a linear regression model by implementing batch gradient descent. You can use Lab3 as a guide. \n",
|
||||
"\n",
|
||||
"- A good way to verify that gradient descent is working correctly is to look\n",
|
||||
"at the value of $J(\\mathbf{w})$ and check that it is decreasing with each step. \n",
|
||||
"\n",
|
||||
"- Assuming you have implemented the gradient and computed the cost correctly, your value of $J(\\mathbf{w})$ should never increase, and should converge to a steady value by the end of the algorithm."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# provide routine to compute cost from Lab5\n",
|
||||
"def compute_cost(X, y, w, verbose=False):\n",
|
||||
" m,n = X.shape\n",
|
||||
" f_w = X @ w \n",
|
||||
" total_cost = (1/(2*m)) * np.sum((f_w-y)**2)\n",
|
||||
" return total_cost "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name='second'></a>\n",
|
||||
"## Exercise 2 Implement gradient_descent:\n",
|
||||
"- Looping `num_iters` number of times\n",
|
||||
" - calculate the gradient\n",
|
||||
" - update the parameters using equation (1) above\n",
|
||||
"return the updated parameters"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<details>\n",
|
||||
"<summary>\n",
|
||||
" <font size='3', color='darkgreen'><b>Hints</b></font>\n",
|
||||
"</summary>\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"def gradient_descent(X, y, w_in, cost_function, gradient_function, alpha, num_iters): \n",
|
||||
" \"\"\"\n",
|
||||
" Performs batch gradient descent to learn theta. Updates theta by taking \n",
|
||||
" num_iters gradient steps with learning rate alpha\n",
|
||||
" \n",
|
||||
" Args:\n",
|
||||
" X : (array_like Shape (m,)\n",
|
||||
" y : (array_like Shape (m,) )\n",
|
||||
" w_in : (array_like Shape (2,)) Initial values of parameters of the model\n",
|
||||
" cost_function: function to compute cost\n",
|
||||
" gradient_function: function to compute the gradient\n",
|
||||
" alpha : (float) Learning rate\n",
|
||||
" num_iters : (int) number of iterations to run gradient descent\n",
|
||||
" Returns\n",
|
||||
" w : (array_like Shape (2,)) Updated values of parameters of the model after\n",
|
||||
" running gradient descent\n",
|
||||
" \"\"\"\n",
|
||||
" \n",
|
||||
" # number of training examples\n",
|
||||
" m = len(X)\n",
|
||||
" \n",
|
||||
" # An array to store cost J and w's at each iteration primarily for graphing later\n",
|
||||
" J_history = []\n",
|
||||
" w_history = []\n",
|
||||
" w = copy.deepcopy(w_in) #avoid modifying global w within function\n",
|
||||
" \n",
|
||||
" for i in range(num_iters):\n",
|
||||
" \n",
|
||||
" # Calculate the gradient and update the parameters\n",
|
||||
" gradient = gradient_function(X, y, w)\n",
|
||||
"\n",
|
||||
" # Update Parameters \n",
|
||||
" w = w - alpha * gradient\n",
|
||||
" \n",
|
||||
" # Save cost J at each iteration\n",
|
||||
" if i<100000: # prevent resource exhaustion \n",
|
||||
" J_history.append( compute_cost(X, y, w))\n",
|
||||
"\n",
|
||||
" # Print cost every at intervals 10 times or as many iterations if < 10\n",
|
||||
" if i% math.ceil(num_iters/10) == 0:\n",
|
||||
" w_history.append(w)\n",
|
||||
" print(f\"Iteration {i:4}: Cost {J_history[-1]:8.2f} \")\n",
|
||||
" \n",
|
||||
" return w, J_history, w_history #return w and J,w history for graphing\n",
|
||||
" ```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def gradient_descent(X, y, w_in, cost_function, gradient_function, alpha, num_iters): \n",
|
||||
" \"\"\"\n",
|
||||
" Performs batch gradient descent to learn theta. Updates theta by taking \n",
|
||||
" num_iters gradient steps with learning rate alpha\n",
|
||||
" \n",
|
||||
" Args:\n",
|
||||
" X : (array_like Shape (m,)\n",
|
||||
" y : (array_like Shape (m,) )\n",
|
||||
" w_in : (array_like Shape (2,)) Initial values of parameters of the model\n",
|
||||
" cost_function: function to compute cost\n",
|
||||
" gradient_function: function to compute the gradient\n",
|
||||
" alpha : (float) Learning rate\n",
|
||||
" num_iters : (int) number of iterations to run gradient descent\n",
|
||||
" Returns\n",
|
||||
" w : (array_like Shape (2,)) Updated values of parameters of the model after\n",
|
||||
" running gradient descent\n",
|
||||
" \"\"\"\n",
|
||||
" \n",
|
||||
" # number of training examples\n",
|
||||
" m = len(X)\n",
|
||||
" \n",
|
||||
" # An array to store cost J and w's at each iteration primarily for graphing later\n",
|
||||
" J_history = []\n",
|
||||
" w_history = []\n",
|
||||
" w = copy.deepcopy(w_in) #avoid modifying global w within function\n",
|
||||
" \n",
|
||||
" for i in range(num_iters):\n",
|
||||
" \n",
|
||||
" ### START CODE HERE ### \n",
|
||||
"\n",
|
||||
" # Calculate the gradient and update the parameters\n",
|
||||
"\n",
|
||||
"\n",
|
||||
" # Update Parameters \n",
|
||||
"\n",
|
||||
"\n",
|
||||
" ### END CODE HERE ### \n",
|
||||
" \n",
|
||||
" # Save cost J at each iteration\n",
|
||||
" if i<100000: # prevent resource exhaustion \n",
|
||||
" J_history.append( compute_cost(X, y, w))\n",
|
||||
"\n",
|
||||
" # Print cost every at intervals 10 times or as many iterations if < 10\n",
|
||||
" if i% math.ceil(num_iters/10) == 0:\n",
|
||||
" w_history.append(w)\n",
|
||||
" print(f\"Iteration {i:4}: Cost {J_history[-1]:8.2f} \")\n",
|
||||
" \n",
|
||||
" return w, J_history, w_history #return w and J,w history for graphing"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In the next cell we will test your implementation. Be sure to select your preferred compute_gradient function."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# initialize parameters\n",
|
||||
"initial_w = np.zeros_like(w_init) \n",
|
||||
"# some gradient descent settings\n",
|
||||
"iterations = 1000\n",
|
||||
"alpha = 5.0e-7\n",
|
||||
"# run gradient descent - CHOOSE WHICH COMPUTE_GRADIENT TO RUN\n",
|
||||
"w_final, J_hist, w_hist = gradient_descent(X_train ,y_train, initial_w, compute_cost, \n",
|
||||
" compute_gradient, alpha, iterations)\n",
|
||||
"#w_final, J_hist, w_hist = gradient_descent(X_train ,y_train, initial_w, compute_cost, \n",
|
||||
"# compute_gradient_m, alpha, iterations)\n",
|
||||
"print(f\"w found by gradient descent: \")\n",
|
||||
"print(w_final)\n",
|
||||
"print(f\"predictions on training set\")\n",
|
||||
"print(X_train @ w_final)\n",
|
||||
"print(f\"actual values y_train \")\n",
|
||||
"print(y_train)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<details>\n",
|
||||
"<summary>\n",
|
||||
" <b>**Expected Output**:</b>\n",
|
||||
"</summary>\n",
|
||||
"\n",
|
||||
" ```\n",
|
||||
"Iteration 0: Cost 2529.46 \n",
|
||||
"Iteration 100: Cost 695.99 \n",
|
||||
"Iteration 200: Cost 694.92 \n",
|
||||
"Iteration 300: Cost 693.86 \n",
|
||||
"Iteration 400: Cost 692.81 \n",
|
||||
"Iteration 500: Cost 691.77 \n",
|
||||
"Iteration 600: Cost 690.73 \n",
|
||||
"Iteration 700: Cost 689.71 \n",
|
||||
"Iteration 800: Cost 688.70 \n",
|
||||
"Iteration 900: Cost 687.69 \n",
|
||||
"w found by gradient descent: \n",
|
||||
"[[-0.00223541]\n",
|
||||
" [ 0.20396569]\n",
|
||||
" [ 0.00374919]\n",
|
||||
" [-0.0112487 ]\n",
|
||||
" [-0.0658614 ]]\n",
|
||||
"predictions on training set\n",
|
||||
"[[426.18530497]\n",
|
||||
" [286.16747201]\n",
|
||||
" [171.46763087]]\n",
|
||||
"actual values y_train \n",
|
||||
"[[460]\n",
|
||||
" [232]\n",
|
||||
" [178]]\n",
|
||||
"\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# plot cost vs iteration \n",
|
||||
"plt.plot(J_hist)\n",
|
||||
"plt.title(\"Cost vs iteration\")\n",
|
||||
"plt.ylabel('Cost')\n",
|
||||
"plt.xlabel('iteration step')\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"*These results are not inspiring*! As in Lab 3, we have run into a situation where the mismatch in scaling between our features makes it difficult to converge. The next section will help."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Feature Scaling or Mean Normalization\n",
|
||||
"\n",
|
||||
"We can speed up gradient descent by having each of our input values in roughly the same range. This is because the speed $\\mathbf{w}$ changes depends of the range of the input features. In our example, we have the sqft feature which is 3 orders of magnitude larger than the number of bedroom features. This doesn't allow a single alpha value to be set appropriately for all features. The way to prevent this is to modify the ranges of our input variables so that they are all roughly the same. Ideally around: \n",
|
||||
"$$ -1 <= x_{(i)} <= 1 \\;\\; or \\;\\; -0.5 <= x_{(i)} <= 0.5 $$\n",
|
||||
"\n",
|
||||
"Two techniques to help with this are feature scaling and mean normalization. \n",
|
||||
"**Feature scaling** involves dividing the input values by the range (i.e. the maximum value minus the minimum value) of the input variable, resulting in a new range of just 1. \n",
|
||||
"**Mean normalization** involves subtracting the average value for an input variable from the values for that input variable resulting in a new average value for the input variable of just zero. \n",
|
||||
"In this lab we will implement _mean normalization_.\n",
|
||||
"\n",
|
||||
"To implement mean normalization, adjust your input values as shown in this formula:\n",
|
||||
"$$x_i := \\dfrac{x_i - \\mu_i}{\\sigma_i} \\tag{4}$$ \n",
|
||||
"where $i$ selects a feature or a column in our X matrix. $µ_i$ is the average of all the values for feature (i) and $\\sigma_i$ is the standard deviation over feature (i).\n",
|
||||
"\n",
|
||||
"_Usage details_: Once a model is trained with scaled features, all inputs to predictions using that model will also need to be scaled. The model targets, `y_train`, are not scaled. The resulting parameters `w` will naturally be different than those in the unscaled model. \n",
|
||||
"Clearly you don't want to scale the $x_0$ values which we have set to one. We will scale the original data and then add a column of ones.\n",
|
||||
"\n",
|
||||
"<a name='third'></a>\n",
|
||||
"### Exercise 3 Mean Normalization\n",
|
||||
"Write a function that will accept our training data and return a mean normalized version by implementing equation (4) above. You may want to use `np.mean()`, `np.std()`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<details>\n",
|
||||
"<summary>\n",
|
||||
" <font size='3', color='darkgreen'><b>Hints</b></font>\n",
|
||||
"</summary>\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
" def mean_normalize_features(X):\n",
|
||||
" \"\"\"\n",
|
||||
" returns mean normalized X by column\n",
|
||||
" Args:\n",
|
||||
" X : (numpy array (m,n)) \n",
|
||||
" Returns\n",
|
||||
" X_norm: (numpy array (m,n)) input normalized by column\n",
|
||||
" \"\"\"\n",
|
||||
" mu = np.mean(X,axis=0) \n",
|
||||
" sigma = np.std(X,axis=0)\n",
|
||||
" X_norm = (X - mu)/sigma # fancy numpy broadcasting makes these look easy\n",
|
||||
" return(X_norm)\n",
|
||||
"\n",
|
||||
"#check our work\n",
|
||||
"#from sklearn.preprocessing import scale\n",
|
||||
"#scale(X_orig, axis=0, with_mean=True, with_std=True, copy=True)\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def mean_normalize_features(X):\n",
|
||||
" \"\"\"\n",
|
||||
" returns mean normalized X by column\n",
|
||||
" Args:\n",
|
||||
" X : (numpy array (m,n)) \n",
|
||||
" Returns\n",
|
||||
" X_norm: (numpy array (m,n)) input normalized by column\n",
|
||||
" \"\"\"\n",
|
||||
" #~ 3 lines if implemented using matrices\n",
|
||||
" ### START CODE HERE ### \n",
|
||||
"\n",
|
||||
" ### END CODE HERE ### \n",
|
||||
"\n",
|
||||
" return(X_norm)\n",
|
||||
" \n",
|
||||
"#check our work\n",
|
||||
"#from sklearn.preprocessing import scale\n",
|
||||
"#scale(X_orig, axis=0, with_mean=True, with_std=True, copy=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print(\"Original data:\")\n",
|
||||
"print(X_orig)\n",
|
||||
"print(\"normalized data\")\n",
|
||||
"print(mean_normalize_features(X_orig))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<details>\n",
|
||||
"<summary>\n",
|
||||
" <b>**Expected Output**:</b>\n",
|
||||
"</summary>\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"Original data:\n",
|
||||
"[[2104 5 1 45]\n",
|
||||
" [1416 3 2 40]\n",
|
||||
" [ 852 2 1 35]]\n",
|
||||
"normalized data\n",
|
||||
"[[ 1.26311506 1.33630621 -0.70710678 1.22474487]\n",
|
||||
" [-0.08073519 -0.26726124 1.41421356 0. ]\n",
|
||||
" [-1.18237987 -1.06904497 -0.70710678 -1.22474487]]\n",
|
||||
"```\n",
|
||||
"Note the values in each normalized column."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's now normalize our original data and re-run our gradient descent algorithm."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# normalize the original features\n",
|
||||
"X_norm = mean_normalize_features(X_orig)\n",
|
||||
"\n",
|
||||
"# add the column of ones and create scaled training set\n",
|
||||
"tmp_ones = np.ones((3,1), dtype=np.int64) #dtype just added to keep examples neat.. not required\n",
|
||||
"X_train_s = np.concatenate([tmp_ones, X_norm], axis=1)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Note the **vastly larger value of alpha**. This will speed descent."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# initialize parameters\n",
|
||||
"initial_w = np.zeros_like(w_init) \n",
|
||||
"# some gradient descent settings\n",
|
||||
"iterations = 1000\n",
|
||||
"alpha = 1.0e-2\n",
|
||||
"# run gradient descent\n",
|
||||
"w_final, J_hist, w_hist = gradient_descent(X_train_s ,y_train, initial_w, \n",
|
||||
" compute_cost, compute_gradient_m, alpha, iterations)\n",
|
||||
"print(f\"w found by gradient descent: \")\n",
|
||||
"print(w_final)\n",
|
||||
"print(f\"predictions on training set\")\n",
|
||||
"print(X_train_s @ w_final)\n",
|
||||
"print(f\"actual values y_train \")\n",
|
||||
"print(y_train)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<details>\n",
|
||||
"<summary>\n",
|
||||
" <b>**Expected Output**:</b>\n",
|
||||
"</summary>\n",
|
||||
" \n",
|
||||
"```\n",
|
||||
"Iteration 0: Cost 48254.77 \n",
|
||||
"Iteration 100: Cost 5582.45 \n",
|
||||
"Iteration 200: Cost 745.80 \n",
|
||||
"Iteration 300: Cost 99.90 \n",
|
||||
"Iteration 400: Cost 13.38 \n",
|
||||
"Iteration 500: Cost 1.79 \n",
|
||||
"Iteration 600: Cost 0.24 \n",
|
||||
"Iteration 700: Cost 0.03 \n",
|
||||
"Iteration 800: Cost 0.00 \n",
|
||||
"Iteration 900: Cost 0.00 \n",
|
||||
"w found by gradient descent: \n",
|
||||
"[[289.98748034]\n",
|
||||
" [ 38.05168398]\n",
|
||||
" [ 41.54320558]\n",
|
||||
" [-30.98791712]\n",
|
||||
" [ 36.34190238]]\n",
|
||||
"predictions on training set\n",
|
||||
"[[459.98690403]\n",
|
||||
" [231.98894904]\n",
|
||||
" [177.98658794]]\n",
|
||||
"actual values y_train \n",
|
||||
"[[460]\n",
|
||||
" [232]\n",
|
||||
" [178]]\n",
|
||||
"\n",
|
||||
"```\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The scaled features get very accurate results much faster!"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# plot cost vs iteration \n",
|
||||
"plt.plot(J_hist)\n",
|
||||
"plt.title(\"Cost vs iteration\")\n",
|
||||
"plt.ylabel('Cost')\n",
|
||||
"plt.xlabel('iteration step')\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Scale by the learning rate: $\\alpha$\n",
|
||||
"- $\\alpha$ is a positive number smaller than 1 that reduces the magnitude of the update to be smaller than the actual gradient.\n",
|
||||
"- Try varying the learning rate in the example above. Is there a value where it diverges rather than converging?\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"tmp_alpha = 0.01\n",
|
||||
"print(f\"Learning rate alpha: {tmp_alpha}\")\n",
|
||||
"\n",
|
||||
"tmp_gradient = np.array([1,2]).reshape(-1,1)\n",
|
||||
"print(\"Gradient before scaling by the learning rate:\")\n",
|
||||
"print(tmp_gradient)\n",
|
||||
"print()\n",
|
||||
"\n",
|
||||
"gradient_scaled_by_learning_rate = tmp_alpha * tmp_gradient\n",
|
||||
"print(\"Gradient after scaling by the learning rate\")\n",
|
||||
"print(gradient_scaled_by_learning_rate)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"- Subtract the gradient: $-$\n",
|
||||
" - Recall that the gradient points in the direction that would INCREASE the cost. \n",
|
||||
" - Negative one multiplied by the gradient will point in the direction that REDUCES the cost.\n",
|
||||
" - So, to update the weight in the direction that reduces the cost, subtract the gradient."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"direction_of_update = -1 * gradient_scaled_by_learning_rate\n",
|
||||
"print(\"The direction to update the parameter vector\")\n",
|
||||
"print(direction_of_update)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.8.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
@ -0,0 +1,305 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Optional Lab: Linear Regression using Scikit-Learn"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"There is an open-source, commercially usable machine learning toolkit called [scikit-learn](https://scikit-learn.org/stable/index.html). This toolkit contains implementations of many of the algorithms that you will work with in this course.\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Goals\n",
|
||||
"In this lab you will:\n",
|
||||
"- Utilize scikit-learn to implement linear regression using a close form solution based on the normal equation"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Tools\n",
|
||||
"You will utilize functions from scikit-learn as well as matplotlib and NumPy. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import numpy as np\n",
|
||||
"np.set_printoptions(precision=2)\n",
|
||||
"from sklearn.linear_model import LinearRegression, SGDRegressor\n",
|
||||
"from sklearn.preprocessing import StandardScaler\n",
|
||||
"from lab_utils_multi import load_house_data\n",
|
||||
"import matplotlib.pyplot as plt\n",
|
||||
"dlblue = '#0096ff'; dlorange = '#FF9300'; dldarkred='#C00000'; dlmagenta='#FF40FF'; dlpurple='#7030A0'; \n",
|
||||
"plt.style.use('./deeplearning.mplstyle')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40291_2\"></a>\n",
|
||||
"# Linear Regression, closed-form solution\n",
|
||||
"Scikit-learn has the [linear regression model](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression) which implements a closed-form linear regression.\n",
|
||||
"\n",
|
||||
"Let's use the data from the early labs - a house with 1000 square feet sold for \\\\$300,000 and a house with 2000 square feet sold for \\\\$500,000.\n",
|
||||
"\n",
|
||||
"| Size (1000 sqft) | Price (1000s of dollars) |\n",
|
||||
"| ----------------| ------------------------ |\n",
|
||||
"| 1 | 300 |\n",
|
||||
"| 2 | 500 |\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Load the data set"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"X_train = np.array([1.0, 2.0]) #features\n",
|
||||
"y_train = np.array([300, 500]) #target value"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Create and fit the model\n",
|
||||
"The code below performs regression using scikit-learn. \n",
|
||||
"The first step creates a regression object. \n",
|
||||
"The second step utilizes one of the methods associated with the object, `fit`. This performs regression, fitting the parameters to the input data. The toolkit expects a two-dimensional X matrix."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"LinearRegression()"
|
||||
]
|
||||
},
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"linear_model = LinearRegression()\n",
|
||||
"#X must be a 2-D Matrix\n",
|
||||
"linear_model.fit(X_train.reshape(-1, 1), y_train) "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### View Parameters \n",
|
||||
"The $\\mathbf{w}$ and $\\mathbf{b}$ parameters are referred to as 'coefficients' and 'intercept' in scikit-learn."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"w = [200.], b = 100.00\n",
|
||||
"'manual' prediction: f_wb = wx+b : [240100.]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"b = linear_model.intercept_\n",
|
||||
"w = linear_model.coef_\n",
|
||||
"print(f\"w = {w:}, b = {b:0.2f}\")\n",
|
||||
"print(f\"'manual' prediction: f_wb = wx+b : {1200*w + b}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Make Predictions\n",
|
||||
"\n",
|
||||
"Calling the `predict` function generates predictions."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Prediction on training set: [300. 500.]\n",
|
||||
"Prediction for 1200 sqft house: $240100.00\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"y_pred = linear_model.predict(X_train.reshape(-1, 1))\n",
|
||||
"\n",
|
||||
"print(\"Prediction on training set:\", y_pred)\n",
|
||||
"\n",
|
||||
"X_test = np.array([[1200]])\n",
|
||||
"print(f\"Prediction for 1200 sqft house: ${linear_model.predict(X_test)[0]:0.2f}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Second Example\n",
|
||||
"The second example is from an earlier lab with multiple features. The final parameter values and predictions are very close to the results from the un-normalized 'long-run' from that lab. That un-normalized run took hours to produce results, while this is nearly instantaneous. The closed-form solution work well on smaller data sets such as these but can be computationally demanding on larger data sets. \n",
|
||||
">The closed-form solution does not require normalization."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# load the dataset\n",
|
||||
"X_train, y_train = load_house_data()\n",
|
||||
"X_features = ['size(sqft)','bedrooms','floors','age']"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"LinearRegression()"
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"linear_model = LinearRegression()\n",
|
||||
"linear_model.fit(X_train, y_train) "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"w = [ 0.27 -32.62 -67.25 -1.47], b = 220.42\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"b = linear_model.intercept_\n",
|
||||
"w = linear_model.coef_\n",
|
||||
"print(f\"w = {w:}, b = {b:0.2f}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Prediction on training set:\n",
|
||||
" [295.18 485.98 389.52 492.15]\n",
|
||||
"prediction using w,b:\n",
|
||||
" [295.18 485.98 389.52 492.15]\n",
|
||||
"Target values \n",
|
||||
" [300. 509.8 394. 540. ]\n",
|
||||
" predicted price of a house with 1200 sqft, 3 bedrooms, 1 floor, 40 years old = $318709.09\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(f\"Prediction on training set:\\n {linear_model.predict(X_train)[:4]}\" )\n",
|
||||
"print(f\"prediction using w,b:\\n {(X_train @ w + b)[:4]}\")\n",
|
||||
"print(f\"Target values \\n {y_train[:4]}\")\n",
|
||||
"\n",
|
||||
"x_house = np.array([1200, 3,1, 40]).reshape(-1,4)\n",
|
||||
"x_house_predict = linear_model.predict(x_house)[0]\n",
|
||||
"print(f\" predicted price of a house with 1200 sqft, 3 bedrooms, 1 floor, 40 years old = ${x_house_predict*1000:0.2f}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Congratulations!\n",
|
||||
"In this lab you:\n",
|
||||
"- utilized an open-source machine learning toolkit, scikit-learn\n",
|
||||
"- implemented linear regression using a close-form solution from that toolkit"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
@ -0,0 +1,282 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Ungraded Lab - Normal Equations \n",
|
||||
"\n",
|
||||
"In the lecture videos, you learned that the closed-form solution to linear regression is\n",
|
||||
"\n",
|
||||
"\\begin{equation*}\n",
|
||||
"w = (X^TX)^{-1}X^Ty \\tag{1}\n",
|
||||
"\\end{equation*}\n",
|
||||
"\n",
|
||||
"Using this formula does not require any feature scaling, and you will get an exact solution in one calculation: there is no “loop until convergence” like in gradient descent.\n",
|
||||
"\n",
|
||||
"This lab makes extensive use of linear algebra. It is not required for the course, but the solutions are provided and completing it may improve your familiarity with the subject. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import numpy as np"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Dataset\n",
|
||||
"\n",
|
||||
"You will again use the motivating example of housing price prediction as in the last few labs. The training dataset contains three examples with 4 features (size, bedrooms, floors and age) shown in the table below.\n",
|
||||
"\n",
|
||||
"| Size (feet$^2$) | Number of Bedrooms | Number of floors | Age of Home | Price (1000s dollars) | \n",
|
||||
"| ----------------| ------------------- |----------------- |--------------|-------------- | \n",
|
||||
"| 2104 | 5 | 1 | 45 | 460 | \n",
|
||||
"| 1416 | 3 | 2 | 40 | 232 | \n",
|
||||
"| 852 | 2 | 1 | 35 | 178 | \n",
|
||||
"\n",
|
||||
"We would like to build a linear regression model using these values so we can then predict the price for other houses - say, a house with 1200 feet$^2$, 3 bedrooms, 1 floor, 40 years old.\n",
|
||||
"\n",
|
||||
"Please run the following to load the data and extend X with a column of 1's."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Load data set\n",
|
||||
"X_orig = np.array([[2104,5,1,45], [1416,3,2,40], [852,2,1,35]])\n",
|
||||
"y_train = np.array([460,232,178]).reshape(-1,1) #reshape creates (m,1) matrix\n",
|
||||
"\n",
|
||||
"#extend X_orig with column of ones\n",
|
||||
"tmp_ones = np.ones((3,1), dtype=np.int64) #dtype just added to keep examples neat.. not required\n",
|
||||
"X_train = np.concatenate([tmp_ones, X_orig], axis=1)\n",
|
||||
"\n",
|
||||
"print(f\"X shape: {X_train.shape}, y_shape: {y_train.shape}\")\n",
|
||||
"print(X_train)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Exercise**\n",
|
||||
"\n",
|
||||
"Complete the code in the `normal_equation()` function below. Use the formula above to calculate $w$. Remember that while you don’t need to scale your features, we still need to add a column of 1’s to the original X matrix to have an intercept term $w_0$. \n",
|
||||
"\n",
|
||||
"**Hint**\n",
|
||||
"Look into `np.linalg.pinv()`, `np.transpose()` (also .T) and `np.dot()`. Be sure to use pinv or the pseudo inverse rather than inv."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<details>\n",
|
||||
" <summary><font size=\"2\" color=\"darkgreen\"><b>Hints</b></font></summary>\n",
|
||||
" \n",
|
||||
" \n",
|
||||
" def normal_equation(X, y): \n",
|
||||
"\n",
|
||||
" Computes the closed-form solution to linear \n",
|
||||
" regression using the normal equations.\n",
|
||||
" \n",
|
||||
" Parameters\n",
|
||||
" ----------\n",
|
||||
" X : array_like\n",
|
||||
" Shape (m,n)\n",
|
||||
" \n",
|
||||
" y: array_like\n",
|
||||
" Shape (m,)\n",
|
||||
" \n",
|
||||
" Returns\n",
|
||||
" -------\n",
|
||||
" w : array_like\n",
|
||||
" Shape (n,)\n",
|
||||
" Parameters computed by normal equation\n",
|
||||
" \n",
|
||||
" \n",
|
||||
" #(≈ 1 line of code)\n",
|
||||
" # w = \n",
|
||||
" w = np.linalg.pinv(X.T @ X) @ X.T @ y\n",
|
||||
" \n",
|
||||
" return w \n",
|
||||
"\n",
|
||||
"</details>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def normal_equation(X, y): \n",
|
||||
" \"\"\"\n",
|
||||
" Computes the closed-form solution to linear \n",
|
||||
" regression using the normal equations.\n",
|
||||
" \n",
|
||||
" Parameters\n",
|
||||
" ----------\n",
|
||||
" X : array_like\n",
|
||||
" Shape (m,n)\n",
|
||||
" \n",
|
||||
" y: array_like\n",
|
||||
" Shape (m,)\n",
|
||||
" \n",
|
||||
" Returns\n",
|
||||
" -------\n",
|
||||
" w : array_like\n",
|
||||
" Shape (n,)\n",
|
||||
" Parameters computed by normal equation\n",
|
||||
" \"\"\"\n",
|
||||
" \n",
|
||||
" #(≈ 1 line of code)\n",
|
||||
" # w = \n",
|
||||
"\n",
|
||||
" \n",
|
||||
" return w"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"w_normal = normal_equation(X_train, y_train)\n",
|
||||
"print(\"w found by normal equation:\")\n",
|
||||
"print(w_normal)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<details>\n",
|
||||
"<summary>\n",
|
||||
" <b>**Expected Output**:</b>\n",
|
||||
"</summary>\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"w found by normal equation:\n",
|
||||
"[[ 1.240339 ]\n",
|
||||
" [ 0.15440335]\n",
|
||||
" [ 23.47118976]\n",
|
||||
" [-65.69139736]\n",
|
||||
" [ 1.82734354]]\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now let's see what the prediction is on our training data."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"y_pred = X_train @ w_normal\n",
|
||||
"print(\"Prediction using computed w:\")\n",
|
||||
"print(y_pred)\n",
|
||||
"print(\"Our Target values for y:\")\n",
|
||||
"print(y_train)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<details>\n",
|
||||
"<summary>\n",
|
||||
" <b>**Expected Output**:</b>\n",
|
||||
"</summary>\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"Prediction using computed w:\n",
|
||||
"[[460.]\n",
|
||||
" [232.]\n",
|
||||
" [178.]]\n",
|
||||
"Our Target values for y:\n",
|
||||
"[[460]\n",
|
||||
" [232]\n",
|
||||
" [178]]\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Great! Now we have our parameters for our model. Let's try predicting the price of a house with 1200 feet^2, 3 bedrooms, 1 floor, 40 years old. We will manually add the 1's column."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"X_test = np.array([1,1200,3,1,40])\n",
|
||||
"\n",
|
||||
"y_pred = X_test @ w_normal\n",
|
||||
"print(\"our predicted price is: %.2f thousand dollars\" % y_pred)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<details>\n",
|
||||
"<summary>\n",
|
||||
" <b>**Expected Output**:</b>\n",
|
||||
"</summary>\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"our predicted price is: 264.34 thousand dollars\n",
|
||||
"```\n",
|
||||
"_seems a bit pricy.._"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.8.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
137
work/.ipynb_checkpoints/C1_W2_Lab08_Sklearn-checkpoint.ipynb
Normal file
@ -0,0 +1,137 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Ungraded Lab: Linear Regression using Scikit-Learn"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now that you've implemented linear regression from scratch, let's see you can train a linear regression model using scikit-learn.\n",
|
||||
"\n",
|
||||
"## Dataset \n",
|
||||
"Let's start with the same dataset as the first labs."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import numpy as np\n",
|
||||
"\n",
|
||||
"# X is the input variable (size in square feet)\n",
|
||||
"# y in the output variable (price in 1000s of dollars)\n",
|
||||
"X = np.array([1000, 2000])\n",
|
||||
"y = np.array([200, 400])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Fit the model\n",
|
||||
"\n",
|
||||
"The code below imports the [linear regression model](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression) from scikit-learn. You can fit this model on the training data by calling `fit` function."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from sklearn.linear_model import LinearRegression\n",
|
||||
"\n",
|
||||
"linear_model = LinearRegression()\n",
|
||||
"# We must reshape X using .reshape(-1, 1) because our data has a single feature\n",
|
||||
"# If X has multiple features, you don't need to reshape\n",
|
||||
"linear_model.fit(X.reshape(-1, 1), y) "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Make Predictions\n",
|
||||
"\n",
|
||||
"You can see the predictions made by this model by calling the `predict` function."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"y_pred = linear_model.predict(X.reshape(-1,1))\n",
|
||||
"\n",
|
||||
"print(\"Prediction on training set:\", y_pred)\n",
|
||||
"\n",
|
||||
"X_test = np.array([[1200]])\n",
|
||||
"print(f\"Prediction for 1200 sqft house: {linear_model.predict(X_test)}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Calculate score\n",
|
||||
"\n",
|
||||
"You can calculate how well this model is doing by calling the `score` function. Specifically, it, returns the coefficient of determination $R^2$ of the prediction. 1 is the best score."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print(\"Accuracy on training set:\", linear_model.score(X.reshape(-1,1), y))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## View Parameters \n",
|
||||
"Our $\\mathbf{w}$ parameters from our earlier labs are referred to as 'intercept' and 'coefficients' in sklearn."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print(f\"w = {linear_model.intercept_},{linear_model.coef_}\")"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.8.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
@ -0,0 +1,532 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "existing-laundry",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# UGL - Multiple Variable Model Representation\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "registered-finnish",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import numpy as np\n",
|
||||
"import matplotlib.pyplot as plt\n",
|
||||
"\n",
|
||||
"%matplotlib inline"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "premium-reputation",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Take two data points - TODO: come up with problem statement/explanantion\n",
|
||||
"X_orig = np.array([[10,5], [20, 2]])\n",
|
||||
"y_orig = np.array([1,2])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "mature-salmon",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"2\n",
|
||||
"2\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# print the length of X_orig\n",
|
||||
"print(len(X_orig))\n",
|
||||
"\n",
|
||||
"# print the length of y_orig\n",
|
||||
"print(len(y_orig))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "future-merchant",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"(2, 2)\n",
|
||||
"(2,)\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# print the shape of X_orig\n",
|
||||
"print(X_orig.shape)\n",
|
||||
"\n",
|
||||
"# print the shape of y_orig\n",
|
||||
"print(y_orig.shape)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "enormous-spotlight",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Hypothesis"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "wicked-bread",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Model prediction\n",
|
||||
"The model's prediction is also called the \"hypothesis\", $h_{w}(x)$. \n",
|
||||
"- The prediction is given by the linear model:\n",
|
||||
"\n",
|
||||
"$$ h_{w}(x) = w_0 + w_1x_1 \\tag{2}$$\n",
|
||||
"\n",
|
||||
"This the equation for a line, with an intercept $w_0$ and a slope $w_1$"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "stylish-report",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Vector notation\n",
|
||||
"\n",
|
||||
"For convenience of notation, you'll define $\\overrightarrow{x}$ as a vector containing two values:\n",
|
||||
"\n",
|
||||
"$$ \\vec{x} = \\begin{pmatrix}\n",
|
||||
" x_0 & x_1 \n",
|
||||
" \\end{pmatrix}\n",
|
||||
"$$\n",
|
||||
"\n",
|
||||
"- You'll set $x_0 = 1$. \n",
|
||||
"- $x_1$ will be the city population from your dataset `X_orig`. \n",
|
||||
"\n",
|
||||
"\n",
|
||||
"Similarly, you are defining $\\vec{w}$ as a vector containing two values:\n",
|
||||
"\n",
|
||||
"$$ \\vec{w} = \\begin{pmatrix}\n",
|
||||
" w_0 \\\\ \n",
|
||||
" w_1 \n",
|
||||
" \\end{pmatrix}\n",
|
||||
"$$\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"Now the hypothesis $h_{\\vec{w}}(\\vec{x})$ can now be written as\n",
|
||||
"\n",
|
||||
"$$ h_{\\vec{w}}(\\vec{x}) = \\vec{x} \\times \\vec{w} \\tag{3}\n",
|
||||
"$$ \n",
|
||||
"\n",
|
||||
"$$\n",
|
||||
"h_{\\vec{w}}(\\vec{x}) = \n",
|
||||
"\\begin{pmatrix} x_0 & x_1 \\end{pmatrix} \\times \n",
|
||||
"\\begin{pmatrix} w_0 \\\\ w_1 \\end{pmatrix} \n",
|
||||
"$$\n",
|
||||
"$$\n",
|
||||
"h_{\\vec{w}}(\\vec{x}) = x_0 \\times w_0 + x_1 \\times w_1 \n",
|
||||
"$$\n",
|
||||
"Here is a small example: \n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "embedded-planning",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"The input x is:\n",
|
||||
"[1 2]\n",
|
||||
"\n",
|
||||
"The parameter w is\n",
|
||||
"[[3]\n",
|
||||
" [4]]\n",
|
||||
"\n",
|
||||
"The model's prediction is [11]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Here is a small concrete example of x and w as vectors\n",
|
||||
"\n",
|
||||
"tmp_x = np.array([1,2])\n",
|
||||
"print(f\"The input x is:\")\n",
|
||||
"print(tmp_x)\n",
|
||||
"print()\n",
|
||||
"\n",
|
||||
"tmp_w = np.array([[3],[4]])\n",
|
||||
"print(f\"The parameter w is\")\n",
|
||||
"print(tmp_w)\n",
|
||||
"print()\n",
|
||||
"\n",
|
||||
"tmp_h = np.dot(tmp_x,tmp_w)\n",
|
||||
"print(f\"The model's prediction is {tmp_h}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "continuing-domain",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Matrix X\n",
|
||||
"\n",
|
||||
"To allow you to process multiple examples (multiple cities) at a time, you can stack multiple examples (cities) as rows of a matrix $\\mathbf{X}$.\n",
|
||||
"\n",
|
||||
"For example, let's say New York City is $\\vec{x^{(0)}}$ and San Francisco is $\\vec{x^{(1)}}$. Then stack New York City in row 1 and San Francisco in row 2 of matrix $\\mathbf{X}$:\n",
|
||||
"\n",
|
||||
"$$\\mathbf{X} = \\begin{pmatrix}\n",
|
||||
" \\vec{x^{(0)}} \\\\ \n",
|
||||
" \\vec{x^{(1)}}\n",
|
||||
"\\end{pmatrix}\n",
|
||||
"$$\n",
|
||||
"\n",
|
||||
"Recall that each vector consists of $w_0$ and $w_1$, and $\\mathbf{X}$ looks like this:\n",
|
||||
"$$\n",
|
||||
"\\mathbf{X} = \\begin{pmatrix}\n",
|
||||
" x^{(0)}_0 & x^{(0)}_1 \\\\ \n",
|
||||
" x^{(1)}_0 & x^{(1)}_1\n",
|
||||
"\\end{pmatrix}\n",
|
||||
"$$\n",
|
||||
"\n",
|
||||
"Recall that you're fixing $x_0^{(i)}$ for all cities to be `1`, so you can also write $\\mathbf{X}$ as:\n",
|
||||
"$$\\mathbf{X} =\n",
|
||||
"\\begin{pmatrix}\n",
|
||||
" 1 & x^{(0)}_1 \\\\ \n",
|
||||
" 1 & x^{(1)}_1\n",
|
||||
"\\end{pmatrix}\n",
|
||||
"$$"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "suspended-promise",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"New York City has population 9\n",
|
||||
"San Francisco has population 2\n",
|
||||
"An example of matrix X with city populations for two cities is:\n",
|
||||
"\n",
|
||||
"[[1 9]\n",
|
||||
" [1 2]]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Here is a concrete example\n",
|
||||
"\n",
|
||||
"tmp_NYC_population = 9\n",
|
||||
"tmp_SF_population = 2\n",
|
||||
"tmp_x0 = 1 # x0 for all cities\n",
|
||||
"\n",
|
||||
"tmp_X = np.array([[tmp_x0, tmp_NYC_population],\n",
|
||||
" [tmp_x0, tmp_SF_population]\n",
|
||||
" ])\n",
|
||||
"\n",
|
||||
"print(f\"New York City has population {tmp_NYC_population}\")\n",
|
||||
"print(f\"San Francisco has population {tmp_SF_population}\")\n",
|
||||
"print(f\"An example of matrix X with city populations for two cities is:\\n\")\n",
|
||||
"print(f\"{tmp_X}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "acute-blame",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Matrix X in general\n",
|
||||
"In general, when you have $m$ training examples (in this dataset $m$ is the number of cities), and there are $n$ features (here, just 1 feature, which is city population):\n",
|
||||
"- $\\mathbf{X}$ is a matrix with dimensions ($m$, $n+1$) (m rows, n+1 columns)\n",
|
||||
" - Each row is a city and its input features.\n",
|
||||
"\n",
|
||||
"$$\\mathbf{X} = \\begin{pmatrix}\n",
|
||||
" \\vec{x^{(0)}} \\\\ \n",
|
||||
" \\vec{x^{(1)}} \\\\\n",
|
||||
" \\cdots \\\\\n",
|
||||
" \\vec{x^{(m-1)}}\n",
|
||||
"\\end{pmatrix}\n",
|
||||
"= \\begin{pmatrix}\n",
|
||||
" x^{(0)}_0 & x^{(0)}_1 & \\cdots & x^{(0)}_{n} \\\\ \n",
|
||||
" x^{(1)}_0 & x^{(1)}_1 & \\cdots & x^{(1)}_{n} \\\\\n",
|
||||
" \\cdots \\\\\n",
|
||||
" x^{(m-1)}_0 & x^{(m-1)}_1 & \\cdots & x^{(m-1)}_{n} \n",
|
||||
"\\end{pmatrix}\n",
|
||||
"$$\n",
|
||||
"\n",
|
||||
"- In this dataset, $n=1$ (city population) and $m=97$ (97 cities in the dataset)\n",
|
||||
"\n",
|
||||
"$$\\mathbf{X} = \\begin{pmatrix}\n",
|
||||
" \\vec{x^{(0)}} \\\\ \n",
|
||||
" \\vec{x^{(1)}} \\\\\n",
|
||||
" \\cdots \\\\\n",
|
||||
" \\vec{x^{(m-1)}}\n",
|
||||
"\\end{pmatrix}\n",
|
||||
"= \\begin{pmatrix}\n",
|
||||
" x^{(0)}_0 & x^{(0)}_1 \\\\ \n",
|
||||
" x^{(1)}_0 & x^{(1)}_1 \\\\\n",
|
||||
" \\cdots \\\\\n",
|
||||
" x^{(97-1)}_0 & x^{(97-1)}_1 \n",
|
||||
"\\end{pmatrix}\n",
|
||||
"$$\n",
|
||||
"\n",
|
||||
"- $\\vec{w}$ is a vector with dimensions ($n+1$, $1$) (n+1 rows, 1 column)\n",
|
||||
" - Each column represents one feature.\n",
|
||||
"\n",
|
||||
"$$\\vec{w} = \\begin{pmatrix}\n",
|
||||
"w_0 \\\\ \n",
|
||||
"w_1 \\\\\n",
|
||||
"\\cdots\\\\\n",
|
||||
"w_{n}\n",
|
||||
"\\end{pmatrix}\n",
|
||||
"$$\n",
|
||||
"- In this dataset, there is just the intercept and the city population feature:\n",
|
||||
"$$\\vec{w} = \\begin{pmatrix}\n",
|
||||
"w_0 \\\\ \n",
|
||||
"w_1 \\\\\n",
|
||||
"\\end{pmatrix}\n",
|
||||
"$$"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "criminal-financing",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Processing data: Add the column for the intercept\n",
|
||||
"\n",
|
||||
"To calculate the cost and implement gradient descent, you will want to first add another column to your data (as $x_0$) to accomodate the $w_0$ intercept term. \n",
|
||||
"- This allows you to treat $w_0$ as simply another 'feature': feature 0.\n",
|
||||
"- The city population is then $w_1$, or feature 1.\n",
|
||||
"\n",
|
||||
"So if your original $\\mathbf{X_{orig}}$ looks like this:\n",
|
||||
"\n",
|
||||
"$$ \n",
|
||||
"\\mathbf{X_{orig}} = \n",
|
||||
"\\begin{pmatrix}\n",
|
||||
" x^{(0)}_1 \\\\ \n",
|
||||
" x^{(1)}_1 \\\\\n",
|
||||
" \\cdots \\\\\n",
|
||||
" x^{(97-1)}_1 \n",
|
||||
"\\end{pmatrix}\n",
|
||||
"$$\n",
|
||||
"\n",
|
||||
"You will want to combine it with a vector of ones:\n",
|
||||
"$$\n",
|
||||
"\\vec{1} = \n",
|
||||
"\\begin{pmatrix}\n",
|
||||
" x^{(0)}_0 \\\\ \n",
|
||||
" x^{(1)}_0 \\\\\n",
|
||||
" \\cdots \\\\\n",
|
||||
" x^{(m-1)}_0\n",
|
||||
"\\end{pmatrix}\n",
|
||||
"= \n",
|
||||
"\\begin{pmatrix}\n",
|
||||
" 1 \\\\ \n",
|
||||
" 1 \\\\\n",
|
||||
" \\cdots \\\\\n",
|
||||
" 1\n",
|
||||
"\\end{pmatrix}\n",
|
||||
"$$\n",
|
||||
"\n",
|
||||
"So it will look like this:\n",
|
||||
"$$\n",
|
||||
"\\mathbf{X} = \\begin{pmatrix} \\vec{1} & \\mathbf{X_{orig}}\\end{pmatrix}\n",
|
||||
"=\n",
|
||||
"\\begin{pmatrix}\n",
|
||||
" 1 & x^{(0)}_1 \\\\ \n",
|
||||
" 1 & x^{(1)}_1 \\\\\n",
|
||||
" \\cdots \\\\\n",
|
||||
" 1 & x^{(97-1)}_1 \n",
|
||||
"\\end{pmatrix}\n",
|
||||
"$$\n",
|
||||
"\n",
|
||||
"Here is a small example of what you'll want to do."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "concerned-violence",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Matrix of city populations\n",
|
||||
"[[9]\n",
|
||||
" [2]]\n",
|
||||
"\n",
|
||||
"Column vector of ones ({tmp_num_of_cities} rows and 1 column)\n",
|
||||
"[[1.]\n",
|
||||
" [1.]]\n",
|
||||
"\n",
|
||||
"Vector of ones stacked to the left of tmp_X_orig\n",
|
||||
"[[1. 9.]\n",
|
||||
" [1. 2.]]\n",
|
||||
"tmp_x has shape: (2, 2)\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"tmp_NYC_population = 9\n",
|
||||
"tmp_SF_population = 2\n",
|
||||
"tmp_x0 = 1 # x0 for all cities\n",
|
||||
"tmp_num_of_cities = 2\n",
|
||||
"\n",
|
||||
"tmp_X_orig = np.array([[tmp_NYC_population],\n",
|
||||
" [tmp_SF_population]\n",
|
||||
" ])\n",
|
||||
"\n",
|
||||
"print(\"Matrix of city populations\")\n",
|
||||
"print(tmp_X_orig)\n",
|
||||
"print()\n",
|
||||
"\n",
|
||||
"# Use np.ones to create a column vector of ones\n",
|
||||
"tmp_ones = np.ones((tmp_num_of_cities,1))\n",
|
||||
"print(\"Column vector of ones ({tmp_num_of_cities} rows and 1 column)\")\n",
|
||||
"print(tmp_ones)\n",
|
||||
"print()\n",
|
||||
"\n",
|
||||
"tmp_X = np.concatenate([tmp_ones, tmp_X_orig], axis=1)\n",
|
||||
"print(\"Vector of ones stacked to the left of tmp_X_orig\")\n",
|
||||
"print(tmp_X)\n",
|
||||
"\n",
|
||||
"print(f\"tmp_x has shape: {tmp_X.shape}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "young-living",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In this small example, the $\\mathbf{X}$ is now:\n",
|
||||
"$$\\mathbf{X} = \n",
|
||||
"\\begin{pmatrix}\n",
|
||||
"1 & 9 \\\\\n",
|
||||
"1 & 2\n",
|
||||
"\\end{pmatrix}\n",
|
||||
"$$\n",
|
||||
"\n",
|
||||
"Notice that when calling `np.concatenate`, you're setting `axis=1`. \n",
|
||||
"- This puts the vector of ones on the left and the tmp_X_orig to the right.\n",
|
||||
"- If you set axis = 0, then `np.concatenate` would place the vector of ones ON TOP of tmp_X_orig"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"id": "united-roots",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Calling numpy.concatenate, setting axis=0\n",
|
||||
"Vector of ones stacked to the ON TOP of tmp_X_orig\n",
|
||||
"[[1.]\n",
|
||||
" [1.]\n",
|
||||
" [9.]\n",
|
||||
" [2.]]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(\"Calling numpy.concatenate, setting axis=0\")\n",
|
||||
"tmp_X_version_2 = np.concatenate([tmp_ones, tmp_X_orig], axis=0)\n",
|
||||
"print(\"Vector of ones stacked to the ON TOP of tmp_X_orig\")\n",
|
||||
"print(tmp_X_version_2)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "hydraulic-inspector",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"So if you set axis=1, $\\mathbf{X}$ looks like this:\n",
|
||||
"$$\\mathbf{X} = \n",
|
||||
"\\begin{pmatrix}\n",
|
||||
"1 \\\\ 1 \\\\\n",
|
||||
"9 \\\\ 2\n",
|
||||
"\\end{pmatrix}\n",
|
||||
"$$\n",
|
||||
"This is **NOT** what you want.\n",
|
||||
"\n",
|
||||
"You'll want to set axis=1 so that you get a column vector of ones on the left and a colun vector of the city populations on the right:\n",
|
||||
"\n",
|
||||
"$$\\mathbf{X} = \n",
|
||||
"\\begin{pmatrix}\n",
|
||||
"1 & x^{(0)}_1 \\\\\n",
|
||||
"1 & x^{(1)}_1\n",
|
||||
"\\end{pmatrix}\n",
|
||||
"$$"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "gorgeous-bermuda",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Add a column to X_orig to account for the w_0 term\n",
|
||||
"# X_train = np.stack([np.ones(X_orig.shape), X_orig], axis=1)\n",
|
||||
"m = len(X_col)\n",
|
||||
"col_vec_ones = np.ones((m, 1))\n",
|
||||
"X_train = np.concatenate([col_vec_ones, X_col], axis=1)\n",
|
||||
"# Keep y_orig the same\n",
|
||||
"y_train = y_col\n",
|
||||
"\n",
|
||||
"print ('The shape of X_train is: ' + str(X_train.shape))\n",
|
||||
"print ('The shape of y_train is: ' + str(y_train.shape))"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
@ -0,0 +1,334 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "distributed-detective",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# UGL - Multiple Variable Cost"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "after-cargo",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import numpy as np"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "entire-ecology",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"matrix A has 2 rows and 2 columns\n",
|
||||
"[[1 1]\n",
|
||||
" [1 1]]\n",
|
||||
"\n",
|
||||
"Vector b has 2 rows and 1 column\n",
|
||||
"[[2]\n",
|
||||
" [2]]\n",
|
||||
"\n",
|
||||
"Multiply A times b\n",
|
||||
"[[4]\n",
|
||||
" [4]]\n",
|
||||
"The product has 2 rows and 1 column\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# make a matrix A with 2 rows and 2 columns\n",
|
||||
"tmp_A = np.array([[1,1],[1,1]])\n",
|
||||
"print(f\"matrix A has 2 rows and 2 columns\")\n",
|
||||
"print(tmp_A)\n",
|
||||
"print()\n",
|
||||
"\n",
|
||||
"# make a colun vector B with 2 rows and 1 column\n",
|
||||
"tmp_b = np.array([[2],[2]])\n",
|
||||
"print(f\"Vector b has 2 rows and 1 column\")\n",
|
||||
"print(tmp_b)\n",
|
||||
"print()\n",
|
||||
"# perform matrix multiplication A x b\n",
|
||||
"tmp_A_times_b = np.dot(tmp_A,tmp_b)\n",
|
||||
"print(\"Multiply A times b\")\n",
|
||||
"print(tmp_A_times_b)\n",
|
||||
"print(\"The product has 2 rows and 1 column\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "drawn-product",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"matrix A has 2 rows and 2 columns\n",
|
||||
"[[1 1]\n",
|
||||
" [1 1]]\n",
|
||||
"\n",
|
||||
"Vector b has 2 rows and 1 column\n",
|
||||
"[[2]\n",
|
||||
" [2]]\n",
|
||||
"\n",
|
||||
"The error message you'll see is:\n",
|
||||
"shapes (2,1) and (2,2) not aligned: 1 (dim 1) != 2 (dim 0)\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# make a matrix A with 2 rows and 2 columns\n",
|
||||
"tmp_A = np.array([[1,1],[1,1]])\n",
|
||||
"print(f\"matrix A has 2 rows and 2 columns\")\n",
|
||||
"print(tmp_A)\n",
|
||||
"print()\n",
|
||||
"\n",
|
||||
"# make a colun vector B with 2 rows and 1 column\n",
|
||||
"tmp_b = np.array([[2],[2]])\n",
|
||||
"print(f\"Vector b has 2 rows and 1 column\")\n",
|
||||
"print(tmp_b)\n",
|
||||
"print()\n",
|
||||
"\n",
|
||||
"# Try to perform matrix multiplication A x b\n",
|
||||
"try:\n",
|
||||
" tmp_b_times_A = np.dot(tmp_b,tmp_A)\n",
|
||||
"except Exception as e:\n",
|
||||
" print(\"The error message you'll see is:\")\n",
|
||||
" print(e)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "entertaining-playback",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The message says that it's checking:\n",
|
||||
" - The number of columns of the left matrix `b`, or `dim 1` is 1.\n",
|
||||
" - The number of rows on the right matrix `dim 0`, is 2.\n",
|
||||
" - 1 does not equal 2\n",
|
||||
" - So the two matrices cannot be multiplied together."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "useful-desire",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Calculate the cost\n",
|
||||
"Next, calculate the cost $J(\\vec{w})$\n",
|
||||
"- Recall that the equation for the cost function $J(w)$ looks like this:\n",
|
||||
"$$J(\\vec{w}) = \\frac{1}{2m} \\sum\\limits_{i = 1}^{m} (h_{w}(x^{(i)}) - y^{(i)})^2 \\tag{1}$$ \n",
|
||||
"\n",
|
||||
"- The model prediction is a column vector of 97 examples:\n",
|
||||
"$$\\vec{h_{\\vec{w}}(\\mathbf{X})} = \\begin{pmatrix}\n",
|
||||
"h^{(0)}_{w}(x) \\\\\n",
|
||||
"h^{(1)}_{w}(x) \\\\\n",
|
||||
"\\cdots \\\\\n",
|
||||
"h^{(97-1)}_{w}(x) \\\\\n",
|
||||
"\\end{pmatrix} \n",
|
||||
"$$\n",
|
||||
"\n",
|
||||
"- Similarly, `y_train` contains the true profit per city as a column vector of 97 examples\n",
|
||||
"$$\\vec{y} = \\begin{pmatrix}\n",
|
||||
"y^{(0)} \\\\\n",
|
||||
"y^{(1)} \\\\\n",
|
||||
"\\cdots \\\\\n",
|
||||
"y^{(97-1)}\\\\\n",
|
||||
"\\end{pmatrix} \n",
|
||||
"$$\n",
|
||||
"\n",
|
||||
"Here is a small example to show you how to apply element-wise operations on numpy arrays."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "attempted-potato",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Create a column vector c with 3 rows and 1 column\n",
|
||||
"[[1]\n",
|
||||
" [2]\n",
|
||||
" [3]]\n",
|
||||
"\n",
|
||||
"Create a column vector c with 3 rows and 1 column\n",
|
||||
"[[2]\n",
|
||||
" [2]\n",
|
||||
" [2]]\n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Create two sample column vectors\n",
|
||||
"tmp_c = np.array([[1],[2],[3]])\n",
|
||||
"print(\"Create a column vector c with 3 rows and 1 column\")\n",
|
||||
"print(tmp_c)\n",
|
||||
"print()\n",
|
||||
"\n",
|
||||
"tmp_d = np.array([[2],[2],[2]])\n",
|
||||
"print(\"Create a column vector c with 3 rows and 1 column\")\n",
|
||||
"print(tmp_d)\n",
|
||||
"print()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "sought-postage",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You can apply `+, -, *, /` operators on two vectors of the same length."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "spoken-testament",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Take the element-wise multiplication between vectors c and d\n",
|
||||
"[[2]\n",
|
||||
" [4]\n",
|
||||
" [6]]\n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Take the element-wise multiplication of two vectors\n",
|
||||
"tmp_mult = tmp_c * tmp_d\n",
|
||||
"print(\"Take the element-wise multiplication between vectors c and d\")\n",
|
||||
"print(tmp_mult)\n",
|
||||
"print()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "hearing-nudist",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You can use `numpy.square` to apply the element-wise square of a vector\n",
|
||||
"- Note, `**2` will also work."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "median-extraction",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Take the element-wise square of vector c\n",
|
||||
"[[1]\n",
|
||||
" [4]\n",
|
||||
" [9]]\n",
|
||||
"\n",
|
||||
"Another way to get the element-wise square of vector c\n",
|
||||
"[[1]\n",
|
||||
" [4]\n",
|
||||
" [9]]\n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Take the element-wise square of vector c\n",
|
||||
"tmp_square = np.square(tmp_c)\n",
|
||||
"tmp_square_option_2 = tmp_c**2\n",
|
||||
"print(\"Take the element-wise square of vector c\")\n",
|
||||
"print(tmp_square)\n",
|
||||
"print()\n",
|
||||
"print(\"Another way to get the element-wise square of vector c\")\n",
|
||||
"print(tmp_square_option_2)\n",
|
||||
"print()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "interim-prefix",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You can use `numpy.sum` to add up all the elements of a vector (or matrix)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "fossil-objective",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Vector d\n",
|
||||
"[[2]\n",
|
||||
" [2]\n",
|
||||
" [2]]\n",
|
||||
"\n",
|
||||
"Take the sum of all the elements in vector d\n",
|
||||
"6\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Take the sum of all elements in vector d\n",
|
||||
"tmp_sum = np.sum(tmp_d)\n",
|
||||
"print(\"Vector d\")\n",
|
||||
"print(tmp_d)\n",
|
||||
"print()\n",
|
||||
"print(\"Take the sum of all the elements in vector d\")\n",
|
||||
"print(tmp_sum)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "convenient-taylor",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
@ -0,0 +1,301 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "representative-rhythm",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import numpy as np"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "buried-blackberry",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Prediction: $\\vec{h}_{\\vec{w}}(\\mathbf{X})$\n",
|
||||
"- This is the model's prediction, calculated by $\\mathbf{X}\\vec{w}$"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "obvious-keeping",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Provide two cities and their populations\n",
|
||||
"[[1 9]\n",
|
||||
" [1 2]]\n",
|
||||
"View the current parameter vector\n",
|
||||
"[[1]\n",
|
||||
" [2]]\n",
|
||||
"\n",
|
||||
"Calculate the model prediction h\n",
|
||||
"[[19]\n",
|
||||
" [ 5]]\n",
|
||||
"\n",
|
||||
"The model predicts [19] for city 0, and [5] for city 1\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Provide two cities and their populations\n",
|
||||
"tmp_X = np.array([[1, 9],[1, 2]])\n",
|
||||
"print(\"Provide two cities and their populations\")\n",
|
||||
"print(tmp_X)\n",
|
||||
"\n",
|
||||
"# View the current parameter vector\n",
|
||||
"tmp_w = np.array([[1],[2]])\n",
|
||||
"print(\"View the current parameter vector\")\n",
|
||||
"print(tmp_w)\n",
|
||||
"print()\n",
|
||||
"\n",
|
||||
"# Calculate the model prediction h\n",
|
||||
"tmp_h = np.dot(tmp_X, tmp_w)\n",
|
||||
"print(\"Calculate the model prediction h\")\n",
|
||||
"print(tmp_h)\n",
|
||||
"print()\n",
|
||||
"print(f\"The model predicts {tmp_h[0]} for city 0, and {tmp_h[1]} for city 1\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "developmental-sustainability",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Error: $\\vec{h}_{\\vec{w}}(\\mathbf{X}) - \\vec{y}$\n",
|
||||
" - This is the difference between the model prediction and the actual value of y.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "informed-recorder",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Model prediction tmp_h\n",
|
||||
"[[19]\n",
|
||||
" [ 5]]\n",
|
||||
"\n",
|
||||
"True labels for the profits per city\n",
|
||||
"[[10]\n",
|
||||
" [ 6]]\n",
|
||||
"\n",
|
||||
"Error\n",
|
||||
"[[ 9]\n",
|
||||
" [-1]]\n",
|
||||
"The error for city 0 prediction is [9] and is positive; the error for city 1 prediction is [-1] and is negative\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# View the model's predictions\n",
|
||||
"print(\"Model prediction tmp_h\")\n",
|
||||
"print(tmp_h)\n",
|
||||
"print()\n",
|
||||
"\n",
|
||||
"# Get the true labels for these two cities\n",
|
||||
"tmp_y = np.array([[10],[6]])\n",
|
||||
"print(\"True labels for the profits per city\")\n",
|
||||
"print(tmp_y)\n",
|
||||
"print()\n",
|
||||
"\n",
|
||||
"# Calculate the error\n",
|
||||
"tmp_error = tmp_h - tmp_y\n",
|
||||
"print(\"Error\")\n",
|
||||
"print(tmp_error)\n",
|
||||
"print(f\"The error for city 0 prediction is {tmp_error[0]} and is positive; the error for city 1 prediction is {tmp_error[1]} and is negative\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "suitable-chain",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Gradient: $\\frac{1}{m} \\mathbf{X}^T \\times Error$\n",
|
||||
"- This is a vector containing the gradient for each element of the parameter vector $\\vec{w}$\n",
|
||||
" - Since $\\vec{w}$ is a column vector with 2 rows, this gradient is also a column vector with 2 rows.\n",
|
||||
" - The $\\frac{1}{m}$ takes the average gradient across all 97 training examples (97 cities).\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "automatic-fiction",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"X: two cities and their populations\n",
|
||||
"[[1 9]\n",
|
||||
" [1 2]]\n",
|
||||
"\n",
|
||||
"Transpose of X\n",
|
||||
"[[1 1]\n",
|
||||
" [9 2]]\n",
|
||||
"\n",
|
||||
"The number of examples (number of cities) is 2\n",
|
||||
"\n",
|
||||
"Error\n",
|
||||
"[[ 9]\n",
|
||||
" [-1]]\n",
|
||||
"Gradient\n",
|
||||
"[[ 4. ]\n",
|
||||
" [39.5]]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Provide two cities and their populations\n",
|
||||
"tmp_X = np.array([[1, 9],[1, 2]])\n",
|
||||
"print(\"X: two cities and their populations\")\n",
|
||||
"print(tmp_X)\n",
|
||||
"print()\n",
|
||||
"\n",
|
||||
"# transpose of X\n",
|
||||
"tmp_X_T = tmp_X.T\n",
|
||||
"print(\"Transpose of X\")\n",
|
||||
"print(tmp_X_T)\n",
|
||||
"print()\n",
|
||||
"\n",
|
||||
"# The number of examples (cities)\n",
|
||||
"tmp_m = tmp_X.shape[0]\n",
|
||||
"print(f\"The number of examples (number of cities) is {tmp_m}\\n\")\n",
|
||||
"\n",
|
||||
"# error\n",
|
||||
"print(\"Error\")\n",
|
||||
"print(tmp_error)\n",
|
||||
"\n",
|
||||
"# Calculate the gradient\n",
|
||||
"tmp_gradient = (1/tmp_m) * np.dot(tmp_X_T, tmp_error)\n",
|
||||
"print(\"Gradient\")\n",
|
||||
"print(tmp_gradient)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "virgin-kitchen",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Scale by the learning rate: $\\alpha$\n",
|
||||
"- $\\alpha$ is a positive number smaller than 1 that reduces the magnitude of the update to be smaller than the actual gradient.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "authentic-output",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Learning rate alpha: 0.01\n",
|
||||
"Gradient before scaling by the learning rate:\n",
|
||||
"[[ 4. ]\n",
|
||||
" [39.5]]\n",
|
||||
"\n",
|
||||
"Gradient after scaling by the learning rate\n",
|
||||
"[[0.04 ]\n",
|
||||
" [0.395]]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"tmp_alpha = 0.01\n",
|
||||
"print(f\"Learning rate alpha: {tmp_alpha}\")\n",
|
||||
"\n",
|
||||
"print(\"Gradient before scaling by the learning rate:\")\n",
|
||||
"print(tmp_gradient)\n",
|
||||
"print()\n",
|
||||
"\n",
|
||||
"gradient_scaled_by_learning_rate = tmp_alpha * tmp_gradient\n",
|
||||
"print(\"Gradient after scaling by the learning rate\")\n",
|
||||
"print(gradient_scaled_by_learning_rate)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "incorporate-queen",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"- Subtract the gradient: $-$\n",
|
||||
" - Recall that the gradient points in the direction that would INCREASE the cost, negative one multiplied by the gradient will point in the direction that REDUCES the cost.\n",
|
||||
" - So, to update the weight in the direction that reduces the cost, subtract the gradient."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "hybrid-patent",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Gradient after scaling by the learning rate\n",
|
||||
"[[0.04 ]\n",
|
||||
" [0.395]]\n",
|
||||
"\n",
|
||||
"The direction to update the parameter vector\n",
|
||||
"[[-0.04 ]\n",
|
||||
" [-0.395]]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"gradient_scaled_by_learning_rate = tmp_alpha * tmp_gradient\n",
|
||||
"print(\"Gradient after scaling by the learning rate\")\n",
|
||||
"print(gradient_scaled_by_learning_rate)\n",
|
||||
"print()\n",
|
||||
"\n",
|
||||
"direction_of_update = -1 * gradient_scaled_by_learning_rate\n",
|
||||
"print(\"The direction to update the parameter vector\")\n",
|
||||
"print(direction_of_update)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "western-theory",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
@ -0,0 +1,140 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "balanced-gather",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## UGL - Normal Equations \n",
|
||||
"\n",
|
||||
"In the lecture videos, you learned that the closed-form solution to linear regression is\n",
|
||||
"\n",
|
||||
"\\begin{equation*}\n",
|
||||
"w = (X^TX)^{-1}X^Ty\n",
|
||||
"\\end{equation*}\n",
|
||||
"\n",
|
||||
"Using this formula does not require any feature scaling, and you will get an exact solution in one calculation: there is no “loop until convergence” like in gradient descent.\n",
|
||||
"\n",
|
||||
"**Exercise**\n",
|
||||
"\n",
|
||||
"Complete the code in the `normal_equation()` function below to use the formula above to calculate $w$. Remember that while you don’t need to scale your features, we still need to add a column of 1’s to the original X matrix to have an intercept term $w_0$. You can assume that this has already been done in the previous parts and the variable that you should use is `X_train`.\n",
|
||||
"\n",
|
||||
"**Hint**\n",
|
||||
"Look into np.linalg.inv(), .T and np.dot()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "radio-latest",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# TODO: Originally was the assignment dataset. Either reuse or add new one\n",
|
||||
"X_train = np.zeros((5,2)) \n",
|
||||
"y_train = np.zeros(2)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "mexican-marsh",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def normal_equation(X, y): \n",
|
||||
" \"\"\"\n",
|
||||
" Computes the closed-form solution to linear \n",
|
||||
" regression using the normal equations.\n",
|
||||
" \n",
|
||||
" Parameters\n",
|
||||
" ----------\n",
|
||||
" X : array_like\n",
|
||||
" Shape (m,n)\n",
|
||||
" \n",
|
||||
" y: array_like\n",
|
||||
" Shape (m,)\n",
|
||||
" \n",
|
||||
" Returns\n",
|
||||
" -------\n",
|
||||
" w : array_like\n",
|
||||
" Shape (n,)\n",
|
||||
" Parameters computed by normal equation\n",
|
||||
" \"\"\"\n",
|
||||
" \n",
|
||||
" #(≈ 1 line of code)\n",
|
||||
" # w = \n",
|
||||
" ### BEGIN SOLUTION ###\n",
|
||||
" w = np.dot(np.dot(np.linalg.inv(np.dot(X.T, X)),X.T), y)\n",
|
||||
" ### END SOLUTION ### \n",
|
||||
" \n",
|
||||
" return w"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "smoking-optimum",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"w_normal = normal_equation(X_train, y_train)\n",
|
||||
"print(\"w found by normal equation:\", w_normal)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "bibliographic-services",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now let's see what the prediction is on unseen input"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "wrapped-tradition",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"X_test_orig = np.array([1650, 3])\n",
|
||||
"\n",
|
||||
"X_test_norm = (X_test_orig - mu)/sigma\n",
|
||||
"X_test = np.hstack((1, X_test_norm))\n",
|
||||
"y_pred_normal = np.dot(X_test, w_normal)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "relative-array",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print(\"Predicted price of a 1650 sq-ft, 3 br house \\\n",
|
||||
" using normal equations is is: $%.2f\" % (y_pred_normal))"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
153
work/.ipynb_checkpoints/oldC1_W2_Lab08_Sklearn-checkpoint.ipynb
Normal file
@ -0,0 +1,153 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "expected-characterization",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Ungraded Lab: Linear Regression using Scikit-Learn"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "gorgeous-lincoln",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now that you've implemented linear regression from scratch, let's see you can train a linear regression model using scikit-learn.\n",
|
||||
"\n",
|
||||
"## Dataset \n",
|
||||
"Let's start with the same dataset as before."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "mobile-firmware",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import numpy as np\n",
|
||||
"\n",
|
||||
"# X is the input variable (size in square feet)\n",
|
||||
"# y in the output variable (price in 1000s of dollars)\n",
|
||||
"X = np.array([1000, 2000])\n",
|
||||
"y = np.array([200, 400])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "offshore-lease",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Fit the model\n",
|
||||
"\n",
|
||||
"The code below imports the [linear regression model](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression) from scikit-learn. You can fit this model on the training data by calling `fit` function."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "monetary-tactics",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"LinearRegression()"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from sklearn.linear_model import LinearRegression\n",
|
||||
"\n",
|
||||
"linear_model = LinearRegression()\n",
|
||||
"# We have to reshape X using .reshape(-1, 1) because our data has a single feature\n",
|
||||
"# If X has multiple features, you don't need to reshape\n",
|
||||
"linear_model.fit(X.reshape(-1, 1), y) "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "thick-seven",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Make Predictions\n",
|
||||
"\n",
|
||||
"You can see the predictions made by this model by calling the `predict` function."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "norwegian-variety",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Prediction on training set: [200. 400.]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"y_pred = linear_model.predict(X.reshape(-1,1))\n",
|
||||
"\n",
|
||||
"print(\"Prediction on training set:\", y_pred)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "geographic-archive",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Calculate accuracy\n",
|
||||
"\n",
|
||||
"You can calculate this accuracy of this model by calling the `score` function."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "immune-password",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Accuracy on training set: 1.0\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(\"Accuracy on training set:\", linear_model.score(X.reshape(-1,1), y))"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
126
work/.ipynb_checkpoints/oldW2_UGL8_Scikit_Learn-checkpoint.ipynb
Normal file
@ -0,0 +1,126 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "expected-characterization",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Ungraded Lab: Linear Regression using Scikit-Learn"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "gorgeous-lincoln",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now that you've implemented linear regression from scratch, let's see you can train a linear regression model using scikit-learn.\n",
|
||||
"\n",
|
||||
"## Dataset \n",
|
||||
"Let's start with the same dataset as before."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "mobile-firmware",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import numpy as np\n",
|
||||
"\n",
|
||||
"# X is the input variable (size in square feet)\n",
|
||||
"# y in the output variable (price in 1000s of dollars)\n",
|
||||
"X = np.array([1000, 2000])\n",
|
||||
"y = np.array([200, 400])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "offshore-lease",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Fit the model\n",
|
||||
"\n",
|
||||
"The code below imports the [linear regression model](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression) from scikit-learn. You can fit this model on the training data by calling `fit` function."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "monetary-tactics",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from sklearn.linear_model import LinearRegression\n",
|
||||
"\n",
|
||||
"linear_model = LinearRegression()\n",
|
||||
"# We have to reshape X using .reshape(-1, 1) because our data has a single feature\n",
|
||||
"# If X has multiple features, you don't need to reshape\n",
|
||||
"linear_model.fit(X.reshape(-1, 1), y) "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "thick-seven",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Make Predictions\n",
|
||||
"\n",
|
||||
"You can see the predictions made by this model by calling the `predict` function."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "norwegian-variety",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"y_pred = linear_model.predict(X.reshape(-1,1))\n",
|
||||
"\n",
|
||||
"print(\"Prediction on training set:\", y_pred)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "geographic-archive",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Calculate accuracy\n",
|
||||
"\n",
|
||||
"You can calculate this accuracy of this model by calling the `score` function."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "immune-password",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print(\"Accuracy on training set:\", linear_model.score(X.reshape(-1,1), y))"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
730
work/C1_W2_Lab01_Python_Numpy_Vectorization_Soln.ipynb
Normal file
@ -0,0 +1,730 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Optional Lab: Python, NumPy and Vectorization\n",
|
||||
"A brief introduction to some of the scientific computing used in this course. In particular the NumPy scientific computing package and its use with python.\n",
|
||||
"\n",
|
||||
"# Outline\n",
|
||||
"- [ 1.1 Goals](#toc_40015_1.1)\n",
|
||||
"- [ 1.2 Useful References](#toc_40015_1.2)\n",
|
||||
"- [2 Python and NumPy <a name='Python and NumPy'></a>](#toc_40015_2)\n",
|
||||
"- [3 Vectors](#toc_40015_3)\n",
|
||||
"- [ 3.1 Abstract](#toc_40015_3.1)\n",
|
||||
"- [ 3.2 NumPy Arrays](#toc_40015_3.2)\n",
|
||||
"- [ 3.3 Vector Creation](#toc_40015_3.3)\n",
|
||||
"- [ 3.4 Operations on Vectors](#toc_40015_3.4)\n",
|
||||
"- [4 Matrices](#toc_40015_4)\n",
|
||||
"- [ 4.1 Abstract](#toc_40015_4.1)\n",
|
||||
"- [ 4.2 NumPy Arrays](#toc_40015_4.2)\n",
|
||||
"- [ 4.3 Matrix Creation](#toc_40015_4.3)\n",
|
||||
"- [ 4.4 Operations on Matrices](#toc_40015_4.4)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import numpy as np # it is an unofficial standard to use np for numpy\n",
|
||||
"import time"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_1.1\"></a>\n",
|
||||
"## 1.1 Goals\n",
|
||||
"In this lab, you will:\n",
|
||||
"- Review the features of NumPy and Python that are used in Course 1"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_1.2\"></a>\n",
|
||||
"## 1.2 Useful References\n",
|
||||
"- NumPy Documentation including a basic introduction: [NumPy.org](https://NumPy.org/doc/stable/)\n",
|
||||
"- A challenging feature topic: [NumPy Broadcasting](https://NumPy.org/doc/stable/user/basics.broadcasting.html)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_2\"></a>\n",
|
||||
"# 2 Python and NumPy <a name='Python and NumPy'></a>\n",
|
||||
"Python is the programming language we will be using in this course. It has a set of numeric data types and arithmetic operations. NumPy is a library that extends the base capabilities of python to add a richer data set including more numeric types, vectors, matrices, and many matrix functions. NumPy and python work together fairly seamlessly. Python arithmetic operators work on NumPy data types and many NumPy functions will accept python data types.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3\"></a>\n",
|
||||
"# 3 Vectors\n",
|
||||
"<a name=\"toc_40015_3.1\"></a>\n",
|
||||
"## 3.1 Abstract\n",
|
||||
"<img align=\"right\" src=\"./images/C1_W2_Lab04_Vectors.PNG\" style=\"width:340px;\" >Vectors, as you will use them in this course, are ordered arrays of numbers. In notation, vectors are denoted with lower case bold letters such as $\\mathbf{x}$. The elements of a vector are all the same type. A vector does not, for example, contain both characters and numbers. The number of elements in the array is often referred to as the *dimension* though mathematicians may prefer *rank*. The vector shown has a dimension of $n$. The elements of a vector can be referenced with an index. In math settings, indexes typically run from 1 to n. In computer science and these labs, indexing will typically run from 0 to n-1. In notation, elements of a vector, when referenced individually will indicate the index in a subscript, for example, the $0^{th}$ element, of the vector $\\mathbf{x}$ is $x_0$. Note, the x is not bold in this case. \n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.2\"></a>\n",
|
||||
"## 3.2 NumPy Arrays\n",
|
||||
"\n",
|
||||
"NumPy's basic data structure is an indexable, n-dimensional *array* containing elements of the same type (`dtype`). Right away, you may notice we have overloaded the term 'dimension'. Above, it was the number of elements in the vector, here, dimension refers to the number of indexes of an array. A one-dimensional or 1-D array has one index. In Course 1, we will represent vectors as NumPy 1-D arrays. \n",
|
||||
"\n",
|
||||
" - 1-D array, shape (n,): n elements indexed [0] through [n-1]\n",
|
||||
" "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.3\"></a>\n",
|
||||
"## 3.3 Vector Creation\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Data creation routines in NumPy will generally have a first parameter which is the shape of the object. This can either be a single value for a 1-D result or a tuple (n,m,...) specifying the shape of the result. Below are examples of creating vectors using these routines."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# NumPy routines which allocate memory and fill arrays with value\n",
|
||||
"a = np.zeros(4); print(f\"np.zeros(4) : a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")\n",
|
||||
"a = np.zeros((4,)); print(f\"np.zeros(4,) : a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")\n",
|
||||
"a = np.random.random_sample(4); print(f\"np.random.random_sample(4): a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Some data creation routines do not take a shape tuple:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# NumPy routines which allocate memory and fill arrays with value but do not accept shape as input argument\n",
|
||||
"a = np.arange(4.); print(f\"np.arange(4.): a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")\n",
|
||||
"a = np.random.rand(4); print(f\"np.random.rand(4): a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"values can be specified manually as well. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# NumPy routines which allocate memory and fill with user specified values\n",
|
||||
"a = np.array([5,4,3,2]); print(f\"np.array([5,4,3,2]): a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")\n",
|
||||
"a = np.array([5.,4,3,2]); print(f\"np.array([5.,4,3,2]): a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"These have all created a one-dimensional vector `a` with four elements. `a.shape` returns the dimensions. Here we see a.shape = `(4,)` indicating a 1-d array with 4 elements. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.4\"></a>\n",
|
||||
"## 3.4 Operations on Vectors\n",
|
||||
"Let's explore some operations using vectors.\n",
|
||||
"<a name=\"toc_40015_3.4.1\"></a>\n",
|
||||
"### 3.4.1 Indexing\n",
|
||||
"Elements of vectors can be accessed via indexing and slicing. NumPy provides a very complete set of indexing and slicing capabilities. We will explore only the basics needed for the course here. Reference [Slicing and Indexing](https://NumPy.org/doc/stable/reference/arrays.indexing.html) for more details. \n",
|
||||
"**Indexing** means referring to *an element* of an array by its position within the array. \n",
|
||||
"**Slicing** means getting a *subset* of elements from an array based on their indices. \n",
|
||||
"NumPy starts indexing at zero so the 3rd element of an vector $\\mathbf{a}$ is `a[2]`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#vector indexing operations on 1-D vectors\n",
|
||||
"a = np.arange(10)\n",
|
||||
"print(a)\n",
|
||||
"\n",
|
||||
"#access an element\n",
|
||||
"print(f\"a[2].shape: {a[2].shape} a[2] = {a[2]}, Accessing an element returns a scalar\")\n",
|
||||
"\n",
|
||||
"# access the last element, negative indexes count from the end\n",
|
||||
"print(f\"a[-1] = {a[-1]}\")\n",
|
||||
"\n",
|
||||
"#indexs must be within the range of the vector or they will produce and error\n",
|
||||
"try:\n",
|
||||
" c = a[10]\n",
|
||||
"except Exception as e:\n",
|
||||
" print(\"The error message you'll see is:\")\n",
|
||||
" print(e)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.4.2\"></a>\n",
|
||||
"### 3.4.2 Slicing\n",
|
||||
"Slicing creates an array of indices using a set of three values (`start:stop:step`). A subset of values is also valid. Its use is best explained by example:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#vector slicing operations\n",
|
||||
"a = np.arange(10)\n",
|
||||
"print(f\"a = {a}\")\n",
|
||||
"\n",
|
||||
"#access 5 consecutive elements (start:stop:step)\n",
|
||||
"c = a[2:7:1]; print(\"a[2:7:1] = \", c)\n",
|
||||
"\n",
|
||||
"# access 3 elements separated by two \n",
|
||||
"c = a[2:7:2]; print(\"a[2:7:2] = \", c)\n",
|
||||
"\n",
|
||||
"# access all elements index 3 and above\n",
|
||||
"c = a[3:]; print(\"a[3:] = \", c)\n",
|
||||
"\n",
|
||||
"# access all elements below index 3\n",
|
||||
"c = a[:3]; print(\"a[:3] = \", c)\n",
|
||||
"\n",
|
||||
"# access all elements\n",
|
||||
"c = a[:]; print(\"a[:] = \", c)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.4.3\"></a>\n",
|
||||
"### 3.4.3 Single vector operations\n",
|
||||
"There are a number of useful operations that involve operations on a single vector."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"a = np.array([1,2,3,4])\n",
|
||||
"print(f\"a : {a}\")\n",
|
||||
"# negate elements of a\n",
|
||||
"b = -a \n",
|
||||
"print(f\"b = -a : {b}\")\n",
|
||||
"\n",
|
||||
"# sum all elements of a, returns a scalar\n",
|
||||
"b = np.sum(a) \n",
|
||||
"print(f\"b = np.sum(a) : {b}\")\n",
|
||||
"\n",
|
||||
"b = np.mean(a)\n",
|
||||
"print(f\"b = np.mean(a): {b}\")\n",
|
||||
"\n",
|
||||
"b = a**2\n",
|
||||
"print(f\"b = a**2 : {b}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.4.4\"></a>\n",
|
||||
"### 3.4.4 Vector Vector element-wise operations\n",
|
||||
"Most of the NumPy arithmetic, logical and comparison operations apply to vectors as well. These operators work on an element-by-element basis. For example \n",
|
||||
"$$ c_i = a_i + b_i $$"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"a = np.array([ 1, 2, 3, 4])\n",
|
||||
"b = np.array([-1,-2, 3, 4])\n",
|
||||
"print(f\"Binary operators work element wise: {a + b}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Of course, for this to work correctly, the vectors must be of the same size:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#try a mismatched vector operation\n",
|
||||
"c = np.array([1, 2])\n",
|
||||
"try:\n",
|
||||
" d = a + c\n",
|
||||
"except Exception as e:\n",
|
||||
" print(\"The error message you'll see is:\")\n",
|
||||
" print(e)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.4.5\"></a>\n",
|
||||
"### 3.4.5 Scalar Vector operations\n",
|
||||
"Vectors can be 'scaled' by scalar values. A scalar value is just a number. The scalar multiplies all the elements of the vector."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"a = np.array([1, 2, 3, 4])\n",
|
||||
"\n",
|
||||
"# multiply a by a scalar\n",
|
||||
"b = 5 * a \n",
|
||||
"print(f\"b = 5 * a : {b}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.4.6\"></a>\n",
|
||||
"### 3.4.6 Vector Vector dot product\n",
|
||||
"The dot product is a mainstay of Linear Algebra and NumPy. This is an operation used extensively in this course and should be well understood. The dot product is shown below."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<img src=\"./images/C1_W2_Lab04_dot_notrans.gif\" width=800> "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The dot product multiplies the values in two vectors element-wise and then sums the result.\n",
|
||||
"Vector dot product requires the dimensions of the two vectors to be the same. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's implement our own version of the dot product below:\n",
|
||||
"\n",
|
||||
"**Using a for loop**, implement a function which returns the dot product of two vectors. The function to return given inputs $a$ and $b$:\n",
|
||||
"$$ x = \\sum_{i=0}^{n-1} a_i b_i $$\n",
|
||||
"Assume both `a` and `b` are the same shape."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def my_dot(a, b): \n",
|
||||
" \"\"\"\n",
|
||||
" Compute the dot product of two vectors\n",
|
||||
" \n",
|
||||
" Args:\n",
|
||||
" a (ndarray (n,)): input vector \n",
|
||||
" b (ndarray (n,)): input vector with same dimension as a\n",
|
||||
" \n",
|
||||
" Returns:\n",
|
||||
" x (scalar): \n",
|
||||
" \"\"\"\n",
|
||||
" x=0\n",
|
||||
" for i in range(a.shape[0]):\n",
|
||||
" x = x + a[i] * b[i]\n",
|
||||
" return x"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# test 1-D\n",
|
||||
"a = np.array([1, 2, 3, 4])\n",
|
||||
"b = np.array([-1, 4, 3, 2])\n",
|
||||
"print(f\"my_dot(a, b) = {my_dot(a, b)}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Note, the dot product is expected to return a scalar value. \n",
|
||||
"\n",
|
||||
"Let's try the same operations using `np.dot`. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# test 1-D\n",
|
||||
"a = np.array([1, 2, 3, 4])\n",
|
||||
"b = np.array([-1, 4, 3, 2])\n",
|
||||
"c = np.dot(a, b)\n",
|
||||
"print(f\"NumPy 1-D np.dot(a, b) = {c}, np.dot(a, b).shape = {c.shape} \") \n",
|
||||
"c = np.dot(b, a)\n",
|
||||
"print(f\"NumPy 1-D np.dot(b, a) = {c}, np.dot(a, b).shape = {c.shape} \")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Above, you will note that the results for 1-D matched our implementation."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.4.7\"></a>\n",
|
||||
"### 3.4.7 The Need for Speed: vector vs for loop\n",
|
||||
"We utilized the NumPy library because it improves speed memory efficiency. Let's demonstrate:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"np.random.seed(1)\n",
|
||||
"a = np.random.rand(10000000) # very large arrays\n",
|
||||
"b = np.random.rand(10000000)\n",
|
||||
"\n",
|
||||
"tic = time.time() # capture start time\n",
|
||||
"c = np.dot(a, b)\n",
|
||||
"toc = time.time() # capture end time\n",
|
||||
"\n",
|
||||
"print(f\"np.dot(a, b) = {c:.4f}\")\n",
|
||||
"print(f\"Vectorized version duration: {1000*(toc-tic):.4f} ms \")\n",
|
||||
"\n",
|
||||
"tic = time.time() # capture start time\n",
|
||||
"c = my_dot(a,b)\n",
|
||||
"toc = time.time() # capture end time\n",
|
||||
"\n",
|
||||
"print(f\"my_dot(a, b) = {c:.4f}\")\n",
|
||||
"print(f\"loop version duration: {1000*(toc-tic):.4f} ms \")\n",
|
||||
"\n",
|
||||
"del(a);del(b) #remove these big arrays from memory"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"So, vectorization provides a large speed up in this example. This is because NumPy makes better use of available data parallelism in the underlying hardware. GPU's and modern CPU's implement Single Instruction, Multiple Data (SIMD) pipelines allowing multiple operations to be issued in parallel. This is critical in Machine Learning where the data sets are often very large."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_12345_3.4.8\"></a>\n",
|
||||
"### 3.4.8 Vector Vector operations in Course 1\n",
|
||||
"Vector Vector operations will appear frequently in course 1. Here is why:\n",
|
||||
"- Going forward, our examples will be stored in an array, `X_train` of dimension (m,n). This will be explained more in context, but here it is important to note it is a 2 Dimensional array or matrix (see next section on matrices).\n",
|
||||
"- `w` will be a 1-dimensional vector of shape (n,).\n",
|
||||
"- we will perform operations by looping through the examples, extracting each example to work on individually by indexing X. For example:`X[i]`\n",
|
||||
"- `X[i]` returns a value of shape (n,), a 1-dimensional vector. Consequently, operations involving `X[i]` are often vector-vector. \n",
|
||||
"\n",
|
||||
"That is a somewhat lengthy explanation, but aligning and understanding the shapes of your operands is important when performing vector operations."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# show common Course 1 example\n",
|
||||
"X = np.array([[1],[2],[3],[4]])\n",
|
||||
"w = np.array([2])\n",
|
||||
"c = np.dot(X[1], w)\n",
|
||||
"\n",
|
||||
"print(f\"X[1] has shape {X[1].shape}\")\n",
|
||||
"print(f\"w has shape {w.shape}\")\n",
|
||||
"print(f\"c has shape {c.shape}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_4\"></a>\n",
|
||||
"# 4 Matrices\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_4.1\"></a>\n",
|
||||
"## 4.1 Abstract\n",
|
||||
"Matrices, are two dimensional arrays. The elements of a matrix are all of the same type. In notation, matrices are denoted with capitol, bold letter such as $\\mathbf{X}$. In this and other labs, `m` is often the number of rows and `n` the number of columns. The elements of a matrix can be referenced with a two dimensional index. In math settings, numbers in the index typically run from 1 to n. In computer science and these labs, indexing will run from 0 to n-1. \n",
|
||||
"<figure>\n",
|
||||
" <center> <img src=\"./images/C1_W2_Lab04_Matrices.PNG\" alt='missing' width=900><center/>\n",
|
||||
" <figcaption> Generic Matrix Notation, 1st index is row, 2nd is column </figcaption>\n",
|
||||
"<figure/>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_4.2\"></a>\n",
|
||||
"## 4.2 NumPy Arrays\n",
|
||||
"\n",
|
||||
"NumPy's basic data structure is an indexable, n-dimensional *array* containing elements of the same type (`dtype`). These were described earlier. Matrices have a two-dimensional (2-D) index [m,n].\n",
|
||||
"\n",
|
||||
"In Course 1, 2-D matrices are used to hold training data. Training data is $m$ examples by $n$ features creating an (m,n) array. Course 1 does not do operations directly on matrices but typically extracts an example as a vector and operates on that. Below you will review: \n",
|
||||
"- data creation\n",
|
||||
"- slicing and indexing"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_4.3\"></a>\n",
|
||||
"## 4.3 Matrix Creation\n",
|
||||
"The same functions that created 1-D vectors will create 2-D or n-D arrays. Here are some examples\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Below, the shape tuple is provided to achieve a 2-D result. Notice how NumPy uses brackets to denote each dimension. Notice further than NumPy, when printing, will print one row per line.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"a = np.zeros((1, 5)) \n",
|
||||
"print(f\"a shape = {a.shape}, a = {a}\") \n",
|
||||
"\n",
|
||||
"a = np.zeros((2, 1)) \n",
|
||||
"print(f\"a shape = {a.shape}, a = {a}\") \n",
|
||||
"\n",
|
||||
"a = np.random.random_sample((1, 1)) \n",
|
||||
"print(f\"a shape = {a.shape}, a = {a}\") "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"One can also manually specify data. Dimensions are specified with additional brackets matching the format in the printing above."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# NumPy routines which allocate memory and fill with user specified values\n",
|
||||
"a = np.array([[5], [4], [3]]); print(f\" a shape = {a.shape}, np.array: a = {a}\")\n",
|
||||
"a = np.array([[5], # One can also\n",
|
||||
" [4], # separate values\n",
|
||||
" [3]]); #into separate rows\n",
|
||||
"print(f\" a shape = {a.shape}, np.array: a = {a}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_4.4\"></a>\n",
|
||||
"## 4.4 Operations on Matrices\n",
|
||||
"Let's explore some operations using matrices."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_4.4.1\"></a>\n",
|
||||
"### 4.4.1 Indexing\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Matrices include a second index. The two indexes describe [row, column]. Access can either return an element or a row/column. See below:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#vector indexing operations on matrices\n",
|
||||
"a = np.arange(6).reshape(-1, 2) #reshape is a convenient way to create matrices\n",
|
||||
"print(f\"a.shape: {a.shape}, \\na= {a}\")\n",
|
||||
"\n",
|
||||
"#access an element\n",
|
||||
"print(f\"\\na[2,0].shape: {a[2, 0].shape}, a[2,0] = {a[2, 0]}, type(a[2,0]) = {type(a[2, 0])} Accessing an element returns a scalar\\n\")\n",
|
||||
"\n",
|
||||
"#access a row\n",
|
||||
"print(f\"a[2].shape: {a[2].shape}, a[2] = {a[2]}, type(a[2]) = {type(a[2])}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"It is worth drawing attention to the last example. Accessing a matrix by just specifying the row will return a *1-D vector*."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Reshape** \n",
|
||||
"The previous example used [reshape](https://numpy.org/doc/stable/reference/generated/numpy.reshape.html) to shape the array. \n",
|
||||
"`a = np.arange(6).reshape(-1, 2) ` \n",
|
||||
"This line of code first created a *1-D Vector* of six elements. It then reshaped that vector into a *2-D* array using the reshape command. This could have been written: \n",
|
||||
"`a = np.arange(6).reshape(3, 2) ` \n",
|
||||
"To arrive at the same 3 row, 2 column array.\n",
|
||||
"The -1 argument tells the routine to compute the number of rows given the size of the array and the number of columns.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_4.4.2\"></a>\n",
|
||||
"### 4.4.2 Slicing\n",
|
||||
"Slicing creates an array of indices using a set of three values (`start:stop:step`). A subset of values is also valid. Its use is best explained by example:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#vector 2-D slicing operations\n",
|
||||
"a = np.arange(20).reshape(-1, 10)\n",
|
||||
"print(f\"a = \\n{a}\")\n",
|
||||
"\n",
|
||||
"#access 5 consecutive elements (start:stop:step)\n",
|
||||
"print(\"a[0, 2:7:1] = \", a[0, 2:7:1], \", a[0, 2:7:1].shape =\", a[0, 2:7:1].shape, \"a 1-D array\")\n",
|
||||
"\n",
|
||||
"#access 5 consecutive elements (start:stop:step) in two rows\n",
|
||||
"print(\"a[:, 2:7:1] = \\n\", a[:, 2:7:1], \", a[:, 2:7:1].shape =\", a[:, 2:7:1].shape, \"a 2-D array\")\n",
|
||||
"\n",
|
||||
"# access all elements\n",
|
||||
"print(\"a[:,:] = \\n\", a[:,:], \", a[:,:].shape =\", a[:,:].shape)\n",
|
||||
"\n",
|
||||
"# access all elements in one row (very common usage)\n",
|
||||
"print(\"a[1,:] = \", a[1,:], \", a[1,:].shape =\", a[1,:].shape, \"a 1-D array\")\n",
|
||||
"# same as\n",
|
||||
"print(\"a[1] = \", a[1], \", a[1].shape =\", a[1].shape, \"a 1-D array\")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_5.0\"></a>\n",
|
||||
"## Congratulations!\n",
|
||||
"In this lab you mastered the features of Python and NumPy that are needed for Course 1."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"dl_toc_settings": {
|
||||
"rndtag": "40015"
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.10"
|
||||
},
|
||||
"toc-autonumbering": false
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
648
work/C1_W2_Lab02_Multiple_Variable_Soln.ipynb
Normal file
@ -0,0 +1,648 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Optional Lab: Multiple Variable Linear Regression\n",
|
||||
"\n",
|
||||
"In this lab, you will extend the data structures and previously developed routines to support multiple features. Several routines are updated making the lab appear lengthy, but it makes minor adjustments to previous routines making it quick to review.\n",
|
||||
"# Outline\n",
|
||||
"- [ 1.1 Goals](#toc_15456_1.1)\n",
|
||||
"- [ 1.2 Tools](#toc_15456_1.2)\n",
|
||||
"- [ 1.3 Notation](#toc_15456_1.3)\n",
|
||||
"- [2 Problem Statement](#toc_15456_2)\n",
|
||||
"- [ 2.1 Matrix X containing our examples](#toc_15456_2.1)\n",
|
||||
"- [ 2.2 Parameter vector w, b](#toc_15456_2.2)\n",
|
||||
"- [3 Model Prediction With Multiple Variables](#toc_15456_3)\n",
|
||||
"- [ 3.1 Single Prediction element by element](#toc_15456_3.1)\n",
|
||||
"- [ 3.2 Single Prediction, vector](#toc_15456_3.2)\n",
|
||||
"- [4 Compute Cost With Multiple Variables](#toc_15456_4)\n",
|
||||
"- [5 Gradient Descent With Multiple Variables](#toc_15456_5)\n",
|
||||
"- [ 5.1 Compute Gradient with Multiple Variables](#toc_15456_5.1)\n",
|
||||
"- [ 5.2 Gradient Descent With Multiple Variables](#toc_15456_5.2)\n",
|
||||
"- [6 Congratulations](#toc_15456_6)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_1.1\"></a>\n",
|
||||
"## 1.1 Goals\n",
|
||||
"- Extend our regression model routines to support multiple features\n",
|
||||
" - Extend data structures to support multiple features\n",
|
||||
" - Rewrite prediction, cost and gradient routines to support multiple features\n",
|
||||
" - Utilize NumPy `np.dot` to vectorize their implementations for speed and simplicity"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_1.2\"></a>\n",
|
||||
"## 1.2 Tools\n",
|
||||
"In this lab, we will make use of: \n",
|
||||
"- NumPy, a popular library for scientific computing\n",
|
||||
"- Matplotlib, a popular library for plotting data"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import copy, math\n",
|
||||
"import numpy as np\n",
|
||||
"import matplotlib.pyplot as plt\n",
|
||||
"plt.style.use('./deeplearning.mplstyle')\n",
|
||||
"np.set_printoptions(precision=2) # reduced display precision on numpy arrays"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_1.3\"></a>\n",
|
||||
"## 1.3 Notation\n",
|
||||
"Here is a summary of some of the notation you will encounter, updated for multiple features. \n",
|
||||
"\n",
|
||||
"|General <img width=70/> <br /> Notation <img width=70/> | Description<img width=350/>| Python (if applicable) |\n",
|
||||
"|: ------------|: ------------------------------------------------------------||\n",
|
||||
"| $a$ | scalar, non bold ||\n",
|
||||
"| $\\mathbf{a}$ | vector, bold ||\n",
|
||||
"| $\\mathbf{A}$ | matrix, bold capital ||\n",
|
||||
"| **Regression** | | | |\n",
|
||||
"| $\\mathbf{X}$ | training example matrix | `X_train` | \n",
|
||||
"| $\\mathbf{y}$ | training example targets | `y_train` \n",
|
||||
"| $\\mathbf{x}^{(i)}$, $y^{(i)}$ | $i_{th}$Training Example | `X[i]`, `y[i]`|\n",
|
||||
"| m | number of training examples | `m`|\n",
|
||||
"| n | number of features in each example | `n`|\n",
|
||||
"| $\\mathbf{w}$ | parameter: weight, | `w` |\n",
|
||||
"| $b$ | parameter: bias | `b` | \n",
|
||||
"| $f_{\\mathbf{w},b}(\\mathbf{x}^{(i)})$ | The result of the model evaluation at $\\mathbf{x^{(i)}}$ parameterized by $\\mathbf{w},b$: $f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) = \\mathbf{w} \\cdot \\mathbf{x}^{(i)}+b$ | `f_wb` | \n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_2\"></a>\n",
|
||||
"# 2 Problem Statement\n",
|
||||
"\n",
|
||||
"You will use the motivating example of housing price prediction. The training dataset contains three examples with four features (size, bedrooms, floors and, age) shown in the table below. Note that, unlike the earlier labs, size is in sqft rather than 1000 sqft. This causes an issue, which you will solve in the next lab!\n",
|
||||
"\n",
|
||||
"| Size (sqft) | Number of Bedrooms | Number of floors | Age of Home | Price (1000s dollars) | \n",
|
||||
"| ----------------| ------------------- |----------------- |--------------|-------------- | \n",
|
||||
"| 2104 | 5 | 1 | 45 | 460 | \n",
|
||||
"| 1416 | 3 | 2 | 40 | 232 | \n",
|
||||
"| 852 | 2 | 1 | 35 | 178 | \n",
|
||||
"\n",
|
||||
"You will build a linear regression model using these values so you can then predict the price for other houses. For example, a house with 1200 sqft, 3 bedrooms, 1 floor, 40 years old. \n",
|
||||
"\n",
|
||||
"Please run the following code cell to create your `X_train` and `y_train` variables."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"X_train = np.array([[2104, 5, 1, 45], [1416, 3, 2, 40], [852, 2, 1, 35]])\n",
|
||||
"y_train = np.array([460, 232, 178])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_2.1\"></a>\n",
|
||||
"## 2.1 Matrix X containing our examples\n",
|
||||
"Similar to the table above, examples are stored in a NumPy matrix `X_train`. Each row of the matrix represents one example. When you have $m$ training examples ( $m$ is three in our example), and there are $n$ features (four in our example), $\\mathbf{X}$ is a matrix with dimensions ($m$, $n$) (m rows, n columns).\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"$$\\mathbf{X} = \n",
|
||||
"\\begin{pmatrix}\n",
|
||||
" x^{(0)}_0 & x^{(0)}_1 & \\cdots & x^{(0)}_{n-1} \\\\ \n",
|
||||
" x^{(1)}_0 & x^{(1)}_1 & \\cdots & x^{(1)}_{n-1} \\\\\n",
|
||||
" \\cdots \\\\\n",
|
||||
" x^{(m-1)}_0 & x^{(m-1)}_1 & \\cdots & x^{(m-1)}_{n-1} \n",
|
||||
"\\end{pmatrix}\n",
|
||||
"$$\n",
|
||||
"notation:\n",
|
||||
"- $\\mathbf{x}^{(i)}$ is vector containing example i. $\\mathbf{x}^{(i)}$ $ = (x^{(i)}_0, x^{(i)}_1, \\cdots,x^{(i)}_{n-1})$\n",
|
||||
"- $x^{(i)}_j$ is element j in example i. The superscript in parenthesis indicates the example number while the subscript represents an element. \n",
|
||||
"\n",
|
||||
"Display the input data."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# data is stored in numpy array/matrix\n",
|
||||
"print(f\"X Shape: {X_train.shape}, X Type:{type(X_train)})\")\n",
|
||||
"print(X_train)\n",
|
||||
"print(f\"y Shape: {y_train.shape}, y Type:{type(y_train)})\")\n",
|
||||
"print(y_train)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_2.2\"></a>\n",
|
||||
"## 2.2 Parameter vector w, b\n",
|
||||
"\n",
|
||||
"* $\\mathbf{w}$ is a vector with $n$ elements.\n",
|
||||
" - Each element contains the parameter associated with one feature.\n",
|
||||
" - in our dataset, n is 4.\n",
|
||||
" - notionally, we draw this as a column vector\n",
|
||||
"\n",
|
||||
"$$\\mathbf{w} = \\begin{pmatrix}\n",
|
||||
"w_0 \\\\ \n",
|
||||
"w_1 \\\\\n",
|
||||
"\\cdots\\\\\n",
|
||||
"w_{n-1}\n",
|
||||
"\\end{pmatrix}\n",
|
||||
"$$\n",
|
||||
"* $b$ is a scalar parameter. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"For demonstration, $\\mathbf{w}$ and $b$ will be loaded with some initial selected values that are near the optimal. $\\mathbf{w}$ is a 1-D NumPy vector."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"b_init = 785.1811367994083\n",
|
||||
"w_init = np.array([ 0.39133535, 18.75376741, -53.36032453, -26.42131618])\n",
|
||||
"print(f\"w_init shape: {w_init.shape}, b_init type: {type(b_init)}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_3\"></a>\n",
|
||||
"# 3 Model Prediction With Multiple Variables\n",
|
||||
"The model's prediction with multiple variables is given by the linear model:\n",
|
||||
"\n",
|
||||
"$$ f_{\\mathbf{w},b}(\\mathbf{x}) = w_0x_0 + w_1x_1 +... + w_{n-1}x_{n-1} + b \\tag{1}$$\n",
|
||||
"or in vector notation:\n",
|
||||
"$$ f_{\\mathbf{w},b}(\\mathbf{x}) = \\mathbf{w} \\cdot \\mathbf{x} + b \\tag{2} $$ \n",
|
||||
"where $\\cdot$ is a vector `dot product`\n",
|
||||
"\n",
|
||||
"To demonstrate the dot product, we will implement prediction using (1) and (2)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_3.1\"></a>\n",
|
||||
"## 3.1 Single Prediction element by element\n",
|
||||
"Our previous prediction multiplied one feature value by one parameter and added a bias parameter. A direct extension of our previous implementation of prediction to multiple features would be to implement (1) above using loop over each element, performing the multiply with its parameter and then adding the bias parameter at the end.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def predict_single_loop(x, w, b): \n",
|
||||
" \"\"\"\n",
|
||||
" single predict using linear regression\n",
|
||||
" \n",
|
||||
" Args:\n",
|
||||
" x (ndarray): Shape (n,) example with multiple features\n",
|
||||
" w (ndarray): Shape (n,) model parameters \n",
|
||||
" b (scalar): model parameter \n",
|
||||
" \n",
|
||||
" Returns:\n",
|
||||
" p (scalar): prediction\n",
|
||||
" \"\"\"\n",
|
||||
" n = x.shape[0]\n",
|
||||
" p = 0\n",
|
||||
" for i in range(n):\n",
|
||||
" p_i = x[i] * w[i] \n",
|
||||
" p = p + p_i \n",
|
||||
" p = p + b \n",
|
||||
" return p"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# get a row from our training data\n",
|
||||
"x_vec = X_train[0,:]\n",
|
||||
"print(f\"x_vec shape {x_vec.shape}, x_vec value: {x_vec}\")\n",
|
||||
"\n",
|
||||
"# make a prediction\n",
|
||||
"f_wb = predict_single_loop(x_vec, w_init, b_init)\n",
|
||||
"print(f\"f_wb shape {f_wb.shape}, prediction: {f_wb}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Note the shape of `x_vec`. It is a 1-D NumPy vector with 4 elements, (4,). The result, `f_wb` is a scalar."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_3.2\"></a>\n",
|
||||
"## 3.2 Single Prediction, vector\n",
|
||||
"\n",
|
||||
"Noting that equation (1) above can be implemented using the dot product as in (2) above. We can make use of vector operations to speed up predictions.\n",
|
||||
"\n",
|
||||
"Recall from the Python/Numpy lab that NumPy `np.dot()`[[link](https://numpy.org/doc/stable/reference/generated/numpy.dot.html)] can be used to perform a vector dot product. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def predict(x, w, b): \n",
|
||||
" \"\"\"\n",
|
||||
" single predict using linear regression\n",
|
||||
" Args:\n",
|
||||
" x (ndarray): Shape (n,) example with multiple features\n",
|
||||
" w (ndarray): Shape (n,) model parameters \n",
|
||||
" b (scalar): model parameter \n",
|
||||
" \n",
|
||||
" Returns:\n",
|
||||
" p (scalar): prediction\n",
|
||||
" \"\"\"\n",
|
||||
" p = np.dot(x, w) + b \n",
|
||||
" return p "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# get a row from our training data\n",
|
||||
"x_vec = X_train[0,:]\n",
|
||||
"print(f\"x_vec shape {x_vec.shape}, x_vec value: {x_vec}\")\n",
|
||||
"\n",
|
||||
"# make a prediction\n",
|
||||
"f_wb = predict(x_vec,w_init, b_init)\n",
|
||||
"print(f\"f_wb shape {f_wb.shape}, prediction: {f_wb}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The results and shapes are the same as the previous version which used looping. Going forward, `np.dot` will be used for these operations. The prediction is now a single statement. Most routines will implement it directly rather than calling a separate predict routine."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_4\"></a>\n",
|
||||
"# 4 Compute Cost With Multiple Variables\n",
|
||||
"The equation for the cost function with multiple variables $J(\\mathbf{w},b)$ is:\n",
|
||||
"$$J(\\mathbf{w},b) = \\frac{1}{2m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) - y^{(i)})^2 \\tag{3}$$ \n",
|
||||
"where:\n",
|
||||
"$$ f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) = \\mathbf{w} \\cdot \\mathbf{x}^{(i)} + b \\tag{4} $$ \n",
|
||||
"\n",
|
||||
"\n",
|
||||
"In contrast to previous labs, $\\mathbf{w}$ and $\\mathbf{x}^{(i)}$ are vectors rather than scalars supporting multiple features."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Below is an implementation of equations (3) and (4). Note that this uses a *standard pattern for this course* where a for loop over all `m` examples is used."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def compute_cost(X, y, w, b): \n",
|
||||
" \"\"\"\n",
|
||||
" compute cost\n",
|
||||
" Args:\n",
|
||||
" X (ndarray (m,n)): Data, m examples with n features\n",
|
||||
" y (ndarray (m,)) : target values\n",
|
||||
" w (ndarray (n,)) : model parameters \n",
|
||||
" b (scalar) : model parameter\n",
|
||||
" \n",
|
||||
" Returns:\n",
|
||||
" cost (scalar): cost\n",
|
||||
" \"\"\"\n",
|
||||
" m = X.shape[0]\n",
|
||||
" cost = 0.0\n",
|
||||
" for i in range(m): \n",
|
||||
" f_wb_i = np.dot(X[i], w) + b #(n,)(n,) = scalar (see np.dot)\n",
|
||||
" cost = cost + (f_wb_i - y[i])**2 #scalar\n",
|
||||
" cost = cost / (2 * m) #scalar \n",
|
||||
" return cost"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Compute and display cost using our pre-chosen optimal parameters. \n",
|
||||
"cost = compute_cost(X_train, y_train, w_init, b_init)\n",
|
||||
"print(f'Cost at optimal w : {cost}')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Expected Result**: Cost at optimal w : 1.5578904045996674e-12"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_5\"></a>\n",
|
||||
"# 5 Gradient Descent With Multiple Variables\n",
|
||||
"Gradient descent for multiple variables:\n",
|
||||
"\n",
|
||||
"$$\\begin{align*} \\text{repeat}&\\text{ until convergence:} \\; \\lbrace \\newline\\;\n",
|
||||
"& w_j = w_j - \\alpha \\frac{\\partial J(\\mathbf{w},b)}{\\partial w_j} \\tag{5} \\; & \\text{for j = 0..n-1}\\newline\n",
|
||||
"&b\\ \\ = b - \\alpha \\frac{\\partial J(\\mathbf{w},b)}{\\partial b} \\newline \\rbrace\n",
|
||||
"\\end{align*}$$\n",
|
||||
"\n",
|
||||
"where, n is the number of features, parameters $w_j$, $b$, are updated simultaneously and where \n",
|
||||
"\n",
|
||||
"$$\n",
|
||||
"\\begin{align}\n",
|
||||
"\\frac{\\partial J(\\mathbf{w},b)}{\\partial w_j} &= \\frac{1}{m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) - y^{(i)})x_{j}^{(i)} \\tag{6} \\\\\n",
|
||||
"\\frac{\\partial J(\\mathbf{w},b)}{\\partial b} &= \\frac{1}{m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) - y^{(i)}) \\tag{7}\n",
|
||||
"\\end{align}\n",
|
||||
"$$\n",
|
||||
"* m is the number of training examples in the data set\n",
|
||||
"\n",
|
||||
" \n",
|
||||
"* $f_{\\mathbf{w},b}(\\mathbf{x}^{(i)})$ is the model's prediction, while $y^{(i)}$ is the target value\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_5.1\"></a>\n",
|
||||
"## 5.1 Compute Gradient with Multiple Variables\n",
|
||||
"An implementation for calculating the equations (6) and (7) is below. There are many ways to implement this. In this version, there is an\n",
|
||||
"- outer loop over all m examples. \n",
|
||||
" - $\\frac{\\partial J(\\mathbf{w},b)}{\\partial b}$ for the example can be computed directly and accumulated\n",
|
||||
" - in a second loop over all n features:\n",
|
||||
" - $\\frac{\\partial J(\\mathbf{w},b)}{\\partial w_j}$ is computed for each $w_j$.\n",
|
||||
" "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def compute_gradient(X, y, w, b): \n",
|
||||
" \"\"\"\n",
|
||||
" Computes the gradient for linear regression \n",
|
||||
" Args:\n",
|
||||
" X (ndarray (m,n)): Data, m examples with n features\n",
|
||||
" y (ndarray (m,)) : target values\n",
|
||||
" w (ndarray (n,)) : model parameters \n",
|
||||
" b (scalar) : model parameter\n",
|
||||
" \n",
|
||||
" Returns:\n",
|
||||
" dj_dw (ndarray (n,)): The gradient of the cost w.r.t. the parameters w. \n",
|
||||
" dj_db (scalar): The gradient of the cost w.r.t. the parameter b. \n",
|
||||
" \"\"\"\n",
|
||||
" m,n = X.shape #(number of examples, number of features)\n",
|
||||
" dj_dw = np.zeros((n,))\n",
|
||||
" dj_db = 0.\n",
|
||||
"\n",
|
||||
" for i in range(m): \n",
|
||||
" err = (np.dot(X[i], w) + b) - y[i] \n",
|
||||
" for j in range(n): \n",
|
||||
" dj_dw[j] = dj_dw[j] + err * X[i, j] \n",
|
||||
" dj_db = dj_db + err \n",
|
||||
" dj_dw = dj_dw / m \n",
|
||||
" dj_db = dj_db / m \n",
|
||||
" \n",
|
||||
" return dj_db, dj_dw"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#Compute and display gradient \n",
|
||||
"tmp_dj_db, tmp_dj_dw = compute_gradient(X_train, y_train, w_init, b_init)\n",
|
||||
"print(f'dj_db at initial w,b: {tmp_dj_db}')\n",
|
||||
"print(f'dj_dw at initial w,b: \\n {tmp_dj_dw}')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Expected Result**: \n",
|
||||
"dj_db at initial w,b: -1.6739251122999121e-06 \n",
|
||||
"dj_dw at initial w,b: \n",
|
||||
" [-2.73e-03 -6.27e-06 -2.22e-06 -6.92e-05] "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_5.2\"></a>\n",
|
||||
"## 5.2 Gradient Descent With Multiple Variables\n",
|
||||
"The routine below implements equation (5) above."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def gradient_descent(X, y, w_in, b_in, cost_function, gradient_function, alpha, num_iters): \n",
|
||||
" \"\"\"\n",
|
||||
" Performs batch gradient descent to learn w and b. Updates w and b by taking \n",
|
||||
" num_iters gradient steps with learning rate alpha\n",
|
||||
" \n",
|
||||
" Args:\n",
|
||||
" X (ndarray (m,n)) : Data, m examples with n features\n",
|
||||
" y (ndarray (m,)) : target values\n",
|
||||
" w_in (ndarray (n,)) : initial model parameters \n",
|
||||
" b_in (scalar) : initial model parameter\n",
|
||||
" cost_function : function to compute cost\n",
|
||||
" gradient_function : function to compute the gradient\n",
|
||||
" alpha (float) : Learning rate\n",
|
||||
" num_iters (int) : number of iterations to run gradient descent\n",
|
||||
" \n",
|
||||
" Returns:\n",
|
||||
" w (ndarray (n,)) : Updated values of parameters \n",
|
||||
" b (scalar) : Updated value of parameter \n",
|
||||
" \"\"\"\n",
|
||||
" \n",
|
||||
" # An array to store cost J and w's at each iteration primarily for graphing later\n",
|
||||
" J_history = []\n",
|
||||
" w = copy.deepcopy(w_in) #avoid modifying global w within function\n",
|
||||
" b = b_in\n",
|
||||
" \n",
|
||||
" for i in range(num_iters):\n",
|
||||
"\n",
|
||||
" # Calculate the gradient and update the parameters\n",
|
||||
" dj_db,dj_dw = gradient_function(X, y, w, b) ##None\n",
|
||||
"\n",
|
||||
" # Update Parameters using w, b, alpha and gradient\n",
|
||||
" w = w - alpha * dj_dw ##None\n",
|
||||
" b = b - alpha * dj_db ##None\n",
|
||||
" \n",
|
||||
" # Save cost J at each iteration\n",
|
||||
" if i<100000: # prevent resource exhaustion \n",
|
||||
" J_history.append( cost_function(X, y, w, b))\n",
|
||||
"\n",
|
||||
" # Print cost every at intervals 10 times or as many iterations if < 10\n",
|
||||
" if i% math.ceil(num_iters / 10) == 0:\n",
|
||||
" print(f\"Iteration {i:4d}: Cost {J_history[-1]:8.2f} \")\n",
|
||||
" \n",
|
||||
" return w, b, J_history #return final w,b and J history for graphing"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In the next cell you will test the implementation. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# initialize parameters\n",
|
||||
"initial_w = np.zeros_like(w_init)\n",
|
||||
"initial_b = 0.\n",
|
||||
"# some gradient descent settings\n",
|
||||
"iterations = 1000\n",
|
||||
"alpha = 5.0e-7\n",
|
||||
"# run gradient descent \n",
|
||||
"w_final, b_final, J_hist = gradient_descent(X_train, y_train, initial_w, initial_b,\n",
|
||||
" compute_cost, compute_gradient, \n",
|
||||
" alpha, iterations)\n",
|
||||
"print(f\"b,w found by gradient descent: {b_final:0.2f},{w_final} \")\n",
|
||||
"m,_ = X_train.shape\n",
|
||||
"for i in range(m):\n",
|
||||
" print(f\"prediction: {np.dot(X_train[i], w_final) + b_final:0.2f}, target value: {y_train[i]}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Expected Result**: \n",
|
||||
"b,w found by gradient descent: -0.00,[ 0.2 0. -0.01 -0.07] \n",
|
||||
"prediction: 426.19, target value: 460 \n",
|
||||
"prediction: 286.17, target value: 232 \n",
|
||||
"prediction: 171.47, target value: 178 "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# plot cost versus iteration \n",
|
||||
"fig, (ax1, ax2) = plt.subplots(1, 2, constrained_layout=True, figsize=(12, 4))\n",
|
||||
"ax1.plot(J_hist)\n",
|
||||
"ax2.plot(100 + np.arange(len(J_hist[100:])), J_hist[100:])\n",
|
||||
"ax1.set_title(\"Cost vs. iteration\"); ax2.set_title(\"Cost vs. iteration (tail)\")\n",
|
||||
"ax1.set_ylabel('Cost') ; ax2.set_ylabel('Cost') \n",
|
||||
"ax1.set_xlabel('iteration step') ; ax2.set_xlabel('iteration step') \n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"*These results are not inspiring*! Cost is still declining and our predictions are not very accurate. The next lab will explore how to improve on this."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"<a name=\"toc_15456_6\"></a>\n",
|
||||
"# 6 Congratulations!\n",
|
||||
"In this lab you:\n",
|
||||
"- Redeveloped the routines for linear regression, now with multiple variables.\n",
|
||||
"- Utilized NumPy `np.dot` to vectorize the implementations"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"dl_toc_settings": {
|
||||
"rndtag": "15456"
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.6"
|
||||
},
|
||||
"toc-autonumbering": false
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
666
work/C1_W2_Lab03_Feature_Scaling_and_Learning_Rate_Soln.ipynb
Normal file
@ -0,0 +1,666 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Optional Lab: Feature scaling and Learning Rate (Multi-variable)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Goals\n",
|
||||
"In this lab you will:\n",
|
||||
"- Utilize the multiple variables routines developed in the previous lab\n",
|
||||
"- run Gradient Descent on a data set with multiple features\n",
|
||||
"- explore the impact of the *learning rate alpha* on gradient descent\n",
|
||||
"- improve performance of gradient descent by *feature scaling* using z-score normalization"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Tools\n",
|
||||
"You will utilize the functions developed in the last lab as well as matplotlib and NumPy. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import numpy as np\n",
|
||||
"import matplotlib.pyplot as plt\n",
|
||||
"from lab_utils_multi import load_house_data, run_gradient_descent \n",
|
||||
"from lab_utils_multi import norm_plot, plt_equal_scale, plot_cost_i_w\n",
|
||||
"from lab_utils_common import dlc\n",
|
||||
"np.set_printoptions(precision=2)\n",
|
||||
"plt.style.use('./deeplearning.mplstyle')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Notation\n",
|
||||
"\n",
|
||||
"|General <br /> Notation | Description| Python (if applicable) |\n",
|
||||
"|: ------------|: ------------------------------------------------------------||\n",
|
||||
"| $a$ | scalar, non bold ||\n",
|
||||
"| $\\mathbf{a}$ | vector, bold ||\n",
|
||||
"| $\\mathbf{A}$ | matrix, bold capital ||\n",
|
||||
"| **Regression** | | | |\n",
|
||||
"| $\\mathbf{X}$ | training example maxtrix | `X_train` | \n",
|
||||
"| $\\mathbf{y}$ | training example targets | `y_train` \n",
|
||||
"| $\\mathbf{x}^{(i)}$, $y^{(i)}$ | $i_{th}$Training Example | `X[i]`, `y[i]`|\n",
|
||||
"| m | number of training examples | `m`|\n",
|
||||
"| n | number of features in each example | `n`|\n",
|
||||
"| $\\mathbf{w}$ | parameter: weight, | `w` |\n",
|
||||
"| $b$ | parameter: bias | `b` | \n",
|
||||
"| $f_{\\mathbf{w},b}(\\mathbf{x}^{(i)})$ | The result of the model evaluation at $\\mathbf{x}^{(i)}$ parameterized by $\\mathbf{w},b$: $f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) = \\mathbf{w} \\cdot \\mathbf{x}^{(i)}+b$ | `f_wb` | \n",
|
||||
"|$\\frac{\\partial J(\\mathbf{w},b)}{\\partial w_j}$| the gradient or partial derivative of cost with respect to a parameter $w_j$ |`dj_dw[j]`| \n",
|
||||
"|$\\frac{\\partial J(\\mathbf{w},b)}{\\partial b}$| the gradient or partial derivative of cost with respect to a parameter $b$| `dj_db`|"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Problem Statement\n",
|
||||
"\n",
|
||||
"As in the previous labs, you will use the motivating example of housing price prediction. The training data set contains many examples with 4 features (size, bedrooms, floors and age) shown in the table below. Note, in this lab, the Size feature is in sqft while earlier labs utilized 1000 sqft. This data set is larger than the previous lab.\n",
|
||||
"\n",
|
||||
"We would like to build a linear regression model using these values so we can then predict the price for other houses - say, a house with 1200 sqft, 3 bedrooms, 1 floor, 40 years old. \n",
|
||||
"\n",
|
||||
"## Dataset: \n",
|
||||
"| Size (sqft) | Number of Bedrooms | Number of floors | Age of Home | Price (1000s dollars) | \n",
|
||||
"| ----------------| ------------------- |----------------- |--------------|----------------------- | \n",
|
||||
"| 952 | 2 | 1 | 65 | 271.5 | \n",
|
||||
"| 1244 | 3 | 2 | 64 | 232 | \n",
|
||||
"| 1947 | 3 | 2 | 17 | 509.8 | \n",
|
||||
"| ... | ... | ... | ... | ... |\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# load the dataset\n",
|
||||
"X_train, y_train = load_house_data()\n",
|
||||
"X_features = ['size(sqft)','bedrooms','floors','age']"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's view the dataset and its features by plotting each feature versus price."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"fig,ax=plt.subplots(1, 4, figsize=(12, 3), sharey=True)\n",
|
||||
"for i in range(len(ax)):\n",
|
||||
" ax[i].scatter(X_train[:,i],y_train)\n",
|
||||
" ax[i].set_xlabel(X_features[i])\n",
|
||||
"ax[0].set_ylabel(\"Price (1000's)\")\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Plotting each feature vs. the target, price, provides some indication of which features have the strongest influence on price. Above, increasing size also increases price. Bedrooms and floors don't seem to have a strong impact on price. Newer houses have higher prices than older houses."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_5\"></a>\n",
|
||||
"## Gradient Descent With Multiple Variables\n",
|
||||
"Here are the equations you developed in the last lab on gradient descent for multiple variables.:\n",
|
||||
"\n",
|
||||
"$$\\begin{align*} \\text{repeat}&\\text{ until convergence:} \\; \\lbrace \\newline\\;\n",
|
||||
"& w_j := w_j - \\alpha \\frac{\\partial J(\\mathbf{w},b)}{\\partial w_j} \\tag{1} \\; & \\text{for j = 0..n-1}\\newline\n",
|
||||
"&b\\ \\ := b - \\alpha \\frac{\\partial J(\\mathbf{w},b)}{\\partial b} \\newline \\rbrace\n",
|
||||
"\\end{align*}$$\n",
|
||||
"\n",
|
||||
"where, n is the number of features, parameters $w_j$, $b$, are updated simultaneously and where \n",
|
||||
"\n",
|
||||
"$$\n",
|
||||
"\\begin{align}\n",
|
||||
"\\frac{\\partial J(\\mathbf{w},b)}{\\partial w_j} &= \\frac{1}{m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) - y^{(i)})x_{j}^{(i)} \\tag{2} \\\\\n",
|
||||
"\\frac{\\partial J(\\mathbf{w},b)}{\\partial b} &= \\frac{1}{m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) - y^{(i)}) \\tag{3}\n",
|
||||
"\\end{align}\n",
|
||||
"$$\n",
|
||||
"* m is the number of training examples in the data set\n",
|
||||
"\n",
|
||||
" \n",
|
||||
"* $f_{\\mathbf{w},b}(\\mathbf{x}^{(i)})$ is the model's prediction, while $y^{(i)}$ is the target value\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Learning Rate\n",
|
||||
"<figure>\n",
|
||||
" <img src=\"./images/C1_W2_Lab06_learningrate.PNG\" style=\"width:1200px;\" >\n",
|
||||
"</figure>\n",
|
||||
"The lectures discussed some of the issues related to setting the learning rate $\\alpha$. The learning rate controls the size of the update to the parameters. See equation (1) above. It is shared by all the parameters. \n",
|
||||
"\n",
|
||||
"Let's run gradient descent and try a few settings of $\\alpha$ on our data set"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### $\\alpha$ = 9.9e-7"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#set alpha to 9.9e-7\n",
|
||||
"_, _, hist = run_gradient_descent(X_train, y_train, 10, alpha = 9.9e-7)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"It appears the learning rate is too high. The solution does not converge. Cost is *increasing* rather than decreasing. Let's plot the result:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"plot_cost_i_w(X_train, y_train, hist)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The plot on the right shows the value of one of the parameters, $w_0$. At each iteration, it is overshooting the optimal value and as a result, cost ends up *increasing* rather than approaching the minimum. Note that this is not a completely accurate picture as there are 4 parameters being modified each pass rather than just one. This plot is only showing $w_0$ with the other parameters fixed at benign values. In this and later plots you may notice the blue and orange lines being slightly off."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"### $\\alpha$ = 9e-7\n",
|
||||
"Let's try a bit smaller value and see what happens."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#set alpha to 9e-7\n",
|
||||
"_,_,hist = run_gradient_descent(X_train, y_train, 10, alpha = 9e-7)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Cost is decreasing throughout the run showing that alpha is not too large. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"plot_cost_i_w(X_train, y_train, hist)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"On the left, you see that cost is decreasing as it should. On the right, you can see that $w_0$ is still oscillating around the minimum, but it is decreasing each iteration rather than increasing. Note above that `dj_dw[0]` changes sign with each iteration as `w[0]` jumps over the optimal value.\n",
|
||||
"This alpha value will converge. You can vary the number of iterations to see how it behaves."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### $\\alpha$ = 1e-7\n",
|
||||
"Let's try a bit smaller value for $\\alpha$ and see what happens."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#set alpha to 1e-7\n",
|
||||
"_,_,hist = run_gradient_descent(X_train, y_train, 10, alpha = 1e-7)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Cost is decreasing throughout the run showing that $\\alpha$ is not too large. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"plot_cost_i_w(X_train,y_train,hist)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"On the left, you see that cost is decreasing as it should. On the right you can see that $w_0$ is decreasing without crossing the minimum. Note above that `dj_w0` is negative throughout the run. This solution will also converge, though not quite as quickly as the previous example."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"source": [
|
||||
"## Feature Scaling \n",
|
||||
"<figure>\n",
|
||||
" <img src=\"./images/C1_W2_Lab06_featurescalingheader.PNG\" style=\"width:1200px;\" >\n",
|
||||
"</figure>\n",
|
||||
"The lectures described the importance of rescaling the dataset so the features have a similar range.\n",
|
||||
"If you are interested in the details of why this is the case, click on the 'details' header below. If not, the section below will walk through an implementation of how to do feature scaling."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<details>\n",
|
||||
"<summary>\n",
|
||||
" <font size='3', color='darkgreen'><b>Details</b></font>\n",
|
||||
"</summary>\n",
|
||||
"\n",
|
||||
"Let's look again at the situation with $\\alpha$ = 9e-7. This is pretty close to the maximum value we can set $\\alpha$ to without diverging. This is a short run showing the first few iterations:\n",
|
||||
"\n",
|
||||
"<figure>\n",
|
||||
" <img src=\"./images/C1_W2_Lab06_ShortRun.PNG\" style=\"width:1200px;\" >\n",
|
||||
"</figure>\n",
|
||||
"\n",
|
||||
"Above, while cost is being decreased, its clear that $w_0$ is making more rapid progress than the other parameters due to its much larger gradient.\n",
|
||||
"\n",
|
||||
"The graphic below shows the result of a very long run with $\\alpha$ = 9e-7. This takes several hours.\n",
|
||||
"\n",
|
||||
"<figure>\n",
|
||||
" <img src=\"./images/C1_W2_Lab06_LongRun.PNG\" style=\"width:1200px;\" >\n",
|
||||
"</figure>\n",
|
||||
" \n",
|
||||
"Above, you can see cost decreased slowly after its initial reduction. Notice the difference between `w0` and `w1`,`w2`,`w3` as well as `dj_dw0` and `dj_dw1-3`. `w0` reaches its near final value very quickly and `dj_dw0` has quickly decreased to a small value showing that `w0` is near the final value. The other parameters were reduced much more slowly.\n",
|
||||
"\n",
|
||||
"Why is this? Is there something we can improve? See below:\n",
|
||||
"<figure>\n",
|
||||
" <center> <img src=\"./images/C1_W2_Lab06_scale.PNG\" ></center>\n",
|
||||
"</figure> \n",
|
||||
"\n",
|
||||
"The figure above shows why $w$'s are updated unevenly. \n",
|
||||
"- $\\alpha$ is shared by all parameter updates ($w$'s and $b$).\n",
|
||||
"- the common error term is multiplied by the features for the $w$'s. (not $b$).\n",
|
||||
"- the features vary significantly in magnitude making some features update much faster than others. In this case, $w_0$ is multiplied by 'size(sqft)', which is generally > 1000, while $w_1$ is multiplied by 'number of bedrooms', which is generally 2-4. \n",
|
||||
" \n",
|
||||
"The solution is Feature Scaling."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The lectures discussed three different techniques: \n",
|
||||
"- Feature scaling, essentially dividing each positive feature by its maximum value, or more generally, rescale each feature by both its minimum and maximum values using (x-min)/(max-min). Both ways normalizes features to the range of -1 and 1, where the former method works for positive features which is simple and serves well for the lecture's example, and the latter method works for any features.\n",
|
||||
"- Mean normalization: $x_i := \\dfrac{x_i - \\mu_i}{max - min} $ \n",
|
||||
"- Z-score normalization which we will explore below. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"### z-score normalization \n",
|
||||
"After z-score normalization, all features will have a mean of 0 and a standard deviation of 1.\n",
|
||||
"\n",
|
||||
"To implement z-score normalization, adjust your input values as shown in this formula:\n",
|
||||
"$$x^{(i)}_j = \\dfrac{x^{(i)}_j - \\mu_j}{\\sigma_j} \\tag{4}$$ \n",
|
||||
"where $j$ selects a feature or a column in the $\\mathbf{X}$ matrix. $µ_j$ is the mean of all the values for feature (j) and $\\sigma_j$ is the standard deviation of feature (j).\n",
|
||||
"$$\n",
|
||||
"\\begin{align}\n",
|
||||
"\\mu_j &= \\frac{1}{m} \\sum_{i=0}^{m-1} x^{(i)}_j \\tag{5}\\\\\n",
|
||||
"\\sigma^2_j &= \\frac{1}{m} \\sum_{i=0}^{m-1} (x^{(i)}_j - \\mu_j)^2 \\tag{6}\n",
|
||||
"\\end{align}\n",
|
||||
"$$\n",
|
||||
"\n",
|
||||
">**Implementation Note:** When normalizing the features, it is important\n",
|
||||
"to store the values used for normalization - the mean value and the standard deviation used for the computations. After learning the parameters\n",
|
||||
"from the model, we often want to predict the prices of houses we have not\n",
|
||||
"seen before. Given a new x value (living room area and number of bed-\n",
|
||||
"rooms), we must first normalize x using the mean and standard deviation\n",
|
||||
"that we had previously computed from the training set.\n",
|
||||
"\n",
|
||||
"**Implementation**"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def zscore_normalize_features(X):\n",
|
||||
" \"\"\"\n",
|
||||
" computes X, zcore normalized by column\n",
|
||||
" \n",
|
||||
" Args:\n",
|
||||
" X (ndarray (m,n)) : input data, m examples, n features\n",
|
||||
" \n",
|
||||
" Returns:\n",
|
||||
" X_norm (ndarray (m,n)): input normalized by column\n",
|
||||
" mu (ndarray (n,)) : mean of each feature\n",
|
||||
" sigma (ndarray (n,)) : standard deviation of each feature\n",
|
||||
" \"\"\"\n",
|
||||
" # find the mean of each column/feature\n",
|
||||
" mu = np.mean(X, axis=0) # mu will have shape (n,)\n",
|
||||
" # find the standard deviation of each column/feature\n",
|
||||
" sigma = np.std(X, axis=0) # sigma will have shape (n,)\n",
|
||||
" # element-wise, subtract mu for that column from each example, divide by std for that column\n",
|
||||
" X_norm = (X - mu) / sigma \n",
|
||||
"\n",
|
||||
" return (X_norm, mu, sigma)\n",
|
||||
" \n",
|
||||
"#check our work\n",
|
||||
"#from sklearn.preprocessing import scale\n",
|
||||
"#scale(X_orig, axis=0, with_mean=True, with_std=True, copy=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's look at the steps involved in Z-score normalization. The plot below shows the transformation step by step."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"mu = np.mean(X_train,axis=0) \n",
|
||||
"sigma = np.std(X_train,axis=0) \n",
|
||||
"X_mean = (X_train - mu)\n",
|
||||
"X_norm = (X_train - mu)/sigma \n",
|
||||
"\n",
|
||||
"fig,ax=plt.subplots(1, 3, figsize=(12, 3))\n",
|
||||
"ax[0].scatter(X_train[:,0], X_train[:,3])\n",
|
||||
"ax[0].set_xlabel(X_features[0]); ax[0].set_ylabel(X_features[3]);\n",
|
||||
"ax[0].set_title(\"unnormalized\")\n",
|
||||
"ax[0].axis('equal')\n",
|
||||
"\n",
|
||||
"ax[1].scatter(X_mean[:,0], X_mean[:,3])\n",
|
||||
"ax[1].set_xlabel(X_features[0]); ax[0].set_ylabel(X_features[3]);\n",
|
||||
"ax[1].set_title(r\"X - $\\mu$\")\n",
|
||||
"ax[1].axis('equal')\n",
|
||||
"\n",
|
||||
"ax[2].scatter(X_norm[:,0], X_norm[:,3])\n",
|
||||
"ax[2].set_xlabel(X_features[0]); ax[0].set_ylabel(X_features[3]);\n",
|
||||
"ax[2].set_title(r\"Z-score normalized\")\n",
|
||||
"ax[2].axis('equal')\n",
|
||||
"plt.tight_layout(rect=[0, 0.03, 1, 0.95])\n",
|
||||
"fig.suptitle(\"distribution of features before, during, after normalization\")\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The plot above shows the relationship between two of the training set parameters, \"age\" and \"size(sqft)\". *These are plotted with equal scale*. \n",
|
||||
"- Left: Unnormalized: The range of values or the variance of the 'size(sqft)' feature is much larger than that of age\n",
|
||||
"- Middle: The first step removes the mean or average value from each feature. This leaves features that are centered around zero. It's difficult to see the difference for the 'age' feature, but 'size(sqft)' is clearly around zero.\n",
|
||||
"- Right: The second step divides by the standard deviation. This leaves both features centered at zero with a similar scale."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's normalize the data and compare it to the original data."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# normalize the original features\n",
|
||||
"X_norm, X_mu, X_sigma = zscore_normalize_features(X_train)\n",
|
||||
"print(f\"X_mu = {X_mu}, \\nX_sigma = {X_sigma}\")\n",
|
||||
"print(f\"Peak to Peak range by column in Raw X:{np.ptp(X_train,axis=0)}\") \n",
|
||||
"print(f\"Peak to Peak range by column in Normalized X:{np.ptp(X_norm,axis=0)}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The peak to peak range of each column is reduced from a factor of thousands to a factor of 2-3 by normalization."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"fig,ax=plt.subplots(1, 4, figsize=(12, 3))\n",
|
||||
"for i in range(len(ax)):\n",
|
||||
" norm_plot(ax[i],X_train[:,i],)\n",
|
||||
" ax[i].set_xlabel(X_features[i])\n",
|
||||
"ax[0].set_ylabel(\"count\");\n",
|
||||
"fig.suptitle(\"distribution of features before normalization\")\n",
|
||||
"plt.show()\n",
|
||||
"fig,ax=plt.subplots(1,4,figsize=(12,3))\n",
|
||||
"for i in range(len(ax)):\n",
|
||||
" norm_plot(ax[i],X_norm[:,i],)\n",
|
||||
" ax[i].set_xlabel(X_features[i])\n",
|
||||
"ax[0].set_ylabel(\"count\"); \n",
|
||||
"fig.suptitle(\"distribution of features after normalization\")\n",
|
||||
"\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Notice, above, the range of the normalized data (x-axis) is centered around zero and roughly +/- 2. Most importantly, the range is similar for each feature."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's re-run our gradient descent algorithm with normalized data.\n",
|
||||
"Note the **vastly larger value of alpha**. This will speed up gradient descent."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"w_norm, b_norm, hist = run_gradient_descent(X_norm, y_train, 1000, 1.0e-1, )"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The scaled features get very accurate results **much, much faster!**. Notice the gradient of each parameter is tiny by the end of this fairly short run. A learning rate of 0.1 is a good start for regression with normalized features.\n",
|
||||
"Let's plot our predictions versus the target values. Note, the prediction is made using the normalized feature while the plot is shown using the original feature values."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#predict target using normalized features\n",
|
||||
"m = X_norm.shape[0]\n",
|
||||
"yp = np.zeros(m)\n",
|
||||
"for i in range(m):\n",
|
||||
" yp[i] = np.dot(X_norm[i], w_norm) + b_norm\n",
|
||||
"\n",
|
||||
" # plot predictions and targets versus original features \n",
|
||||
"fig,ax=plt.subplots(1,4,figsize=(12, 3),sharey=True)\n",
|
||||
"for i in range(len(ax)):\n",
|
||||
" ax[i].scatter(X_train[:,i],y_train, label = 'target')\n",
|
||||
" ax[i].set_xlabel(X_features[i])\n",
|
||||
" ax[i].scatter(X_train[:,i],yp,color=dlc[\"dlorange\"], label = 'predict')\n",
|
||||
"ax[0].set_ylabel(\"Price\"); ax[0].legend();\n",
|
||||
"fig.suptitle(\"target versus prediction using z-score normalized model\")\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The results look good. A few points to note:\n",
|
||||
"- with multiple features, we can no longer have a single plot showing results versus features.\n",
|
||||
"- when generating the plot, the normalized features were used. Any predictions using the parameters learned from a normalized training set must also be normalized."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Prediction**\n",
|
||||
"The point of generating our model is to use it to predict housing prices that are not in the data set. Let's predict the price of a house with 1200 sqft, 3 bedrooms, 1 floor, 40 years old. Recall, that you must normalize the data with the mean and standard deviation derived when the training data was normalized. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# First, normalize out example.\n",
|
||||
"x_house = np.array([1200, 3, 1, 40])\n",
|
||||
"x_house_norm = (x_house - X_mu) / X_sigma\n",
|
||||
"print(x_house_norm)\n",
|
||||
"x_house_predict = np.dot(x_house_norm, w_norm) + b_norm\n",
|
||||
"print(f\" predicted price of a house with 1200 sqft, 3 bedrooms, 1 floor, 40 years old = ${x_house_predict*1000:0.0f}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Cost Contours** \n",
|
||||
"<img align=\"left\" src=\"./images/C1_W2_Lab06_contours.PNG\" style=\"width:240px;\" >Another way to view feature scaling is in terms of the cost contours. When feature scales do not match, the plot of cost versus parameters in a contour plot is asymmetric. \n",
|
||||
"\n",
|
||||
"In the plot below, the scale of the parameters is matched. The left plot is the cost contour plot of w[0], the square feet versus w[1], the number of bedrooms before normalizing the features. The plot is so asymmetric, the curves completing the contours are not visible. In contrast, when the features are normalized, the cost contour is much more symmetric. The result is that updates to parameters during gradient descent can make equal progress for each parameter. \n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"plt_equal_scale(X_train, X_norm, y_train)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"## Congratulations!\n",
|
||||
"In this lab you:\n",
|
||||
"- utilized the routines for linear regression with multiple features you developed in previous labs\n",
|
||||
"- explored the impact of the learning rate $\\alpha$ on convergence \n",
|
||||
"- discovered the value of feature scaling using z-score normalization in speeding convergence"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Acknowledgments\n",
|
||||
"The housing data was derived from the [Ames Housing dataset](http://jse.amstat.org/v19n3/decock.pdf) compiled by Dean De Cock for use in data science education."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.6"
|
||||
},
|
||||
"toc-autonumbering": false
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
344
work/C1_W2_Lab04_FeatEng_PolyReg_Soln.ipynb
Normal file
@ -0,0 +1,344 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Optional Lab: Feature Engineering and Polynomial Regression\n",
|
||||
"\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Goals\n",
|
||||
"In this lab you will:\n",
|
||||
"- explore feature engineering and polynomial regression which allows you to use the machinery of linear regression to fit very complicated, even very non-linear functions.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Tools\n",
|
||||
"You will utilize the function developed in previous labs as well as matplotlib and NumPy. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import numpy as np\n",
|
||||
"import matplotlib.pyplot as plt\n",
|
||||
"from lab_utils_multi import zscore_normalize_features, run_gradient_descent_feng\n",
|
||||
"np.set_printoptions(precision=2) # reduced display precision on numpy arrays"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name='FeatureEng'></a>\n",
|
||||
"# Feature Engineering and Polynomial Regression Overview\n",
|
||||
"\n",
|
||||
"Out of the box, linear regression provides a means of building models of the form:\n",
|
||||
"$$f_{\\mathbf{w},b} = w_0x_0 + w_1x_1+ ... + w_{n-1}x_{n-1} + b \\tag{1}$$ \n",
|
||||
"What if your features/data are non-linear or are combinations of features? For example, Housing prices do not tend to be linear with living area but penalize very small or very large houses resulting in the curves shown in the graphic above. How can we use the machinery of linear regression to fit this curve? Recall, the 'machinery' we have is the ability to modify the parameters $\\mathbf{w}$, $\\mathbf{b}$ in (1) to 'fit' the equation to the training data. However, no amount of adjusting of $\\mathbf{w}$,$\\mathbf{b}$ in (1) will achieve a fit to a non-linear curve.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name='PolynomialFeatures'></a>\n",
|
||||
"## Polynomial Features\n",
|
||||
"\n",
|
||||
"Above we were considering a scenario where the data was non-linear. Let's try using what we know so far to fit a non-linear curve. We'll start with a simple quadratic: $y = 1+x^2$\n",
|
||||
"\n",
|
||||
"You're familiar with all the routines we're using. They are available in the lab_utils.py file for review. We'll use [`np.c_[..]`](https://numpy.org/doc/stable/reference/generated/numpy.c_.html) which is a NumPy routine to concatenate along the column boundary."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# create target data\n",
|
||||
"x = np.arange(0, 20, 1)\n",
|
||||
"y = 1 + x**2\n",
|
||||
"X = x.reshape(-1, 1)\n",
|
||||
"\n",
|
||||
"model_w,model_b = run_gradient_descent_feng(X,y,iterations=1000, alpha = 1e-2)\n",
|
||||
"\n",
|
||||
"plt.scatter(x, y, marker='x', c='r', label=\"Actual Value\"); plt.title(\"no feature engineering\")\n",
|
||||
"plt.plot(x,X@model_w + model_b, label=\"Predicted Value\"); plt.xlabel(\"X\"); plt.ylabel(\"y\"); plt.legend(); plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Well, as expected, not a great fit. What is needed is something like $y= w_0x_0^2 + b$, or a **polynomial feature**.\n",
|
||||
"To accomplish this, you can modify the *input data* to *engineer* the needed features. If you swap the original data with a version that squares the $x$ value, then you can achieve $y= w_0x_0^2 + b$. Let's try it. Swap `X` for `X**2` below:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# create target data\n",
|
||||
"x = np.arange(0, 20, 1)\n",
|
||||
"y = 1 + x**2\n",
|
||||
"\n",
|
||||
"# Engineer features \n",
|
||||
"X = x**2 #<-- added engineered feature"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"X = X.reshape(-1, 1) #X should be a 2-D Matrix\n",
|
||||
"model_w,model_b = run_gradient_descent_feng(X, y, iterations=10000, alpha = 1e-5)\n",
|
||||
"\n",
|
||||
"plt.scatter(x, y, marker='x', c='r', label=\"Actual Value\"); plt.title(\"Added x**2 feature\")\n",
|
||||
"plt.plot(x, np.dot(X,model_w) + model_b, label=\"Predicted Value\"); plt.xlabel(\"x\"); plt.ylabel(\"y\"); plt.legend(); plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Great! near perfect fit. Notice the values of $\\mathbf{w}$ and b printed right above the graph: `w,b found by gradient descent: w: [1.], b: 0.0490`. Gradient descent modified our initial values of $\\mathbf{w},b $ to be (1.0,0.049) or a model of $y=1*x_0^2+0.049$, very close to our target of $y=1*x_0^2+1$. If you ran it longer, it could be a better match. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Selecting Features\n",
|
||||
"<a name='GDF'></a>\n",
|
||||
"Above, we knew that an $x^2$ term was required. It may not always be obvious which features are required. One could add a variety of potential features to try and find the most useful. For example, what if we had instead tried : $y=w_0x_0 + w_1x_1^2 + w_2x_2^3+b$ ? \n",
|
||||
"\n",
|
||||
"Run the next cells. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# create target data\n",
|
||||
"x = np.arange(0, 20, 1)\n",
|
||||
"y = x**2\n",
|
||||
"\n",
|
||||
"# engineer features .\n",
|
||||
"X = np.c_[x, x**2, x**3] #<-- added engineered feature"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"model_w,model_b = run_gradient_descent_feng(X, y, iterations=10000, alpha=1e-7)\n",
|
||||
"\n",
|
||||
"plt.scatter(x, y, marker='x', c='r', label=\"Actual Value\"); plt.title(\"x, x**2, x**3 features\")\n",
|
||||
"plt.plot(x, X@model_w + model_b, label=\"Predicted Value\"); plt.xlabel(\"x\"); plt.ylabel(\"y\"); plt.legend(); plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Note the value of $\\mathbf{w}$, `[0.08 0.54 0.03]` and b is `0.0106`.This implies the model after fitting/training is:\n",
|
||||
"$$ 0.08x + 0.54x^2 + 0.03x^3 + 0.0106 $$\n",
|
||||
"Gradient descent has emphasized the data that is the best fit to the $x^2$ data by increasing the $w_1$ term relative to the others. If you were to run for a very long time, it would continue to reduce the impact of the other terms. \n",
|
||||
">Gradient descent is picking the 'correct' features for us by emphasizing its associated parameter\n",
|
||||
"\n",
|
||||
"Let's review this idea:\n",
|
||||
"- Intially, the features were re-scaled so they are comparable to each other\n",
|
||||
"- less weight value implies less important/correct feature, and in extreme, when the weight becomes zero or very close to zero, the associated feature is not useful in fitting the model to the data.\n",
|
||||
"- above, after fitting, the weight associated with the $x^2$ feature is much larger than the weights for $x$ or $x^3$ as it is the most useful in fitting the data. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### An Alternate View\n",
|
||||
"Above, polynomial features were chosen based on how well they matched the target data. Another way to think about this is to note that we are still using linear regression once we have created new features. Given that, the best features will be linear relative to the target. This is best understood with an example. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# create target data\n",
|
||||
"x = np.arange(0, 20, 1)\n",
|
||||
"y = x**2\n",
|
||||
"\n",
|
||||
"# engineer features .\n",
|
||||
"X = np.c_[x, x**2, x**3] #<-- added engineered feature\n",
|
||||
"X_features = ['x','x^2','x^3']"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"fig,ax=plt.subplots(1, 3, figsize=(12, 3), sharey=True)\n",
|
||||
"for i in range(len(ax)):\n",
|
||||
" ax[i].scatter(X[:,i],y)\n",
|
||||
" ax[i].set_xlabel(X_features[i])\n",
|
||||
"ax[0].set_ylabel(\"y\")\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Above, it is clear that the $x^2$ feature mapped against the target value $y$ is linear. Linear regression can then easily generate a model using that feature."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Scaling features\n",
|
||||
"As described in the last lab, if the data set has features with significantly different scales, one should apply feature scaling to speed gradient descent. In the example above, there is $x$, $x^2$ and $x^3$ which will naturally have very different scales. Let's apply Z-score normalization to our example."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# create target data\n",
|
||||
"x = np.arange(0,20,1)\n",
|
||||
"X = np.c_[x, x**2, x**3]\n",
|
||||
"print(f\"Peak to Peak range by column in Raw X:{np.ptp(X,axis=0)}\")\n",
|
||||
"\n",
|
||||
"# add mean_normalization \n",
|
||||
"X = zscore_normalize_features(X) \n",
|
||||
"print(f\"Peak to Peak range by column in Normalized X:{np.ptp(X,axis=0)}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now we can try again with a more aggressive value of alpha:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"x = np.arange(0,20,1)\n",
|
||||
"y = x**2\n",
|
||||
"\n",
|
||||
"X = np.c_[x, x**2, x**3]\n",
|
||||
"X = zscore_normalize_features(X) \n",
|
||||
"\n",
|
||||
"model_w, model_b = run_gradient_descent_feng(X, y, iterations=100000, alpha=1e-1)\n",
|
||||
"\n",
|
||||
"plt.scatter(x, y, marker='x', c='r', label=\"Actual Value\"); plt.title(\"Normalized x x**2, x**3 feature\")\n",
|
||||
"plt.plot(x,X@model_w + model_b, label=\"Predicted Value\"); plt.xlabel(\"x\"); plt.ylabel(\"y\"); plt.legend(); plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Feature scaling allows this to converge much faster. \n",
|
||||
"Note again the values of $\\mathbf{w}$. The $w_1$ term, which is the $x^2$ term is the most emphasized. Gradient descent has all but eliminated the $x^3$ term."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Complex Functions\n",
|
||||
"With feature engineering, even quite complex functions can be modeled:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"x = np.arange(0,20,1)\n",
|
||||
"y = np.cos(x/2)\n",
|
||||
"\n",
|
||||
"X = np.c_[x, x**2, x**3,x**4, x**5, x**6, x**7, x**8, x**9, x**10, x**11, x**12, x**13]\n",
|
||||
"X = zscore_normalize_features(X) \n",
|
||||
"\n",
|
||||
"model_w,model_b = run_gradient_descent_feng(X, y, iterations=1000000, alpha = 1e-1)\n",
|
||||
"\n",
|
||||
"plt.scatter(x, y, marker='x', c='r', label=\"Actual Value\"); plt.title(\"Normalized x x**2, x**3 feature\")\n",
|
||||
"plt.plot(x,X@model_w + model_b, label=\"Predicted Value\"); plt.xlabel(\"x\"); plt.ylabel(\"y\"); plt.legend(); plt.show()\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"## Congratulations!\n",
|
||||
"In this lab you:\n",
|
||||
"- learned how linear regression can model complex, even highly non-linear functions using feature engineering\n",
|
||||
"- recognized that it is important to apply feature scaling when doing feature engineering"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.6"
|
||||
},
|
||||
"toc-autonumbering": false
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
222
work/C1_W2_Lab05_Sklearn_GD_Soln.ipynb
Normal file
@ -0,0 +1,222 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Optional Lab: Linear Regression using Scikit-Learn"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"There is an open-source, commercially usable machine learning toolkit called [scikit-learn](https://scikit-learn.org/stable/index.html). This toolkit contains implementations of many of the algorithms that you will work with in this course.\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Goals\n",
|
||||
"In this lab you will:\n",
|
||||
"- Utilize scikit-learn to implement linear regression using Gradient Descent"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Tools\n",
|
||||
"You will utilize functions from scikit-learn as well as matplotlib and NumPy. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import numpy as np\n",
|
||||
"import matplotlib.pyplot as plt\n",
|
||||
"from sklearn.linear_model import SGDRegressor\n",
|
||||
"from sklearn.preprocessing import StandardScaler\n",
|
||||
"from lab_utils_multi import load_house_data\n",
|
||||
"from lab_utils_common import dlc\n",
|
||||
"np.set_printoptions(precision=2)\n",
|
||||
"plt.style.use('./deeplearning.mplstyle')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Gradient Descent\n",
|
||||
"Scikit-learn has a gradient descent regression model [sklearn.linear_model.SGDRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDRegressor.html#examples-using-sklearn-linear-model-sgdregressor). Like your previous implementation of gradient descent, this model performs best with normalized inputs. [sklearn.preprocessing.StandardScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html#sklearn.preprocessing.StandardScaler) will perform z-score normalization as in a previous lab. Here it is referred to as 'standard score'."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Load the data set"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"X_train, y_train = load_house_data()\n",
|
||||
"X_features = ['size(sqft)','bedrooms','floors','age']"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Scale/normalize the training data"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"scaler = StandardScaler()\n",
|
||||
"X_norm = scaler.fit_transform(X_train)\n",
|
||||
"print(f\"Peak to Peak range by column in Raw X:{np.ptp(X_train,axis=0)}\") \n",
|
||||
"print(f\"Peak to Peak range by column in Normalized X:{np.ptp(X_norm,axis=0)}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Create and fit the regression model"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"sgdr = SGDRegressor(max_iter=1000)\n",
|
||||
"sgdr.fit(X_norm, y_train)\n",
|
||||
"print(sgdr)\n",
|
||||
"print(f\"number of iterations completed: {sgdr.n_iter_}, number of weight updates: {sgdr.t_}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### View parameters\n",
|
||||
"Note, the parameters are associated with the *normalized* input data. The fit parameters are very close to those found in the previous lab with this data."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"b_norm = sgdr.intercept_\n",
|
||||
"w_norm = sgdr.coef_\n",
|
||||
"print(f\"model parameters: w: {w_norm}, b:{b_norm}\")\n",
|
||||
"print( \"model parameters from previous lab: w: [110.56 -21.27 -32.71 -37.97], b: 363.16\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Make predictions\n",
|
||||
"Predict the targets of the training data. Use both the `predict` routine and compute using $w$ and $b$."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# make a prediction using sgdr.predict()\n",
|
||||
"y_pred_sgd = sgdr.predict(X_norm)\n",
|
||||
"# make a prediction using w,b. \n",
|
||||
"y_pred = np.dot(X_norm, w_norm) + b_norm \n",
|
||||
"print(f\"prediction using np.dot() and sgdr.predict match: {(y_pred == y_pred_sgd).all()}\")\n",
|
||||
"\n",
|
||||
"print(f\"Prediction on training set:\\n{y_pred[:4]}\" )\n",
|
||||
"print(f\"Target values \\n{y_train[:4]}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Plot Results\n",
|
||||
"Let's plot the predictions versus the target values."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# plot predictions and targets vs original features \n",
|
||||
"fig,ax=plt.subplots(1,4,figsize=(12,3),sharey=True)\n",
|
||||
"for i in range(len(ax)):\n",
|
||||
" ax[i].scatter(X_train[:,i],y_train, label = 'target')\n",
|
||||
" ax[i].set_xlabel(X_features[i])\n",
|
||||
" ax[i].scatter(X_train[:,i],y_pred,color=dlc[\"dlorange\"], label = 'predict')\n",
|
||||
"ax[0].set_ylabel(\"Price\"); ax[0].legend();\n",
|
||||
"fig.suptitle(\"target versus prediction using z-score normalized model\")\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Congratulations!\n",
|
||||
"In this lab you:\n",
|
||||
"- utilized an open-source machine learning toolkit, scikit-learn\n",
|
||||
"- implemented linear regression using gradient descent and feature normalization from that toolkit"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.8.10"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
241
work/C1_W2_Lab06_Sklearn_Normal_Soln.ipynb
Normal file
@ -0,0 +1,241 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Optional Lab: Linear Regression using Scikit-Learn"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"There is an open-source, commercially usable machine learning toolkit called [scikit-learn](https://scikit-learn.org/stable/index.html). This toolkit contains implementations of many of the algorithms that you will work with in this course.\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Goals\n",
|
||||
"In this lab you will:\n",
|
||||
"- Utilize scikit-learn to implement linear regression using a close form solution based on the normal equation"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Tools\n",
|
||||
"You will utilize functions from scikit-learn as well as matplotlib and NumPy. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import numpy as np\n",
|
||||
"import matplotlib.pyplot as plt\n",
|
||||
"from sklearn.linear_model import LinearRegression\n",
|
||||
"from lab_utils_multi import load_house_data\n",
|
||||
"plt.style.use('./deeplearning.mplstyle')\n",
|
||||
"np.set_printoptions(precision=2)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40291_2\"></a>\n",
|
||||
"# Linear Regression, closed-form solution\n",
|
||||
"Scikit-learn has the [linear regression model](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression) which implements a closed-form linear regression.\n",
|
||||
"\n",
|
||||
"Let's use the data from the early labs - a house with 1000 square feet sold for \\\\$300,000 and a house with 2000 square feet sold for \\\\$500,000.\n",
|
||||
"\n",
|
||||
"| Size (1000 sqft) | Price (1000s of dollars) |\n",
|
||||
"| ----------------| ------------------------ |\n",
|
||||
"| 1 | 300 |\n",
|
||||
"| 2 | 500 |\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Load the data set"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"X_train = np.array([1.0, 2.0]) #features\n",
|
||||
"y_train = np.array([300, 500]) #target value"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Create and fit the model\n",
|
||||
"The code below performs regression using scikit-learn. \n",
|
||||
"The first step creates a regression object. \n",
|
||||
"The second step utilizes one of the methods associated with the object, `fit`. This performs regression, fitting the parameters to the input data. The toolkit expects a two-dimensional X matrix."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"linear_model = LinearRegression()\n",
|
||||
"#X must be a 2-D Matrix\n",
|
||||
"linear_model.fit(X_train.reshape(-1, 1), y_train) "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### View Parameters \n",
|
||||
"The $\\mathbf{w}$ and $\\mathbf{b}$ parameters are referred to as 'coefficients' and 'intercept' in scikit-learn."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"b = linear_model.intercept_\n",
|
||||
"w = linear_model.coef_\n",
|
||||
"print(f\"w = {w:}, b = {b:0.2f}\")\n",
|
||||
"print(f\"'manual' prediction: f_wb = wx+b : {1200*w + b}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Make Predictions\n",
|
||||
"\n",
|
||||
"Calling the `predict` function generates predictions."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"y_pred = linear_model.predict(X_train.reshape(-1, 1))\n",
|
||||
"\n",
|
||||
"print(\"Prediction on training set:\", y_pred)\n",
|
||||
"\n",
|
||||
"X_test = np.array([[1200]])\n",
|
||||
"print(f\"Prediction for 1200 sqft house: ${linear_model.predict(X_test)[0]:0.2f}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Second Example\n",
|
||||
"The second example is from an earlier lab with multiple features. The final parameter values and predictions are very close to the results from the un-normalized 'long-run' from that lab. That un-normalized run took hours to produce results, while this is nearly instantaneous. The closed-form solution work well on smaller data sets such as these but can be computationally demanding on larger data sets. \n",
|
||||
">The closed-form solution does not require normalization."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# load the dataset\n",
|
||||
"X_train, y_train = load_house_data()\n",
|
||||
"X_features = ['size(sqft)','bedrooms','floors','age']"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"linear_model = LinearRegression()\n",
|
||||
"linear_model.fit(X_train, y_train) "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"b = linear_model.intercept_\n",
|
||||
"w = linear_model.coef_\n",
|
||||
"print(f\"w = {w:}, b = {b:0.2f}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print(f\"Prediction on training set:\\n {linear_model.predict(X_train)[:4]}\" )\n",
|
||||
"print(f\"prediction using w,b:\\n {(X_train @ w + b)[:4]}\")\n",
|
||||
"print(f\"Target values \\n {y_train[:4]}\")\n",
|
||||
"\n",
|
||||
"x_house = np.array([1200, 3,1, 40]).reshape(-1,4)\n",
|
||||
"x_house_predict = linear_model.predict(x_house)[0]\n",
|
||||
"print(f\" predicted price of a house with 1200 sqft, 3 bedrooms, 1 floor, 40 years old = ${x_house_predict*1000:0.2f}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Congratulations!\n",
|
||||
"In this lab you:\n",
|
||||
"- utilized an open-source machine learning toolkit, scikit-learn\n",
|
||||
"- implemented linear regression using a close-form solution from that toolkit"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.8.10"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
BIN
work/__pycache__/lab_utils_common.cpython-37.pyc
Normal file
BIN
work/__pycache__/lab_utils_multi.cpython-37.pyc
Normal file
BIN
work/__pycache__/lab_utils_uni.cpython-37.pyc
Normal file
@ -0,0 +1,730 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Optional Lab: Python, NumPy and Vectorization\n",
|
||||
"A brief introduction to some of the scientific computing used in this course. In particular the NumPy scientific computing package and its use with python.\n",
|
||||
"\n",
|
||||
"# Outline\n",
|
||||
"- [ 1.1 Goals](#toc_40015_1.1)\n",
|
||||
"- [ 1.2 Useful References](#toc_40015_1.2)\n",
|
||||
"- [2 Python and NumPy <a name='Python and NumPy'></a>](#toc_40015_2)\n",
|
||||
"- [3 Vectors](#toc_40015_3)\n",
|
||||
"- [ 3.1 Abstract](#toc_40015_3.1)\n",
|
||||
"- [ 3.2 NumPy Arrays](#toc_40015_3.2)\n",
|
||||
"- [ 3.3 Vector Creation](#toc_40015_3.3)\n",
|
||||
"- [ 3.4 Operations on Vectors](#toc_40015_3.4)\n",
|
||||
"- [4 Matrices](#toc_40015_4)\n",
|
||||
"- [ 4.1 Abstract](#toc_40015_4.1)\n",
|
||||
"- [ 4.2 NumPy Arrays](#toc_40015_4.2)\n",
|
||||
"- [ 4.3 Matrix Creation](#toc_40015_4.3)\n",
|
||||
"- [ 4.4 Operations on Matrices](#toc_40015_4.4)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import numpy as np # it is an unofficial standard to use np for numpy\n",
|
||||
"import time"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_1.1\"></a>\n",
|
||||
"## 1.1 Goals\n",
|
||||
"In this lab, you will:\n",
|
||||
"- Review the features of NumPy and Python that are used in Course 1"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_1.2\"></a>\n",
|
||||
"## 1.2 Useful References\n",
|
||||
"- NumPy Documentation including a basic introduction: [NumPy.org](https://NumPy.org/doc/stable/)\n",
|
||||
"- A challenging feature topic: [NumPy Broadcasting](https://NumPy.org/doc/stable/user/basics.broadcasting.html)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_2\"></a>\n",
|
||||
"# 2 Python and NumPy <a name='Python and NumPy'></a>\n",
|
||||
"Python is the programming language we will be using in this course. It has a set of numeric data types and arithmetic operations. NumPy is a library that extends the base capabilities of python to add a richer data set including more numeric types, vectors, matrices, and many matrix functions. NumPy and python work together fairly seamlessly. Python arithmetic operators work on NumPy data types and many NumPy functions will accept python data types.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3\"></a>\n",
|
||||
"# 3 Vectors\n",
|
||||
"<a name=\"toc_40015_3.1\"></a>\n",
|
||||
"## 3.1 Abstract\n",
|
||||
"<img align=\"right\" src=\"./images/C1_W2_Lab04_Vectors.PNG\" style=\"width:340px;\" >Vectors, as you will use them in this course, are ordered arrays of numbers. In notation, vectors are denoted with lower case bold letters such as $\\mathbf{x}$. The elements of a vector are all the same type. A vector does not, for example, contain both characters and numbers. The number of elements in the array is often referred to as the *dimension* though mathematicians may prefer *rank*. The vector shown has a dimension of $n$. The elements of a vector can be referenced with an index. In math settings, indexes typically run from 1 to n. In computer science and these labs, indexing will typically run from 0 to n-1. In notation, elements of a vector, when referenced individually will indicate the index in a subscript, for example, the $0^{th}$ element, of the vector $\\mathbf{x}$ is $x_0$. Note, the x is not bold in this case. \n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.2\"></a>\n",
|
||||
"## 3.2 NumPy Arrays\n",
|
||||
"\n",
|
||||
"NumPy's basic data structure is an indexable, n-dimensional *array* containing elements of the same type (`dtype`). Right away, you may notice we have overloaded the term 'dimension'. Above, it was the number of elements in the vector, here, dimension refers to the number of indexes of an array. A one-dimensional or 1-D array has one index. In Course 1, we will represent vectors as NumPy 1-D arrays. \n",
|
||||
"\n",
|
||||
" - 1-D array, shape (n,): n elements indexed [0] through [n-1]\n",
|
||||
" "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.3\"></a>\n",
|
||||
"## 3.3 Vector Creation\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Data creation routines in NumPy will generally have a first parameter which is the shape of the object. This can either be a single value for a 1-D result or a tuple (n,m,...) specifying the shape of the result. Below are examples of creating vectors using these routines."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# NumPy routines which allocate memory and fill arrays with value\n",
|
||||
"a = np.zeros(4); print(f\"np.zeros(4) : a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")\n",
|
||||
"a = np.zeros((4,)); print(f\"np.zeros(4,) : a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")\n",
|
||||
"a = np.random.random_sample(4); print(f\"np.random.random_sample(4): a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Some data creation routines do not take a shape tuple:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# NumPy routines which allocate memory and fill arrays with value but do not accept shape as input argument\n",
|
||||
"a = np.arange(4.); print(f\"np.arange(4.): a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")\n",
|
||||
"a = np.random.rand(4); print(f\"np.random.rand(4): a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"values can be specified manually as well. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# NumPy routines which allocate memory and fill with user specified values\n",
|
||||
"a = np.array([5,4,3,2]); print(f\"np.array([5,4,3,2]): a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")\n",
|
||||
"a = np.array([5.,4,3,2]); print(f\"np.array([5.,4,3,2]): a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"These have all created a one-dimensional vector `a` with four elements. `a.shape` returns the dimensions. Here we see a.shape = `(4,)` indicating a 1-d array with 4 elements. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.4\"></a>\n",
|
||||
"## 3.4 Operations on Vectors\n",
|
||||
"Let's explore some operations using vectors.\n",
|
||||
"<a name=\"toc_40015_3.4.1\"></a>\n",
|
||||
"### 3.4.1 Indexing\n",
|
||||
"Elements of vectors can be accessed via indexing and slicing. NumPy provides a very complete set of indexing and slicing capabilities. We will explore only the basics needed for the course here. Reference [Slicing and Indexing](https://NumPy.org/doc/stable/reference/arrays.indexing.html) for more details. \n",
|
||||
"**Indexing** means referring to *an element* of an array by its position within the array. \n",
|
||||
"**Slicing** means getting a *subset* of elements from an array based on their indices. \n",
|
||||
"NumPy starts indexing at zero so the 3rd element of an vector $\\mathbf{a}$ is `a[2]`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#vector indexing operations on 1-D vectors\n",
|
||||
"a = np.arange(10)\n",
|
||||
"print(a)\n",
|
||||
"\n",
|
||||
"#access an element\n",
|
||||
"print(f\"a[2].shape: {a[2].shape} a[2] = {a[2]}, Accessing an element returns a scalar\")\n",
|
||||
"\n",
|
||||
"# access the last element, negative indexes count from the end\n",
|
||||
"print(f\"a[-1] = {a[-1]}\")\n",
|
||||
"\n",
|
||||
"#indexs must be within the range of the vector or they will produce and error\n",
|
||||
"try:\n",
|
||||
" c = a[10]\n",
|
||||
"except Exception as e:\n",
|
||||
" print(\"The error message you'll see is:\")\n",
|
||||
" print(e)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.4.2\"></a>\n",
|
||||
"### 3.4.2 Slicing\n",
|
||||
"Slicing creates an array of indices using a set of three values (`start:stop:step`). A subset of values is also valid. Its use is best explained by example:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#vector slicing operations\n",
|
||||
"a = np.arange(10)\n",
|
||||
"print(f\"a = {a}\")\n",
|
||||
"\n",
|
||||
"#access 5 consecutive elements (start:stop:step)\n",
|
||||
"c = a[2:7:1]; print(\"a[2:7:1] = \", c)\n",
|
||||
"\n",
|
||||
"# access 3 elements separated by two \n",
|
||||
"c = a[2:7:2]; print(\"a[2:7:2] = \", c)\n",
|
||||
"\n",
|
||||
"# access all elements index 3 and above\n",
|
||||
"c = a[3:]; print(\"a[3:] = \", c)\n",
|
||||
"\n",
|
||||
"# access all elements below index 3\n",
|
||||
"c = a[:3]; print(\"a[:3] = \", c)\n",
|
||||
"\n",
|
||||
"# access all elements\n",
|
||||
"c = a[:]; print(\"a[:] = \", c)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.4.3\"></a>\n",
|
||||
"### 3.4.3 Single vector operations\n",
|
||||
"There are a number of useful operations that involve operations on a single vector."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"a = np.array([1,2,3,4])\n",
|
||||
"print(f\"a : {a}\")\n",
|
||||
"# negate elements of a\n",
|
||||
"b = -a \n",
|
||||
"print(f\"b = -a : {b}\")\n",
|
||||
"\n",
|
||||
"# sum all elements of a, returns a scalar\n",
|
||||
"b = np.sum(a) \n",
|
||||
"print(f\"b = np.sum(a) : {b}\")\n",
|
||||
"\n",
|
||||
"b = np.mean(a)\n",
|
||||
"print(f\"b = np.mean(a): {b}\")\n",
|
||||
"\n",
|
||||
"b = a**2\n",
|
||||
"print(f\"b = a**2 : {b}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.4.4\"></a>\n",
|
||||
"### 3.4.4 Vector Vector element-wise operations\n",
|
||||
"Most of the NumPy arithmetic, logical and comparison operations apply to vectors as well. These operators work on an element-by-element basis. For example \n",
|
||||
"$$ \\mathbf{a} + \\mathbf{b} = \\sum_{i=0}^{n-1} a_i + b_i $$"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"a = np.array([ 1, 2, 3, 4])\n",
|
||||
"b = np.array([-1,-2, 3, 4])\n",
|
||||
"print(f\"Binary operators work element wise: {a + b}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Of course, for this to work correctly, the vectors must be of the same size:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#try a mismatched vector operation\n",
|
||||
"c = np.array([1, 2])\n",
|
||||
"try:\n",
|
||||
" d = a + c\n",
|
||||
"except Exception as e:\n",
|
||||
" print(\"The error message you'll see is:\")\n",
|
||||
" print(e)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.4.5\"></a>\n",
|
||||
"### 3.4.5 Scalar Vector operations\n",
|
||||
"Vectors can be 'scaled' by scalar values. A scalar value is just a number. The scalar multiplies all the elements of the vector."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"a = np.array([1, 2, 3, 4])\n",
|
||||
"\n",
|
||||
"# multiply a by a scalar\n",
|
||||
"b = 5 * a \n",
|
||||
"print(f\"b = 5 * a : {b}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.4.6\"></a>\n",
|
||||
"### 3.4.6 Vector Vector dot product\n",
|
||||
"The dot product is a mainstay of Linear Algebra and NumPy. This is an operation used extensively in this course and should be well understood. The dot product is shown below."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<img src=\"./images/C1_W2_Lab04_dot_notrans.gif\" width=800> "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The dot product multiplies the values in two vectors element-wise and then sums the result.\n",
|
||||
"Vector dot product requires the dimensions of the two vectors to be the same. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's implement our own version of the dot product below:\n",
|
||||
"\n",
|
||||
"**Using a for loop**, implement a function which returns the dot product of two vectors. The function to return given inputs $a$ and $b$:\n",
|
||||
"$$ x = \\sum_{i=0}^{n-1} a_i b_i $$\n",
|
||||
"Assume both `a` and `b` are the same shape."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def my_dot(a, b): \n",
|
||||
" \"\"\"\n",
|
||||
" Compute the dot product of two vectors\n",
|
||||
" \n",
|
||||
" Args:\n",
|
||||
" a (ndarray (n,)): input vector \n",
|
||||
" b (ndarray (n,)): input vector with same dimension as a\n",
|
||||
" \n",
|
||||
" Returns:\n",
|
||||
" x (scalar): \n",
|
||||
" \"\"\"\n",
|
||||
" x=0\n",
|
||||
" for i in range(a.shape[0]):\n",
|
||||
" x = x + a[i] * b[i]\n",
|
||||
" return x"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# test 1-D\n",
|
||||
"a = np.array([1, 2, 3, 4])\n",
|
||||
"b = np.array([-1, 4, 3, 2])\n",
|
||||
"print(f\"my_dot(a, b) = {my_dot(a, b)}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Note, the dot product is expected to return a scalar value. \n",
|
||||
"\n",
|
||||
"Let's try the same operations using `np.dot`. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# test 1-D\n",
|
||||
"a = np.array([1, 2, 3, 4])\n",
|
||||
"b = np.array([-1, 4, 3, 2])\n",
|
||||
"c = np.dot(a, b)\n",
|
||||
"print(f\"NumPy 1-D np.dot(a, b) = {c}, np.dot(a, b).shape = {c.shape} \") \n",
|
||||
"c = np.dot(b, a)\n",
|
||||
"print(f\"NumPy 1-D np.dot(b, a) = {c}, np.dot(a, b).shape = {c.shape} \")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Above, you will note that the results for 1-D matched our implementation."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.4.7\"></a>\n",
|
||||
"### 3.4.7 The Need for Speed: vector vs for loop\n",
|
||||
"We utilized the NumPy library because it improves speed memory efficiency. Let's demonstrate:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"np.random.seed(1)\n",
|
||||
"a = np.random.rand(10000000) # very large arrays\n",
|
||||
"b = np.random.rand(10000000)\n",
|
||||
"\n",
|
||||
"tic = time.time() # capture start time\n",
|
||||
"c = np.dot(a, b)\n",
|
||||
"toc = time.time() # capture end time\n",
|
||||
"\n",
|
||||
"print(f\"np.dot(a, b) = {c:.4f}\")\n",
|
||||
"print(f\"Vectorized version duration: {1000*(toc-tic):.4f} ms \")\n",
|
||||
"\n",
|
||||
"tic = time.time() # capture start time\n",
|
||||
"c = my_dot(a,b)\n",
|
||||
"toc = time.time() # capture end time\n",
|
||||
"\n",
|
||||
"print(f\"my_dot(a, b) = {c:.4f}\")\n",
|
||||
"print(f\"loop version duration: {1000*(toc-tic):.4f} ms \")\n",
|
||||
"\n",
|
||||
"del(a);del(b) #remove these big arrays from memory"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"So, vectorization provides a large speed up in this example. This is because NumPy makes better use of available data parallelism in the underlying hardware. GPU's and modern CPU's implement Single Instruction, Multiple Data (SIMD) pipelines allowing multiple operations to be issued in parallel. This is critical in Machine Learning where the data sets are often very large."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_12345_3.4.8\"></a>\n",
|
||||
"### 3.4.8 Vector Vector operations in Course 1\n",
|
||||
"Vector Vector operations will appear frequently in course 1. Here is why:\n",
|
||||
"- Going forward, our examples will be stored in an array, `X_train` of dimension (m,n). This will be explained more in context, but here it is important to note it is a 2 Dimensional array or matrix (see next section on matrices).\n",
|
||||
"- `w` will be a 1-dimensional vector of shape (n,).\n",
|
||||
"- we will perform operations by looping through the examples, extracting each example to work on individually by indexing X. For example:`X[i]`\n",
|
||||
"- `X[i]` returns a value of shape (n,), a 1-dimensional vector. Consequently, operations involving `X[i]` are often vector-vector. \n",
|
||||
"\n",
|
||||
"That is a somewhat lengthy explanation, but aligning and understanding the shapes of your operands is important when performing vector operations."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# show common Course 1 example\n",
|
||||
"X = np.array([[1],[2],[3],[4]])\n",
|
||||
"w = np.array([2])\n",
|
||||
"c = np.dot(X[1], w)\n",
|
||||
"\n",
|
||||
"print(f\"X[1] has shape {X[1].shape}\")\n",
|
||||
"print(f\"w has shape {w.shape}\")\n",
|
||||
"print(f\"c has shape {c.shape}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_4\"></a>\n",
|
||||
"# 4 Matrices\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_4.1\"></a>\n",
|
||||
"## 4.1 Abstract\n",
|
||||
"Matrices, are two dimensional arrays. The elements of a matrix are all of the same type. In notation, matrices are denoted with capitol, bold letter such as $\\mathbf{X}$. In this and other labs, `m` is often the number of rows and `n` the number of columns. The elements of a matrix can be referenced with a two dimensional index. In math settings, numbers in the index typically run from 1 to n. In computer science and these labs, indexing will run from 0 to n-1. \n",
|
||||
"<figure>\n",
|
||||
" <center> <img src=\"./images/C1_W2_Lab04_Matrices.PNG\" alt='missing' width=900><center/>\n",
|
||||
" <figcaption> Generic Matrix Notation, 1st index is row, 2nd is column </figcaption>\n",
|
||||
"<figure/>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_4.2\"></a>\n",
|
||||
"## 4.2 NumPy Arrays\n",
|
||||
"\n",
|
||||
"NumPy's basic data structure is an indexable, n-dimensional *array* containing elements of the same type (`dtype`). These were described earlier. Matrices have a two-dimensional (2-D) index [m,n].\n",
|
||||
"\n",
|
||||
"In Course 1, 2-D matrices are used to hold training data. Training data is $m$ examples by $n$ features creating an (m,n) array. Course 1 does not do operations directly on matrices but typically extracts an example as a vector and operates on that. Below you will review: \n",
|
||||
"- data creation\n",
|
||||
"- slicing and indexing"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_4.3\"></a>\n",
|
||||
"## 4.3 Matrix Creation\n",
|
||||
"The same functions that created 1-D vectors will create 2-D or n-D arrays. Here are some examples\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Below, the shape tuple is provided to achieve a 2-D result. Notice how NumPy uses brackets to denote each dimension. Notice further than NumPy, when printing, will print one row per line.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"a = np.zeros((1, 5)) \n",
|
||||
"print(f\"a shape = {a.shape}, a = {a}\") \n",
|
||||
"\n",
|
||||
"a = np.zeros((2, 1)) \n",
|
||||
"print(f\"a shape = {a.shape}, a = {a}\") \n",
|
||||
"\n",
|
||||
"a = np.random.random_sample((1, 1)) \n",
|
||||
"print(f\"a shape = {a.shape}, a = {a}\") "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"One can also manually specify data. Dimensions are specified with additional brackets matching the format in the printing above."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# NumPy routines which allocate memory and fill with user specified values\n",
|
||||
"a = np.array([[5], [4], [3]]); print(f\" a shape = {a.shape}, np.array: a = {a}\")\n",
|
||||
"a = np.array([[5], # One can also\n",
|
||||
" [4], # separate values\n",
|
||||
" [3]]); #into separate rows\n",
|
||||
"print(f\" a shape = {a.shape}, np.array: a = {a}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_4.4\"></a>\n",
|
||||
"## 4.4 Operations on Matrices\n",
|
||||
"Let's explore some operations using matrices."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_4.4.1\"></a>\n",
|
||||
"### 4.4.1 Indexing\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Matrices include a second index. The two indexes describe [row, column]. Access can either return an element or a row/column. See below:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#vector indexing operations on matrices\n",
|
||||
"a = np.arange(6).reshape(-1, 2) #reshape is a convenient way to create matrices\n",
|
||||
"print(f\"a.shape: {a.shape}, \\na= {a}\")\n",
|
||||
"\n",
|
||||
"#access an element\n",
|
||||
"print(f\"\\na[2,0].shape: {a[2, 0].shape}, a[2,0] = {a[2, 0]}, type(a[2,0]) = {type(a[2, 0])} Accessing an element returns a scalar\\n\")\n",
|
||||
"\n",
|
||||
"#access a row\n",
|
||||
"print(f\"a[2].shape: {a[2].shape}, a[2] = {a[2]}, type(a[2]) = {type(a[2])}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"It is worth drawing attention to the last example. Accessing a matrix by just specifying the row will return a *1-D vector*."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Reshape** \n",
|
||||
"The previous example used [reshape](https://numpy.org/doc/stable/reference/generated/numpy.reshape.html) to shape the array. \n",
|
||||
"`a = np.arange(6).reshape(-1, 2) ` \n",
|
||||
"This line of code first created a *1-D Vector* of six elements. It then reshaped that vector into a *2-D* array using the reshape command. This could have been written: \n",
|
||||
"`a = np.arange(6).reshape(3, 2) ` \n",
|
||||
"To arrive at the same 3 row, 2 column array.\n",
|
||||
"The -1 argument tells the routine to compute the number of rows given the size of the array and the number of columns.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_4.4.2\"></a>\n",
|
||||
"### 4.4.2 Slicing\n",
|
||||
"Slicing creates an array of indices using a set of three values (`start:stop:step`). A subset of values is also valid. Its use is best explained by example:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#vector 2-D slicing operations\n",
|
||||
"a = np.arange(20).reshape(-1, 10)\n",
|
||||
"print(f\"a = \\n{a}\")\n",
|
||||
"\n",
|
||||
"#access 5 consecutive elements (start:stop:step)\n",
|
||||
"print(\"a[0, 2:7:1] = \", a[0, 2:7:1], \", a[0, 2:7:1].shape =\", a[0, 2:7:1].shape, \"a 1-D array\")\n",
|
||||
"\n",
|
||||
"#access 5 consecutive elements (start:stop:step) in two rows\n",
|
||||
"print(\"a[:, 2:7:1] = \\n\", a[:, 2:7:1], \", a[:, 2:7:1].shape =\", a[:, 2:7:1].shape, \"a 2-D array\")\n",
|
||||
"\n",
|
||||
"# access all elements\n",
|
||||
"print(\"a[:,:] = \\n\", a[:,:], \", a[:,:].shape =\", a[:,:].shape)\n",
|
||||
"\n",
|
||||
"# access all elements in one row (very common usage)\n",
|
||||
"print(\"a[1,:] = \", a[1,:], \", a[1,:].shape =\", a[1,:].shape, \"a 1-D array\")\n",
|
||||
"# same as\n",
|
||||
"print(\"a[1] = \", a[1], \", a[1].shape =\", a[1].shape, \"a 1-D array\")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_5.0\"></a>\n",
|
||||
"## Congratulations!\n",
|
||||
"In this lab you mastered the features of Python and NumPy that are needed for Course 1."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"dl_toc_settings": {
|
||||
"rndtag": "40015"
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.10"
|
||||
},
|
||||
"toc-autonumbering": false
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
@ -0,0 +1,730 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Optional Lab: Python, NumPy and Vectorization\n",
|
||||
"A brief introduction to some of the scientific computing used in this course. In particular the NumPy scientific computing package and its use with python.\n",
|
||||
"\n",
|
||||
"# Outline\n",
|
||||
"- [ 1.1 Goals](#toc_40015_1.1)\n",
|
||||
"- [ 1.2 Useful References](#toc_40015_1.2)\n",
|
||||
"- [2 Python and NumPy <a name='Python and NumPy'></a>](#toc_40015_2)\n",
|
||||
"- [3 Vectors](#toc_40015_3)\n",
|
||||
"- [ 3.1 Abstract](#toc_40015_3.1)\n",
|
||||
"- [ 3.2 NumPy Arrays](#toc_40015_3.2)\n",
|
||||
"- [ 3.3 Vector Creation](#toc_40015_3.3)\n",
|
||||
"- [ 3.4 Operations on Vectors](#toc_40015_3.4)\n",
|
||||
"- [4 Matrices](#toc_40015_4)\n",
|
||||
"- [ 4.1 Abstract](#toc_40015_4.1)\n",
|
||||
"- [ 4.2 NumPy Arrays](#toc_40015_4.2)\n",
|
||||
"- [ 4.3 Matrix Creation](#toc_40015_4.3)\n",
|
||||
"- [ 4.4 Operations on Matrices](#toc_40015_4.4)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import numpy as np # it is an unofficial standard to use np for numpy\n",
|
||||
"import time"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_1.1\"></a>\n",
|
||||
"## 1.1 Goals\n",
|
||||
"In this lab, you will:\n",
|
||||
"- Review the features of NumPy and Python that are used in Course 1"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_1.2\"></a>\n",
|
||||
"## 1.2 Useful References\n",
|
||||
"- NumPy Documentation including a basic introduction: [NumPy.org](https://NumPy.org/doc/stable/)\n",
|
||||
"- A challenging feature topic: [NumPy Broadcasting](https://NumPy.org/doc/stable/user/basics.broadcasting.html)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_2\"></a>\n",
|
||||
"# 2 Python and NumPy <a name='Python and NumPy'></a>\n",
|
||||
"Python is the programming language we will be using in this course. It has a set of numeric data types and arithmetic operations. NumPy is a library that extends the base capabilities of python to add a richer data set including more numeric types, vectors, matrices, and many matrix functions. NumPy and python work together fairly seamlessly. Python arithmetic operators work on NumPy data types and many NumPy functions will accept python data types.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3\"></a>\n",
|
||||
"# 3 Vectors\n",
|
||||
"<a name=\"toc_40015_3.1\"></a>\n",
|
||||
"## 3.1 Abstract\n",
|
||||
"<img align=\"right\" src=\"./images/C1_W2_Lab04_Vectors.PNG\" style=\"width:340px;\" >Vectors, as you will use them in this course, are ordered arrays of numbers. In notation, vectors are denoted with lower case bold letters such as $\\mathbf{x}$. The elements of a vector are all the same type. A vector does not, for example, contain both characters and numbers. The number of elements in the array is often referred to as the *dimension* though mathematicians may prefer *rank*. The vector shown has a dimension of $n$. The elements of a vector can be referenced with an index. In math settings, indexes typically run from 1 to n. In computer science and these labs, indexing will typically run from 0 to n-1. In notation, elements of a vector, when referenced individually will indicate the index in a subscript, for example, the $0^{th}$ element, of the vector $\\mathbf{x}$ is $x_0$. Note, the x is not bold in this case. \n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.2\"></a>\n",
|
||||
"## 3.2 NumPy Arrays\n",
|
||||
"\n",
|
||||
"NumPy's basic data structure is an indexable, n-dimensional *array* containing elements of the same type (`dtype`). Right away, you may notice we have overloaded the term 'dimension'. Above, it was the number of elements in the vector, here, dimension refers to the number of indexes of an array. A one-dimensional or 1-D array has one index. In Course 1, we will represent vectors as NumPy 1-D arrays. \n",
|
||||
"\n",
|
||||
" - 1-D array, shape (n,): n elements indexed [0] through [n-1]\n",
|
||||
" "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.3\"></a>\n",
|
||||
"## 3.3 Vector Creation\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Data creation routines in NumPy will generally have a first parameter which is the shape of the object. This can either be a single value for a 1-D result or a tuple (n,m,...) specifying the shape of the result. Below are examples of creating vectors using these routines."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# NumPy routines which allocate memory and fill arrays with value\n",
|
||||
"a = np.zeros(4); print(f\"np.zeros(4) : a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")\n",
|
||||
"a = np.zeros((4,)); print(f\"np.zeros(4,) : a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")\n",
|
||||
"a = np.random.random_sample(4); print(f\"np.random.random_sample(4): a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Some data creation routines do not take a shape tuple:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# NumPy routines which allocate memory and fill arrays with value but do not accept shape as input argument\n",
|
||||
"a = np.arange(4.); print(f\"np.arange(4.): a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")\n",
|
||||
"a = np.random.rand(4); print(f\"np.random.rand(4): a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"values can be specified manually as well. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# NumPy routines which allocate memory and fill with user specified values\n",
|
||||
"a = np.array([5,4,3,2]); print(f\"np.array([5,4,3,2]): a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")\n",
|
||||
"a = np.array([5.,4,3,2]); print(f\"np.array([5.,4,3,2]): a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"These have all created a one-dimensional vector `a` with four elements. `a.shape` returns the dimensions. Here we see a.shape = `(4,)` indicating a 1-d array with 4 elements. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.4\"></a>\n",
|
||||
"## 3.4 Operations on Vectors\n",
|
||||
"Let's explore some operations using vectors.\n",
|
||||
"<a name=\"toc_40015_3.4.1\"></a>\n",
|
||||
"### 3.4.1 Indexing\n",
|
||||
"Elements of vectors can be accessed via indexing and slicing. NumPy provides a very complete set of indexing and slicing capabilities. We will explore only the basics needed for the course here. Reference [Slicing and Indexing](https://NumPy.org/doc/stable/reference/arrays.indexing.html) for more details. \n",
|
||||
"**Indexing** means referring to *an element* of an array by its position within the array. \n",
|
||||
"**Slicing** means getting a *subset* of elements from an array based on their indices. \n",
|
||||
"NumPy starts indexing at zero so the 3rd element of an vector $\\mathbf{a}$ is `a[2]`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#vector indexing operations on 1-D vectors\n",
|
||||
"a = np.arange(10)\n",
|
||||
"print(a)\n",
|
||||
"\n",
|
||||
"#access an element\n",
|
||||
"print(f\"a[2].shape: {a[2].shape} a[2] = {a[2]}, Accessing an element returns a scalar\")\n",
|
||||
"\n",
|
||||
"# access the last element, negative indexes count from the end\n",
|
||||
"print(f\"a[-1] = {a[-1]}\")\n",
|
||||
"\n",
|
||||
"#indexs must be within the range of the vector or they will produce and error\n",
|
||||
"try:\n",
|
||||
" c = a[10]\n",
|
||||
"except Exception as e:\n",
|
||||
" print(\"The error message you'll see is:\")\n",
|
||||
" print(e)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.4.2\"></a>\n",
|
||||
"### 3.4.2 Slicing\n",
|
||||
"Slicing creates an array of indices using a set of three values (`start:stop:step`). A subset of values is also valid. Its use is best explained by example:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#vector slicing operations\n",
|
||||
"a = np.arange(10)\n",
|
||||
"print(f\"a = {a}\")\n",
|
||||
"\n",
|
||||
"#access 5 consecutive elements (start:stop:step)\n",
|
||||
"c = a[2:7:1]; print(\"a[2:7:1] = \", c)\n",
|
||||
"\n",
|
||||
"# access 3 elements separated by two \n",
|
||||
"c = a[2:7:2]; print(\"a[2:7:2] = \", c)\n",
|
||||
"\n",
|
||||
"# access all elements index 3 and above\n",
|
||||
"c = a[3:]; print(\"a[3:] = \", c)\n",
|
||||
"\n",
|
||||
"# access all elements below index 3\n",
|
||||
"c = a[:3]; print(\"a[:3] = \", c)\n",
|
||||
"\n",
|
||||
"# access all elements\n",
|
||||
"c = a[:]; print(\"a[:] = \", c)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.4.3\"></a>\n",
|
||||
"### 3.4.3 Single vector operations\n",
|
||||
"There are a number of useful operations that involve operations on a single vector."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"a = np.array([1,2,3,4])\n",
|
||||
"print(f\"a : {a}\")\n",
|
||||
"# negate elements of a\n",
|
||||
"b = -a \n",
|
||||
"print(f\"b = -a : {b}\")\n",
|
||||
"\n",
|
||||
"# sum all elements of a, returns a scalar\n",
|
||||
"b = np.sum(a) \n",
|
||||
"print(f\"b = np.sum(a) : {b}\")\n",
|
||||
"\n",
|
||||
"b = np.mean(a)\n",
|
||||
"print(f\"b = np.mean(a): {b}\")\n",
|
||||
"\n",
|
||||
"b = a**2\n",
|
||||
"print(f\"b = a**2 : {b}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.4.4\"></a>\n",
|
||||
"### 3.4.4 Vector Vector element-wise operations\n",
|
||||
"Most of the NumPy arithmetic, logical and comparison operations apply to vectors as well. These operators work on an element-by-element basis. For example \n",
|
||||
"$$ \\mathbf{a} + \\mathbf{b} = \\sum_{i=0}^{n-1} a_i + b_i $$"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"a = np.array([ 1, 2, 3, 4])\n",
|
||||
"b = np.array([-1,-2, 3, 4])\n",
|
||||
"print(f\"Binary operators work element wise: {a + b}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Of course, for this to work correctly, the vectors must be of the same size:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#try a mismatched vector operation\n",
|
||||
"c = np.array([1, 2])\n",
|
||||
"try:\n",
|
||||
" d = a + c\n",
|
||||
"except Exception as e:\n",
|
||||
" print(\"The error message you'll see is:\")\n",
|
||||
" print(e)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.4.5\"></a>\n",
|
||||
"### 3.4.5 Scalar Vector operations\n",
|
||||
"Vectors can be 'scaled' by scalar values. A scalar value is just a number. The scalar multiplies all the elements of the vector."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"a = np.array([1, 2, 3, 4])\n",
|
||||
"\n",
|
||||
"# multiply a by a scalar\n",
|
||||
"b = 5 * a \n",
|
||||
"print(f\"b = 5 * a : {b}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.4.6\"></a>\n",
|
||||
"### 3.4.6 Vector Vector dot product\n",
|
||||
"The dot product is a mainstay of Linear Algebra and NumPy. This is an operation used extensively in this course and should be well understood. The dot product is shown below."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<img src=\"./images/C1_W2_Lab04_dot_notrans.gif\" width=800> "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The dot product multiplies the values in two vectors element-wise and then sums the result.\n",
|
||||
"Vector dot product requires the dimensions of the two vectors to be the same. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's implement our own version of the dot product below:\n",
|
||||
"\n",
|
||||
"**Using a for loop**, implement a function which returns the dot product of two vectors. The function to return given inputs $a$ and $b$:\n",
|
||||
"$$ x = \\sum_{i=0}^{n-1} a_i b_i $$\n",
|
||||
"Assume both `a` and `b` are the same shape."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def my_dot(a, b): \n",
|
||||
" \"\"\"\n",
|
||||
" Compute the dot product of two vectors\n",
|
||||
" \n",
|
||||
" Args:\n",
|
||||
" a (ndarray (n,)): input vector \n",
|
||||
" b (ndarray (n,)): input vector with same dimension as a\n",
|
||||
" \n",
|
||||
" Returns:\n",
|
||||
" x (scalar): \n",
|
||||
" \"\"\"\n",
|
||||
" x=0\n",
|
||||
" for i in range(a.shape[0]):\n",
|
||||
" x = x + a[i] * b[i]\n",
|
||||
" return x"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# test 1-D\n",
|
||||
"a = np.array([1, 2, 3, 4])\n",
|
||||
"b = np.array([-1, 4, 3, 2])\n",
|
||||
"print(f\"my_dot(a, b) = {my_dot(a, b)}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Note, the dot product is expected to return a scalar value. \n",
|
||||
"\n",
|
||||
"Let's try the same operations using `np.dot`. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# test 1-D\n",
|
||||
"a = np.array([1, 2, 3, 4])\n",
|
||||
"b = np.array([-1, 4, 3, 2])\n",
|
||||
"c = np.dot(a, b)\n",
|
||||
"print(f\"NumPy 1-D np.dot(a, b) = {c}, np.dot(a, b).shape = {c.shape} \") \n",
|
||||
"c = np.dot(b, a)\n",
|
||||
"print(f\"NumPy 1-D np.dot(b, a) = {c}, np.dot(a, b).shape = {c.shape} \")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Above, you will note that the results for 1-D matched our implementation."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.4.7\"></a>\n",
|
||||
"### 3.4.7 The Need for Speed: vector vs for loop\n",
|
||||
"We utilized the NumPy library because it improves speed memory efficiency. Let's demonstrate:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"np.random.seed(1)\n",
|
||||
"a = np.random.rand(10000000) # very large arrays\n",
|
||||
"b = np.random.rand(10000000)\n",
|
||||
"\n",
|
||||
"tic = time.time() # capture start time\n",
|
||||
"c = np.dot(a, b)\n",
|
||||
"toc = time.time() # capture end time\n",
|
||||
"\n",
|
||||
"print(f\"np.dot(a, b) = {c:.4f}\")\n",
|
||||
"print(f\"Vectorized version duration: {1000*(toc-tic):.4f} ms \")\n",
|
||||
"\n",
|
||||
"tic = time.time() # capture start time\n",
|
||||
"c = my_dot(a,b)\n",
|
||||
"toc = time.time() # capture end time\n",
|
||||
"\n",
|
||||
"print(f\"my_dot(a, b) = {c:.4f}\")\n",
|
||||
"print(f\"loop version duration: {1000*(toc-tic):.4f} ms \")\n",
|
||||
"\n",
|
||||
"del(a);del(b) #remove these big arrays from memory"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"So, vectorization provides a large speed up in this example. This is because NumPy makes better use of available data parallelism in the underlying hardware. GPU's and modern CPU's implement Single Instruction, Multiple Data (SIMD) pipelines allowing multiple operations to be issued in parallel. This is critical in Machine Learning where the data sets are often very large."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_12345_3.4.8\"></a>\n",
|
||||
"### 3.4.8 Vector Vector operations in Course 1\n",
|
||||
"Vector Vector operations will appear frequently in course 1. Here is why:\n",
|
||||
"- Going forward, our examples will be stored in an array, `X_train` of dimension (m,n). This will be explained more in context, but here it is important to note it is a 2 Dimensional array or matrix (see next section on matrices).\n",
|
||||
"- `w` will be a 1-dimensional vector of shape (n,).\n",
|
||||
"- we will perform operations by looping through the examples, extracting each example to work on individually by indexing X. For example:`X[i]`\n",
|
||||
"- `X[i]` returns a value of shape (n,), a 1-dimensional vector. Consequently, operations involving `X[i]` are often vector-vector. \n",
|
||||
"\n",
|
||||
"That is a somewhat lengthy explanation, but aligning and understanding the shapes of your operands is important when performing vector operations."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# show common Course 1 example\n",
|
||||
"X = np.array([[1],[2],[3],[4]])\n",
|
||||
"w = np.array([2])\n",
|
||||
"c = np.dot(X[1], w)\n",
|
||||
"\n",
|
||||
"print(f\"X[1] has shape {X[1].shape}\")\n",
|
||||
"print(f\"w has shape {w.shape}\")\n",
|
||||
"print(f\"c has shape {c.shape}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_4\"></a>\n",
|
||||
"# 4 Matrices\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_4.1\"></a>\n",
|
||||
"## 4.1 Abstract\n",
|
||||
"Matrices, are two dimensional arrays. The elements of a matrix are all of the same type. In notation, matrices are denoted with capitol, bold letter such as $\\mathbf{X}$. In this and other labs, `m` is often the number of rows and `n` the number of columns. The elements of a matrix can be referenced with a two dimensional index. In math settings, numbers in the index typically run from 1 to n. In computer science and these labs, indexing will run from 0 to n-1. \n",
|
||||
"<figure>\n",
|
||||
" <center> <img src=\"./images/C1_W2_Lab04_Matrices.PNG\" alt='missing' width=900><center/>\n",
|
||||
" <figcaption> Generic Matrix Notation, 1st index is row, 2nd is column </figcaption>\n",
|
||||
"<figure/>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_4.2\"></a>\n",
|
||||
"## 4.2 NumPy Arrays\n",
|
||||
"\n",
|
||||
"NumPy's basic data structure is an indexable, n-dimensional *array* containing elements of the same type (`dtype`). These were described earlier. Matrices have a two-dimensional (2-D) index [m,n].\n",
|
||||
"\n",
|
||||
"In Course 1, 2-D matrices are used to hold training data. Training data is $m$ examples by $n$ features creating an (m,n) array. Course 1 does not do operations directly on matrices but typically extracts an example as a vector and operates on that. Below you will review: \n",
|
||||
"- data creation\n",
|
||||
"- slicing and indexing"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_4.3\"></a>\n",
|
||||
"## 4.3 Matrix Creation\n",
|
||||
"The same functions that created 1-D vectors will create 2-D or n-D arrays. Here are some examples\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Below, the shape tuple is provided to achieve a 2-D result. Notice how NumPy uses brackets to denote each dimension. Notice further than NumPy, when printing, will print one row per line.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"a = np.zeros((1, 5)) \n",
|
||||
"print(f\"a shape = {a.shape}, a = {a}\") \n",
|
||||
"\n",
|
||||
"a = np.zeros((2, 1)) \n",
|
||||
"print(f\"a shape = {a.shape}, a = {a}\") \n",
|
||||
"\n",
|
||||
"a = np.random.random_sample((1, 1)) \n",
|
||||
"print(f\"a shape = {a.shape}, a = {a}\") "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"One can also manually specify data. Dimensions are specified with additional brackets matching the format in the printing above."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# NumPy routines which allocate memory and fill with user specified values\n",
|
||||
"a = np.array([[5], [4], [3]]); print(f\" a shape = {a.shape}, np.array: a = {a}\")\n",
|
||||
"a = np.array([[5], # One can also\n",
|
||||
" [4], # separate values\n",
|
||||
" [3]]); #into separate rows\n",
|
||||
"print(f\" a shape = {a.shape}, np.array: a = {a}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_4.4\"></a>\n",
|
||||
"## 4.4 Operations on Matrices\n",
|
||||
"Let's explore some operations using matrices."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_4.4.1\"></a>\n",
|
||||
"### 4.4.1 Indexing\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Matrices include a second index. The two indexes describe [row, column]. Access can either return an element or a row/column. See below:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#vector indexing operations on matrices\n",
|
||||
"a = np.arange(6).reshape(-1, 2) #reshape is a convenient way to create matrices\n",
|
||||
"print(f\"a.shape: {a.shape}, \\na= {a}\")\n",
|
||||
"\n",
|
||||
"#access an element\n",
|
||||
"print(f\"\\na[2,0].shape: {a[2, 0].shape}, a[2,0] = {a[2, 0]}, type(a[2,0]) = {type(a[2, 0])} Accessing an element returns a scalar\\n\")\n",
|
||||
"\n",
|
||||
"#access a row\n",
|
||||
"print(f\"a[2].shape: {a[2].shape}, a[2] = {a[2]}, type(a[2]) = {type(a[2])}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"It is worth drawing attention to the last example. Accessing a matrix by just specifying the row will return a *1-D vector*."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Reshape** \n",
|
||||
"The previous example used [reshape](https://numpy.org/doc/stable/reference/generated/numpy.reshape.html) to shape the array. \n",
|
||||
"`a = np.arange(6).reshape(-1, 2) ` \n",
|
||||
"This line of code first created a *1-D Vector* of six elements. It then reshaped that vector into a *2-D* array using the reshape command. This could have been written: \n",
|
||||
"`a = np.arange(6).reshape(3, 2) ` \n",
|
||||
"To arrive at the same 3 row, 2 column array.\n",
|
||||
"The -1 argument tells the routine to compute the number of rows given the size of the array and the number of columns.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_4.4.2\"></a>\n",
|
||||
"### 4.4.2 Slicing\n",
|
||||
"Slicing creates an array of indices using a set of three values (`start:stop:step`). A subset of values is also valid. Its use is best explained by example:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#vector 2-D slicing operations\n",
|
||||
"a = np.arange(20).reshape(-1, 10)\n",
|
||||
"print(f\"a = \\n{a}\")\n",
|
||||
"\n",
|
||||
"#access 5 consecutive elements (start:stop:step)\n",
|
||||
"print(\"a[0, 2:7:1] = \", a[0, 2:7:1], \", a[0, 2:7:1].shape =\", a[0, 2:7:1].shape, \"a 1-D array\")\n",
|
||||
"\n",
|
||||
"#access 5 consecutive elements (start:stop:step) in two rows\n",
|
||||
"print(\"a[:, 2:7:1] = \\n\", a[:, 2:7:1], \", a[:, 2:7:1].shape =\", a[:, 2:7:1].shape, \"a 2-D array\")\n",
|
||||
"\n",
|
||||
"# access all elements\n",
|
||||
"print(\"a[:,:] = \\n\", a[:,:], \", a[:,:].shape =\", a[:,:].shape)\n",
|
||||
"\n",
|
||||
"# access all elements in one row (very common usage)\n",
|
||||
"print(\"a[1,:] = \", a[1,:], \", a[1,:].shape =\", a[1,:].shape, \"a 1-D array\")\n",
|
||||
"# same as\n",
|
||||
"print(\"a[1] = \", a[1], \", a[1].shape =\", a[1].shape, \"a 1-D array\")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_5.0\"></a>\n",
|
||||
"## Congratulations!\n",
|
||||
"In this lab you mastered the features of Python and NumPy that are needed for Course 1."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"dl_toc_settings": {
|
||||
"rndtag": "40015"
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.10"
|
||||
},
|
||||
"toc-autonumbering": false
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
100
work/data/houses.txt
Normal file
@ -0,0 +1,100 @@
|
||||
9.520000000000000000e+02,2.000000000000000000e+00,1.000000000000000000e+00,6.500000000000000000e+01,2.715000000000000000e+02
|
||||
1.244000000000000000e+03,3.000000000000000000e+00,1.000000000000000000e+00,6.400000000000000000e+01,3.000000000000000000e+02
|
||||
1.947000000000000000e+03,3.000000000000000000e+00,2.000000000000000000e+00,1.700000000000000000e+01,5.098000000000000114e+02
|
||||
1.725000000000000000e+03,3.000000000000000000e+00,2.000000000000000000e+00,4.200000000000000000e+01,3.940000000000000000e+02
|
||||
1.959000000000000000e+03,3.000000000000000000e+00,2.000000000000000000e+00,1.500000000000000000e+01,5.400000000000000000e+02
|
||||
1.314000000000000000e+03,2.000000000000000000e+00,1.000000000000000000e+00,1.400000000000000000e+01,4.150000000000000000e+02
|
||||
8.640000000000000000e+02,2.000000000000000000e+00,1.000000000000000000e+00,6.600000000000000000e+01,2.300000000000000000e+02
|
||||
1.836000000000000000e+03,3.000000000000000000e+00,1.000000000000000000e+00,1.700000000000000000e+01,5.600000000000000000e+02
|
||||
1.026000000000000000e+03,3.000000000000000000e+00,1.000000000000000000e+00,4.300000000000000000e+01,2.940000000000000000e+02
|
||||
3.194000000000000000e+03,4.000000000000000000e+00,2.000000000000000000e+00,8.700000000000000000e+01,7.182000000000000455e+02
|
||||
7.880000000000000000e+02,2.000000000000000000e+00,1.000000000000000000e+00,8.000000000000000000e+01,2.000000000000000000e+02
|
||||
1.200000000000000000e+03,2.000000000000000000e+00,2.000000000000000000e+00,1.700000000000000000e+01,3.020000000000000000e+02
|
||||
1.557000000000000000e+03,2.000000000000000000e+00,1.000000000000000000e+00,1.800000000000000000e+01,4.680000000000000000e+02
|
||||
1.430000000000000000e+03,3.000000000000000000e+00,1.000000000000000000e+00,2.000000000000000000e+01,3.741999999999999886e+02
|
||||
1.220000000000000000e+03,2.000000000000000000e+00,1.000000000000000000e+00,1.500000000000000000e+01,3.880000000000000000e+02
|
||||
1.092000000000000000e+03,2.000000000000000000e+00,1.000000000000000000e+00,6.400000000000000000e+01,2.820000000000000000e+02
|
||||
8.480000000000000000e+02,1.000000000000000000e+00,1.000000000000000000e+00,1.700000000000000000e+01,3.118000000000000114e+02
|
||||
1.682000000000000000e+03,3.000000000000000000e+00,2.000000000000000000e+00,2.300000000000000000e+01,4.010000000000000000e+02
|
||||
1.768000000000000000e+03,3.000000000000000000e+00,2.000000000000000000e+00,1.800000000000000000e+01,4.498000000000000114e+02
|
||||
1.040000000000000000e+03,3.000000000000000000e+00,1.000000000000000000e+00,4.400000000000000000e+01,3.010000000000000000e+02
|
||||
1.652000000000000000e+03,2.000000000000000000e+00,1.000000000000000000e+00,2.100000000000000000e+01,5.020000000000000000e+02
|
||||
1.088000000000000000e+03,2.000000000000000000e+00,1.000000000000000000e+00,3.500000000000000000e+01,3.400000000000000000e+02
|
||||
1.316000000000000000e+03,3.000000000000000000e+00,1.000000000000000000e+00,1.400000000000000000e+01,4.002819999999999823e+02
|
||||
1.593000000000000000e+03,0.000000000000000000e+00,1.000000000000000000e+00,2.000000000000000000e+01,5.720000000000000000e+02
|
||||
9.720000000000000000e+02,2.000000000000000000e+00,1.000000000000000000e+00,7.300000000000000000e+01,2.640000000000000000e+02
|
||||
1.097000000000000000e+03,3.000000000000000000e+00,1.000000000000000000e+00,3.700000000000000000e+01,3.040000000000000000e+02
|
||||
1.004000000000000000e+03,2.000000000000000000e+00,1.000000000000000000e+00,5.100000000000000000e+01,2.980000000000000000e+02
|
||||
9.040000000000000000e+02,3.000000000000000000e+00,1.000000000000000000e+00,5.500000000000000000e+01,2.198000000000000114e+02
|
||||
1.694000000000000000e+03,3.000000000000000000e+00,1.000000000000000000e+00,1.300000000000000000e+01,4.906999999999999886e+02
|
||||
1.073000000000000000e+03,2.000000000000000000e+00,1.000000000000000000e+00,1.000000000000000000e+02,2.169600000000000080e+02
|
||||
1.419000000000000000e+03,3.000000000000000000e+00,2.000000000000000000e+00,1.900000000000000000e+01,3.681999999999999886e+02
|
||||
1.164000000000000000e+03,3.000000000000000000e+00,1.000000000000000000e+00,5.200000000000000000e+01,2.800000000000000000e+02
|
||||
1.935000000000000000e+03,3.000000000000000000e+00,2.000000000000000000e+00,1.200000000000000000e+01,5.268700000000000045e+02
|
||||
1.216000000000000000e+03,2.000000000000000000e+00,2.000000000000000000e+00,7.400000000000000000e+01,2.370000000000000000e+02
|
||||
2.482000000000000000e+03,4.000000000000000000e+00,2.000000000000000000e+00,1.600000000000000000e+01,5.624260000000000446e+02
|
||||
1.200000000000000000e+03,2.000000000000000000e+00,1.000000000000000000e+00,1.800000000000000000e+01,3.698000000000000114e+02
|
||||
1.840000000000000000e+03,3.000000000000000000e+00,2.000000000000000000e+00,2.000000000000000000e+01,4.600000000000000000e+02
|
||||
1.851000000000000000e+03,3.000000000000000000e+00,2.000000000000000000e+00,5.700000000000000000e+01,3.740000000000000000e+02
|
||||
1.660000000000000000e+03,3.000000000000000000e+00,2.000000000000000000e+00,1.900000000000000000e+01,3.900000000000000000e+02
|
||||
1.096000000000000000e+03,2.000000000000000000e+00,2.000000000000000000e+00,9.700000000000000000e+01,1.580000000000000000e+02
|
||||
1.775000000000000000e+03,3.000000000000000000e+00,2.000000000000000000e+00,2.800000000000000000e+01,4.260000000000000000e+02
|
||||
2.030000000000000000e+03,4.000000000000000000e+00,2.000000000000000000e+00,4.500000000000000000e+01,3.900000000000000000e+02
|
||||
1.784000000000000000e+03,4.000000000000000000e+00,2.000000000000000000e+00,1.070000000000000000e+02,2.777740000000000009e+02
|
||||
1.073000000000000000e+03,2.000000000000000000e+00,1.000000000000000000e+00,1.000000000000000000e+02,2.169600000000000080e+02
|
||||
1.552000000000000000e+03,3.000000000000000000e+00,1.000000000000000000e+00,1.600000000000000000e+01,4.258000000000000114e+02
|
||||
1.953000000000000000e+03,3.000000000000000000e+00,2.000000000000000000e+00,1.600000000000000000e+01,5.040000000000000000e+02
|
||||
1.224000000000000000e+03,2.000000000000000000e+00,2.000000000000000000e+00,1.200000000000000000e+01,3.290000000000000000e+02
|
||||
1.616000000000000000e+03,3.000000000000000000e+00,1.000000000000000000e+00,1.600000000000000000e+01,4.640000000000000000e+02
|
||||
8.160000000000000000e+02,2.000000000000000000e+00,1.000000000000000000e+00,5.800000000000000000e+01,2.200000000000000000e+02
|
||||
1.349000000000000000e+03,3.000000000000000000e+00,1.000000000000000000e+00,2.100000000000000000e+01,3.580000000000000000e+02
|
||||
1.571000000000000000e+03,3.000000000000000000e+00,1.000000000000000000e+00,1.400000000000000000e+01,4.780000000000000000e+02
|
||||
1.486000000000000000e+03,3.000000000000000000e+00,1.000000000000000000e+00,5.700000000000000000e+01,3.340000000000000000e+02
|
||||
1.506000000000000000e+03,2.000000000000000000e+00,1.000000000000000000e+00,1.600000000000000000e+01,4.269800000000000182e+02
|
||||
1.097000000000000000e+03,3.000000000000000000e+00,1.000000000000000000e+00,2.700000000000000000e+01,2.900000000000000000e+02
|
||||
1.764000000000000000e+03,3.000000000000000000e+00,2.000000000000000000e+00,2.400000000000000000e+01,4.630000000000000000e+02
|
||||
1.208000000000000000e+03,2.000000000000000000e+00,1.000000000000000000e+00,1.400000000000000000e+01,3.908000000000000114e+02
|
||||
1.470000000000000000e+03,3.000000000000000000e+00,2.000000000000000000e+00,2.400000000000000000e+01,3.540000000000000000e+02
|
||||
1.768000000000000000e+03,3.000000000000000000e+00,2.000000000000000000e+00,8.400000000000000000e+01,3.500000000000000000e+02
|
||||
1.654000000000000000e+03,3.000000000000000000e+00,1.000000000000000000e+00,1.900000000000000000e+01,4.600000000000000000e+02
|
||||
1.029000000000000000e+03,3.000000000000000000e+00,1.000000000000000000e+00,6.000000000000000000e+01,2.370000000000000000e+02
|
||||
1.120000000000000000e+03,2.000000000000000000e+00,2.000000000000000000e+00,1.600000000000000000e+01,2.883039999999999736e+02
|
||||
1.150000000000000000e+03,3.000000000000000000e+00,1.000000000000000000e+00,6.200000000000000000e+01,2.820000000000000000e+02
|
||||
8.160000000000000000e+02,2.000000000000000000e+00,1.000000000000000000e+00,3.900000000000000000e+01,2.490000000000000000e+02
|
||||
1.040000000000000000e+03,3.000000000000000000e+00,1.000000000000000000e+00,2.500000000000000000e+01,3.040000000000000000e+02
|
||||
1.392000000000000000e+03,3.000000000000000000e+00,1.000000000000000000e+00,6.400000000000000000e+01,3.320000000000000000e+02
|
||||
1.603000000000000000e+03,3.000000000000000000e+00,2.000000000000000000e+00,2.900000000000000000e+01,3.518000000000000114e+02
|
||||
1.215000000000000000e+03,3.000000000000000000e+00,1.000000000000000000e+00,6.300000000000000000e+01,3.100000000000000000e+02
|
||||
1.073000000000000000e+03,2.000000000000000000e+00,1.000000000000000000e+00,1.000000000000000000e+02,2.169600000000000080e+02
|
||||
2.599000000000000000e+03,4.000000000000000000e+00,2.000000000000000000e+00,2.200000000000000000e+01,6.663360000000000127e+02
|
||||
1.431000000000000000e+03,3.000000000000000000e+00,1.000000000000000000e+00,5.900000000000000000e+01,3.300000000000000000e+02
|
||||
2.090000000000000000e+03,3.000000000000000000e+00,2.000000000000000000e+00,2.600000000000000000e+01,4.800000000000000000e+02
|
||||
1.790000000000000000e+03,4.000000000000000000e+00,2.000000000000000000e+00,4.900000000000000000e+01,3.303000000000000114e+02
|
||||
1.484000000000000000e+03,3.000000000000000000e+00,2.000000000000000000e+00,1.600000000000000000e+01,3.480000000000000000e+02
|
||||
1.040000000000000000e+03,3.000000000000000000e+00,1.000000000000000000e+00,2.500000000000000000e+01,3.040000000000000000e+02
|
||||
1.431000000000000000e+03,3.000000000000000000e+00,1.000000000000000000e+00,2.200000000000000000e+01,3.840000000000000000e+02
|
||||
1.159000000000000000e+03,3.000000000000000000e+00,1.000000000000000000e+00,5.300000000000000000e+01,3.160000000000000000e+02
|
||||
1.547000000000000000e+03,3.000000000000000000e+00,2.000000000000000000e+00,1.200000000000000000e+01,4.303999999999999773e+02
|
||||
1.983000000000000000e+03,3.000000000000000000e+00,2.000000000000000000e+00,2.200000000000000000e+01,4.500000000000000000e+02
|
||||
1.056000000000000000e+03,3.000000000000000000e+00,1.000000000000000000e+00,5.300000000000000000e+01,2.840000000000000000e+02
|
||||
1.180000000000000000e+03,2.000000000000000000e+00,1.000000000000000000e+00,9.900000000000000000e+01,2.750000000000000000e+02
|
||||
1.358000000000000000e+03,2.000000000000000000e+00,1.000000000000000000e+00,1.700000000000000000e+01,4.140000000000000000e+02
|
||||
9.600000000000000000e+02,3.000000000000000000e+00,1.000000000000000000e+00,5.100000000000000000e+01,2.580000000000000000e+02
|
||||
1.456000000000000000e+03,3.000000000000000000e+00,2.000000000000000000e+00,1.600000000000000000e+01,3.780000000000000000e+02
|
||||
1.446000000000000000e+03,3.000000000000000000e+00,2.000000000000000000e+00,2.500000000000000000e+01,3.500000000000000000e+02
|
||||
1.208000000000000000e+03,2.000000000000000000e+00,1.000000000000000000e+00,1.500000000000000000e+01,4.120000000000000000e+02
|
||||
1.553000000000000000e+03,3.000000000000000000e+00,2.000000000000000000e+00,1.600000000000000000e+01,3.730000000000000000e+02
|
||||
8.820000000000000000e+02,3.000000000000000000e+00,1.000000000000000000e+00,4.900000000000000000e+01,2.250000000000000000e+02
|
||||
2.030000000000000000e+03,4.000000000000000000e+00,2.000000000000000000e+00,4.500000000000000000e+01,3.900000000000000000e+02
|
||||
1.040000000000000000e+03,3.000000000000000000e+00,1.000000000000000000e+00,6.200000000000000000e+01,2.673999999999999773e+02
|
||||
1.616000000000000000e+03,3.000000000000000000e+00,1.000000000000000000e+00,1.600000000000000000e+01,4.640000000000000000e+02
|
||||
8.030000000000000000e+02,2.000000000000000000e+00,1.000000000000000000e+00,8.000000000000000000e+01,1.740000000000000000e+02
|
||||
1.430000000000000000e+03,3.000000000000000000e+00,2.000000000000000000e+00,2.100000000000000000e+01,3.400000000000000000e+02
|
||||
1.656000000000000000e+03,3.000000000000000000e+00,1.000000000000000000e+00,6.100000000000000000e+01,4.300000000000000000e+02
|
||||
1.541000000000000000e+03,3.000000000000000000e+00,1.000000000000000000e+00,1.600000000000000000e+01,4.400000000000000000e+02
|
||||
9.480000000000000000e+02,3.000000000000000000e+00,1.000000000000000000e+00,5.300000000000000000e+01,2.160000000000000000e+02
|
||||
1.224000000000000000e+03,2.000000000000000000e+00,2.000000000000000000e+00,1.200000000000000000e+01,3.290000000000000000e+02
|
||||
1.432000000000000000e+03,2.000000000000000000e+00,1.000000000000000000e+00,4.300000000000000000e+01,3.880000000000000000e+02
|
||||
1.660000000000000000e+03,3.000000000000000000e+00,2.000000000000000000e+00,1.900000000000000000e+01,3.900000000000000000e+02
|
||||
1.212000000000000000e+03,3.000000000000000000e+00,1.000000000000000000e+00,2.000000000000000000e+01,3.560000000000000000e+02
|
||||
1.050000000000000000e+03,2.000000000000000000e+00,1.000000000000000000e+00,6.500000000000000000e+01,2.578000000000000114e+02
|
124
work/deeplearning.mplstyle
Normal file
@ -0,0 +1,124 @@
|
||||
# see https://matplotlib.org/stable/tutorials/introductory/customizing.html
|
||||
lines.linewidth: 4
|
||||
lines.solid_capstyle: butt
|
||||
|
||||
legend.fancybox: true
|
||||
|
||||
# Verdana" for non-math text,
|
||||
# Cambria Math
|
||||
|
||||
#Blue (Crayon-Aqua) 0096FF
|
||||
#Dark Red C00000
|
||||
#Orange (Apple Orange) FF9300
|
||||
#Black 000000
|
||||
#Magenta FF40FF
|
||||
#Purple 7030A0
|
||||
|
||||
axes.prop_cycle: cycler('color', ['0096FF', 'FF9300', 'FF40FF', '7030A0', 'C00000'])
|
||||
#axes.facecolor: f0f0f0 # grey
|
||||
axes.facecolor: ffffff # white
|
||||
axes.labelsize: large
|
||||
axes.axisbelow: true
|
||||
axes.grid: False
|
||||
axes.edgecolor: f0f0f0
|
||||
axes.linewidth: 3.0
|
||||
axes.titlesize: x-large
|
||||
|
||||
patch.edgecolor: f0f0f0
|
||||
patch.linewidth: 0.5
|
||||
|
||||
svg.fonttype: path
|
||||
|
||||
grid.linestyle: -
|
||||
grid.linewidth: 1.0
|
||||
grid.color: cbcbcb
|
||||
|
||||
xtick.major.size: 0
|
||||
xtick.minor.size: 0
|
||||
ytick.major.size: 0
|
||||
ytick.minor.size: 0
|
||||
|
||||
savefig.edgecolor: f0f0f0
|
||||
savefig.facecolor: f0f0f0
|
||||
|
||||
#figure.subplot.left: 0.08
|
||||
#figure.subplot.right: 0.95
|
||||
#figure.subplot.bottom: 0.07
|
||||
|
||||
#figure.facecolor: f0f0f0 # grey
|
||||
figure.facecolor: ffffff # white
|
||||
|
||||
## ***************************************************************************
|
||||
## * FONT *
|
||||
## ***************************************************************************
|
||||
## The font properties used by `text.Text`.
|
||||
## See https://matplotlib.org/api/font_manager_api.html for more information
|
||||
## on font properties. The 6 font properties used for font matching are
|
||||
## given below with their default values.
|
||||
##
|
||||
## The font.family property can take either a concrete font name (not supported
|
||||
## when rendering text with usetex), or one of the following five generic
|
||||
## values:
|
||||
## - 'serif' (e.g., Times),
|
||||
## - 'sans-serif' (e.g., Helvetica),
|
||||
## - 'cursive' (e.g., Zapf-Chancery),
|
||||
## - 'fantasy' (e.g., Western), and
|
||||
## - 'monospace' (e.g., Courier).
|
||||
## Each of these values has a corresponding default list of font names
|
||||
## (font.serif, etc.); the first available font in the list is used. Note that
|
||||
## for font.serif, font.sans-serif, and font.monospace, the first element of
|
||||
## the list (a DejaVu font) will always be used because DejaVu is shipped with
|
||||
## Matplotlib and is thus guaranteed to be available; the other entries are
|
||||
## left as examples of other possible values.
|
||||
##
|
||||
## The font.style property has three values: normal (or roman), italic
|
||||
## or oblique. The oblique style will be used for italic, if it is not
|
||||
## present.
|
||||
##
|
||||
## The font.variant property has two values: normal or small-caps. For
|
||||
## TrueType fonts, which are scalable fonts, small-caps is equivalent
|
||||
## to using a font size of 'smaller', or about 83%% of the current font
|
||||
## size.
|
||||
##
|
||||
## The font.weight property has effectively 13 values: normal, bold,
|
||||
## bolder, lighter, 100, 200, 300, ..., 900. Normal is the same as
|
||||
## 400, and bold is 700. bolder and lighter are relative values with
|
||||
## respect to the current weight.
|
||||
##
|
||||
## The font.stretch property has 11 values: ultra-condensed,
|
||||
## extra-condensed, condensed, semi-condensed, normal, semi-expanded,
|
||||
## expanded, extra-expanded, ultra-expanded, wider, and narrower. This
|
||||
## property is not currently implemented.
|
||||
##
|
||||
## The font.size property is the default font size for text, given in points.
|
||||
## 10 pt is the standard value.
|
||||
##
|
||||
## Note that font.size controls default text sizes. To configure
|
||||
## special text sizes tick labels, axes, labels, title, etc., see the rc
|
||||
## settings for axes and ticks. Special text sizes can be defined
|
||||
## relative to font.size, using the following values: xx-small, x-small,
|
||||
## small, medium, large, x-large, xx-large, larger, or smaller
|
||||
|
||||
|
||||
font.family: sans-serif
|
||||
font.style: normal
|
||||
font.variant: normal
|
||||
font.weight: normal
|
||||
font.stretch: normal
|
||||
font.size: 12.0
|
||||
|
||||
font.serif: DejaVu Serif, Bitstream Vera Serif, Computer Modern Roman, New Century Schoolbook, Century Schoolbook L, Utopia, ITC Bookman, Bookman, Nimbus Roman No9 L, Times New Roman, Times, Palatino, Charter, serif
|
||||
font.sans-serif: Verdana, DejaVu Sans, Bitstream Vera Sans, Computer Modern Sans Serif, Lucida Grande, Geneva, Lucid, Arial, Helvetica, Avant Garde, sans-serif
|
||||
font.cursive: Apple Chancery, Textile, Zapf Chancery, Sand, Script MT, Felipa, Comic Neue, Comic Sans MS, cursive
|
||||
font.fantasy: Chicago, Charcoal, Impact, Western, Humor Sans, xkcd, fantasy
|
||||
font.monospace: DejaVu Sans Mono, Bitstream Vera Sans Mono, Computer Modern Typewriter, Andale Mono, Nimbus Mono L, Courier New, Courier, Fixed, Terminal, monospace
|
||||
|
||||
|
||||
## ***************************************************************************
|
||||
## * TEXT *
|
||||
## ***************************************************************************
|
||||
## The text properties used by `text.Text`.
|
||||
## See https://matplotlib.org/api/artist_api.html#module-matplotlib.text
|
||||
## for more information on text properties
|
||||
#text.color: black
|
||||
|
BIN
work/images/C1_W2_L1_S1_Lecture_b.png
Normal file
After Width: | Height: | Size: 83 KiB |
BIN
work/images/C1_W2_L1_S1_model.png
Normal file
After Width: | Height: | Size: 76 KiB |
BIN
work/images/C1_W2_L1_S1_trainingdata.png
Normal file
After Width: | Height: | Size: 86 KiB |
BIN
work/images/C1_W2_L1_S2_Lectureb.png
Normal file
After Width: | Height: | Size: 133 KiB |
BIN
work/images/C1_W2_L2_S1_Lecture_GD.png
Normal file
After Width: | Height: | Size: 91 KiB |
BIN
work/images/C1_W2_Lab02_GoalOfRegression.PNG
Normal file
After Width: | Height: | Size: 105 KiB |
BIN
work/images/C1_W2_Lab03_alpha_to_big.PNG
Normal file
After Width: | Height: | Size: 60 KiB |
BIN
work/images/C1_W2_Lab03_lecture_learningrate.PNG
Normal file
After Width: | Height: | Size: 84 KiB |
BIN
work/images/C1_W2_Lab03_lecture_slopes.PNG
Normal file
After Width: | Height: | Size: 67 KiB |
BIN
work/images/C1_W2_Lab04_Figures And animations.pptx
Normal file
BIN
work/images/C1_W2_Lab04_Matrices.PNG
Normal file
After Width: | Height: | Size: 14 KiB |
BIN
work/images/C1_W2_Lab04_Vectors.PNG
Normal file
After Width: | Height: | Size: 5.8 KiB |
BIN
work/images/C1_W2_Lab04_dot_notrans.gif
Normal file
After Width: | Height: | Size: 1.6 MiB |
BIN
work/images/C1_W2_Lab06_LongRun.PNG
Normal file
After Width: | Height: | Size: 302 KiB |
BIN
work/images/C1_W2_Lab06_ShortRun.PNG
Normal file
After Width: | Height: | Size: 363 KiB |
BIN
work/images/C1_W2_Lab06_contours.PNG
Normal file
After Width: | Height: | Size: 37 KiB |
BIN
work/images/C1_W2_Lab06_featurescalingheader.PNG
Normal file
After Width: | Height: | Size: 68 KiB |
BIN
work/images/C1_W2_Lab06_learningrate.PNG
Normal file
After Width: | Height: | Size: 76 KiB |
BIN
work/images/C1_W2_Lab06_scale.PNG
Normal file
After Width: | Height: | Size: 65 KiB |
BIN
work/images/C1_W2_Lab07_FeatureEngLecture.PNG
Normal file
After Width: | Height: | Size: 93 KiB |
112
work/lab_utils_common.py
Normal file
@ -0,0 +1,112 @@
|
||||
"""
|
||||
lab_utils_common.py
|
||||
functions common to all optional labs, Course 1, Week 2
|
||||
"""
|
||||
|
||||
import numpy as np
|
||||
import matplotlib.pyplot as plt
|
||||
|
||||
plt.style.use('./deeplearning.mplstyle')
|
||||
dlblue = '#0096ff'; dlorange = '#FF9300'; dldarkred='#C00000'; dlmagenta='#FF40FF'; dlpurple='#7030A0';
|
||||
dlcolors = [dlblue, dlorange, dldarkred, dlmagenta, dlpurple]
|
||||
dlc = dict(dlblue = '#0096ff', dlorange = '#FF9300', dldarkred='#C00000', dlmagenta='#FF40FF', dlpurple='#7030A0')
|
||||
|
||||
|
||||
##########################################################
|
||||
# Regression Routines
|
||||
##########################################################
|
||||
|
||||
#Function to calculate the cost
|
||||
def compute_cost_matrix(X, y, w, b, verbose=False):
|
||||
"""
|
||||
Computes the gradient for linear regression
|
||||
Args:
|
||||
X (ndarray (m,n)): Data, m examples with n features
|
||||
y (ndarray (m,)) : target values
|
||||
w (ndarray (n,)) : model parameters
|
||||
b (scalar) : model parameter
|
||||
verbose : (Boolean) If true, print out intermediate value f_wb
|
||||
Returns
|
||||
cost: (scalar)
|
||||
"""
|
||||
m = X.shape[0]
|
||||
|
||||
# calculate f_wb for all examples.
|
||||
f_wb = X @ w + b
|
||||
# calculate cost
|
||||
total_cost = (1/(2*m)) * np.sum((f_wb-y)**2)
|
||||
|
||||
if verbose: print("f_wb:")
|
||||
if verbose: print(f_wb)
|
||||
|
||||
return total_cost
|
||||
|
||||
def compute_gradient_matrix(X, y, w, b):
|
||||
"""
|
||||
Computes the gradient for linear regression
|
||||
|
||||
Args:
|
||||
X (ndarray (m,n)): Data, m examples with n features
|
||||
y (ndarray (m,)) : target values
|
||||
w (ndarray (n,)) : model parameters
|
||||
b (scalar) : model parameter
|
||||
Returns
|
||||
dj_dw (ndarray (n,1)): The gradient of the cost w.r.t. the parameters w.
|
||||
dj_db (scalar): The gradient of the cost w.r.t. the parameter b.
|
||||
|
||||
"""
|
||||
m,n = X.shape
|
||||
f_wb = X @ w + b
|
||||
e = f_wb - y
|
||||
dj_dw = (1/m) * (X.T @ e)
|
||||
dj_db = (1/m) * np.sum(e)
|
||||
|
||||
return dj_db,dj_dw
|
||||
|
||||
|
||||
# Loop version of multi-variable compute_cost
|
||||
def compute_cost(X, y, w, b):
|
||||
"""
|
||||
compute cost
|
||||
Args:
|
||||
X (ndarray (m,n)): Data, m examples with n features
|
||||
y (ndarray (m,)) : target values
|
||||
w (ndarray (n,)) : model parameters
|
||||
b (scalar) : model parameter
|
||||
Returns
|
||||
cost (scalar) : cost
|
||||
"""
|
||||
m = X.shape[0]
|
||||
cost = 0.0
|
||||
for i in range(m):
|
||||
f_wb_i = np.dot(X[i],w) + b #(n,)(n,)=scalar
|
||||
cost = cost + (f_wb_i - y[i])**2
|
||||
cost = cost/(2*m)
|
||||
return cost
|
||||
|
||||
def compute_gradient(X, y, w, b):
|
||||
"""
|
||||
Computes the gradient for linear regression
|
||||
Args:
|
||||
X (ndarray (m,n)): Data, m examples with n features
|
||||
y (ndarray (m,)) : target values
|
||||
w (ndarray (n,)) : model parameters
|
||||
b (scalar) : model parameter
|
||||
Returns
|
||||
dj_dw (ndarray Shape (n,)): The gradient of the cost w.r.t. the parameters w.
|
||||
dj_db (scalar): The gradient of the cost w.r.t. the parameter b.
|
||||
"""
|
||||
m,n = X.shape #(number of examples, number of features)
|
||||
dj_dw = np.zeros((n,))
|
||||
dj_db = 0.
|
||||
|
||||
for i in range(m):
|
||||
err = (np.dot(X[i], w) + b) - y[i]
|
||||
for j in range(n):
|
||||
dj_dw[j] = dj_dw[j] + err * X[i,j]
|
||||
dj_db = dj_db + err
|
||||
dj_dw = dj_dw/m
|
||||
dj_db = dj_db/m
|
||||
|
||||
return dj_db,dj_dw
|
||||
|
569
work/lab_utils_multi.py
Normal file
@ -0,0 +1,569 @@
|
||||
import numpy as np
|
||||
import copy
|
||||
import math
|
||||
from scipy.stats import norm
|
||||
import matplotlib.pyplot as plt
|
||||
from mpl_toolkits.mplot3d import axes3d
|
||||
from matplotlib.ticker import MaxNLocator
|
||||
dlblue = '#0096ff'; dlorange = '#FF9300'; dldarkred='#C00000'; dlmagenta='#FF40FF'; dlpurple='#7030A0';
|
||||
plt.style.use('./deeplearning.mplstyle')
|
||||
|
||||
def load_data_multi():
|
||||
data = np.loadtxt("data/ex1data2.txt", delimiter=',')
|
||||
X = data[:,:2]
|
||||
y = data[:,2]
|
||||
return X, y
|
||||
|
||||
##########################################################
|
||||
# Plotting Routines
|
||||
##########################################################
|
||||
|
||||
def plt_house_x(X, y,f_wb=None, ax=None):
|
||||
''' plot house with aXis '''
|
||||
if not ax:
|
||||
fig, ax = plt.subplots(1,1)
|
||||
ax.scatter(X, y, marker='x', c='r', label="Actual Value")
|
||||
|
||||
ax.set_title("Housing Prices")
|
||||
ax.set_ylabel('Price (in 1000s of dollars)')
|
||||
ax.set_xlabel(f'Size (1000 sqft)')
|
||||
if f_wb is not None:
|
||||
ax.plot(X, f_wb, c=dlblue, label="Our Prediction")
|
||||
ax.legend()
|
||||
|
||||
|
||||
def mk_cost_lines(x,y,w,b, ax):
|
||||
''' makes vertical cost lines'''
|
||||
cstr = "cost = (1/2m)*1000*("
|
||||
ctot = 0
|
||||
label = 'cost for point'
|
||||
for p in zip(x,y):
|
||||
f_wb_p = w*p[0]+b
|
||||
c_p = ((f_wb_p - p[1])**2)/2
|
||||
c_p_txt = c_p/1000
|
||||
ax.vlines(p[0], p[1],f_wb_p, lw=3, color=dlpurple, ls='dotted', label=label)
|
||||
label='' #just one
|
||||
cxy = [p[0], p[1] + (f_wb_p-p[1])/2]
|
||||
ax.annotate(f'{c_p_txt:0.0f}', xy=cxy, xycoords='data',color=dlpurple,
|
||||
xytext=(5, 0), textcoords='offset points')
|
||||
cstr += f"{c_p_txt:0.0f} +"
|
||||
ctot += c_p
|
||||
ctot = ctot/(len(x))
|
||||
cstr = cstr[:-1] + f") = {ctot:0.0f}"
|
||||
ax.text(0.15,0.02,cstr, transform=ax.transAxes, color=dlpurple)
|
||||
|
||||
|
||||
def inbounds(a,b,xlim,ylim):
|
||||
xlow,xhigh = xlim
|
||||
ylow,yhigh = ylim
|
||||
ax, ay = a
|
||||
bx, by = b
|
||||
if (ax > xlow and ax < xhigh) and (bx > xlow and bx < xhigh) \
|
||||
and (ay > ylow and ay < yhigh) and (by > ylow and by < yhigh):
|
||||
return(True)
|
||||
else:
|
||||
return(False)
|
||||
|
||||
from mpl_toolkits.mplot3d import axes3d
|
||||
def plt_contour_wgrad(x, y, hist, ax, w_range=[-100, 500, 5], b_range=[-500, 500, 5],
|
||||
contours = [0.1,50,1000,5000,10000,25000,50000],
|
||||
resolution=5, w_final=200, b_final=100,step=10 ):
|
||||
b0,w0 = np.meshgrid(np.arange(*b_range),np.arange(*w_range))
|
||||
z=np.zeros_like(b0)
|
||||
n,_ = w0.shape
|
||||
for i in range(w0.shape[0]):
|
||||
for j in range(w0.shape[1]):
|
||||
z[i][j] = compute_cost(x, y, w0[i][j], b0[i][j] )
|
||||
|
||||
CS = ax.contour(w0, b0, z, contours, linewidths=2,
|
||||
colors=[dlblue, dlorange, dldarkred, dlmagenta, dlpurple])
|
||||
ax.clabel(CS, inline=1, fmt='%1.0f', fontsize=10)
|
||||
ax.set_xlabel("w"); ax.set_ylabel("b")
|
||||
ax.set_title('Contour plot of cost J(w,b), vs b,w with path of gradient descent')
|
||||
w = w_final; b=b_final
|
||||
ax.hlines(b, ax.get_xlim()[0],w, lw=2, color=dlpurple, ls='dotted')
|
||||
ax.vlines(w, ax.get_ylim()[0],b, lw=2, color=dlpurple, ls='dotted')
|
||||
|
||||
base = hist[0]
|
||||
for point in hist[0::step]:
|
||||
edist = np.sqrt((base[0] - point[0])**2 + (base[1] - point[1])**2)
|
||||
if(edist > resolution or point==hist[-1]):
|
||||
if inbounds(point,base, ax.get_xlim(),ax.get_ylim()):
|
||||
plt.annotate('', xy=point, xytext=base,xycoords='data',
|
||||
arrowprops={'arrowstyle': '->', 'color': 'r', 'lw': 3},
|
||||
va='center', ha='center')
|
||||
base=point
|
||||
return
|
||||
|
||||
|
||||
# plots p1 vs p2. Prange is an array of entries [min, max, steps]. In feature scaling lab.
|
||||
def plt_contour_multi(x, y, w, b, ax, prange, p1, p2, title="", xlabel="", ylabel=""):
|
||||
contours = [1e2, 2e2,3e2,4e2, 5e2, 6e2, 7e2,8e2,1e3, 1.25e3,1.5e3, 1e4, 1e5, 1e6, 1e7]
|
||||
px,py = np.meshgrid(np.linspace(*(prange[p1])),np.linspace(*(prange[p2])))
|
||||
z=np.zeros_like(px)
|
||||
n,_ = px.shape
|
||||
for i in range(px.shape[0]):
|
||||
for j in range(px.shape[1]):
|
||||
w_ij = w
|
||||
b_ij = b
|
||||
if p1 <= 3: w_ij[p1] = px[i,j]
|
||||
if p1 == 4: b_ij = px[i,j]
|
||||
if p2 <= 3: w_ij[p2] = py[i,j]
|
||||
if p2 == 4: b_ij = py[i,j]
|
||||
|
||||
z[i][j] = compute_cost(x, y, w_ij, b_ij )
|
||||
CS = ax.contour(px, py, z, contours, linewidths=2,
|
||||
colors=[dlblue, dlorange, dldarkred, dlmagenta, dlpurple])
|
||||
ax.clabel(CS, inline=1, fmt='%1.2e', fontsize=10)
|
||||
ax.set_xlabel(xlabel); ax.set_ylabel(ylabel)
|
||||
ax.set_title(title, fontsize=14)
|
||||
|
||||
|
||||
def plt_equal_scale(X_train, X_norm, y_train):
|
||||
fig,ax = plt.subplots(1,2,figsize=(12,5))
|
||||
prange = [
|
||||
[ 0.238-0.045, 0.238+0.045, 50],
|
||||
[-25.77326319-0.045, -25.77326319+0.045, 50],
|
||||
[-50000, 0, 50],
|
||||
[-1500, 0, 50],
|
||||
[0, 200000, 50]]
|
||||
w_best = np.array([0.23844318, -25.77326319, -58.11084634, -1.57727192])
|
||||
b_best = 235
|
||||
plt_contour_multi(X_train, y_train, w_best, b_best, ax[0], prange, 0, 1,
|
||||
title='Unnormalized, J(w,b), vs w[0],w[1]',
|
||||
xlabel= "w[0] (size(sqft))", ylabel="w[1] (# bedrooms)")
|
||||
#
|
||||
w_best = np.array([111.1972, -16.75480051, -28.51530411, -37.17305735])
|
||||
b_best = 376.949151515151
|
||||
prange = [[ 111-50, 111+50, 75],
|
||||
[-16.75-50,-16.75+50, 75],
|
||||
[-28.5-8, -28.5+8, 50],
|
||||
[-37.1-16,-37.1+16, 50],
|
||||
[376-150, 376+150, 50]]
|
||||
plt_contour_multi(X_norm, y_train, w_best, b_best, ax[1], prange, 0, 1,
|
||||
title='Normalized, J(w,b), vs w[0],w[1]',
|
||||
xlabel= "w[0] (normalized size(sqft))", ylabel="w[1] (normalized # bedrooms)")
|
||||
fig.suptitle("Cost contour with equal scale", fontsize=18)
|
||||
#plt.tight_layout(rect=(0,0,1.05,1.05))
|
||||
fig.tight_layout(rect=(0,0,1,0.95))
|
||||
plt.show()
|
||||
|
||||
def plt_divergence(p_hist, J_hist, x_train,y_train):
|
||||
|
||||
x=np.zeros(len(p_hist))
|
||||
y=np.zeros(len(p_hist))
|
||||
v=np.zeros(len(p_hist))
|
||||
for i in range(len(p_hist)):
|
||||
x[i] = p_hist[i][0]
|
||||
y[i] = p_hist[i][1]
|
||||
v[i] = J_hist[i]
|
||||
|
||||
fig = plt.figure(figsize=(12,5))
|
||||
plt.subplots_adjust( wspace=0 )
|
||||
gs = fig.add_gridspec(1, 5)
|
||||
fig.suptitle(f"Cost escalates when learning rate is too large")
|
||||
#===============
|
||||
# First subplot
|
||||
#===============
|
||||
ax = fig.add_subplot(gs[:2], )
|
||||
|
||||
# Print w vs cost to see minimum
|
||||
fix_b = 100
|
||||
w_array = np.arange(-70000, 70000, 1000)
|
||||
cost = np.zeros_like(w_array)
|
||||
|
||||
for i in range(len(w_array)):
|
||||
tmp_w = w_array[i]
|
||||
cost[i] = compute_cost(x_train, y_train, tmp_w, fix_b)
|
||||
|
||||
ax.plot(w_array, cost)
|
||||
ax.plot(x,v, c=dlmagenta)
|
||||
ax.set_title("Cost vs w, b set to 100")
|
||||
ax.set_ylabel('Cost')
|
||||
ax.set_xlabel('w')
|
||||
ax.xaxis.set_major_locator(MaxNLocator(2))
|
||||
|
||||
#===============
|
||||
# Second Subplot
|
||||
#===============
|
||||
|
||||
tmp_b,tmp_w = np.meshgrid(np.arange(-35000, 35000, 500),np.arange(-70000, 70000, 500))
|
||||
z=np.zeros_like(tmp_b)
|
||||
for i in range(tmp_w.shape[0]):
|
||||
for j in range(tmp_w.shape[1]):
|
||||
z[i][j] = compute_cost(x_train, y_train, tmp_w[i][j], tmp_b[i][j] )
|
||||
|
||||
ax = fig.add_subplot(gs[2:], projection='3d')
|
||||
ax.plot_surface(tmp_w, tmp_b, z, alpha=0.3, color=dlblue)
|
||||
ax.xaxis.set_major_locator(MaxNLocator(2))
|
||||
ax.yaxis.set_major_locator(MaxNLocator(2))
|
||||
|
||||
ax.set_xlabel('w', fontsize=16)
|
||||
ax.set_ylabel('b', fontsize=16)
|
||||
ax.set_zlabel('\ncost', fontsize=16)
|
||||
plt.title('Cost vs (b, w)')
|
||||
# Customize the view angle
|
||||
ax.view_init(elev=20., azim=-65)
|
||||
ax.plot(x, y, v,c=dlmagenta)
|
||||
|
||||
return
|
||||
|
||||
# draw derivative line
|
||||
# y = m*(x - x1) + y1
|
||||
def add_line(dj_dx, x1, y1, d, ax):
|
||||
x = np.linspace(x1-d, x1+d,50)
|
||||
y = dj_dx*(x - x1) + y1
|
||||
ax.scatter(x1, y1, color=dlblue, s=50)
|
||||
ax.plot(x, y, '--', c=dldarkred,zorder=10, linewidth = 1)
|
||||
xoff = 30 if x1 == 200 else 10
|
||||
ax.annotate(r"$\frac{\partial J}{\partial w}$ =%d" % dj_dx, fontsize=14,
|
||||
xy=(x1, y1), xycoords='data',
|
||||
xytext=(xoff, 10), textcoords='offset points',
|
||||
arrowprops=dict(arrowstyle="->"),
|
||||
horizontalalignment='left', verticalalignment='top')
|
||||
|
||||
def plt_gradients(x_train,y_train, f_compute_cost, f_compute_gradient):
|
||||
#===============
|
||||
# First subplot
|
||||
#===============
|
||||
fig,ax = plt.subplots(1,2,figsize=(12,4))
|
||||
|
||||
# Print w vs cost to see minimum
|
||||
fix_b = 100
|
||||
w_array = np.linspace(-100, 500, 50)
|
||||
w_array = np.linspace(0, 400, 50)
|
||||
cost = np.zeros_like(w_array)
|
||||
|
||||
for i in range(len(w_array)):
|
||||
tmp_w = w_array[i]
|
||||
cost[i] = f_compute_cost(x_train, y_train, tmp_w, fix_b)
|
||||
ax[0].plot(w_array, cost,linewidth=1)
|
||||
ax[0].set_title("Cost vs w, with gradient; b set to 100")
|
||||
ax[0].set_ylabel('Cost')
|
||||
ax[0].set_xlabel('w')
|
||||
|
||||
# plot lines for fixed b=100
|
||||
for tmp_w in [100,200,300]:
|
||||
fix_b = 100
|
||||
dj_dw,dj_db = f_compute_gradient(x_train, y_train, tmp_w, fix_b )
|
||||
j = f_compute_cost(x_train, y_train, tmp_w, fix_b)
|
||||
add_line(dj_dw, tmp_w, j, 30, ax[0])
|
||||
|
||||
#===============
|
||||
# Second Subplot
|
||||
#===============
|
||||
|
||||
tmp_b,tmp_w = np.meshgrid(np.linspace(-200, 200, 10), np.linspace(-100, 600, 10))
|
||||
U = np.zeros_like(tmp_w)
|
||||
V = np.zeros_like(tmp_b)
|
||||
for i in range(tmp_w.shape[0]):
|
||||
for j in range(tmp_w.shape[1]):
|
||||
U[i][j], V[i][j] = f_compute_gradient(x_train, y_train, tmp_w[i][j], tmp_b[i][j] )
|
||||
X = tmp_w
|
||||
Y = tmp_b
|
||||
n=-2
|
||||
color_array = np.sqrt(((V-n)/2)**2 + ((U-n)/2)**2)
|
||||
|
||||
ax[1].set_title('Gradient shown in quiver plot')
|
||||
Q = ax[1].quiver(X, Y, U, V, color_array, units='width', )
|
||||
qk = ax[1].quiverkey(Q, 0.9, 0.9, 2, r'$2 \frac{m}{s}$', labelpos='E',coordinates='figure')
|
||||
ax[1].set_xlabel("w"); ax[1].set_ylabel("b")
|
||||
|
||||
def norm_plot(ax, data):
|
||||
scale = (np.max(data) - np.min(data))*0.2
|
||||
x = np.linspace(np.min(data)-scale,np.max(data)+scale,50)
|
||||
_,bins, _ = ax.hist(data, x, color="xkcd:azure")
|
||||
#ax.set_ylabel("Count")
|
||||
|
||||
mu = np.mean(data);
|
||||
std = np.std(data);
|
||||
dist = norm.pdf(bins, loc=mu, scale = std)
|
||||
|
||||
axr = ax.twinx()
|
||||
axr.plot(bins,dist, color = "orangered", lw=2)
|
||||
axr.set_ylim(bottom=0)
|
||||
axr.axis('off')
|
||||
|
||||
def plot_cost_i_w(X,y,hist):
|
||||
ws = np.array([ p[0] for p in hist["params"]])
|
||||
rng = max(abs(ws[:,0].min()),abs(ws[:,0].max()))
|
||||
wr = np.linspace(-rng+0.27,rng+0.27,20)
|
||||
cst = [compute_cost(X,y,np.array([wr[i],-32, -67, -1.46]), 221) for i in range(len(wr))]
|
||||
|
||||
fig,ax = plt.subplots(1,2,figsize=(12,3))
|
||||
ax[0].plot(hist["iter"], (hist["cost"])); ax[0].set_title("Cost vs Iteration")
|
||||
ax[0].set_xlabel("iteration"); ax[0].set_ylabel("Cost")
|
||||
ax[1].plot(wr, cst); ax[1].set_title("Cost vs w[0]")
|
||||
ax[1].set_xlabel("w[0]"); ax[1].set_ylabel("Cost")
|
||||
ax[1].plot(ws[:,0],hist["cost"])
|
||||
plt.show()
|
||||
|
||||
|
||||
##########################################################
|
||||
# Regression Routines
|
||||
##########################################################
|
||||
|
||||
def compute_gradient_matrix(X, y, w, b):
|
||||
"""
|
||||
Computes the gradient for linear regression
|
||||
|
||||
Args:
|
||||
X : (array_like Shape (m,n)) variable such as house size
|
||||
y : (array_like Shape (m,1)) actual value
|
||||
w : (array_like Shape (n,1)) Values of parameters of the model
|
||||
b : (scalar ) Values of parameter of the model
|
||||
Returns
|
||||
dj_dw: (array_like Shape (n,1)) The gradient of the cost w.r.t. the parameters w.
|
||||
dj_db: (scalar) The gradient of the cost w.r.t. the parameter b.
|
||||
|
||||
"""
|
||||
m,n = X.shape
|
||||
f_wb = X @ w + b
|
||||
e = f_wb - y
|
||||
dj_dw = (1/m) * (X.T @ e)
|
||||
dj_db = (1/m) * np.sum(e)
|
||||
|
||||
return dj_db,dj_dw
|
||||
|
||||
#Function to calculate the cost
|
||||
def compute_cost_matrix(X, y, w, b, verbose=False):
|
||||
"""
|
||||
Computes the gradient for linear regression
|
||||
Args:
|
||||
X : (array_like Shape (m,n)) variable such as house size
|
||||
y : (array_like Shape (m,)) actual value
|
||||
w : (array_like Shape (n,)) parameters of the model
|
||||
b : (scalar ) parameter of the model
|
||||
verbose : (Boolean) If true, print out intermediate value f_wb
|
||||
Returns
|
||||
cost: (scalar)
|
||||
"""
|
||||
m,n = X.shape
|
||||
|
||||
# calculate f_wb for all examples.
|
||||
f_wb = X @ w + b
|
||||
# calculate cost
|
||||
total_cost = (1/(2*m)) * np.sum((f_wb-y)**2)
|
||||
|
||||
if verbose: print("f_wb:")
|
||||
if verbose: print(f_wb)
|
||||
|
||||
return total_cost
|
||||
|
||||
# Loop version of multi-variable compute_cost
|
||||
def compute_cost(X, y, w, b):
|
||||
"""
|
||||
compute cost
|
||||
Args:
|
||||
X : (ndarray): Shape (m,n) matrix of examples with multiple features
|
||||
w : (ndarray): Shape (n) parameters for prediction
|
||||
b : (scalar): parameter for prediction
|
||||
Returns
|
||||
cost: (scalar) cost
|
||||
"""
|
||||
m = X.shape[0]
|
||||
cost = 0.0
|
||||
for i in range(m):
|
||||
f_wb_i = np.dot(X[i],w) + b
|
||||
cost = cost + (f_wb_i - y[i])**2
|
||||
cost = cost/(2*m)
|
||||
return(np.squeeze(cost))
|
||||
|
||||
def compute_gradient(X, y, w, b):
|
||||
"""
|
||||
Computes the gradient for linear regression
|
||||
Args:
|
||||
X : (ndarray Shape (m,n)) matrix of examples
|
||||
y : (ndarray Shape (m,)) target value of each example
|
||||
w : (ndarray Shape (n,)) parameters of the model
|
||||
b : (scalar) parameter of the model
|
||||
Returns
|
||||
dj_dw : (ndarray Shape (n,)) The gradient of the cost w.r.t. the parameters w.
|
||||
dj_db : (scalar) The gradient of the cost w.r.t. the parameter b.
|
||||
"""
|
||||
m,n = X.shape #(number of examples, number of features)
|
||||
dj_dw = np.zeros((n,))
|
||||
dj_db = 0.
|
||||
|
||||
for i in range(m):
|
||||
err = (np.dot(X[i], w) + b) - y[i]
|
||||
for j in range(n):
|
||||
dj_dw[j] = dj_dw[j] + err * X[i,j]
|
||||
dj_db = dj_db + err
|
||||
dj_dw = dj_dw/m
|
||||
dj_db = dj_db/m
|
||||
|
||||
return dj_db,dj_dw
|
||||
|
||||
#This version saves more values and is more verbose than the assigment versons
|
||||
def gradient_descent_houses(X, y, w_in, b_in, cost_function, gradient_function, alpha, num_iters):
|
||||
"""
|
||||
Performs batch gradient descent to learn theta. Updates theta by taking
|
||||
num_iters gradient steps with learning rate alpha
|
||||
|
||||
Args:
|
||||
X : (array_like Shape (m,n) matrix of examples
|
||||
y : (array_like Shape (m,)) target value of each example
|
||||
w_in : (array_like Shape (n,)) Initial values of parameters of the model
|
||||
b_in : (scalar) Initial value of parameter of the model
|
||||
cost_function: function to compute cost
|
||||
gradient_function: function to compute the gradient
|
||||
alpha : (float) Learning rate
|
||||
num_iters : (int) number of iterations to run gradient descent
|
||||
Returns
|
||||
w : (array_like Shape (n,)) Updated values of parameters of the model after
|
||||
running gradient descent
|
||||
b : (scalar) Updated value of parameter of the model after
|
||||
running gradient descent
|
||||
"""
|
||||
|
||||
# number of training examples
|
||||
m = len(X)
|
||||
|
||||
# An array to store values at each iteration primarily for graphing later
|
||||
hist={}
|
||||
hist["cost"] = []; hist["params"] = []; hist["grads"]=[]; hist["iter"]=[];
|
||||
|
||||
w = copy.deepcopy(w_in) #avoid modifying global w within function
|
||||
b = b_in
|
||||
save_interval = np.ceil(num_iters/10000) # prevent resource exhaustion for long runs
|
||||
|
||||
print(f"Iteration Cost w0 w1 w2 w3 b djdw0 djdw1 djdw2 djdw3 djdb ")
|
||||
print(f"---------------------|--------|--------|--------|--------|--------|--------|--------|--------|--------|--------|")
|
||||
|
||||
for i in range(num_iters):
|
||||
|
||||
# Calculate the gradient and update the parameters
|
||||
dj_db,dj_dw = gradient_function(X, y, w, b)
|
||||
|
||||
# Update Parameters using w, b, alpha and gradient
|
||||
w = w - alpha * dj_dw
|
||||
b = b - alpha * dj_db
|
||||
|
||||
# Save cost J,w,b at each save interval for graphing
|
||||
if i == 0 or i % save_interval == 0:
|
||||
hist["cost"].append(cost_function(X, y, w, b))
|
||||
hist["params"].append([w,b])
|
||||
hist["grads"].append([dj_dw,dj_db])
|
||||
hist["iter"].append(i)
|
||||
|
||||
# Print cost every at intervals 10 times or as many iterations if < 10
|
||||
if i% math.ceil(num_iters/10) == 0:
|
||||
#print(f"Iteration {i:4d}: Cost {cost_function(X, y, w, b):8.2f} ")
|
||||
cst = cost_function(X, y, w, b)
|
||||
print(f"{i:9d} {cst:0.5e} {w[0]: 0.1e} {w[1]: 0.1e} {w[2]: 0.1e} {w[3]: 0.1e} {b: 0.1e} {dj_dw[0]: 0.1e} {dj_dw[1]: 0.1e} {dj_dw[2]: 0.1e} {dj_dw[3]: 0.1e} {dj_db: 0.1e}")
|
||||
|
||||
return w, b, hist #return w,b and history for graphing
|
||||
|
||||
def run_gradient_descent(X,y,iterations=1000, alpha = 1e-6):
|
||||
|
||||
m,n = X.shape
|
||||
# initialize parameters
|
||||
initial_w = np.zeros(n)
|
||||
initial_b = 0
|
||||
# run gradient descent
|
||||
w_out, b_out, hist_out = gradient_descent_houses(X ,y, initial_w, initial_b,
|
||||
compute_cost, compute_gradient_matrix, alpha, iterations)
|
||||
print(f"w,b found by gradient descent: w: {w_out}, b: {b_out:0.2f}")
|
||||
|
||||
return(w_out, b_out, hist_out)
|
||||
|
||||
# compact extaction of hist data
|
||||
#x = hist["iter"]
|
||||
#J = np.array([ p for p in hist["cost"]])
|
||||
#ws = np.array([ p[0] for p in hist["params"]])
|
||||
#dj_ws = np.array([ p[0] for p in hist["grads"]])
|
||||
|
||||
#bs = np.array([ p[1] for p in hist["params"]])
|
||||
|
||||
def run_gradient_descent_feng(X,y,iterations=1000, alpha = 1e-6):
|
||||
m,n = X.shape
|
||||
# initialize parameters
|
||||
initial_w = np.zeros(n)
|
||||
initial_b = 0
|
||||
# run gradient descent
|
||||
w_out, b_out, hist_out = gradient_descent(X ,y, initial_w, initial_b,
|
||||
compute_cost, compute_gradient_matrix, alpha, iterations)
|
||||
print(f"w,b found by gradient descent: w: {w_out}, b: {b_out:0.4f}")
|
||||
|
||||
return(w_out, b_out)
|
||||
|
||||
def gradient_descent(X, y, w_in, b_in, cost_function, gradient_function, alpha, num_iters):
|
||||
"""
|
||||
Performs batch gradient descent to learn theta. Updates theta by taking
|
||||
num_iters gradient steps with learning rate alpha
|
||||
|
||||
Args:
|
||||
X : (array_like Shape (m,n) matrix of examples
|
||||
y : (array_like Shape (m,)) target value of each example
|
||||
w_in : (array_like Shape (n,)) Initial values of parameters of the model
|
||||
b_in : (scalar) Initial value of parameter of the model
|
||||
cost_function: function to compute cost
|
||||
gradient_function: function to compute the gradient
|
||||
alpha : (float) Learning rate
|
||||
num_iters : (int) number of iterations to run gradient descent
|
||||
Returns
|
||||
w : (array_like Shape (n,)) Updated values of parameters of the model after
|
||||
running gradient descent
|
||||
b : (scalar) Updated value of parameter of the model after
|
||||
running gradient descent
|
||||
"""
|
||||
|
||||
# number of training examples
|
||||
m = len(X)
|
||||
|
||||
# An array to store values at each iteration primarily for graphing later
|
||||
hist={}
|
||||
hist["cost"] = []; hist["params"] = []; hist["grads"]=[]; hist["iter"]=[];
|
||||
|
||||
w = copy.deepcopy(w_in) #avoid modifying global w within function
|
||||
b = b_in
|
||||
save_interval = np.ceil(num_iters/10000) # prevent resource exhaustion for long runs
|
||||
|
||||
for i in range(num_iters):
|
||||
|
||||
# Calculate the gradient and update the parameters
|
||||
dj_db,dj_dw = gradient_function(X, y, w, b)
|
||||
|
||||
# Update Parameters using w, b, alpha and gradient
|
||||
w = w - alpha * dj_dw
|
||||
b = b - alpha * dj_db
|
||||
|
||||
# Save cost J,w,b at each save interval for graphing
|
||||
if i == 0 or i % save_interval == 0:
|
||||
hist["cost"].append(cost_function(X, y, w, b))
|
||||
hist["params"].append([w,b])
|
||||
hist["grads"].append([dj_dw,dj_db])
|
||||
hist["iter"].append(i)
|
||||
|
||||
# Print cost every at intervals 10 times or as many iterations if < 10
|
||||
if i% math.ceil(num_iters/10) == 0:
|
||||
#print(f"Iteration {i:4d}: Cost {cost_function(X, y, w, b):8.2f} ")
|
||||
cst = cost_function(X, y, w, b)
|
||||
print(f"Iteration {i:9d}, Cost: {cst:0.5e}")
|
||||
return w, b, hist #return w,b and history for graphing
|
||||
|
||||
def load_house_data():
|
||||
data = np.loadtxt("./data/houses.txt", delimiter=',', skiprows=1)
|
||||
X = data[:,:4]
|
||||
y = data[:,4]
|
||||
return X, y
|
||||
|
||||
def zscore_normalize_features(X,rtn_ms=False):
|
||||
"""
|
||||
returns z-score normalized X by column
|
||||
Args:
|
||||
X : (numpy array (m,n))
|
||||
Returns
|
||||
X_norm: (numpy array (m,n)) input normalized by column
|
||||
"""
|
||||
mu = np.mean(X,axis=0)
|
||||
sigma = np.std(X,axis=0)
|
||||
X_norm = (X - mu)/sigma
|
||||
|
||||
if rtn_ms:
|
||||
return(X_norm, mu, sigma)
|
||||
else:
|
||||
return(X_norm)
|
||||
|
||||
|
@ -0,0 +1,329 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Ungraded Lab: Model Representation\n",
|
||||
"\n",
|
||||
"In this ungraded lab, you will implement the model $f_w$ for linear regression with one variable.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Problem Statement\n",
|
||||
"\n",
|
||||
"You will use the motivating example of housing price prediction. There are two data points - a house with 1000 square feet sold for \\\\$200,000 and a house with 2000 square feet sold for \\\\$400,000.\n",
|
||||
"\n",
|
||||
"Therefore, your dataset contains the following two points - \n",
|
||||
"\n",
|
||||
"| Size (feet$^2$) | Price (1000s of dollars) |\n",
|
||||
"| -------------------| ------------------------ |\n",
|
||||
"| 1000 | 200 |\n",
|
||||
"| 2000 | 400 |\n",
|
||||
"\n",
|
||||
"You would like to fit a linear regression model (represented with a straight line) through these two points, so you can then predict price for other houses - say, a house with 1200 feet$^2$.\n",
|
||||
"\n",
|
||||
"### Notation: `X` and `y`\n",
|
||||
"\n",
|
||||
"For the next few labs, you will use lists in python to represent your dataset. As shown in the video:\n",
|
||||
"- `X` represents input variables, also called input features (in this case - Size (feet$^2$)) and \n",
|
||||
"- `y` represents output variables, also known as target variables (in this case - Price (1000s of dollars)). \n",
|
||||
"\n",
|
||||
"Please run the following code cell to create your `X` and `y` variables."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# X is the input variable (size in square feet)\n",
|
||||
"# y in the output variable (price in 1000s of dollars)\n",
|
||||
"X = [1000, 2000] \n",
|
||||
"y = [200, 400]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Number of training examples `m`\n",
|
||||
"You will use `m` to denote the number of training examples. In Python, use the `len()` function to get the number of examples in a list. You can get `m` by running the next code cell."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# m is the number of training examples\n",
|
||||
"m = len(X)\n",
|
||||
"print(f\"Number of training examples is: {m}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Training example `x_i, y_i`\n",
|
||||
"\n",
|
||||
"You will use (x$^i$, y$^i$) to denote the $i^{th}$ training example. Since Python is zero indexed, (x$^0$, y$^0$) is (1000, 200) and (x$^1$, y$^1$) is (2000, 400). \n",
|
||||
"\n",
|
||||
"Run the next code block below to get the $i^{th}$ training example in a Python list."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"i = 0 # Change this to 1 to see (x^1, y^1)\n",
|
||||
"\n",
|
||||
"x_i = X[i]\n",
|
||||
"y_i = y[i]\n",
|
||||
"print(f\"(x^({i}), y^({i})) = ({x_i}, {y_i})\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Plotting the data\n",
|
||||
"First, let's run the cell below to import [matplotlib](http://matplotlib.org), which is a famous library to plot graphs in Python. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import matplotlib.pyplot as plt"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You can plot these two points using the `scatter()` function in the `matplotlib` library, as shown in the cell below. \n",
|
||||
"- The function arguments `marker` and `c` show the points as red crosses (the default is blue dots).\n",
|
||||
"\n",
|
||||
"You can also use other functions in the `matplotlib` library to display the title and labels for the axes."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Plot the data points\n",
|
||||
"plt.scatter(X, y, marker='x', c='r')\n",
|
||||
"\n",
|
||||
"# Set the title\n",
|
||||
"plt.title(\"Housing Prices\")\n",
|
||||
"# Set the y-axis label\n",
|
||||
"plt.ylabel('Price (in 1000s of dollars)')\n",
|
||||
"# Set the x-axis label\n",
|
||||
"plt.xlabel('Size (feet^2)')\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Model function\n",
|
||||
"\n",
|
||||
"The model function for linear regression (which is a function that maps from `X` to `y`) is represented as \n",
|
||||
"\n",
|
||||
"$f(x) = w_0 + w_1x$\n",
|
||||
"\n",
|
||||
"The formula above is how you can represent straight lines - different values of $w_0$ and $w_1$ give you different straight lines on the plot. Let's try to get a better intuition for this through the code blocks below.\n",
|
||||
"\n",
|
||||
"Let's represent $w$ as a list in python, with $w_0$ as the first item in the list and $w_1$ as the second. \n",
|
||||
"\n",
|
||||
"Let's start with $w_0 = 3$ and $w_1 = 1$ \n",
|
||||
"\n",
|
||||
"### Note: You can come back to this cell to adjust the model's w0 and w1 parameters"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# You can come back here later to adjust w0 and w1\n",
|
||||
"w = [3, 1] \n",
|
||||
"print(\"w_0:\", w[0])\n",
|
||||
"print(\"w_1:\", w[1])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now, let's calculate the value of $f(x)$ for your two data points. You can explicitly write this out for each data point as - \n",
|
||||
"\n",
|
||||
"for $x^0$, `f = w[0]+w[1]*X[0]`\n",
|
||||
"\n",
|
||||
"for $x^1$, `f = w[0]+w[1]*X[1]`\n",
|
||||
"\n",
|
||||
"For a large number of data points, this can get unwieldy and repetitive. So instead, you can calculate the function output in a `for` loop as follows - \n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"f = []\n",
|
||||
"for i in range(len(X)):\n",
|
||||
" f_x = w[0] + w[1]*X[i]\n",
|
||||
" f.append(f_x)\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"Paste the code shown above in the `calculate_model_output` function below.\n",
|
||||
"Please recall that in Python, indentation is significant. Incorrect indentation may result in a Python error message."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def calculate_model_output(w, X):\n",
|
||||
" ### START CODE HERE ### \n",
|
||||
"\n",
|
||||
" ### END CODE HERE ###\n",
|
||||
" return f"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now let's call the `calculate_model_output` function and plot the output using the `plot` method from `matplotlib` library."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"f = calculate_model_output(w, X)\n",
|
||||
"\n",
|
||||
"# Plot our model prediction\n",
|
||||
"plt.plot(X, f, c='b',label='Our Prediction')\n",
|
||||
"\n",
|
||||
"# Plot the data points\n",
|
||||
"plt.scatter(X, y, marker='x', c='r',label='Actual Values')\n",
|
||||
"\n",
|
||||
"# Set the title\n",
|
||||
"plt.title(\"Housing Prices\")\n",
|
||||
"# Set the y-axis label\n",
|
||||
"plt.ylabel('Price (in 1000s of dollars)')\n",
|
||||
"# Set the x-axis label\n",
|
||||
"plt.xlabel('Size (feet^2)')\n",
|
||||
"plt.legend()\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"As you can see, setting $w_0 = 3$ and $w_1 = 1$ does not result in a line that fits our data. \n",
|
||||
"\n",
|
||||
"### Challenge\n",
|
||||
"Try experimenting with different values of $w_0$ and $w_1$. What should the values be for getting a line that fits our data?\n",
|
||||
"\n",
|
||||
"#### Tip:\n",
|
||||
"You can use your mouse to click on the triangle to the left of the green \"Hints\" below to reveal some hints for choosing w0 and w1."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<details>\n",
|
||||
"<summary>\n",
|
||||
" <font size='3', color='darkgreen'><b>Hints</b></font>\n",
|
||||
"</summary>\n",
|
||||
" <p>\n",
|
||||
" <ul>\n",
|
||||
" <li>Try w0 = 1 and w1 = 0.5, w = [1, 0.5] </li>\n",
|
||||
" <li>Try w0 = 0 and w1 = 0.2, w = [0, 0.2] </li>\n",
|
||||
" </ul>\n",
|
||||
" </p>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Prediction\n",
|
||||
"Now that we have a model, we can use it to make our original prediction. Write the expression to predict the price of a house with 1200 feet^2. You can check your answer below.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"\n",
|
||||
"print(f\"{cost_1200sqft:.0f} thousand dollars\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<details>\n",
|
||||
"<summary>\n",
|
||||
" <font size='3', color='darkgreen'><b>Answer</b></font> \n",
|
||||
"</summary> \n",
|
||||
"\n",
|
||||
"```\n",
|
||||
" w = [0, 0.2] \n",
|
||||
" cost_1200sqft = w[0] + w[1]*1200\n",
|
||||
" ```\n",
|
||||
"\n",
|
||||
"240 thousand dollars"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.8.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
@ -0,0 +1,689 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Optional Lab: Python, NumPy and Vectorization\n",
|
||||
"A brief introduction to some of the scientific computing used in this course. In particular the NumPy scientific computing package and its use with python.\n",
|
||||
"\n",
|
||||
"# Outline\n",
|
||||
"- [ 1.1 Goals](#toc_40015_1.1)\n",
|
||||
"- [ 1.2 Useful References](#toc_40015_1.2)\n",
|
||||
"- [2 Python and NumPy <a name='Python and NumPy'></a>](#toc_40015_2)\n",
|
||||
"- [3 Vectors](#toc_40015_3)\n",
|
||||
"- [ 3.1 Abstract](#toc_40015_3.1)\n",
|
||||
"- [ 3.2 NumPy Arrays](#toc_40015_3.2)\n",
|
||||
"- [ 3.3 Vector Creation](#toc_40015_3.3)\n",
|
||||
"- [ 3.4 Operations on Vectors](#toc_40015_3.4)\n",
|
||||
"- [4 Matrices](#toc_40015_4)\n",
|
||||
"- [ 4.1 Abstract](#toc_40015_4.1)\n",
|
||||
"- [ 4.2 NumPy Arrays](#toc_40015_4.2)\n",
|
||||
"- [ 4.3 Matrix Creation](#toc_40015_4.3)\n",
|
||||
"- [ 4.4 Operations on Matrices](#toc_40015_4.4)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import numpy as np # it is an unofficial standard to use np for numpy\n",
|
||||
"import sys\n",
|
||||
"import numpy.random as rand"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_1.1\"></a>\n",
|
||||
"## 1.1 Goals\n",
|
||||
"In this lab you will:\n",
|
||||
"- Review the features of NumPy and Python that are used in Course 1"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_1.2\"></a>\n",
|
||||
"## 1.2 Useful References\n",
|
||||
"- NumPy Documentation including a basic introduction: [NumPy.org](https://NumPy.org/doc/stable/)\n",
|
||||
"- A challenging feature topic: [NumPy Broadcasting](https://NumPy.org/doc/stable/user/basics.broadcasting.html)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_2\"></a>\n",
|
||||
"# 2 Python and NumPy <a name='Python and NumPy'></a>\n",
|
||||
"Python is the programming language we will be using in this course. It has built-in, a set of numeric data types and arithmetic operations. NumPy is a library that extends the base capabilities of python to add a richer data set including more numeric types, vectors, matrices, and many matrix functions. NumPy and python work together fairly seamlessly. Python arithmetic operators work on NumPy data types and many NumPy functions will accept python data types.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3\"></a>\n",
|
||||
"# 3 Vectors\n",
|
||||
"<a name=\"toc_40015_3.1\"></a>\n",
|
||||
"## 3.1 Abstract\n",
|
||||
"<img align=\"right\" src=\"./images/C1_W2_Lab04_Vectors.PNG\" style=\"width:340px;\" >Vectors, as you will use them in this course, are ordered arrays of numbers. In notation, vectors are denoted with lower case bold letters such as $\\mathbf{x}$. The elements of a vector are all the same type. A vector does not, for example, contain both characters and numbers. The number of elements in the array is often referred to as the *dimension* though mathematicians may prefer *rank*. The vector shown has a dimension of $n$. The elements of a vector can be referenced with an index. In math settings, indexes typically run from 1 to n. In computer science and these labs, indexing will typically run from 0 to n-1. In notation, elements of a vector, when referenced individually will indicate the index in a subscript, for example, the $0^{th}$ element, of the vector $\\mathbf{x}$ is $x_0$. Note, the x is not bold in this case. \n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.2\"></a>\n",
|
||||
"## 3.2 NumPy Arrays\n",
|
||||
"\n",
|
||||
"NumPy's basic data structure is an indexable, n-dimensional *array* containing elements of the same type (`dtype`). Right away, you may notice we have overloaded the term 'dimension'. Above, it was the number of elements in the vector, here, dimension refers to the number of indexes of an array. A one-dimensional or 1-D array has one index. In Course 1, we will represent vectors as NumPy 1-D arrays. \n",
|
||||
"\n",
|
||||
" - 1-D array, shape (n,): n elements indexed [0] through [n-1]\n",
|
||||
" "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.3\"></a>\n",
|
||||
"## 3.3 Vector Creation\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Data creation routines in NumPy will generally have a first parameter which is the shape of the object. This can either be a single value for a 1-D result or a tuple (n,m,...) specifying the shape of the result. Below are examples of creating vectors using these routines."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# NumPy routines which allocate memory and fill arrays with value\n",
|
||||
"a = np.zeros(4); print(f\"np.zeros(4) : a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")\n",
|
||||
"a = np.zeros((4,)); print(f\"np.zeros(4,) : a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")\n",
|
||||
"a = np.random.random_sample(4); print(f\"np.random.random_sample(4): a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Some data creation routines do not take a shape tuple:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# NumPy routines which allocate memory and fill arrays with value but do not accept shape as input argument\n",
|
||||
"a = np.arange(4.); print(f\"np.arange(4.): a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")\n",
|
||||
"a = np.random.rand(4); print(f\"np.random.rand(4): a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"values can be specified manually as well. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# NumPy routines which allocate memory and fill with user specified values\n",
|
||||
"a = np.array([5,4,3,2]); print(f\"np.array([5,4,3,2]): a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")\n",
|
||||
"a = np.array([5.,4,3,2]); print(f\"np.array([5.,4,3,2]): a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"These have all created a one-dimensional vector `a` with four elements. `a.shape` returns the dimensions. Here we see a.shape = `(4,)` indicating a 1-d array with 4 elements. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.4\"></a>\n",
|
||||
"## 3.4 Operations on Vectors\n",
|
||||
"Let's explore some operations using vectors.\n",
|
||||
"<a name=\"toc_40015_3.4.1\"></a>\n",
|
||||
"### 3.4.1 Indexing\n",
|
||||
"Elements of vectors can be accessed via indexing and slicing. NumPy provides a very complete set of indexing and slicing capabilities. We will explore only the basics needed for the course here. Reference [Slicing and Indexing](https://NumPy.org/doc/stable/reference/arrays.indexing.html) for more details. \n",
|
||||
"**Indexing** means referring to *an element* of an array by its position within the array. \n",
|
||||
"**Slicing** means getting a *subset* of elements from an array based on their indices. \n",
|
||||
"NumPy starts indexing at zero so the 3rd element of an vector $\\mathbf{a}$ is `a[2]`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#vector indexing operations on 1-D vectors\n",
|
||||
"a = np.arange(10)\n",
|
||||
"print(a)\n",
|
||||
"\n",
|
||||
"#access an element\n",
|
||||
"print(f\"a[2].shape: {a[2].shape} a[2] = {a[2]}, Accessing an element returns a scalar\")\n",
|
||||
"\n",
|
||||
"# access the last element, negative indexes count from the end\n",
|
||||
"print(f\"a[-1] = {a[-1]}\")\n",
|
||||
"\n",
|
||||
"#indexs must be within the range of the vector or they will produce and error\n",
|
||||
"try:\n",
|
||||
" c = a[10]\n",
|
||||
"except Exception as e:\n",
|
||||
" print(\"The error message you'll see is:\")\n",
|
||||
" print(e)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.4.2\"></a>\n",
|
||||
"### 3.4.2 Slicing\n",
|
||||
"Slicing creates an array of indices using a set of three values (`start:stop:step`). A subset of values is also valid. Its use is best explained by example:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#vector slicing operations\n",
|
||||
"a = np.arange(10)\n",
|
||||
"print(f\"a = {a}\")\n",
|
||||
"\n",
|
||||
"#access 5 consecutive elements (start:stop:step)\n",
|
||||
"c = a[2:7:1]; print(\"a[2:7:1] = \", c)\n",
|
||||
"\n",
|
||||
"# access 3 elements separated by two \n",
|
||||
"c = a[2:7:2]; print(\"a[2:7:2] = \", c)\n",
|
||||
"\n",
|
||||
"# access all elements index 3 and above\n",
|
||||
"c = a[3:]; print(\"a[3:] = \", c)\n",
|
||||
"\n",
|
||||
"# access all elements below index 3\n",
|
||||
"c = a[:3]; print(\"a[:3] = \", c)\n",
|
||||
"\n",
|
||||
"# access all elements\n",
|
||||
"c = a[:]; print(\"a[:] = \", c)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.4.3\"></a>\n",
|
||||
"### 3.4.3 Single vector operations\n",
|
||||
"There are a number of useful operations that involve operations on a single vector."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"a = np.array([1,2,3,4])\n",
|
||||
"print(f\"a : {a}\")\n",
|
||||
"# negate elements of a\n",
|
||||
"b = -a \n",
|
||||
"print(f\"b = -a : {b}\")\n",
|
||||
"\n",
|
||||
"# sum all elements of a, returns a scalar\n",
|
||||
"b = np.sum(a) \n",
|
||||
"print(f\"b = np.sum(a) : {b}\")\n",
|
||||
"\n",
|
||||
"b = np.mean(a)\n",
|
||||
"print(f\"b = np.mean(a): {b}\")\n",
|
||||
"\n",
|
||||
"b = a**2\n",
|
||||
"print(f\"b = a**2 : {b}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.4.4\"></a>\n",
|
||||
"### 3.4.4 Vector Vector element-wise operations\n",
|
||||
"Most of the NumPy arithmetic, logical and comparison operations apply to vectors as well. These operators work on an element-by-element basis. For example \n",
|
||||
"$$ \\mathbf{a} + \\mathbf{b} = \\sum_{i=0}^{n-1} a_i + b_i $$"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"a = np.array([ 1, 2, 3, 4])\n",
|
||||
"b = np.array([-1,-2, 3, 4])\n",
|
||||
"print(f\"Binary operators work element wise: {a + b}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Of course, for this to work correctly, the vectors must be of the same size:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#try a mismatched vector operation\n",
|
||||
"c = np.array([1, 2])\n",
|
||||
"try:\n",
|
||||
" d = a + c\n",
|
||||
"except Exception as e:\n",
|
||||
" print(\"The error message you'll see is:\")\n",
|
||||
" print(e)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.4.5\"></a>\n",
|
||||
"### 3.4.5 Scalar Vector operations\n",
|
||||
"Vectors can be 'scaled' by scalar values. A scalar value is just a number. The scalar multiplies all the elements of the vector."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"a = np.array([1, 2, 3, 4])\n",
|
||||
"\n",
|
||||
"# multiply a by a scalar\n",
|
||||
"b = 5 * a \n",
|
||||
"print(f\"b = 5 * a : {b}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.4.6\"></a>\n",
|
||||
"### 3.4.6 Vector Vector dot product\n",
|
||||
"The dot product is a mainstay of Linear Algebra and NumPy. This is an operation used extensively in this course and should be well understood. The dot product is shown below."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<img src=\"./images/C1_W2_Lab04_dot_notrans.gif\" width=800> "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The dot product multiplies the values in two vectors element-wise and then sums the result.\n",
|
||||
"Vector dot product requires the dimensions of the two vectors to be the same. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's implement our own version of the dot product below:\n",
|
||||
"\n",
|
||||
"**Using a for loop**, implement a function which returns the dot product of two vectors. The function to return given inputs $a$ and $b$:\n",
|
||||
"$$ x = \\sum_{i=0}^{n-1} a_i b_i $$\n",
|
||||
"Assume both `a` and `b` are the same shape."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def my_dot(a, b): \n",
|
||||
" \"\"\"\n",
|
||||
" Compute the dot product of two vectors\n",
|
||||
" \n",
|
||||
" Args:\n",
|
||||
" a (ndarray): Shape (n,) input vector \n",
|
||||
" b (ndarray): Shape (n,) input vector with same dimension as a\n",
|
||||
" \n",
|
||||
" Returns:\n",
|
||||
" x (scalar): \n",
|
||||
" \"\"\"\n",
|
||||
" x=0\n",
|
||||
" a_shape = a.shape\n",
|
||||
" for i in range(a.shape[0]):\n",
|
||||
" x = x + a[i] * b[i]\n",
|
||||
" return (x)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# test 1-D\n",
|
||||
"a = np.array([1, 2, 3, 4])\n",
|
||||
"b = np.array([-1, 4, 3, 2])\n",
|
||||
"print(f\"my_dot(a, b) = {my_dot(a, b)}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Note, the dot product is expected to return a scalar value. \n",
|
||||
"\n",
|
||||
"Let's try the same operations using `np.dot`. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# test 1-D\n",
|
||||
"a = np.array([1, 2, 3, 4])\n",
|
||||
"b = np.array([-1, 4, 3, 2])\n",
|
||||
"c = np.dot(a, b)\n",
|
||||
"print(f\"NumPy 1-D np.dot(a, b) = {c}, np.dot(a, b).shape = {c.shape} \") \n",
|
||||
"c = np.dot(b, a)\n",
|
||||
"print(f\"NumPy 1-D np.dot(b, a) = {c}, np.dot(a, b).shape = {c.shape} \")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Above, you will note that the results for 1-D matched our implementation."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.4.7\"></a>\n",
|
||||
"### 3.4.7 The Need for Speed: vector vs for loop\n",
|
||||
"We utilized the NumPy library because it improves speed memory efficiency. Let's demonstrate:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import time\n",
|
||||
"np.random.seed(1)\n",
|
||||
"a = np.random.rand(1000000) # very large arrays\n",
|
||||
"b = np.random.rand(1000000)\n",
|
||||
"\n",
|
||||
"tic = time.time() # capture start time\n",
|
||||
"c = np.dot(a, b)\n",
|
||||
"toc = time.time() # capture end time\n",
|
||||
"\n",
|
||||
"print(f\"np.dot(a, b) = {c:.4f}\")\n",
|
||||
"print(f\"Vectorized version duration: {1000*(toc-tic):.4f} ms \")\n",
|
||||
"\n",
|
||||
"tic = time.time() # capture start time\n",
|
||||
"c = my_dot(a,b)\n",
|
||||
"toc = time.time() # capture end time\n",
|
||||
"\n",
|
||||
"print(f\"my_dot(a, b) = {c:.4f}\")\n",
|
||||
"print(f\"loop version duration: {1000*(toc-tic):.4f} ms \")\n",
|
||||
"\n",
|
||||
"del(a);del(b) #remove these big arrays from memory"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"So, vectorization provides more than 100x speed up in this example! This is because NumPy makes better use of available data parallelism in the underlying hardware. GPU's and modern CPU's implement Single Instruction, Multiple Data (SIMD) pipelines allowing multiple operations to be issued in parallel. This is critical in Machine Learning where the data sets are often very large."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_4\"></a>\n",
|
||||
"# 4 Matrices\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_4.1\"></a>\n",
|
||||
"## 4.1 Abstract\n",
|
||||
"Matrices, are two dimensional arrays. The elements of a matrix are all of the same type. In notation, matrices are denoted with capitol, bold letter such as $\\mathbf{X}$. In this and other labs, `m` is often the number of rows and `n` the number of columns. The elements of a matrix can be referenced with a two dimensional index. In math settings, numbers in the index typically run from 1 to n. In computer science and these labs, indexing will run from 0 to n-1. \n",
|
||||
"<figure>\n",
|
||||
" <center> <img src=\"./images/C1_W2_Lab04_Matrices.PNG\" alt='missing' width=900><center/>\n",
|
||||
" <figcaption> Generic Matrix Notation, 1st index is row, 2nd is column </figcaption>\n",
|
||||
"<figure/>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_4.2\"></a>\n",
|
||||
"## 4.2 NumPy Arrays\n",
|
||||
"\n",
|
||||
"NumPy's basic data structure is an indexable, n-dimensional *array* containing elements of the same type (`dtype`). These were described earlier. Matrices have a two-dimensional (2-D) index [m,n].\n",
|
||||
"\n",
|
||||
"In Course 1, 2-D matrices are used to hold training data. Training data is $m$ examples by $n$ features creating an (m,n) array. Course 1 does not do operations directly on matrices but typically extracts an example as a vector and operates on that. Below you will review: \n",
|
||||
"- data creation\n",
|
||||
"- slicing and indexing"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_4.3\"></a>\n",
|
||||
"## 4.3 Matrix Creation\n",
|
||||
"The same functions that created 1-D vectors will create 2-D or n-D arrays. Here are some examples\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Below, the shape tuple is provided to achieve a 2-D result. Notice how NumPy uses brackets to denote each dimension. Notice further than NumPy, when printing, will print one row per line.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"a = np.zeros((1, 5)) \n",
|
||||
"print(f\"a shape = {a.shape}, a = {a}\") \n",
|
||||
"\n",
|
||||
"a = np.zeros((2, 1)) \n",
|
||||
"print(f\"a shape = {a.shape}, a = {a}\") \n",
|
||||
"\n",
|
||||
"a = np.random.random_sample((1, 1)) \n",
|
||||
"print(f\"a shape = {a.shape}, a = {a}\") "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"One can also manually specify data. Dimensions are specified with additional brackets matching the format in the printing above."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# NumPy routines which allocate memory and fill with user specified values\n",
|
||||
"a = np.array([[5], [4], [3]]); print(f\" a shape = {a.shape}, np.array: a = {a}\")\n",
|
||||
"a = np.array([[5], # One can also\n",
|
||||
" [4], # separate values\n",
|
||||
" [3]]); #into separate rows\n",
|
||||
"print(f\" a shape = {a.shape}, np.array: a = {a}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_4.4\"></a>\n",
|
||||
"## 4.4 Operations on Matrices\n",
|
||||
"Let's explore some operations using matrices."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_4.4.1\"></a>\n",
|
||||
"### 4.4.1 Indexing\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Matrices include a second index. The two indexes describe [row, column]. Access can either return an element or a row/column. See below:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#vector indexing operations on matrices\n",
|
||||
"a = np.arange(6).reshape(-1, 2) #reshape is a convenient way to create matrices\n",
|
||||
"print(f\"a.shape: {a.shape}, \\na= {a}\")\n",
|
||||
"\n",
|
||||
"#access an element\n",
|
||||
"print(f\"\\na[2,0].shape: {a[2, 0].shape}, a[2,0] = {a[2, 0]}, type(a[2,0]) = {type(a[2, 0])} Accessing an element returns a scalar\\n\")\n",
|
||||
"\n",
|
||||
"#access a row\n",
|
||||
"print(f\"a[2].shape: {a[2].shape}, a[2] = {a[2]}, type(a[2]) = {type(a[2])}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"It is worth drawing attention to the last example. Accessing a matrix by just specifying the row will return a *1-D vector*."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_4.4.2\"></a>\n",
|
||||
"### 4.4.2 Slicing\n",
|
||||
"Slicing creates an array of indices using a set of three values (`start:stop:step`). A subset of values is also valid. Its use is best explained by example:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#vector 2-D slicing operations\n",
|
||||
"a = np.arange(20).reshape(-1, 10)\n",
|
||||
"print(f\"a = \\n{a}\")\n",
|
||||
"\n",
|
||||
"#access 5 consecutive elements (start:stop:step)\n",
|
||||
"print(\"a[0, 2:7:1] = \", a[0, 2:7:1], \", a[0, 2:7:1].shape =\", a[0, 2:7:1].shape, \"a 1-D array\")\n",
|
||||
"\n",
|
||||
"#access 5 consecutive elements (start:stop:step) in two rows\n",
|
||||
"print(\"a[:, 2:7:1] = \\n\", a[:, 2:7:1], \", a[:, 2:7:1].shape =\", a[:, 2:7:1].shape, \"a 2-D array\")\n",
|
||||
"\n",
|
||||
"# access all elements\n",
|
||||
"print(\"a[:,:] = \\n\", a[:,:], \", a[:,:].shape =\", a[:,:].shape)\n",
|
||||
"\n",
|
||||
"# access all elements in one row (very common usage)\n",
|
||||
"print(\"a[1,:] = \", a[1,:], \", a[1,:].shape =\", a[1,:].shape, \"a 1-D array\")\n",
|
||||
"# same as\n",
|
||||
"print(\"a[1] = \", a[1], \", a[1].shape =\", a[1].shape, \"a 1-D array\")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_5.0\"></a>\n",
|
||||
"## Congratulations!\n",
|
||||
"In this lab you mastered the features of Python and NumPy that are needed for Course 1."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"dl_toc_settings": {
|
||||
"rndtag": "40015"
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.6"
|
||||
},
|
||||
"toc-autonumbering": false
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
@ -0,0 +1,310 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Ungraded Lab: Cost Function \n",
|
||||
"\n",
|
||||
"In this ungraded lab, you will implement the `cost` function for linear regression with one variable. The term 'cost' in this assignment might be a little confusing since the data is housing cost. Here, cost is a measure how well our model is predicting the actual value of the house. We will use the term 'price' for the data.\n",
|
||||
"\n",
|
||||
"First, let's run the cell below to import [matplotlib](http://matplotlib.org), which is a famous library to plot graphs in Python. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import matplotlib.pyplot as plt"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Problem Statement\n",
|
||||
"\n",
|
||||
"Let's use the same two data points as before - a house with 1000 square feet sold for \\\\$200,000 and a house with 2000 square feet sold for \\\\$400,000.\n",
|
||||
"\n",
|
||||
"That is our dataset contains has the following two points - \n",
|
||||
"\n",
|
||||
"| Size (feet$^2$) | Price (1000s of dollars) |\n",
|
||||
"| -------------------| ------------------------ |\n",
|
||||
"| 1000 | 200 |\n",
|
||||
"| 2000 | 400 |\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# X_train is the input features, in this case (size in square feet)\n",
|
||||
"# y_train is the actual value (price in 1000s of dollars)\n",
|
||||
"X_train = [1000, 2000] \n",
|
||||
"y_train = [200, 400]"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# routine to plot the data points\n",
|
||||
"def plt_house(X, y,f_w=None):\n",
|
||||
" plt.scatter(X, y, marker='x', c='r', label=\"Actual Value\")\n",
|
||||
"\n",
|
||||
" # Set the title\n",
|
||||
" plt.title(\"Housing Prices\")\n",
|
||||
" # Set the y-axis label\n",
|
||||
" plt.ylabel('Price (in 1000s of dollars)')\n",
|
||||
" # Set the x-axis label\n",
|
||||
" plt.xlabel('Size (feet^2)')\n",
|
||||
" # print predictions\n",
|
||||
" if f_w != None:\n",
|
||||
" plt.plot(X, f_w, c='b', label=\"Our Prediction\")\n",
|
||||
" plt.legend()\n",
|
||||
" plt.show()\n",
|
||||
" \n",
|
||||
"plt_house(X_train,y_train)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Computing Cost\n",
|
||||
"\n",
|
||||
"The cost is:\n",
|
||||
" $$J(\\mathbf{w}) = \\frac{1}{2m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w}}(\\mathbf{x}^{(i)}) - y^{(i)})^2$$ \n",
|
||||
" \n",
|
||||
"where \n",
|
||||
" $$f_{\\mathbf{w}}(\\mathbf{x}^{(i)}) = w_0x_0^{(i)} + w_1x_1^{(i)} \\tag{1}$$\n",
|
||||
" \n",
|
||||
"- $f_{\\mathbf{w}}(\\mathbf{x}^{(i)})$ is our prediction for example $i$ using our parameters $\\mathbf{w}$. \n",
|
||||
"- $(f_{\\mathbf{w}}(\\mathbf{x}^{(i)}) -y^{(i)})^2$ is the squared difference between the actual value and our prediction. \n",
|
||||
"- These differences are summed over all the $m$ examples and averaged to produce the cost, $J(\\mathbf{w})$. \n",
|
||||
"Note, in lecture summation ranges are typically from 1 to m while in code, we will run 0 to m-1."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<details>\n",
|
||||
"<summary>\n",
|
||||
" <font size='3', color='darkgreen'><b>Hints</b></font>\n",
|
||||
"</summary>\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"#Function to calculate the cost\n",
|
||||
"def compute_cost(X, y, w):\n",
|
||||
" \n",
|
||||
" m = len(X)\n",
|
||||
" cost = 0\n",
|
||||
" \n",
|
||||
" for i in range(m):\n",
|
||||
" \n",
|
||||
" # Calculate the model prediction\n",
|
||||
" f_w = w[0] + w[1]*X[i]\n",
|
||||
" \n",
|
||||
" # Calculate the cost\n",
|
||||
" cost = cost + (f_w - y[i])**2\n",
|
||||
" \n",
|
||||
" total_cost = 1/(2*m) * cost\n",
|
||||
"\n",
|
||||
" return total_cost\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#Function to calculate the cost\n",
|
||||
"def compute_cost(X, y, w):\n",
|
||||
" \n",
|
||||
" m = len(X)\n",
|
||||
" cost = 0\n",
|
||||
" \n",
|
||||
" for i in range(m):\n",
|
||||
" ### START CODE HERE ### \n",
|
||||
"\n",
|
||||
" ### END CODE HERE ### \n",
|
||||
" total_cost = 1/(2*m) * cost\n",
|
||||
"\n",
|
||||
" return total_cost"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"w_p = [1, 2] # w0 = w[0], w1 = w[1] \n",
|
||||
"\n",
|
||||
"total_cost = compute_cost(X_train, y_train, w_p)\n",
|
||||
"print(\"Total cost :\", total_cost)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Expected Output**:\n",
|
||||
"```Total cost : 4052700.5```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In the next lab, we will minimise the cost by optimizing our parameters $\\mathbf{w}$ using gradient descent. For now, we can try various values manually. To to keep it simple, we know from the previous lab that $w_0 = 0$ produces a minimum. So, we'll set $w_0$ to zero and vary $w_1$."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Print w1 vs cost to see minimum\n",
|
||||
"\n",
|
||||
"w1_list = [-0.6, -0.4, -0.2, 0, 0.2, 0.4, 0.6]\n",
|
||||
"cost_list = []\n",
|
||||
"\n",
|
||||
"for w1 in w1_list:\n",
|
||||
" w_p = [0, w1]\n",
|
||||
" total_cost = compute_cost(X_train, y_train, w_p)\n",
|
||||
" cost_list.append(total_cost)\n",
|
||||
" \n",
|
||||
"plt.plot(w1_list, cost_list)\n",
|
||||
"plt.title(\"Cost vs w1\")\n",
|
||||
"plt.ylabel('Cost')\n",
|
||||
"plt.xlabel('w1')\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# We can see a global minimum at w1 = 0.2 Therefore, let's try w = [0,0.2] \n",
|
||||
"# to see if that fits the data\n",
|
||||
"w_p = [0, 0.2] # w0 = 0, w1 = 0.2\n",
|
||||
"\n",
|
||||
"total_cost = compute_cost(X_train, y_train,w_p)\n",
|
||||
"print(\"Total cost :\", total_cost)\n",
|
||||
"f_w = [w_p[0] + w_p[1]*X_train[0], w_p[0] + w_p[1]*X_train[1]]\n",
|
||||
"plt_house(X_train, y_train, f_w)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"We can see how our cost varies as we modify both $w_0$ and $w_1$ by plotting in 3D or in contour plots."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from mpl_toolkits.mplot3d import axes3d\n",
|
||||
"import matplotlib.pyplot as plt\n",
|
||||
"import numpy as np\n",
|
||||
"\n",
|
||||
"w0 = np.arange(-500, 500, 5)\n",
|
||||
"w1 = np.arange(-0.2, 0.8, 0.005)\n",
|
||||
"w0,w1 = np.meshgrid(w0,w1)\n",
|
||||
"z=np.zeros_like(w0)\n",
|
||||
"n,_ = w0.shape\n",
|
||||
"for i in range(n):\n",
|
||||
" for j in range(n):\n",
|
||||
" z[i][j] = compute_cost(X_train, y_train, [w0[i][j],w1[i][j]] )\n",
|
||||
"\n",
|
||||
"fig = plt.figure(figsize=(12,6))\n",
|
||||
"plt.subplots_adjust( wspace=0.5 )\n",
|
||||
"#===============\n",
|
||||
"# First subplot\n",
|
||||
"#===============\n",
|
||||
"# set up the axes for the first plot\n",
|
||||
"ax = fig.add_subplot(1, 2, 1, projection='3d')\n",
|
||||
"ax.plot_surface(w1, w0, z, rstride=8, cstride=8, alpha=0.3)\n",
|
||||
"\n",
|
||||
"ax.set_xlabel('w_1')\n",
|
||||
"ax.set_ylabel('w_0')\n",
|
||||
"ax.set_zlabel('cost')\n",
|
||||
"plt.title('3D plot of cost vs w0, w1')\n",
|
||||
"# Customize the view angle \n",
|
||||
"ax.view_init(elev=20., azim=-65)\n",
|
||||
"\n",
|
||||
"#===============\n",
|
||||
"# Second subplot\n",
|
||||
"#===============\n",
|
||||
"# set up the axes for the second plot\n",
|
||||
"ax = fig.add_subplot(1, 2, 2)\n",
|
||||
"CS = ax.contour(w1, w0, z,[0,50,1000,5000,10000,25000,50000])\n",
|
||||
"plt.clabel(CS, inline=1, fmt='%1.0f', fontsize=10)\n",
|
||||
"plt.title('Contour plot of cost vs (w0,w1)')\n",
|
||||
"\n",
|
||||
"ax.set_xlabel('w_1')\n",
|
||||
"ax.set_ylabel('w_0')\n",
|
||||
"\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<details>\n",
|
||||
"<summary>\n",
|
||||
" <font size='3'><b>Expected graph</b></font>\n",
|
||||
"</summary>\n",
|
||||
" <img src=\"./figures/ThreeD_And_ContourLab3.PNG\" alt=\"Contour Plot\">\n",
|
||||
"<\\details>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.8.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
@ -0,0 +1,648 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Optional Lab: Multiple Variable Linear Regression\n",
|
||||
"\n",
|
||||
"In this lab, you will extend the data structures and previously developed routines to support multiple features. Several routines are updated making the lab appear lengthy, but it makes minor adjustments to previous routines making it quick to review.\n",
|
||||
"# Outline\n",
|
||||
"- [ 1.1 Goals](#toc_15456_1.1)\n",
|
||||
"- [ 1.2 Tools](#toc_15456_1.2)\n",
|
||||
"- [ 1.3 Notation](#toc_15456_1.3)\n",
|
||||
"- [2 Problem Statement](#toc_15456_2)\n",
|
||||
"- [ 2.1 Matrix X containing our examples](#toc_15456_2.1)\n",
|
||||
"- [ 2.2 Parameter vector w, b](#toc_15456_2.2)\n",
|
||||
"- [3 Model Prediction With Multiple Variables](#toc_15456_3)\n",
|
||||
"- [ 3.1 Single Prediction element by element](#toc_15456_3.1)\n",
|
||||
"- [ 3.2 Single Prediction, vector](#toc_15456_3.2)\n",
|
||||
"- [4 Compute Cost With Multiple Variables](#toc_15456_4)\n",
|
||||
"- [5 Gradient Descent With Multiple Variables](#toc_15456_5)\n",
|
||||
"- [ 5.1 Compute Gradient with Multiple Variables](#toc_15456_5.1)\n",
|
||||
"- [ 5.2 Gradient Descent With Multiple Variables](#toc_15456_5.2)\n",
|
||||
"- [6 Congratulations](#toc_15456_6)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_1.1\"></a>\n",
|
||||
"## 1.1 Goals\n",
|
||||
"- Extend our regression model routines to support multiple features\n",
|
||||
" - Extend data structures to support multiple features\n",
|
||||
" - Rewrite prediction, cost and gradient routines to support multiple features\n",
|
||||
" - Utilize NumPy `np.dot` to vectorize their implementations for speed and simplicity"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_1.2\"></a>\n",
|
||||
"## 1.2 Tools\n",
|
||||
"In this lab, we will make use of: \n",
|
||||
"- NumPy, a popular library for scientific computing\n",
|
||||
"- Matplotlib, a popular library for plotting data"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import copy, math\n",
|
||||
"import numpy as np\n",
|
||||
"import matplotlib.pyplot as plt\n",
|
||||
"plt.style.use('./deeplearning.mplstyle')\n",
|
||||
"np.set_printoptions(precision=2) # reduced display precision on numpy arrays"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_1.3\"></a>\n",
|
||||
"## 1.3 Notation\n",
|
||||
"Here is a summary of some of the notation you will encounter, updated for multiple features. \n",
|
||||
"\n",
|
||||
"|General <img width=70/> <br /> Notation <img width=70/> | Description<img width=350/>| Python (if applicable) |\n",
|
||||
"|: ------------|: ------------------------------------------------------------||\n",
|
||||
"| $a$ | scalar, non bold ||\n",
|
||||
"| $\\mathbf{a}$ | vector, bold ||\n",
|
||||
"| $\\mathbf{A}$ | matrix, bold capital ||\n",
|
||||
"| **Regression** | | | |\n",
|
||||
"| $\\mathbf{X}$ | training example matrix | `X_train` | \n",
|
||||
"| $\\mathbf{y}$ | training example targets | `y_train` \n",
|
||||
"| $\\mathbf{x}^{(i)}$, $y^{(i)}$ | $i_{th}$Training Example | `X[i]`, `y[i]`|\n",
|
||||
"| m | number of training examples | `m`|\n",
|
||||
"| n | number of features in each example | `n`|\n",
|
||||
"| $\\mathbf{w}$ | parameter: weight, | `w` |\n",
|
||||
"| $b$ | parameter: bias | `b` | \n",
|
||||
"| $f_{\\mathbf{w},b}(\\mathbf{x}^{(i)})$ | The result of the model evaluation at $\\mathbf{x^{(i)}}$ parameterized by $\\mathbf{w},b$: $f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) = \\mathbf{w} \\cdot \\mathbf{x}^{(i)}+b$ | `f_wb` | \n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_2\"></a>\n",
|
||||
"# 2 Problem Statement\n",
|
||||
"\n",
|
||||
"You will use the motivating example of housing price prediction. The training dataset contains three examples with four features (size, bedrooms, floors and, age) shown in the table below. Note that, unlike the earlier labs, size is in sqft rather than 1000 sqft. This causes an issue, which you will solve in the next lab!\n",
|
||||
"\n",
|
||||
"| Size (sqft) | Number of Bedrooms | Number of floors | Age of Home | Price (1000s dollars) | \n",
|
||||
"| ----------------| ------------------- |----------------- |--------------|-------------- | \n",
|
||||
"| 2104 | 5 | 1 | 45 | 460 | \n",
|
||||
"| 1416 | 3 | 2 | 40 | 232 | \n",
|
||||
"| 852 | 2 | 1 | 35 | 178 | \n",
|
||||
"\n",
|
||||
"You will build a linear regression model using these values so you can then predict the price for other houses. For example, a house with 1200 sqft, 3 bedrooms, 1 floor, 40 years old. \n",
|
||||
"\n",
|
||||
"Please run the following code cell to create your `X_train` and `y_train` variables."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"X_train = np.array([[2104, 5, 1, 45], [1416, 3, 2, 40], [852, 2, 1, 35]])\n",
|
||||
"y_train = np.array([460, 232, 178])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_2.1\"></a>\n",
|
||||
"## 2.1 Matrix X containing our examples\n",
|
||||
"Similar to the table above, examples are stored in a NumPy matrix `X_train`. Each row of the matrix represents one example. When you have $m$ training examples ( $m$ is three in our example), and there are $n$ features (four in our example), $\\mathbf{X}$ is a matrix with dimensions ($m$, $n$) (m rows, n columns).\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"$$\\mathbf{X} = \n",
|
||||
"\\begin{pmatrix}\n",
|
||||
" x^{(0)}_0 & x^{(0)}_1 & \\cdots & x^{(0)}_{n-1} \\\\ \n",
|
||||
" x^{(1)}_0 & x^{(1)}_1 & \\cdots & x^{(1)}_{n-1} \\\\\n",
|
||||
" \\cdots \\\\\n",
|
||||
" x^{(m-1)}_0 & x^{(m-1)}_1 & \\cdots & x^{(m-1)}_{n-1} \n",
|
||||
"\\end{pmatrix}\n",
|
||||
"$$\n",
|
||||
"notation:\n",
|
||||
"- $\\mathbf{x}^{(i)}$ is vector containing example i. $\\mathbf{x}^{(i)}$ $ = (x^{(i)}_0, x^{(i)}_1, \\cdots,x^{(i)}_{n-1})$\n",
|
||||
"- $x^{(i)}_j$ is element j in example i. The superscript in parenthesis indicates the example number while the subscript represents an element. \n",
|
||||
"\n",
|
||||
"Display the input data."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# data is stored in numpy array/matrix\n",
|
||||
"print(f\"X Shape: {X_train.shape}, X Type:{type(X_train)})\")\n",
|
||||
"print(X_train)\n",
|
||||
"print(f\"y Shape: {y_train.shape}, y Type:{type(y_train)})\")\n",
|
||||
"print(y_train)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_2.2\"></a>\n",
|
||||
"## 2.2 Parameter vector w, b\n",
|
||||
"\n",
|
||||
"* $\\mathbf{w}$ is a vector with $n$ elements.\n",
|
||||
" - Each element contains the parameter associated with one feature.\n",
|
||||
" - in our dataset, n is 4.\n",
|
||||
" - notionally, we draw this as a column vector\n",
|
||||
"\n",
|
||||
"$$\\mathbf{w} = \\begin{pmatrix}\n",
|
||||
"w_0 \\\\ \n",
|
||||
"w_1 \\\\\n",
|
||||
"\\cdots\\\\\n",
|
||||
"w_{n-1}\n",
|
||||
"\\end{pmatrix}\n",
|
||||
"$$\n",
|
||||
"* $b$ is a scalar parameter. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"For demonstration, $\\mathbf{w}$ and $b$ will be loaded with some initial selected values that are near the optimal. $\\mathbf{w}$ is a 1-D NumPy vector."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"b_init = 785.1811367994083\n",
|
||||
"w_init = np.array([ 0.39133535, 18.75376741, -53.36032453, -26.42131618])\n",
|
||||
"print(f\"w_init shape: {w_init.shape}, b_init type: {type(b_init)}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_3\"></a>\n",
|
||||
"# 3 Model Prediction With Multiple Variables\n",
|
||||
"The model's prediction with multiple variables is given by the linear model:\n",
|
||||
"\n",
|
||||
"$$ f_{\\mathbf{w},b}(\\mathbf{x}) = w_0x_0 + w_1x_1 +... + w_{n-1}x_{n-1} + b \\tag{1}$$\n",
|
||||
"or in vector notation:\n",
|
||||
"$$ f_{\\mathbf{w},b}(\\mathbf{x}) = \\mathbf{w} \\cdot \\mathbf{x} + b \\tag{2} $$ \n",
|
||||
"where $\\cdot$ is a vector `dot product`\n",
|
||||
"\n",
|
||||
"To demonstrate the dot product, we will implement prediction using (1) and (2)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_3.1\"></a>\n",
|
||||
"## 3.1 Single Prediction element by element\n",
|
||||
"Our previous prediction multiplied one feature value by one parameter and added a bias parameter. A direct extension of our previous implementation of prediction to multiple features would be to implement (1) above using loop over each element, performing the multiply with its parameter and then adding the bias parameter at the end.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def predict_single_loop(x, w, b): \n",
|
||||
" \"\"\"\n",
|
||||
" single predict using linear regression\n",
|
||||
" \n",
|
||||
" Args:\n",
|
||||
" x (ndarray): Shape (n,) example with multiple features\n",
|
||||
" w (ndarray): Shape (n,) model parameters \n",
|
||||
" b (scalar): model parameter \n",
|
||||
" \n",
|
||||
" Returns:\n",
|
||||
" p (scalar): prediction\n",
|
||||
" \"\"\"\n",
|
||||
" n = x.shape[0]\n",
|
||||
" p = 0\n",
|
||||
" for i in range(n):\n",
|
||||
" p_i = x[i] * w[i] \n",
|
||||
" p = p + p_i \n",
|
||||
" p = p + b \n",
|
||||
" return p"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# get a row from our training data\n",
|
||||
"x_vec = X_train[0,:]\n",
|
||||
"print(f\"x_vec shape {x_vec.shape}, x_vec value: {x_vec}\")\n",
|
||||
"\n",
|
||||
"# make a prediction\n",
|
||||
"f_wb = predict_single_loop(x_vec, w_init, b_init)\n",
|
||||
"print(f\"f_wb shape {f_wb.shape}, prediction: {f_wb}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Note the shape of `x_vec`. It is a 1-D NumPy vector with 4 elements, (4,). The result, `f_wb` is a scalar."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_3.2\"></a>\n",
|
||||
"## 3.2 Single Prediction, vector\n",
|
||||
"\n",
|
||||
"Noting that equation (1) above can be implemented using the dot product as in (2) above. We can make use of vector operations to speed up predictions.\n",
|
||||
"\n",
|
||||
"Recall from the Python/Numpy lab that NumPy `np.dot()`[[link](https://numpy.org/doc/stable/reference/generated/numpy.dot.html)] can be used to perform a vector dot product. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def predict(x, w, b): \n",
|
||||
" \"\"\"\n",
|
||||
" single predict using linear regression\n",
|
||||
" Args:\n",
|
||||
" x (ndarray): Shape (n,) example with multiple features\n",
|
||||
" w (ndarray): Shape (n,) model parameters \n",
|
||||
" b (scalar): model parameter \n",
|
||||
" \n",
|
||||
" Returns:\n",
|
||||
" p (scalar): prediction\n",
|
||||
" \"\"\"\n",
|
||||
" p = np.dot(x, w) + b \n",
|
||||
" return p "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# get a row from our training data\n",
|
||||
"x_vec = X_train[0,:]\n",
|
||||
"print(f\"x_vec shape {x_vec.shape}, x_vec value: {x_vec}\")\n",
|
||||
"\n",
|
||||
"# make a prediction\n",
|
||||
"f_wb = predict(x_vec,w_init, b_init)\n",
|
||||
"print(f\"f_wb shape {f_wb.shape}, prediction: {f_wb}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The results and shapes are the same as the previous version which used looping. Going forward, `np.dot` will be used for these operations. The prediction is now a single statement. Most routines will implement it directly rather than calling a separate predict routine."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_4\"></a>\n",
|
||||
"# 4 Compute Cost With Multiple Variables\n",
|
||||
"The equation for the cost function with multiple variables $J(\\mathbf{w},b)$ is:\n",
|
||||
"$$J(\\mathbf{w},b) = \\frac{1}{2m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) - y^{(i)})^2 \\tag{3}$$ \n",
|
||||
"where:\n",
|
||||
"$$ f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) = \\mathbf{w} \\cdot \\mathbf{x}^{(i)} + b \\tag{4} $$ \n",
|
||||
"\n",
|
||||
"\n",
|
||||
"In contrast to previous labs, $\\mathbf{w}$ and $\\mathbf{x}^{(i)}$ are vectors rather than scalars supporting multiple features."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Below is an implementation of equations (3) and (4). Note that this uses a *standard pattern for this course* where a for loop over all `m` examples is used."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def compute_cost(X, y, w, b): \n",
|
||||
" \"\"\"\n",
|
||||
" compute cost\n",
|
||||
" Args:\n",
|
||||
" X (ndarray (m,n)): Data, m examples with n features\n",
|
||||
" y (ndarray (m,)) : target values\n",
|
||||
" w (ndarray (n,)) : model parameters \n",
|
||||
" b (scalar) : model parameter\n",
|
||||
" \n",
|
||||
" Returns:\n",
|
||||
" cost (scalar): cost\n",
|
||||
" \"\"\"\n",
|
||||
" m = X.shape[0]\n",
|
||||
" cost = 0.0\n",
|
||||
" for i in range(m): \n",
|
||||
" f_wb_i = np.dot(X[i], w) + b #(n,)(n,) = scalar (see np.dot)\n",
|
||||
" cost = cost + (f_wb_i - y[i])**2 #scalar\n",
|
||||
" cost = cost / (2 * m) #scalar \n",
|
||||
" return cost"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Compute and display cost using our pre-chosen optimal parameters. \n",
|
||||
"cost = compute_cost(X_train, y_train, w_init, b_init)\n",
|
||||
"print(f'Cost at optimal w : {cost}')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Expected Result**: Cost at optimal w : 1.5578904045996674e-12"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_5\"></a>\n",
|
||||
"# 5 Gradient Descent With Multiple Variables\n",
|
||||
"Gradient descent for multiple variables:\n",
|
||||
"\n",
|
||||
"$$\\begin{align*} \\text{repeat}&\\text{ until convergence:} \\; \\lbrace \\newline\\;\n",
|
||||
"& w_j = w_j - \\alpha \\frac{\\partial J(\\mathbf{w},b)}{\\partial w_j} \\tag{5} \\; & \\text{for j = 0..n-1}\\newline\n",
|
||||
"&b\\ \\ = b - \\alpha \\frac{\\partial J(\\mathbf{w},b)}{\\partial b} \\newline \\rbrace\n",
|
||||
"\\end{align*}$$\n",
|
||||
"\n",
|
||||
"where, n is the number of features, parameters $w_j$, $b$, are updated simultaneously and where \n",
|
||||
"\n",
|
||||
"$$\n",
|
||||
"\\begin{align}\n",
|
||||
"\\frac{\\partial J(\\mathbf{w},b)}{\\partial w_j} &= \\frac{1}{m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) - y^{(i)})x_{j}^{(i)} \\tag{6} \\\\\n",
|
||||
"\\frac{\\partial J(\\mathbf{w},b)}{\\partial b} &= \\frac{1}{m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) - y^{(i)}) \\tag{7}\n",
|
||||
"\\end{align}\n",
|
||||
"$$\n",
|
||||
"* m is the number of training examples in the data set\n",
|
||||
"\n",
|
||||
" \n",
|
||||
"* $f_{\\mathbf{w},b}(\\mathbf{x}^{(i)})$ is the model's prediction, while $y^{(i)}$ is the target value\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_5.1\"></a>\n",
|
||||
"## 5.1 Compute Gradient with Multiple Variables\n",
|
||||
"An implementation for calculating the equations (6) and (7) is below. There are many ways to implement this. In this version, there is an\n",
|
||||
"- outer loop over all m examples. \n",
|
||||
" - $\\frac{\\partial J(\\mathbf{w},b)}{\\partial b}$ for the example can be computed directly and accumulated\n",
|
||||
" - in a second loop over all n features:\n",
|
||||
" - $\\frac{\\partial J(\\mathbf{w},b)}{\\partial w_j}$ is computed for each $w_j$.\n",
|
||||
" "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def compute_gradient(X, y, w, b): \n",
|
||||
" \"\"\"\n",
|
||||
" Computes the gradient for linear regression \n",
|
||||
" Args:\n",
|
||||
" X (ndarray (m,n)): Data, m examples with n features\n",
|
||||
" y (ndarray (m,)) : target values\n",
|
||||
" w (ndarray (n,)) : model parameters \n",
|
||||
" b (scalar) : model parameter\n",
|
||||
" \n",
|
||||
" Returns:\n",
|
||||
" dj_dw (ndarray (n,)): The gradient of the cost w.r.t. the parameters w. \n",
|
||||
" dj_db (scalar): The gradient of the cost w.r.t. the parameter b. \n",
|
||||
" \"\"\"\n",
|
||||
" m,n = X.shape #(number of examples, number of features)\n",
|
||||
" dj_dw = np.zeros((n,))\n",
|
||||
" dj_db = 0.\n",
|
||||
"\n",
|
||||
" for i in range(m): \n",
|
||||
" err = (np.dot(X[i], w) + b) - y[i] \n",
|
||||
" for j in range(n): \n",
|
||||
" dj_dw[j] = dj_dw[j] + err * X[i, j] \n",
|
||||
" dj_db = dj_db + err \n",
|
||||
" dj_dw = dj_dw / m \n",
|
||||
" dj_db = dj_db / m \n",
|
||||
" \n",
|
||||
" return dj_db, dj_dw"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#Compute and display gradient \n",
|
||||
"tmp_dj_db, tmp_dj_dw = compute_gradient(X_train, y_train, w_init, b_init)\n",
|
||||
"print(f'dj_db at initial w,b: {tmp_dj_db}')\n",
|
||||
"print(f'dj_dw at initial w,b: \\n {tmp_dj_dw}')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Expected Result**: \n",
|
||||
"dj_db at initial w,b: -1.6739251122999121e-06 \n",
|
||||
"dj_dw at initial w,b: \n",
|
||||
" [-2.73e-03 -6.27e-06 -2.22e-06 -6.92e-05] "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_5.2\"></a>\n",
|
||||
"## 5.2 Gradient Descent With Multiple Variables\n",
|
||||
"The routine below implements equation (5) above."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def gradient_descent(X, y, w_in, b_in, cost_function, gradient_function, alpha, num_iters): \n",
|
||||
" \"\"\"\n",
|
||||
" Performs batch gradient descent to learn w and b. Updates w and b by taking \n",
|
||||
" num_iters gradient steps with learning rate alpha\n",
|
||||
" \n",
|
||||
" Args:\n",
|
||||
" X (ndarray (m,n)) : Data, m examples with n features\n",
|
||||
" y (ndarray (m,)) : target values\n",
|
||||
" w_in (ndarray (n,)) : initial model parameters \n",
|
||||
" b_in (scalar) : initial model parameter\n",
|
||||
" cost_function : function to compute cost\n",
|
||||
" gradient_function : function to compute the gradient\n",
|
||||
" alpha (float) : Learning rate\n",
|
||||
" num_iters (int) : number of iterations to run gradient descent\n",
|
||||
" \n",
|
||||
" Returns:\n",
|
||||
" w (ndarray (n,)) : Updated values of parameters \n",
|
||||
" b (scalar) : Updated value of parameter \n",
|
||||
" \"\"\"\n",
|
||||
" \n",
|
||||
" # An array to store cost J and w's at each iteration primarily for graphing later\n",
|
||||
" J_history = []\n",
|
||||
" w = copy.deepcopy(w_in) #avoid modifying global w within function\n",
|
||||
" b = b_in\n",
|
||||
" \n",
|
||||
" for i in range(num_iters):\n",
|
||||
"\n",
|
||||
" # Calculate the gradient and update the parameters\n",
|
||||
" dj_db,dj_dw = gradient_function(X, y, w, b) ##None\n",
|
||||
"\n",
|
||||
" # Update Parameters using w, b, alpha and gradient\n",
|
||||
" w = w - alpha * dj_dw ##None\n",
|
||||
" b = b - alpha * dj_db ##None\n",
|
||||
" \n",
|
||||
" # Save cost J at each iteration\n",
|
||||
" if i<100000: # prevent resource exhaustion \n",
|
||||
" J_history.append( cost_function(X, y, w, b))\n",
|
||||
"\n",
|
||||
" # Print cost every at intervals 10 times or as many iterations if < 10\n",
|
||||
" if i% math.ceil(num_iters / 10) == 0:\n",
|
||||
" print(f\"Iteration {i:4d}: Cost {J_history[-1]:8.2f} \")\n",
|
||||
" \n",
|
||||
" return w, b, J_history #return final w,b and J history for graphing"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In the next cell you will test the implementation. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# initialize parameters\n",
|
||||
"initial_w = np.zeros_like(w_init)\n",
|
||||
"initial_b = 0.\n",
|
||||
"# some gradient descent settings\n",
|
||||
"iterations = 1000\n",
|
||||
"alpha = 5.0e-7\n",
|
||||
"# run gradient descent \n",
|
||||
"w_final, b_final, J_hist = gradient_descent(X_train, y_train, initial_w, initial_b,\n",
|
||||
" compute_cost, compute_gradient, \n",
|
||||
" alpha, iterations)\n",
|
||||
"print(f\"b,w found by gradient descent: {b_final:0.2f},{w_final} \")\n",
|
||||
"m,_ = X_train.shape\n",
|
||||
"for i in range(m):\n",
|
||||
" print(f\"prediction: {np.dot(X_train[i], w_final) + b_final:0.2f}, target value: {y_train[i]}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Expected Result**: \n",
|
||||
"b,w found by gradient descent: -0.00,[ 0.2 0. -0.01 -0.07] \n",
|
||||
"prediction: 426.19, target value: 460 \n",
|
||||
"prediction: 286.17, target value: 232 \n",
|
||||
"prediction: 171.47, target value: 178 "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# plot cost versus iteration \n",
|
||||
"fig, (ax1, ax2) = plt.subplots(1, 2, constrained_layout=True, figsize=(12, 4))\n",
|
||||
"ax1.plot(J_hist)\n",
|
||||
"ax2.plot(100 + np.arange(len(J_hist[100:])), J_hist[100:])\n",
|
||||
"ax1.set_title(\"Cost vs. iteration\"); ax2.set_title(\"Cost vs. iteration (tail)\")\n",
|
||||
"ax1.set_ylabel('Cost') ; ax2.set_ylabel('Cost') \n",
|
||||
"ax1.set_xlabel('iteration step') ; ax2.set_xlabel('iteration step') \n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"*These results are not inspiring*! Cost is still declining and our predictions are not very accurate. The next lab will explore how to improve on this."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"<a name=\"toc_15456_6\"></a>\n",
|
||||
"# 6 Congratulations!\n",
|
||||
"In this lab you:\n",
|
||||
"- Redeveloped the routines for linear regression, now with multiple variables.\n",
|
||||
"- Utilized NumPy `np.dot` to vectorize the implementations"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"dl_toc_settings": {
|
||||
"rndtag": "15456"
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.6"
|
||||
},
|
||||
"toc-autonumbering": false
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
@ -0,0 +1,666 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Optional Lab: Feature scaling and Learning Rate (Multi-variable)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Goals\n",
|
||||
"In this lab you will:\n",
|
||||
"- Utilize the multiple variables routines developed in the previous lab\n",
|
||||
"- run Gradient Descent on a data set with multiple features\n",
|
||||
"- explore the impact of the *learning rate alpha* on gradient descent\n",
|
||||
"- improve performance of gradient descent by *feature scaling* using z-score normalization"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Tools\n",
|
||||
"You will utilize the functions developed in the last lab as well as matplotlib and NumPy. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import numpy as np\n",
|
||||
"import matplotlib.pyplot as plt\n",
|
||||
"from lab_utils_multi import load_house_data, run_gradient_descent \n",
|
||||
"from lab_utils_multi import norm_plot, plt_equal_scale, plot_cost_i_w\n",
|
||||
"from lab_utils_common import dlc\n",
|
||||
"np.set_printoptions(precision=2)\n",
|
||||
"plt.style.use('./deeplearning.mplstyle')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Notation\n",
|
||||
"\n",
|
||||
"|General <br /> Notation | Description| Python (if applicable) |\n",
|
||||
"|: ------------|: ------------------------------------------------------------||\n",
|
||||
"| $a$ | scalar, non bold ||\n",
|
||||
"| $\\mathbf{a}$ | vector, bold ||\n",
|
||||
"| $\\mathbf{A}$ | matrix, bold capital ||\n",
|
||||
"| **Regression** | | | |\n",
|
||||
"| $\\mathbf{X}$ | training example maxtrix | `X_train` | \n",
|
||||
"| $\\mathbf{y}$ | training example targets | `y_train` \n",
|
||||
"| $\\mathbf{x}^{(i)}$, $y^{(i)}$ | $i_{th}$Training Example | `X[i]`, `y[i]`|\n",
|
||||
"| m | number of training examples | `m`|\n",
|
||||
"| n | number of features in each example | `n`|\n",
|
||||
"| $\\mathbf{w}$ | parameter: weight, | `w` |\n",
|
||||
"| $b$ | parameter: bias | `b` | \n",
|
||||
"| $f_{\\mathbf{w},b}(\\mathbf{x}^{(i)})$ | The result of the model evaluation at $\\mathbf{x}^{(i)}$ parameterized by $\\mathbf{w},b$: $f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) = \\mathbf{w} \\cdot \\mathbf{x}^{(i)}+b$ | `f_wb` | \n",
|
||||
"|$\\frac{\\partial J(\\mathbf{w},b)}{\\partial w_j}$| the gradient or partial derivative of cost with respect to a parameter $w_j$ |`dj_dw[j]`| \n",
|
||||
"|$\\frac{\\partial J(\\mathbf{w},b)}{\\partial b}$| the gradient or partial derivative of cost with respect to a parameter $b$| `dj_db`|"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Problem Statement\n",
|
||||
"\n",
|
||||
"As in the previous labs, you will use the motivating example of housing price prediction. The training data set contains many examples with 4 features (size, bedrooms, floors and age) shown in the table below. Note, in this lab, the Size feature is in sqft while earlier labs utilized 1000 sqft. This data set is larger than the previous lab.\n",
|
||||
"\n",
|
||||
"We would like to build a linear regression model using these values so we can then predict the price for other houses - say, a house with 1200 sqft, 3 bedrooms, 1 floor, 40 years old. \n",
|
||||
"\n",
|
||||
"## Dataset: \n",
|
||||
"| Size (sqft) | Number of Bedrooms | Number of floors | Age of Home | Price (1000s dollars) | \n",
|
||||
"| ----------------| ------------------- |----------------- |--------------|----------------------- | \n",
|
||||
"| 952 | 2 | 1 | 65 | 271.5 | \n",
|
||||
"| 1244 | 3 | 2 | 64 | 232 | \n",
|
||||
"| 1947 | 3 | 2 | 17 | 509.8 | \n",
|
||||
"| ... | ... | ... | ... | ... |\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# load the dataset\n",
|
||||
"X_train, y_train = load_house_data()\n",
|
||||
"X_features = ['size(sqft)','bedrooms','floors','age']"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's view the dataset and its features by plotting each feature versus price."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"fig,ax=plt.subplots(1, 4, figsize=(12, 3), sharey=True)\n",
|
||||
"for i in range(len(ax)):\n",
|
||||
" ax[i].scatter(X_train[:,i],y_train)\n",
|
||||
" ax[i].set_xlabel(X_features[i])\n",
|
||||
"ax[0].set_ylabel(\"Price (1000's)\")\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Plotting each feature vs. the target, price, provides some indication of which features have the strongest influence on price. Above, increasing size also increases price. Bedrooms and floors don't seem to have a strong impact on price. Newer houses have higher prices than older houses."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_5\"></a>\n",
|
||||
"## Gradient Descent With Multiple Variables\n",
|
||||
"Here are the equations you developed in the last lab on gradient descent for multiple variables.:\n",
|
||||
"\n",
|
||||
"$$\\begin{align*} \\text{repeat}&\\text{ until convergence:} \\; \\lbrace \\newline\\;\n",
|
||||
"& w_j := w_j - \\alpha \\frac{\\partial J(\\mathbf{w},b)}{\\partial w_j} \\tag{1} \\; & \\text{for j = 0..n-1}\\newline\n",
|
||||
"&b\\ \\ := b - \\alpha \\frac{\\partial J(\\mathbf{w},b)}{\\partial b} \\newline \\rbrace\n",
|
||||
"\\end{align*}$$\n",
|
||||
"\n",
|
||||
"where, n is the number of features, parameters $w_j$, $b$, are updated simultaneously and where \n",
|
||||
"\n",
|
||||
"$$\n",
|
||||
"\\begin{align}\n",
|
||||
"\\frac{\\partial J(\\mathbf{w},b)}{\\partial w_j} &= \\frac{1}{m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) - y^{(i)})x_{j}^{(i)} \\tag{2} \\\\\n",
|
||||
"\\frac{\\partial J(\\mathbf{w},b)}{\\partial b} &= \\frac{1}{m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) - y^{(i)}) \\tag{3}\n",
|
||||
"\\end{align}\n",
|
||||
"$$\n",
|
||||
"* m is the number of training examples in the data set\n",
|
||||
"\n",
|
||||
" \n",
|
||||
"* $f_{\\mathbf{w},b}(\\mathbf{x}^{(i)})$ is the model's prediction, while $y^{(i)}$ is the target value\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Learning Rate\n",
|
||||
"<figure>\n",
|
||||
" <img src=\"./images/C1_W2_Lab06_learningrate.PNG\" style=\"width:1200px;\" >\n",
|
||||
"</figure>\n",
|
||||
"The lectures discussed some of the issues related to setting the learning rate $\\alpha$. The learning rate controls the size of the update to the parameters. See equation (1) above. It is shared by all the parameters. \n",
|
||||
"\n",
|
||||
"Let's run gradient descent and try a few settings of $\\alpha$ on our data set"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### $\\alpha$ = 9.9e-7"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#set alpha to 9.9e-7\n",
|
||||
"_, _, hist = run_gradient_descent(X_train, y_train, 10, alpha = 9.9e-7)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"It appears the learning rate is too high. The solution does not converge. Cost is *increasing* rather than decreasing. Let's plot the result:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"plot_cost_i_w(X_train, y_train, hist)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The plot on the right shows the value of one of the parameters, $w_0$. At each iteration, it is overshooting the optimal value and as a result, cost ends up *increasing* rather than approaching the minimum. Note that this is not a completely accurate picture as there are 4 parameters being modified each pass rather than just one. This plot is only showing $w_0$ with the other parameters fixed at benign values. In this and later plots you may notice the blue and orange lines being slightly off."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"### $\\alpha$ = 9e-7\n",
|
||||
"Let's try a bit smaller value and see what happens."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#set alpha to 9e-7\n",
|
||||
"_,_,hist = run_gradient_descent(X_train, y_train, 10, alpha = 9e-7)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Cost is decreasing throughout the run showing that alpha is not too large. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"plot_cost_i_w(X_train, y_train, hist)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"On the left, you see that cost is decreasing as it should. On the right, you can see that $w_0$ is still oscillating around the minimum, but it is decreasing each iteration rather than increasing. Note above that `dj_dw[0]` changes sign with each iteration as `w[0]` jumps over the optimal value.\n",
|
||||
"This alpha value will converge. You can vary the number of iterations to see how it behaves."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### $\\alpha$ = 1e-7\n",
|
||||
"Let's try a bit smaller value for $\\alpha$ and see what happens."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#set alpha to 1e-7\n",
|
||||
"_,_,hist = run_gradient_descent(X_train, y_train, 10, alpha = 1e-7)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Cost is decreasing throughout the run showing that $\\alpha$ is not too large. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"plot_cost_i_w(X_train,y_train,hist)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"On the left, you see that cost is decreasing as it should. On the right you can see that $w_0$ is decreasing without crossing the minimum. Note above that `dj_w0` is negative throughout the run. This solution will also converge, though not quite as quickly as the previous example."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"source": [
|
||||
"## Feature Scaling \n",
|
||||
"<figure>\n",
|
||||
" <img src=\"./images/C1_W2_Lab06_featurescalingheader.PNG\" style=\"width:1200px;\" >\n",
|
||||
"</figure>\n",
|
||||
"The lectures described the importance of rescaling the dataset so the features have a similar range.\n",
|
||||
"If you are interested in the details of why this is the case, click on the 'details' header below. If not, the section below will walk through an implementation of how to do feature scaling."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<details>\n",
|
||||
"<summary>\n",
|
||||
" <font size='3', color='darkgreen'><b>Details</b></font>\n",
|
||||
"</summary>\n",
|
||||
"\n",
|
||||
"Let's look again at the situation with $\\alpha$ = 9e-7. This is pretty close to the maximum value we can set $\\alpha$ to without diverging. This is a short run showing the first few iterations:\n",
|
||||
"\n",
|
||||
"<figure>\n",
|
||||
" <img src=\"./images/C1_W2_Lab06_ShortRun.PNG\" style=\"width:1200px;\" >\n",
|
||||
"</figure>\n",
|
||||
"\n",
|
||||
"Above, while cost is being decreased, its clear that $w_0$ is making more rapid progress than the other parameters due to its much larger gradient.\n",
|
||||
"\n",
|
||||
"The graphic below shows the result of a very long run with $\\alpha$ = 9e-7. This takes several hours.\n",
|
||||
"\n",
|
||||
"<figure>\n",
|
||||
" <img src=\"./images/C1_W2_Lab06_LongRun.PNG\" style=\"width:1200px;\" >\n",
|
||||
"</figure>\n",
|
||||
" \n",
|
||||
"Above, you can see cost decreased slowly after its initial reduction. Notice the difference between `w0` and `w1`,`w2`,`w3` as well as `dj_dw0` and `dj_dw1-3`. `w0` reaches its near final value very quickly and `dj_dw0` has quickly decreased to a small value showing that `w0` is near the final value. The other parameters were reduced much more slowly.\n",
|
||||
"\n",
|
||||
"Why is this? Is there something we can improve? See below:\n",
|
||||
"<figure>\n",
|
||||
" <center> <img src=\"./images/C1_W2_Lab06_scale.PNG\" ></center>\n",
|
||||
"</figure> \n",
|
||||
"\n",
|
||||
"The figure above shows why $w$'s are updated unevenly. \n",
|
||||
"- $\\alpha$ is shared by all parameter updates ($w$'s and $b$).\n",
|
||||
"- the common error term is multiplied by the features for the $w$'s. (not $b$).\n",
|
||||
"- the features vary significantly in magnitude making some features update much faster than others. In this case, $w_0$ is multiplied by 'size(sqft)', which is generally > 1000, while $w_1$ is multiplied by 'number of bedrooms', which is generally 2-4. \n",
|
||||
" \n",
|
||||
"The solution is Feature Scaling."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The lectures discussed three different techniques: \n",
|
||||
"- Feature scaling, essentially dividing each positive feature by its maximum value, or more generally, rescale each feature by both its minimum and maximum values using (x-min)/(max-min). Both ways normalizes features to the range of -1 and 1, where the former method works for positive features which is simple and serves well for the lecture's example, and the latter method works for any features.\n",
|
||||
"- Mean normalization: $x_i := \\dfrac{x_i - \\mu_i}{max - min} $ \n",
|
||||
"- Z-score normalization which we will explore below. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"### z-score normalization \n",
|
||||
"After z-score normalization, all features will have a mean of 0 and a standard deviation of 1.\n",
|
||||
"\n",
|
||||
"To implement z-score normalization, adjust your input values as shown in this formula:\n",
|
||||
"$$x^{(i)}_j = \\dfrac{x^{(i)}_j - \\mu_j}{\\sigma_j} \\tag{4}$$ \n",
|
||||
"where $j$ selects a feature or a column in the $\\mathbf{X}$ matrix. $µ_j$ is the mean of all the values for feature (j) and $\\sigma_j$ is the standard deviation of feature (j).\n",
|
||||
"$$\n",
|
||||
"\\begin{align}\n",
|
||||
"\\mu_j &= \\frac{1}{m} \\sum_{i=0}^{m-1} x^{(i)}_j \\tag{5}\\\\\n",
|
||||
"\\sigma^2_j &= \\frac{1}{m} \\sum_{i=0}^{m-1} (x^{(i)}_j - \\mu_j)^2 \\tag{6}\n",
|
||||
"\\end{align}\n",
|
||||
"$$\n",
|
||||
"\n",
|
||||
">**Implementation Note:** When normalizing the features, it is important\n",
|
||||
"to store the values used for normalization - the mean value and the standard deviation used for the computations. After learning the parameters\n",
|
||||
"from the model, we often want to predict the prices of houses we have not\n",
|
||||
"seen before. Given a new x value (living room area and number of bed-\n",
|
||||
"rooms), we must first normalize x using the mean and standard deviation\n",
|
||||
"that we had previously computed from the training set.\n",
|
||||
"\n",
|
||||
"**Implementation**"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def zscore_normalize_features(X):\n",
|
||||
" \"\"\"\n",
|
||||
" computes X, zcore normalized by column\n",
|
||||
" \n",
|
||||
" Args:\n",
|
||||
" X (ndarray (m,n)) : input data, m examples, n features\n",
|
||||
" \n",
|
||||
" Returns:\n",
|
||||
" X_norm (ndarray (m,n)): input normalized by column\n",
|
||||
" mu (ndarray (n,)) : mean of each feature\n",
|
||||
" sigma (ndarray (n,)) : standard deviation of each feature\n",
|
||||
" \"\"\"\n",
|
||||
" # find the mean of each column/feature\n",
|
||||
" mu = np.mean(X, axis=0) # mu will have shape (n,)\n",
|
||||
" # find the standard deviation of each column/feature\n",
|
||||
" sigma = np.std(X, axis=0) # sigma will have shape (n,)\n",
|
||||
" # element-wise, subtract mu for that column from each example, divide by std for that column\n",
|
||||
" X_norm = (X - mu) / sigma \n",
|
||||
"\n",
|
||||
" return (X_norm, mu, sigma)\n",
|
||||
" \n",
|
||||
"#check our work\n",
|
||||
"#from sklearn.preprocessing import scale\n",
|
||||
"#scale(X_orig, axis=0, with_mean=True, with_std=True, copy=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's look at the steps involved in Z-score normalization. The plot below shows the transformation step by step."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"mu = np.mean(X_train,axis=0) \n",
|
||||
"sigma = np.std(X_train,axis=0) \n",
|
||||
"X_mean = (X_train - mu)\n",
|
||||
"X_norm = (X_train - mu)/sigma \n",
|
||||
"\n",
|
||||
"fig,ax=plt.subplots(1, 3, figsize=(12, 3))\n",
|
||||
"ax[0].scatter(X_train[:,0], X_train[:,3])\n",
|
||||
"ax[0].set_xlabel(X_features[0]); ax[0].set_ylabel(X_features[3]);\n",
|
||||
"ax[0].set_title(\"unnormalized\")\n",
|
||||
"ax[0].axis('equal')\n",
|
||||
"\n",
|
||||
"ax[1].scatter(X_mean[:,0], X_mean[:,3])\n",
|
||||
"ax[1].set_xlabel(X_features[0]); ax[0].set_ylabel(X_features[3]);\n",
|
||||
"ax[1].set_title(r\"X - $\\mu$\")\n",
|
||||
"ax[1].axis('equal')\n",
|
||||
"\n",
|
||||
"ax[2].scatter(X_norm[:,0], X_norm[:,3])\n",
|
||||
"ax[2].set_xlabel(X_features[0]); ax[0].set_ylabel(X_features[3]);\n",
|
||||
"ax[2].set_title(r\"Z-score normalized\")\n",
|
||||
"ax[2].axis('equal')\n",
|
||||
"plt.tight_layout(rect=[0, 0.03, 1, 0.95])\n",
|
||||
"fig.suptitle(\"distribution of features before, during, after normalization\")\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The plot above shows the relationship between two of the training set parameters, \"age\" and \"size(sqft)\". *These are plotted with equal scale*. \n",
|
||||
"- Left: Unnormalized: The range of values or the variance of the 'size(sqft)' feature is much larger than that of age\n",
|
||||
"- Middle: The first step removes the mean or average value from each feature. This leaves features that are centered around zero. It's difficult to see the difference for the 'age' feature, but 'size(sqft)' is clearly around zero.\n",
|
||||
"- Right: The second step divides by the standard deviation. This leaves both features centered at zero with a similar scale."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's normalize the data and compare it to the original data."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# normalize the original features\n",
|
||||
"X_norm, X_mu, X_sigma = zscore_normalize_features(X_train)\n",
|
||||
"print(f\"X_mu = {X_mu}, \\nX_sigma = {X_sigma}\")\n",
|
||||
"print(f\"Peak to Peak range by column in Raw X:{np.ptp(X_train,axis=0)}\") \n",
|
||||
"print(f\"Peak to Peak range by column in Normalized X:{np.ptp(X_norm,axis=0)}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The peak to peak range of each column is reduced from a factor of thousands to a factor of 2-3 by normalization."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"fig,ax=plt.subplots(1, 4, figsize=(12, 3))\n",
|
||||
"for i in range(len(ax)):\n",
|
||||
" norm_plot(ax[i],X_train[:,i],)\n",
|
||||
" ax[i].set_xlabel(X_features[i])\n",
|
||||
"ax[0].set_ylabel(\"count\");\n",
|
||||
"fig.suptitle(\"distribution of features before normalization\")\n",
|
||||
"plt.show()\n",
|
||||
"fig,ax=plt.subplots(1,4,figsize=(12,3))\n",
|
||||
"for i in range(len(ax)):\n",
|
||||
" norm_plot(ax[i],X_norm[:,i],)\n",
|
||||
" ax[i].set_xlabel(X_features[i])\n",
|
||||
"ax[0].set_ylabel(\"count\"); \n",
|
||||
"fig.suptitle(\"distribution of features after normalization\")\n",
|
||||
"\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Notice, above, the range of the normalized data (x-axis) is centered around zero and roughly +/- 2. Most importantly, the range is similar for each feature."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's re-run our gradient descent algorithm with normalized data.\n",
|
||||
"Note the **vastly larger value of alpha**. This will speed up gradient descent."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"w_norm, b_norm, hist = run_gradient_descent(X_norm, y_train, 1000, 1.0e-1, )"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The scaled features get very accurate results **much, much faster!**. Notice the gradient of each parameter is tiny by the end of this fairly short run. A learning rate of 0.1 is a good start for regression with normalized features.\n",
|
||||
"Let's plot our predictions versus the target values. Note, the prediction is made using the normalized feature while the plot is shown using the original feature values."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#predict target using normalized features\n",
|
||||
"m = X_norm.shape[0]\n",
|
||||
"yp = np.zeros(m)\n",
|
||||
"for i in range(m):\n",
|
||||
" yp[i] = np.dot(X_norm[i], w_norm) + b_norm\n",
|
||||
"\n",
|
||||
" # plot predictions and targets versus original features \n",
|
||||
"fig,ax=plt.subplots(1,4,figsize=(12, 3),sharey=True)\n",
|
||||
"for i in range(len(ax)):\n",
|
||||
" ax[i].scatter(X_train[:,i],y_train, label = 'target')\n",
|
||||
" ax[i].set_xlabel(X_features[i])\n",
|
||||
" ax[i].scatter(X_train[:,i],yp,color=dlc[\"dlorange\"], label = 'predict')\n",
|
||||
"ax[0].set_ylabel(\"Price\"); ax[0].legend();\n",
|
||||
"fig.suptitle(\"target versus prediction using z-score normalized model\")\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The results look good. A few points to note:\n",
|
||||
"- with multiple features, we can no longer have a single plot showing results versus features.\n",
|
||||
"- when generating the plot, the normalized features were used. Any predictions using the parameters learned from a normalized training set must also be normalized."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Prediction**\n",
|
||||
"The point of generating our model is to use it to predict housing prices that are not in the data set. Let's predict the price of a house with 1200 sqft, 3 bedrooms, 1 floor, 40 years old. Recall, that you must normalize the data with the mean and standard deviation derived when the training data was normalized. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# First, normalize out example.\n",
|
||||
"x_house = np.array([1200, 3, 1, 40])\n",
|
||||
"x_house_norm = (x_house - X_mu) / X_sigma\n",
|
||||
"print(x_house_norm)\n",
|
||||
"x_house_predict = np.dot(x_house_norm, w_norm) + b_norm\n",
|
||||
"print(f\" predicted price of a house with 1200 sqft, 3 bedrooms, 1 floor, 40 years old = ${x_house_predict*1000:0.0f}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Cost Contours** \n",
|
||||
"<img align=\"left\" src=\"./images/C1_W2_Lab06_contours.PNG\" style=\"width:240px;\" >Another way to view feature scaling is in terms of the cost contours. When feature scales do not match, the plot of cost versus parameters in a contour plot is asymmetric. \n",
|
||||
"\n",
|
||||
"In the plot below, the scale of the parameters is matched. The left plot is the cost contour plot of w[0], the square feet versus w[1], the number of bedrooms before normalizing the features. The plot is so asymmetric, the curves completing the contours are not visible. In contrast, when the features are normalized, the cost contour is much more symmetric. The result is that updates to parameters during gradient descent can make equal progress for each parameter. \n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"plt_equal_scale(X_train, X_norm, y_train)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"## Congratulations!\n",
|
||||
"In this lab you:\n",
|
||||
"- utilized the routines for linear regression with multiple features you developed in previous labs\n",
|
||||
"- explored the impact of the learning rate $\\alpha$ on convergence \n",
|
||||
"- discovered the value of feature scaling using z-score normalization in speeding convergence"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Acknowledgments\n",
|
||||
"The housing data was derived from the [Ames Housing dataset](http://jse.amstat.org/v19n3/decock.pdf) compiled by Dean De Cock for use in data science education."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.6"
|
||||
},
|
||||
"toc-autonumbering": false
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
@ -0,0 +1,772 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Ungraded Lab - Gradient Descent for Linear Regression\n",
|
||||
"\n",
|
||||
"In the previous labs, we determined the optimal values of $w_0$ and $w_1$ manually. In this lab we will automate this process with gradient descent with one variable as described in lecture."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Outline\n",
|
||||
"\n",
|
||||
"- [Exercise 01- Compute Gradient](#ex01)\n",
|
||||
"- [Exercise 02- Checking the Gradient](#ex02)\n",
|
||||
"- [Exercise 03- Learning Parameters with Batch Gradient Descent](#ex-03)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import matplotlib.pyplot as plt\n",
|
||||
"import math \n",
|
||||
"import copy"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Problem Statement\n",
|
||||
"\n",
|
||||
"Let's use the same two data points as before - a house with 1000 square feet sold for \\\\$200,000 and a house with 2000 square feet sold for \\\\$400,000.\n",
|
||||
"\n",
|
||||
"That is our dataset contains has the following two points - \n",
|
||||
"\n",
|
||||
"| Size (feet$^2$) | Price (1000s of dollars) |\n",
|
||||
"| -------------------| ------------------------ |\n",
|
||||
"| 1000 | 200 |\n",
|
||||
"| 2000 | 400 |\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Load our data set\n",
|
||||
"X_train = [1000, 2000] #feature \n",
|
||||
"y_train = [200, 400] #actual value"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"## routine to plot the data points\n",
|
||||
"def plt_house(X, y,f_w=None):\n",
|
||||
" plt.scatter(X, y, marker='x', c='r', label=\"Actual Value\")\n",
|
||||
"\n",
|
||||
" # Set the title\n",
|
||||
" plt.title(\"Housing Prices\")\n",
|
||||
" # Set the y-axis label\n",
|
||||
" plt.ylabel('Price (in 1000s of dollars)')\n",
|
||||
" # Set the x-axis label\n",
|
||||
" plt.xlabel('Size (feet^2)')\n",
|
||||
" # print predictions\n",
|
||||
" if f_w != None:\n",
|
||||
" plt.plot(X, f_w, c='b', label=\"Our Prediction\")\n",
|
||||
" plt.legend()\n",
|
||||
" plt.show()\n",
|
||||
" \n",
|
||||
"plt_house(X_train,y_train)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Compute_Cost\n",
|
||||
"You produced this in the last lab, so this is supplied here for later use"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#Function to calculate the cost\n",
|
||||
"def compute_cost(X, y, w):\n",
|
||||
" \n",
|
||||
" m = len(X)\n",
|
||||
" cost = 0\n",
|
||||
" \n",
|
||||
" for i in range(m):\n",
|
||||
" \n",
|
||||
" # Calculate the model prediction\n",
|
||||
" f_w = w[0] + w[1]*X[i]\n",
|
||||
" \n",
|
||||
" # Calculate the cost\n",
|
||||
" cost = cost + (f_w - y[i])**2\n",
|
||||
"\n",
|
||||
" total_cost = 1/(2*m) * cost\n",
|
||||
"\n",
|
||||
" return total_cost"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Gradient descent summary\n",
|
||||
"So far in this course we have developed a linear model that predicts $f_{\\mathbf{w}}(x)$ based a single input $x$ using trained parameters $w_0$,$w_1$.\n",
|
||||
"$$f_\\mathbf{w}(x)= w_0 + w_1x \\tag{1}$$\n",
|
||||
"In machine learning, we utilize input data to train the parameters $w_0$,$w_1$ by minimizing a measure of the error between our predictions $f_{\\mathbf{w}}(x)$ and the actual data $y$. The measure is called the $cost$, $J(\\mathbf{w})$. In training we measure the cost over all of our training samples $x^{(i)},y^{(i)}$\n",
|
||||
"$$J(\\mathbf{w}) = \\frac{1}{2m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w}}(x^{(i)}) - y^{(i)})^2\\tag{2}$$ \n",
|
||||
"From calculus we know the partial derivitive of the cost relative to one of the parameters tells us how a small change in that parameter $w_j$, or $\\Delta{w_j}$, causes a small change in $J(\\mathbf{w})$, or $\\Delta(J(w)$.\n",
|
||||
"\n",
|
||||
"$$ \\frac{\\partial J(w)}{\\partial w_j} \\approx \\frac{\\Delta{J(w)}}{\\Delta{w_j}}$$\n",
|
||||
"Using that information, we can iteratively make small adjustments to $w_j$ that reduce the value of $J(\\mathbf{w})$. This iterative process is called gradient descent. \n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"In lecture, *gradient descent* was described as:\n",
|
||||
"\n",
|
||||
"$$\\begin{align*}& \\text{repeat until convergence:} \\; \\lbrace \\newline \\; & w_j := w_j - \\alpha \\frac{\\partial J(\\mathbf{w})}{\\partial w_j} \\tag{3} \\; & \\text{for j := 0,1}\\newline & \\rbrace\\end{align*}$$\n",
|
||||
"where, parameters $w_0$, $w_1$ are updated simultaneously. \n",
|
||||
"As in lecture:\n",
|
||||
"$$\n",
|
||||
"\\begin{align}\n",
|
||||
" \\frac{\\partial J(\\mathbf{w})}{\\partial w_0} &:= \\frac{1}{m} \\sum\\limits_{i = 0}^{m-1} (f_{w}(x^{(i)}) - y^{(i)} \\tag{4}\\\\\n",
|
||||
" \\frac{\\partial J(\\mathbf{w})}{\\partial w_1} &:= \\frac{1}{m} \\sum\\limits_{i = 0}^{m-1} (f_{w}(x^{(i)}) - y^{(i)})x^{(i)} \\tag{5}\\\\\n",
|
||||
"\\end{align}\n",
|
||||
"$$"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name='ex01'></a>\n",
|
||||
"## Exercise 01- Compute Gradient\n",
|
||||
"We will implement a batch gradient descent algorithm for one variable. We'll need three functions. \n",
|
||||
"- compute_gradient implementing equation (4) and (5) above\n",
|
||||
"- compute_cost implementing equation (2) above (code from previous lab)\n",
|
||||
"- gradient_descent, utilizing compute_gradient and compute_cost, runs the iterative algorithm to find the parameters with the lowest cost."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## compute_gradient\n",
|
||||
"<a name='ex-01'></a>\n",
|
||||
"Implement `compute_gradient` which will return $\\frac{\\partial J(\\mathbf{w})}{\\partial w}$. A naming convention we will use in code when referring to gradients is to infer the dJ(w) and name variables for the parameter. For example, $\\frac{\\partial J(\\mathbf{w})}{\\partial w_0}$ will be `dw0`.\n",
|
||||
"\n",
|
||||
"Please complete the `compute_gradient` function to:\n",
|
||||
"\n",
|
||||
"- Create a list to store the gradient `dw`. \n",
|
||||
"- Loop over all examples in the training set `m`. \n",
|
||||
" - Inside the loop, calculate the gradient update from each training example:\n",
|
||||
" - Calculate the model prediction `f`\n",
|
||||
" $$\n",
|
||||
" f_\\mathbf{w}(x^{(i)}) = w_0+ w_1x^{(i)} \n",
|
||||
" $$\n",
|
||||
" - Calculate the gradient for $w_0$ and $w_1$\n",
|
||||
" $$\n",
|
||||
"\\begin{align}\n",
|
||||
"\\frac{\\partial{J(w)}}{\\partial{w_0}} &= f_\\mathbf{w}(x^{(i)}) - y^{(i)} \\\\ \n",
|
||||
"\\frac{\\partial{J(w)}}{\\partial{w_1}} &= (f_\\mathbf{w}(x^{(i)}) - y^{(i)})x^{(i)} \\\\\n",
|
||||
"\\end{align} \n",
|
||||
"$$\n",
|
||||
" - Add these gradients to the total gradients `dw`\n",
|
||||
" \n",
|
||||
" - Compute total gradient by dividing by the number of examples `m`.\n",
|
||||
"**Note** that this assignment continues to use python lists rather than the NumPy data structures that will be described in upcoming lectures. This will require writing some expressions 'per element' where later, these could be a single operation. Also note that these routines are specifically for one variable. Later labs and the weekly assignment will use more general cases."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<details>\n",
|
||||
"<summary>\n",
|
||||
" <font size='3', color='darkgreen'><b>Hints</b></font>\n",
|
||||
"</summary>\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"def compute_gradient(X, y, w): \n",
|
||||
" \"\"\"\n",
|
||||
" Computes the gradient for linear regression \n",
|
||||
" \n",
|
||||
" Args:\n",
|
||||
" X : (array_like Shape (m,)) variable such as house size \n",
|
||||
" y : (array_like Shape (m,)) actual value \n",
|
||||
" w : (array_like Shape (2,)) Initial values of parameters of the model \n",
|
||||
" Returns\n",
|
||||
" dw: (array_like Shape (2,)) The gradient of the cost w.r.t. the parameters w. \n",
|
||||
" Note that dw has the same dimensions as w.\n",
|
||||
" \"\"\"\n",
|
||||
" m = len(X)\n",
|
||||
" \n",
|
||||
" dw = [0,0]\n",
|
||||
" for i in range(m): \n",
|
||||
" f = w[0] + w[1]*X[i]\n",
|
||||
" dw0 = f-y[i]\n",
|
||||
" dw1 = (f-y[i])*X[i] \n",
|
||||
" dw[0] = dw[0] + dw0\n",
|
||||
" dw[1] = dw[1] + dw1\n",
|
||||
" dw[0] = (1/m) * dw[0]\n",
|
||||
" dw[1] = (1/m) * dw[1] \n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def compute_gradient(X, y, w): \n",
|
||||
" \"\"\"\n",
|
||||
" Computes the gradient for linear regression \n",
|
||||
" \n",
|
||||
" Args:\n",
|
||||
" X : (array_like Shape (m,)) variable such as house size \n",
|
||||
" y : (array_like Shape (m,)) actual value \n",
|
||||
" w : (array_like Shape (2,)) Initial values of parameters of the model \n",
|
||||
" Returns\n",
|
||||
" dw: (array_like Shape (2,)) The gradient of the cost w.r.t. the parameters w. \n",
|
||||
" Note that dw has the same dimensions as w.\n",
|
||||
" \"\"\"\n",
|
||||
" m = len(X)\n",
|
||||
" dw = [0,0] \n",
|
||||
" ### START CODE HERE ### \n",
|
||||
"\n",
|
||||
" ### END CODE HERE ### \n",
|
||||
" \n",
|
||||
" return dw"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#Compute and display gradient with w initialized to zeroes\n",
|
||||
"initial_w = [0,0]\n",
|
||||
"grad = compute_gradient(X_train, y_train, initial_w)\n",
|
||||
"print('Gradient at initial w (zeros):', grad)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Expected Output**:\n",
|
||||
"```Gradient at initial w (zeros): [-300.0, -500000.0]```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#Now, lets try setting w to a value we know, from previous labs, is the optimal value\n",
|
||||
"initial_w = [0,0.2]\n",
|
||||
"grad = compute_gradient(X_train, y_train, initial_w)\n",
|
||||
"print('Gradient when w is set to optimal values:', grad)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Expected Output**:\n",
|
||||
"```Gradient when w is set to optimal values: [0.0, 0.0]``` \n",
|
||||
"As we expected, the gradient is zero at the \"bottom of the bowl\"."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# one more test case to ensure we are using all the w values.\n",
|
||||
"initial_w = [0.1,0.1]\n",
|
||||
"grad = compute_gradient(X_train, y_train, initial_w)\n",
|
||||
"print('Gradient:', grad)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Expected Output**:\n",
|
||||
"```Gradient: [-149.9, -249850.0]``` "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Checking the gradient\n",
|
||||
"What do these gradient values mean? \n",
|
||||
"If you have taken calculus, you may recall an early lecture describing a derivative as:\n",
|
||||
"$$\\frac{df(x)}{dx} = \\lim_{\\Delta{x} \\to 0} \\frac{f(x+\\Delta{x}) - f(x)}{\\Delta{x}}$$\n",
|
||||
"The derivative then is just a measure of how a small change in x, the $\\Delta{x}$ above, changes $f(x)$.\n",
|
||||
"\n",
|
||||
"Above, we calculated `dw1` or $\\frac{\\partial J(\\mathbf{w})}{\\partial w_1}$ to be -249850.0. That says that when $\\mathbf{w} = [0.1,0.1]$, a small change in $w_1$ will result in a change in the **cost**, $J(\\mathbf{w})$, that is -249850.0 times that change. Note the change in notation from $d$ to $\\partial{}$ just indicates the J has multiple dependencies and that this is a derivative with respect to one of them - a partial derivative.\n",
|
||||
"\n",
|
||||
"We can use this knowledge to perform a simple check of our implementation of the gradient."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name='ex02'></a>\n",
|
||||
"## Exercise 2 \n",
|
||||
"Let's check our gradient descent algorithm by \n",
|
||||
"calculating an approximation to the partial derivative with respect to $w_1$. We can't make $\\Delta{x}$ go to zero as in the equation above, but we can just use a small value: \n",
|
||||
"$$ \\frac{\\partial J(\\mathbf{w})}{\\partial w_1} \\approx\\frac{Cost(w_0,w_1+\\Delta)-Cost(w_0,w_1)}{\\Delta{w_1}}$$\n",
|
||||
"Of course, the same method can be applied to any of the parameters."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<details>\n",
|
||||
"<summary>\n",
|
||||
" <font size='3', color='darkgreen'><b>Hints</b></font>\n",
|
||||
"</summary>\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"# calculate a derivative and compare with our implementaion.\n",
|
||||
"delta = 0.00001\n",
|
||||
"w_check = [0.1,0.1]\n",
|
||||
"\n",
|
||||
"# compute the gradient using our derivation and implementation\n",
|
||||
"grad = compute_gradient(X_train, y_train, initial_w)\n",
|
||||
"\n",
|
||||
"# compute point 1\n",
|
||||
"c1 = compute_cost(X_train,y_train,w_check)\n",
|
||||
"\n",
|
||||
"#increment parameter w_check[1] by delta, leave w_check[0] the same\n",
|
||||
"w_check[0] = w_check[0] # leave the same\n",
|
||||
"w_check[1] = w_check[1] + delta\n",
|
||||
"\n",
|
||||
"#compute point 2\n",
|
||||
"c2 = compute_cost(X_train,y_train,w_check)\n",
|
||||
"calculated_dw1 = (c2 - c1)/delta\n",
|
||||
"print(f\"calculated_dw1 {calculated_dw1:0.1f}, expected dw1 {grad[1]}\" )#increment parameter w_check[1] by delta, leave w_check[0] the same \n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# calculate a derivative and compare with our implementaion.\n",
|
||||
"delta = 0.00001\n",
|
||||
"w_check = [0.1,0.1]\n",
|
||||
"\n",
|
||||
"# compute the gradient using our derivation and implementation\n",
|
||||
"### START CODE HERE ### \n",
|
||||
"\n",
|
||||
"\n",
|
||||
"# compute point 1\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"#increment parameter w_check[1] by delta, leave w_check[0] the same\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"#compute point 2\n",
|
||||
"\n",
|
||||
"### END CODE HERE ### \n",
|
||||
"\n",
|
||||
"print(f\"calculated_dw1 {calculated_dw1:0.1f}, expected dw1 {grad[1]}\" )"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Expected Output**: \n",
|
||||
"```calculated_dw1 -249837.5, expected dw1 -249850.0``` \n",
|
||||
"Not *exactly* the same, but close. The real derivative would take delta to zero. Try changing the value of delta."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name='ex-03'></a>\n",
|
||||
"## Exercise 3 Learning parameters using batch gradient descent \n",
|
||||
"\n",
|
||||
"You will now find the optimal parameters of a linear regression model by using batch gradient descent. Recall batch refers to running all the examples in one batch. \n",
|
||||
"- You don't need to implement anything for this part. Simply run the cells below. \n",
|
||||
"- A good way to verify that gradient descent is working correctly is to look\n",
|
||||
"at the value of $J(\\mathbf{w})$ and check that it is decreasing with each step. \n",
|
||||
"- Assuming you have implemented the gradient and computed the cost correctly, your value of $J(\\mathbf{w})$ should never increase and should converge to a steady value by the end of the algorithm."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def gradient_descent(X, y, w_in, cost_function, gradient_function, alpha, num_iters): \n",
|
||||
" \"\"\"\n",
|
||||
" Performs batch gradient descent to learn theta. Updates theta by taking \n",
|
||||
" num_iters gradient steps with learning rate alpha\n",
|
||||
" \n",
|
||||
" Args:\n",
|
||||
" X : (array_like Shape (m,)\n",
|
||||
" y : (array_like Shape (m,) )\n",
|
||||
" w_in : (array_like Shape (2,)) Initial values of parameters of the model\n",
|
||||
" alpha : (float) Learning rate\n",
|
||||
" num_iters : (int) number of iterations to run gradient descent\n",
|
||||
" Returns\n",
|
||||
" w : (array_like Shape (2,)) Updated values of parameters of the model after\n",
|
||||
" running gradient descent\n",
|
||||
" \"\"\"\n",
|
||||
" \n",
|
||||
" # number of training examples\n",
|
||||
" m = len(X)\n",
|
||||
" w = copy.deepcopy(w_in) # avoid modifying global w\n",
|
||||
" # An array to store cost J and w's at each iteration primarily for graphing later\n",
|
||||
" J_history = []\n",
|
||||
" w_history = []\n",
|
||||
" \n",
|
||||
" for i in range(num_iters):\n",
|
||||
" \n",
|
||||
" # Calculate the gradient and update the parameters\n",
|
||||
" gradient = gradient_function(X, y, w)\n",
|
||||
"\n",
|
||||
" # Update Parameters \n",
|
||||
" w[0] = w[0] - alpha * gradient[0]\n",
|
||||
" w[1] = w[1] - alpha * gradient[1]\n",
|
||||
"\n",
|
||||
" # Save cost J at each iteration\n",
|
||||
" if i<100000: # prevent resource exhaustion \n",
|
||||
" J_history.append( compute_cost(X, y, w))\n",
|
||||
" \n",
|
||||
" # Print cost every at intervals 10 times or as many iterations if < 10\n",
|
||||
" if i% math.ceil(num_iters/10) == 0:\n",
|
||||
" w_history.append([w[0],w[1]])\n",
|
||||
" print(f\"Iteration {i:4}: Cost {J_history[-1]:8.2f} \",\n",
|
||||
" f\"gradient: {gradient[0]:9.4f},{gradient[1]:14.4f}\")\n",
|
||||
" \n",
|
||||
" return w, J_history, w_history #return w and J,w history for graphing"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# initialize parameters\n",
|
||||
"w_init = [0,0]\n",
|
||||
"# some gradient descent settings\n",
|
||||
"iterations = 1000\n",
|
||||
"alpha = 1.0e-8\n",
|
||||
"# run gradient descent\n",
|
||||
"w_final, J_hist, w_hist = gradient_descent(X_train ,y_train, w_init, compute_cost, compute_gradient, alpha, iterations)\n",
|
||||
"print(f\"w found by gradient descent: ({w_final[0]:8.4f},{w_final[1]:8.4f})\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Expected Output**: \n",
|
||||
"```w found by gradient descent: (0.0001,0.2000)``` \n",
|
||||
"As we expected, the calculated parameter values are very close to (0,0.2) from previous labs."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print(f\"1000 sqft house estimate {w_final[0] + w_final[1]*1000:0.2f} Thousand dollars\")\n",
|
||||
"print(f\"1000 sqft house estimate {w_final[0] + w_final[1]*1200:0.2f} Thousand dollars\")\n",
|
||||
"print(f\"2000 sqft house estimate {w_final[0] + w_final[1]*2000:0.2f} Thousand dollars\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# plot cost vs iteration \n",
|
||||
"plt.plot(J_hist)\n",
|
||||
"plt.title(\"Cost vs iteration\")\n",
|
||||
"plt.ylabel('Cost')\n",
|
||||
"plt.xlabel('iteration step')\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The plot shows that we rapidly reduced cost early. Recall from lecture that the gradient tends to be larger when further from the optimum creating larger step sizes. As you approach the final value, the gradient is smaller resulting in smaller step sizes."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Plotting\n",
|
||||
"Let's produce some of the fancy graphs that are popular for showing gradient descent. First we'll create some extra test cases."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# generate some more paths\n",
|
||||
"w_init = [400,0.6]\n",
|
||||
"# some gradient descent settings\n",
|
||||
"iterations = 1000\n",
|
||||
"alpha = 1.0e-7\n",
|
||||
"# run gradient descent\n",
|
||||
"w2_final, J2_hist, w2_hist = gradient_descent(X_train ,y_train, w_init, compute_cost, compute_gradient, alpha, iterations)\n",
|
||||
"print(f\"w found by gradient descent: ({w2_final[0]:0.4f},{w2_final[1]:0.4f})\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Note, cost seems to have **plateaued**."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#generate some more paths\n",
|
||||
"w_init = [100,0.1]\n",
|
||||
"# some gradient descent settings\n",
|
||||
"iterations = 5\n",
|
||||
"alpha = 1.0e-6 # larger alpha\n",
|
||||
"# run gradient descent\n",
|
||||
"w3_final, J3_hist, w3_hist = gradient_descent(X_train ,y_train, w_init, compute_cost, compute_gradient, alpha, iterations)\n",
|
||||
"print(f\"w found by gradient descent: ({w3_final[0]:0.4f},{w3_final[1]:0.4f})\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Note, cost is **increasing**!"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from mpl_toolkits.mplot3d import axes3d\n",
|
||||
"import matplotlib.pyplot as plt\n",
|
||||
"from matplotlib import cm\n",
|
||||
"import numpy as np\n",
|
||||
"\n",
|
||||
"w0 = np.arange(-500, 500, 5)\n",
|
||||
"w1 = np.arange(-0.2, 0.8, 0.005)\n",
|
||||
"w0,w1 = np.meshgrid(w0,w1)\n",
|
||||
"z=np.zeros_like(w0)\n",
|
||||
"n,_ = w0.shape\n",
|
||||
"for i in range(n):\n",
|
||||
" for j in range(n):\n",
|
||||
" z[i][j] = compute_cost(X_train, y_train, [w0[i][j],w1[i][j]] )\n",
|
||||
"\n",
|
||||
" \n",
|
||||
"fig = plt.figure(figsize=(24,6))\n",
|
||||
"\n",
|
||||
"ax = fig.add_subplot(1, 2, 2)\n",
|
||||
"CS = ax.contour(w1, w0, z,[0,50,1000,5000,10000,25000,50000])\n",
|
||||
"plt.clabel(CS, inline=1, fmt='%1.0f', fontsize=10)\n",
|
||||
"plt.title('Contour plot of cost J(w), vs w0,w1 with path of gradient descent')\n",
|
||||
"\n",
|
||||
"w_sub = [ (i[1],i[0]) for i in w_hist]\n",
|
||||
"for i in range(len(w_sub)-1):\n",
|
||||
" plt.annotate('', xy=w_sub[i + 1], xytext=w_sub[i],\n",
|
||||
" arrowprops={'arrowstyle': '->', 'color': 'r', 'lw': 1},\n",
|
||||
" va='center', ha='center')\n",
|
||||
"\n",
|
||||
"w_sub = [ (i[1],i[0]) for i in w2_hist]\n",
|
||||
"for i in range(len(w_sub)-1):\n",
|
||||
" plt.annotate('', xy=w_sub[i + 1], xytext=w_sub[i],\n",
|
||||
" arrowprops={'arrowstyle': '->', 'color': 'b', 'lw': 1},\n",
|
||||
" va='center', ha='center')\n",
|
||||
"w_sub = [ (i[1],i[0]) for i in w3_hist]\n",
|
||||
"for i in range(len(w_sub)-1):\n",
|
||||
" plt.annotate('', xy=w_sub[i + 1], xytext=w_sub[i],\n",
|
||||
" arrowprops={'arrowstyle': '->', 'color': 'g', 'lw': 1},\n",
|
||||
" va='center', ha='center')\n",
|
||||
" \n",
|
||||
"ax.set_xlabel('w_1')\n",
|
||||
"ax.set_ylabel('w_0')\n",
|
||||
"\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<details>\n",
|
||||
"<summary>\n",
|
||||
" <font size='3'><b>Expected graph</b></font>\n",
|
||||
"</summary>\n",
|
||||
" <img src=\"./figures/ContourPlotLab3.PNG\" alt=\"Contour Plot\">\n",
|
||||
"<\\details>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"What is this graph showing? The ellipses are describing the surface of the cost $J(\\mathbf{w})$. The lines are the paths take from initial values of $(w_0,w_1)$ to their final values. \n",
|
||||
"The **red line** is our first run with w_init = (0,0). Gradient Descent successfully moves the parameters to (0,0.2) where cost is a minimum. But what about the Blue and Green lines? \n",
|
||||
"The **Blue** lines has w_init = (400,0.6) and alpha = 1.0e-7. Notice that while `w1` moves, `w0` doesn't seem to move. Why? \n",
|
||||
"The **Green** line has w_init = (100,0.1) and alpha = 1.0e-6. It never fully converges. Why?"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<details>\n",
|
||||
"<summary>\n",
|
||||
" <font size='3', color='darkgreen'><b>Hints</b></font>\n",
|
||||
"</summary>\n",
|
||||
" \n",
|
||||
"In next week's lectures we will cover some fine tuning of gradient descent that is required to get it to run well. The **blue line** is one of these cases. It it does not seem that `w0` is being updated, but it is, just slowly. `w1` is multiplied by $x_1$ which is the square footage of houses in the dataset, a value in the thousands. This makes `w1` update much more quickly than `w0`. Review the update equations (4) and (5) above. With alpha = 1.0e-7, it will take many iterations to update `w0` to the right value. \n",
|
||||
" \n",
|
||||
"Why not just increase the value of alpha? The **green** line demonstrates the problem with doing this. We use a larger value for alpha in that run and the solution _diverges_. The update for `w1` is so large that the cost is larger on each iteration rather than smaller. If you run it long enough, you will generate a numerical overflow (try it). The lecture described this scenario. \n",
|
||||
" \n",
|
||||
"So, we have a situation where alpha is too big for `w1` but too small for `w0`. A means of dealing with this will be described next week. It involves _scaling_ or _normalizing_ the features in the data set so they fall within the same range. Once the data is normalized, alpha will impact all features evenly.\n",
|
||||
" \n",
|
||||
"Another way to handle this is to select the largest value of alpha you can that doesn't cause the solution to diverge, and then run it a long time. Try this in the next section _if you have the time!_"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#TAKES A LONG TIME, 10 minutes or so.\n",
|
||||
"w_init = [400,0.1]\n",
|
||||
"# some gradient descent settings\n",
|
||||
"iterations = 40000000\n",
|
||||
"alpha = 7.0e-7\n",
|
||||
"# run gradient descent\n",
|
||||
"w4_final, J4_hist, w4_hist = gradient_descent(X_train ,y_train, w_init, compute_cost, compute_gradient, alpha, iterations)\n",
|
||||
"print(f\"w found by gradient descent: ({w4_final[0]:0.4f},{w4_final[1]:0.4f})\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"w0 = np.arange(-500, 500, 5)\n",
|
||||
"w1 = np.arange(-0.2, 0.8, 0.005)\n",
|
||||
"w0,w1 = np.meshgrid(w0,w1)\n",
|
||||
"z=np.zeros_like(w0)\n",
|
||||
"n,_ = w0.shape\n",
|
||||
"for i in range(n):\n",
|
||||
" for j in range(n):\n",
|
||||
" z[i][j] = compute_cost(X_train, y_train, [w0[i][j],w1[i][j]] )\n",
|
||||
"\n",
|
||||
" \n",
|
||||
"fig = plt.figure(figsize=(24,6))\n",
|
||||
"\n",
|
||||
"ax = fig.add_subplot(1, 2, 2)\n",
|
||||
"CS = ax.contour(w1, w0, z,[0,50,1000,5000,10000,25000,50000])\n",
|
||||
"plt.clabel(CS, inline=1, fmt='%1.0f', fontsize=10)\n",
|
||||
"plt.title('Contour plot of cost, w0 vs w1')\n",
|
||||
"\n",
|
||||
"w_sub = [ (i[1],i[0]) for i in w_hist]\n",
|
||||
"for i in range(len(w_sub)-1):\n",
|
||||
" plt.annotate('', xy=w_sub[i + 1], xytext=w_sub[i],\n",
|
||||
" arrowprops={'arrowstyle': '->', 'color': 'r', 'lw': 1},\n",
|
||||
" va='center', ha='center')\n",
|
||||
"\n",
|
||||
"w_sub = [ (i[1],i[0]) for i in w4_hist]\n",
|
||||
"for i in range(len(w_sub)-1):\n",
|
||||
" plt.annotate('', xy=w_sub[i + 1], xytext=w_sub[i],\n",
|
||||
" arrowprops={'arrowstyle': '->', 'color': 'c', 'lw': 1},\n",
|
||||
" va='center', ha='center')\n",
|
||||
" \n",
|
||||
"ax.set_xlabel('w_1')\n",
|
||||
"ax.set_ylabel('w_0')\n",
|
||||
"\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The cyan line is our long-running solution. Scaling or Normalizing features will get us to the right solution faster. We will cover this next week."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.8.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
@ -0,0 +1,344 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Optional Lab: Feature Engineering and Polynomial Regression\n",
|
||||
"\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Goals\n",
|
||||
"In this lab you will:\n",
|
||||
"- explore feature engineering and polynomial regression which allows you to use the machinery of linear regression to fit very complicated, even very non-linear functions.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Tools\n",
|
||||
"You will utilize the function developed in previous labs as well as matplotlib and NumPy. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import numpy as np\n",
|
||||
"import matplotlib.pyplot as plt\n",
|
||||
"from lab_utils_multi import zscore_normalize_features, run_gradient_descent_feng\n",
|
||||
"np.set_printoptions(precision=2) # reduced display precision on numpy arrays"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name='FeatureEng'></a>\n",
|
||||
"# Feature Engineering and Polynomial Regression Overview\n",
|
||||
"\n",
|
||||
"Out of the box, linear regression provides a means of building models of the form:\n",
|
||||
"$$f_{\\mathbf{w},b} = w_0x_0 + w_1x_1+ ... + w_{n-1}x_{n-1} + b \\tag{1}$$ \n",
|
||||
"What if your features/data are non-linear or are combinations of features? For example, Housing prices do not tend to be linear with living area but penalize very small or very large houses resulting in the curves shown in the graphic above. How can we use the machinery of linear regression to fit this curve? Recall, the 'machinery' we have is the ability to modify the parameters $\\mathbf{w}$, $\\mathbf{b}$ in (1) to 'fit' the equation to the training data. However, no amount of adjusting of $\\mathbf{w}$,$\\mathbf{b}$ in (1) will achieve a fit to a non-linear curve.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name='PolynomialFeatures'></a>\n",
|
||||
"## Polynomial Features\n",
|
||||
"\n",
|
||||
"Above we were considering a scenario where the data was non-linear. Let's try using what we know so far to fit a non-linear curve. We'll start with a simple quadratic: $y = 1+x^2$\n",
|
||||
"\n",
|
||||
"You're familiar with all the routines we're using. They are available in the lab_utils.py file for review. We'll use [`np.c_[..]`](https://numpy.org/doc/stable/reference/generated/numpy.c_.html) which is a NumPy routine to concatenate along the column boundary."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# create target data\n",
|
||||
"x = np.arange(0, 20, 1)\n",
|
||||
"y = 1 + x**2\n",
|
||||
"X = x.reshape(-1, 1)\n",
|
||||
"\n",
|
||||
"model_w,model_b = run_gradient_descent_feng(X,y,iterations=1000, alpha = 1e-2)\n",
|
||||
"\n",
|
||||
"plt.scatter(x, y, marker='x', c='r', label=\"Actual Value\"); plt.title(\"no feature engineering\")\n",
|
||||
"plt.plot(x,X@model_w + model_b, label=\"Predicted Value\"); plt.xlabel(\"X\"); plt.ylabel(\"y\"); plt.legend(); plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Well, as expected, not a great fit. What is needed is something like $y= w_0x_0^2 + b$, or a **polynomial feature**.\n",
|
||||
"To accomplish this, you can modify the *input data* to *engineer* the needed features. If you swap the original data with a version that squares the $x$ value, then you can achieve $y= w_0x_0^2 + b$. Let's try it. Swap `X` for `X**2` below:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# create target data\n",
|
||||
"x = np.arange(0, 20, 1)\n",
|
||||
"y = 1 + x**2\n",
|
||||
"\n",
|
||||
"# Engineer features \n",
|
||||
"X = x**2 #<-- added engineered feature"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"X = X.reshape(-1, 1) #X should be a 2-D Matrix\n",
|
||||
"model_w,model_b = run_gradient_descent_feng(X, y, iterations=10000, alpha = 1e-5)\n",
|
||||
"\n",
|
||||
"plt.scatter(x, y, marker='x', c='r', label=\"Actual Value\"); plt.title(\"Added x**2 feature\")\n",
|
||||
"plt.plot(x, np.dot(X,model_w) + model_b, label=\"Predicted Value\"); plt.xlabel(\"x\"); plt.ylabel(\"y\"); plt.legend(); plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Great! near perfect fit. Notice the values of $\\mathbf{w}$ and b printed right above the graph: `w,b found by gradient descent: w: [1.], b: 0.0490`. Gradient descent modified our initial values of $\\mathbf{w},b $ to be (1.0,0.049) or a model of $y=1*x_0^2+0.049$, very close to our target of $y=1*x_0^2+1$. If you ran it longer, it could be a better match. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Selecting Features\n",
|
||||
"<a name='GDF'></a>\n",
|
||||
"Above, we knew that an $x^2$ term was required. It may not always be obvious which features are required. One could add a variety of potential features to try and find the most useful. For example, what if we had instead tried : $y=w_0x_0 + w_1x_1^2 + w_2x_2^3+b$ ? \n",
|
||||
"\n",
|
||||
"Run the next cells. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# create target data\n",
|
||||
"x = np.arange(0, 20, 1)\n",
|
||||
"y = x**2\n",
|
||||
"\n",
|
||||
"# engineer features .\n",
|
||||
"X = np.c_[x, x**2, x**3] #<-- added engineered feature"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"model_w,model_b = run_gradient_descent_feng(X, y, iterations=10000, alpha=1e-7)\n",
|
||||
"\n",
|
||||
"plt.scatter(x, y, marker='x', c='r', label=\"Actual Value\"); plt.title(\"x, x**2, x**3 features\")\n",
|
||||
"plt.plot(x, X@model_w + model_b, label=\"Predicted Value\"); plt.xlabel(\"x\"); plt.ylabel(\"y\"); plt.legend(); plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Note the value of $\\mathbf{w}$, `[0.08 0.54 0.03]` and b is `0.0106`.This implies the model after fitting/training is:\n",
|
||||
"$$ 0.08x + 0.54x^2 + 0.03x^3 + 0.0106 $$\n",
|
||||
"Gradient descent has emphasized the data that is the best fit to the $x^2$ data by increasing the $w_1$ term relative to the others. If you were to run for a very long time, it would continue to reduce the impact of the other terms. \n",
|
||||
">Gradient descent is picking the 'correct' features for us by emphasizing its associated parameter\n",
|
||||
"\n",
|
||||
"Let's review this idea:\n",
|
||||
"- Intially, the features were re-scaled so they are comparable to each other\n",
|
||||
"- less weight value implies less important/correct feature, and in extreme, when the weight becomes zero or very close to zero, the associated feature is not useful in fitting the model to the data.\n",
|
||||
"- above, after fitting, the weight associated with the $x^2$ feature is much larger than the weights for $x$ or $x^3$ as it is the most useful in fitting the data. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### An Alternate View\n",
|
||||
"Above, polynomial features were chosen based on how well they matched the target data. Another way to think about this is to note that we are still using linear regression once we have created new features. Given that, the best features will be linear relative to the target. This is best understood with an example. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# create target data\n",
|
||||
"x = np.arange(0, 20, 1)\n",
|
||||
"y = x**2\n",
|
||||
"\n",
|
||||
"# engineer features .\n",
|
||||
"X = np.c_[x, x**2, x**3] #<-- added engineered feature\n",
|
||||
"X_features = ['x','x^2','x^3']"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"fig,ax=plt.subplots(1, 3, figsize=(12, 3), sharey=True)\n",
|
||||
"for i in range(len(ax)):\n",
|
||||
" ax[i].scatter(X[:,i],y)\n",
|
||||
" ax[i].set_xlabel(X_features[i])\n",
|
||||
"ax[0].set_ylabel(\"y\")\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Above, it is clear that the $x^2$ feature mapped against the target value $y$ is linear. Linear regression can then easily generate a model using that feature."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Scaling features\n",
|
||||
"As described in the last lab, if the data set has features with significantly different scales, one should apply feature scaling to speed gradient descent. In the example above, there is $x$, $x^2$ and $x^3$ which will naturally have very different scales. Let's apply Z-score normalization to our example."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# create target data\n",
|
||||
"x = np.arange(0,20,1)\n",
|
||||
"X = np.c_[x, x**2, x**3]\n",
|
||||
"print(f\"Peak to Peak range by column in Raw X:{np.ptp(X,axis=0)}\")\n",
|
||||
"\n",
|
||||
"# add mean_normalization \n",
|
||||
"X = zscore_normalize_features(X) \n",
|
||||
"print(f\"Peak to Peak range by column in Normalized X:{np.ptp(X,axis=0)}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now we can try again with a more aggressive value of alpha:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"x = np.arange(0,20,1)\n",
|
||||
"y = x**2\n",
|
||||
"\n",
|
||||
"X = np.c_[x, x**2, x**3]\n",
|
||||
"X = zscore_normalize_features(X) \n",
|
||||
"\n",
|
||||
"model_w, model_b = run_gradient_descent_feng(X, y, iterations=100000, alpha=1e-1)\n",
|
||||
"\n",
|
||||
"plt.scatter(x, y, marker='x', c='r', label=\"Actual Value\"); plt.title(\"Normalized x x**2, x**3 feature\")\n",
|
||||
"plt.plot(x,X@model_w + model_b, label=\"Predicted Value\"); plt.xlabel(\"x\"); plt.ylabel(\"y\"); plt.legend(); plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Feature scaling allows this to converge much faster. \n",
|
||||
"Note again the values of $\\mathbf{w}$. The $w_1$ term, which is the $x^2$ term is the most emphasized. Gradient descent has all but eliminated the $x^3$ term."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Complex Functions\n",
|
||||
"With feature engineering, even quite complex functions can be modeled:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"x = np.arange(0,20,1)\n",
|
||||
"y = np.cos(x/2)\n",
|
||||
"\n",
|
||||
"X = np.c_[x, x**2, x**3,x**4, x**5, x**6, x**7, x**8, x**9, x**10, x**11, x**12, x**13]\n",
|
||||
"X = zscore_normalize_features(X) \n",
|
||||
"\n",
|
||||
"model_w,model_b = run_gradient_descent_feng(X, y, iterations=1000000, alpha = 1e-1)\n",
|
||||
"\n",
|
||||
"plt.scatter(x, y, marker='x', c='r', label=\"Actual Value\"); plt.title(\"Normalized x x**2, x**3 feature\")\n",
|
||||
"plt.plot(x,X@model_w + model_b, label=\"Predicted Value\"); plt.xlabel(\"x\"); plt.ylabel(\"y\"); plt.legend(); plt.show()\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"## Congratulations!\n",
|
||||
"In this lab you:\n",
|
||||
"- learned how linear regression can model complex, even highly non-linear functions using feature engineering\n",
|
||||
"- recognized that it is important to apply feature scaling when doing feature engineering"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.6"
|
||||
},
|
||||
"toc-autonumbering": false
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
@ -0,0 +1,501 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Ungraded Lab - Multiple Variable Model Representation\n",
|
||||
"\n",
|
||||
"In this ungraded lab, you will extend our model to support multiple features. You will also utilized a popular python numeric library, NumPy to efficiently store and manipulate data. For detailed descriptions and examples of routines used, see [Numpy Documentation](https://numpy.org/doc/stable/reference/)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import numpy as np"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Problem Statement\n",
|
||||
"\n",
|
||||
"You will use the motivating example of housing price prediction. The training dataset contains three examples with 4 features (size,bedrooms,floors and age) shown in the table below.\n",
|
||||
"\n",
|
||||
"| Size (feet$^2$) | Number of Bedrooms | Number of floors | Age of Home | Price (1000s dollars) | \n",
|
||||
"| ----------------| ------------------- |----------------- |--------------|-------------- | \n",
|
||||
"| 2104 | 5 | 1 | 45 | 460 | \n",
|
||||
"| 1416 | 3 | 2 | 40 | 232 | \n",
|
||||
"| 852 | 2 | 1 | 35 | 178 | \n",
|
||||
"\n",
|
||||
"We would like to build a linear regression model using these values so we can then predict the price for other houses - say, a house with 1200 feet$^2$, 3 bedrooms, 1 floor, 40 years old. In this lab you will create the model. In the following labs, we will fit the data.\n",
|
||||
"\n",
|
||||
"### Notation: X, y and parameters w\n",
|
||||
"\n",
|
||||
"The lectures and equations describe $\\mathbf{X}$, $\\mathbf{y}$, $\\mathbf{w}$. In our code these are represented by variables:\n",
|
||||
"- `X_orig` represents input variables, also called input features. In previous labs, there was just one feature, now there are four. \n",
|
||||
"- `y_orig` represents output variables, also known as target variables (in this case - Price (1000s of dollars)). \n",
|
||||
"- `w_init` represents our parameters. \n",
|
||||
"Please run the following code cell to create your `X_orig` and `y_orig` variables."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"X_orig = np.array([[2104,5,1,45], [1416,3,2,40], [852,2,1,35]])\n",
|
||||
"y_orig = np.array([460,232,178]).reshape(-1,1) #reshape creates (m,1) matrix"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Matrix X containing our examples\n",
|
||||
"Similar to the table above, examples are stored in a NumPy matrix `X_init`. Each row of the matrix represents one example. As described in lecture, examples are extended by a column of ones creating `X_init_e`, described below. In general, when you have $m$ training examples ( $m$ is three in our example), and there are $n$ features (four in our example), $\\mathbf{X}$ is a matrix with dimensions ($m$, $n+1$) (m rows, n+1 columns).\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"$$\\mathbf{X} = \\begin{pmatrix}\n",
|
||||
" \\mathbf{x}^{(0)} \\\\ \n",
|
||||
" \\mathbf{x}^{(1)} \\\\\n",
|
||||
" \\cdots \\\\\n",
|
||||
" \\mathbf{x}^{(m-1)}\n",
|
||||
"\\end{pmatrix}\n",
|
||||
"= \\begin{pmatrix}\n",
|
||||
" x^{(0)}_0 & x^{(0)}_1 & \\cdots & x^{(0)}_{n} \\\\ \n",
|
||||
" x^{(1)}_0 & x^{(1)}_1 & \\cdots & x^{(1)}_{n} \\\\\n",
|
||||
" \\cdots \\\\\n",
|
||||
" x^{(m-1)}_0 & x^{(m-1)}_1 & \\cdots & x^{(m-1)}_{n} \n",
|
||||
"\\end{pmatrix}\n",
|
||||
"$$\n",
|
||||
"notation:\n",
|
||||
"- $\\mathbf{x}^{(0)}$ is example 0. The superscript in parenthesis indicates the example number. The bold indicates a vector (described more below)\n",
|
||||
"- $x^{(0)}_2$ is element 2 in example 0. The superscript in parenthesis indicates the example number while the subscript represents an element. \n",
|
||||
"\n",
|
||||
"For our dataset, $\\mathbf{X}$ is (3,5):\n",
|
||||
"$$\\mathbf{X} = \\begin{pmatrix}\n",
|
||||
" \\mathbf{x}^{(0)} \\\\ \n",
|
||||
" \\mathbf{x}^{(1)} \\\\\n",
|
||||
" \\mathbf{x}^{(2)}\n",
|
||||
"\\end{pmatrix}\n",
|
||||
"= \\begin{pmatrix}\n",
|
||||
" 1 & 2104 & 5 & 1 & 45 & 460 \\\\ \n",
|
||||
" 1 & 1416 & 3 & 2 & 40 & 232 \\\\\n",
|
||||
" 1 & 852 & 2 & 1 & 35 & 178\n",
|
||||
"\\end{pmatrix}\n",
|
||||
"$$\n",
|
||||
"Lets try implementing this. Start by examining our input data"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# data is stored in numpy array/matrix\n",
|
||||
"print(f\"X Shape: {X_orig.shape}, X Type:{type(X_orig)})\")\n",
|
||||
"print(X_orig)\n",
|
||||
"print(f\"y Shape: {y_orig.shape}, y Type:{type(y_orig)})\")\n",
|
||||
"print(y_orig)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"To simplify matrix/vector operations, you will want to first add another column to your data (as $x_0$) to accomodate the $w_0$ intercept term. This allows you to treat $w_0$ the same as the other parameters.\n",
|
||||
"\n",
|
||||
"So if your original `X_orig` looks like this:\n",
|
||||
"\n",
|
||||
"$$ \n",
|
||||
"\\mathbf{X_{orig}} = \n",
|
||||
"\\begin{pmatrix}\n",
|
||||
" x^{(0)}_1 \\\\ \n",
|
||||
" x^{(1)}_1 \\\\\n",
|
||||
" \\cdots \\\\\n",
|
||||
" x^{(m-1)}_1 \n",
|
||||
"\\end{pmatrix}\n",
|
||||
"$$\n",
|
||||
"\n",
|
||||
"You will want to combine it with a vector of ones:\n",
|
||||
"$$\n",
|
||||
"\\mathbf{1} = \n",
|
||||
"\\begin{pmatrix}\n",
|
||||
" 1 \\\\ \n",
|
||||
" 1 \\\\\n",
|
||||
" \\cdots \\\\\n",
|
||||
" 1\n",
|
||||
"\\end{pmatrix}\n",
|
||||
"$$\n",
|
||||
"\n",
|
||||
"So it will look like this:\n",
|
||||
"$$\n",
|
||||
"\\mathbf{X_{train}} = \\begin{pmatrix} \\mathbf{1} & \\mathbf{X_{orig}}\\end{pmatrix}\n",
|
||||
"=\n",
|
||||
"\\begin{pmatrix}\n",
|
||||
" 1 & x^{(0)}_1 \\\\ \n",
|
||||
" 1 & x^{(1)}_1 \\\\\n",
|
||||
" \\cdots \\\\\n",
|
||||
" 1 & x^{(m-1)}_1 \n",
|
||||
"\\end{pmatrix}\n",
|
||||
"$$\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"tmp_ones = np.ones((3,1), dtype=np.int64) #dtype just added to keep examples neat.. not required\n",
|
||||
"print (\"(m,1) column of ones\")\n",
|
||||
"print(tmp_ones)\n",
|
||||
"#extend X_orig with column of ones\n",
|
||||
"X_train = np.concatenate([tmp_ones, X_orig], axis=1)\n",
|
||||
"y_train = y_orig # just for symmetry\n",
|
||||
"\n",
|
||||
"print(f\"Vector of ones stacked to the left of X_orig \")\n",
|
||||
"print(X_train)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Parameter vector w\n",
|
||||
"\n",
|
||||
"-$\\mathbf{w}$ is a vector with dimensions ($n+1$, $1$) (n+1 rows, 1 column)\n",
|
||||
" - Each column contains the parameters associated with one feature.\n",
|
||||
" - in our dataset, n+1 is 5.\n",
|
||||
"\n",
|
||||
"$$\\mathbf{w} = \\begin{pmatrix}\n",
|
||||
"w_0 \\\\ \n",
|
||||
"w_1 \\\\\n",
|
||||
"\\cdots\\\\\n",
|
||||
"w_{n}\n",
|
||||
"\\end{pmatrix}\n",
|
||||
"$$\n",
|
||||
"For this lab, lets initialize `w` with some handy predetermined values. Normally, `w` would be initalized with random values or zero. Note the use of \".reshape\" to create a (n,1) column vector. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"w_init = np.array([ 785.1811367994083, 0.39133535, 18.75376741, \n",
|
||||
" -53.36032453, -26.42131618]).reshape(-1,1)\n",
|
||||
"print(f\"w_init shape: {w_init.shape}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Model prediction\n",
|
||||
"The model's prediction with multiple variables is given by the linear model:\n",
|
||||
"\n",
|
||||
"$$ f_{\\mathbf{w}}(\\mathbf{x}) = w_0 + w_1x_1 + ... + w_nx_n \\tag{1}$$\n",
|
||||
"\n",
|
||||
"This is where representing our data in matrices and vectors pays off. Recall from the Linear Algebra review the Matrix Vector multiplication. This is shown below\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"Note that Row/Column that is highlighted. Knowing that we have set the $x_0$ values to 1, its clear the first row/column operation implements the prediction (1) above for $\\mathbf{x}^{(0)}$ , resulting in $f_{\\mathbf{w}}(\\mathbf{x}^{(0)})$. The second row of the result is $f_{\\mathbf{w}}(\\mathbf{x}^{(1)})$ and so on. By utilizing Matrix Vector multiplication, we can compute the prediction of all of the examples in $X$ in one statement!.\n",
|
||||
"\n",
|
||||
"$$f_{\\mathbf{w}}(\\mathbf{X})=\\mathbf{X}\\mathbf{w} \\tag{2}$$\n",
|
||||
"\n",
|
||||
"Let's try this. We have previously initized `X_train` and `w_init`. Before you run the cell below, what shape will `f_w` be?"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# calculate f_w for all examples.\n",
|
||||
"f_w = X_train @ w_init # the same as np.matmul(x_orig_e, w_init)\n",
|
||||
"print(\"f_w calculated using a matrix multiply\")\n",
|
||||
"print(f_w)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Using our carefully selected `w` values, the results nearly match our `y_train` values."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print(\"y_train values\")\n",
|
||||
"print(y_train)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Single Prediction\n",
|
||||
"\n",
|
||||
"We now can make prediction on a full set of examples, what about a single example? There are multiple ways to form this calculation, but here we will immitate the calculation that was highlighted in blue in the figure above.\n",
|
||||
"For convenience of notation, you'll define $\\mathbf{x}$ as a vector:\n",
|
||||
"\n",
|
||||
"$$ \\mathbf{x} = \\begin{pmatrix}\n",
|
||||
" x_0 & x_1 & ... & x_n\n",
|
||||
" \\end{pmatrix}\n",
|
||||
"$$\n",
|
||||
"\n",
|
||||
"- With $x_0 = 1$ and ($x_1$,..,$x_n$) being your input data. \n",
|
||||
"\n",
|
||||
"The prediction $f_{\\mathbf{w}}(\\mathbf{x})$ is now\n",
|
||||
"$$ f_{\\mathbf{w}}(\\mathbf{x}) = \\mathbf{x}\\mathbf{w} \\tag{3} $$ \n",
|
||||
"Which performs the following operation:\n",
|
||||
"$$\n",
|
||||
"f_{\\mathbf{w}}(\\mathbf{x}) = x_0w_0 + x_1w_1 + ... + x_nw_n\n",
|
||||
"$$\n",
|
||||
"Let's try it. Recall we wanted to predict the value of a house with 1200 feet$^2$, 3 bedrooms, 1 floor, 40 years old."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Define our x vector, extended with a 1.\n",
|
||||
"x_vec = np.array([1,1200,3,1,40]).reshape(1,-1) # row vector\n",
|
||||
"print(\"x_vec shape\", x_vec.shape)\n",
|
||||
"print(\"x_vec\")\n",
|
||||
"print(x_vec)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# make a prediction\n",
|
||||
"f_wv = x_vec @ w_init\n",
|
||||
"print(\"f_wv shape\", f_wv.shape)\n",
|
||||
"print(\"prediction f_wv\", f_wv)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Great! Now that we have realized our model in Matrix and Vector form lets \n",
|
||||
"- review some of the operations in more detail\n",
|
||||
"- try an example on your own."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### np.concatenate and axis\n",
|
||||
"We will use np.concatenate often. The use of `axis` is often confusing. Lets look at this in more detail with an example.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"tmp_X_orig = np.array([[9],\n",
|
||||
" [2]\n",
|
||||
" ])\n",
|
||||
"\n",
|
||||
"print(\"Matrix tmp_X_orig\")\n",
|
||||
"print(tmp_X_orig, \"\\n\")\n",
|
||||
"\n",
|
||||
"# Use np.ones to create a column vector of ones\n",
|
||||
"tmp_ones = np.ones((2,1))\n",
|
||||
"print(f\"Column vector of ones (2 rows and 1 column)\")\n",
|
||||
"print(tmp_ones, \"\\n\")\n",
|
||||
"\n",
|
||||
"tmp_X = np.concatenate([tmp_ones, tmp_X_orig], axis=1)\n",
|
||||
"print(\"Vector of ones stacked to the left of tmp_X_orig\")\n",
|
||||
"print(tmp_X, \"\\n\")\n",
|
||||
"\n",
|
||||
"print(f\"tmp_x has shape: {tmp_X.shape}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"In this small example, the $\\mathbf{X}$ is now:\n",
|
||||
"$$\\mathbf{X} = \n",
|
||||
"\\begin{pmatrix}\n",
|
||||
"1 & 9 \\\\\n",
|
||||
"1 & 2\n",
|
||||
"\\end{pmatrix}\n",
|
||||
"$$\n",
|
||||
"\n",
|
||||
"Notice that when calling `np.concatenate`, you're setting `axis=1`. \n",
|
||||
"- This puts the vector of ones on the left and the tmp_X_orig to the right.\n",
|
||||
"- If you set axis = 0, then `np.concatenate` would place the vector of ones ON TOP of tmp_X_orig"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print(\"Calling numpy.concatenate, setting axis=0\")\n",
|
||||
"tmp_X_version_2 = np.concatenate([tmp_ones, tmp_X_orig], axis=0)\n",
|
||||
"print(\"Vector of ones stacked to the ON TOP of tmp_X_orig\")\n",
|
||||
"print(tmp_X_version_2)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"So if you set axis=0, $\\mathbf{X}$ looks like this:\n",
|
||||
"$$\\mathbf{X} = \n",
|
||||
"\\begin{pmatrix}\n",
|
||||
"1 \\\\ 1 \\\\\n",
|
||||
"9 \\\\ 2\n",
|
||||
"\\end{pmatrix}\n",
|
||||
"$$\n",
|
||||
"This is **NOT** what you want.\n",
|
||||
"\n",
|
||||
"You'll want to set axis=1 so that you get a column vector of ones on the left and a column vector on the right:\n",
|
||||
"\n",
|
||||
"$$\\mathbf{X} = \n",
|
||||
"\\begin{pmatrix}\n",
|
||||
"1 & x^{(0)}_1 \\\\\n",
|
||||
"1 & x^{(1)}_1\n",
|
||||
"\\end{pmatrix}\n",
|
||||
"$$"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Second Example on your own\n",
|
||||
"Let's try a similar example with slightly different features.\n",
|
||||
"\n",
|
||||
"| Size (feet$^2$) | Number of Bedrooms | Age of Home | Price (1000s dollars) | \n",
|
||||
"| ----------------| ------------------- |--------------|-------------- | \n",
|
||||
"| 2104 | 5 | 45 | 460 | \n",
|
||||
"| 1416 | 3 | 40 | 232 | \n",
|
||||
"| 1534 | 4 | 30 | 315 | \n",
|
||||
"| 852 | 2 | 35 | 178 | \n",
|
||||
"\n",
|
||||
"**Using the previous example as a guide** as needed, \n",
|
||||
"- create the data structures for `X_orig`, `y_orig` \n",
|
||||
"- extend X_orig with a column of 1's.\n",
|
||||
"- calculate `f_w`\n",
|
||||
"- make a prediction for a single example, 1500sqft, 3 bedrooms, 40 years old"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# use these precalculated values as inital parameters\n",
|
||||
"w_init2 = np.array([-267.70709382, -0.37871854, 220.9610984, 9.32723112]).reshape(-1,1)\n",
|
||||
"\n",
|
||||
"X_orig2 =\n",
|
||||
"y_train2 = \n",
|
||||
"tmp_ones2 = \n",
|
||||
"X_train2 = \n",
|
||||
"f_w2 = \n",
|
||||
"print(f_w2)\n",
|
||||
"print(y_train2)\n",
|
||||
"\n",
|
||||
"x_vec2 = np.array([1,1500,3,40]).reshape(1,-1)\n",
|
||||
"f_wv2 = x_vec2 @ w_init2\n",
|
||||
"print(f_wv2)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<details>\n",
|
||||
"<summary>\n",
|
||||
" <font size='3', color='darkgreen'><b>Hints</b></font>\n",
|
||||
"</summary>\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"w_init2 = np.array([-267.70709382, -0.37871854, 220.9610984, 9.32723112]).reshape(-1,1)\n",
|
||||
"X_orig2 = np.array([[2104,5,45], [1416,3,40], [1534,4,30], [852,2,35]])\n",
|
||||
"y_train2 = np.array([460,232,315,178]).reshape(-1,1) #reshape creates (m,1) matrix\n",
|
||||
"tmp_ones2 = np.ones((4,1), dtype=np.int64)\n",
|
||||
"X_train2 = np.concatenate([tmp_ones2, X_orig2], axis=1)\n",
|
||||
"f_w2 = X_train2 @ w_init2\n",
|
||||
"print(f_w2)\n",
|
||||
"print(y_train2)\n",
|
||||
"\n",
|
||||
"x_vec2 = np.array([1,1500,3,40]).reshape(1,-1)\n",
|
||||
"f_wv2 = x_vec2 @ w_init2\n",
|
||||
"print(f_wv2)\n",
|
||||
"-----------------------------------------------------------------\n",
|
||||
" Output of cell\n",
|
||||
"-----------------------------------------------------------------\n",
|
||||
"[[459.99999042]\n",
|
||||
" [231.99999354]\n",
|
||||
" [314.99999302]\n",
|
||||
" [177.9999961 ]]\n",
|
||||
"[[460]\n",
|
||||
" [232]\n",
|
||||
" [315]\n",
|
||||
" [178]]\n",
|
||||
"[[200.18763618]]\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.8.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
@ -0,0 +1,396 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Ungraded Lab - Multiple Variable Cost\n",
|
||||
"\n",
|
||||
"In this lab we will adjust our previous single variable cost calculation to use multiple variables and utilize the NumPy vectors and matrices."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import numpy as np"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We will utilize the same data set and intialization as the last lab.\n",
|
||||
"### Problem Statement\n",
|
||||
"\n",
|
||||
"You will use the motivating example of housing price prediction. The training dataset contains three examples with 4 features (size,bedrooms,floors and age) shown in the table below.\n",
|
||||
"\n",
|
||||
"| Size (feet$^2$) | Number of Bedrooms | Number of floors | Age of Home | Price (1000s dollars) | \n",
|
||||
"| ----------------| ------------------- |----------------- |--------------|-------------- | \n",
|
||||
"| 2104 | 5 | 1 | 45 | 460 | \n",
|
||||
"| 1416 | 3 | 2 | 40 | 232 | \n",
|
||||
"| 852 | 2 | 1 | 35 | 178 | \n",
|
||||
"\n",
|
||||
"We would like to build a linear regression model using these values so we can then predict the price for other houses - say, a house with 1200 feet$^2$, 3 bedrooms, 1 floor, 40 years old. In this lab you will create the model. In the following labs, we will fit the data. \n",
|
||||
"\n",
|
||||
"We will set this up without much explaination. Refer to the previous lab for details."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Load data set\n",
|
||||
"X_orig = np.array([[2104,5,1,45], [1416,3,2,40], [852,2,1,35]])\n",
|
||||
"y_train = np.array([460,232,178]).reshape(-1,1) #reshape creates (m,1) matrix\n",
|
||||
"\n",
|
||||
"#extend X_orig with column of ones\n",
|
||||
"tmp_ones = np.ones((3,1), dtype=np.int64) #dtype just added to keep examples neat.. not required\n",
|
||||
"X_train = np.concatenate([tmp_ones, X_orig], axis=1)\n",
|
||||
"# load parameters. set to near optimal values\n",
|
||||
"w_init = np.array([ 785.1811367994083, 0.39133535, 18.75376741, \n",
|
||||
" -53.36032453, -26.42131618]).reshape(-1,1)\n",
|
||||
"print(f\"X shape: {X_train.shape}, w_shape: {w_init.shape}, y_shape: {y_train.shape}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Calculate the cost\n",
|
||||
"Next, calculate the cost $J(\\vec{w})$\n",
|
||||
"- Recall that the equation for the cost function $J(w)$ looks like this:\n",
|
||||
"$$J(\\mathbf{w}) = \\frac{1}{2m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w}}(\\mathbf{x}^{(i)}) - y^{(i)})^2 \\tag{1}$$ \n",
|
||||
"\n",
|
||||
"- The model prediction is a vector of size m:\n",
|
||||
"$$\\mathbf{f_{\\mathbf{w}}(\\mathbf{X})} = \\begin{pmatrix}\n",
|
||||
"f_{\\mathbf{w}}(x^{(0)}) \\\\\n",
|
||||
"f_{\\mathbf{w}}(x^{(1)}) \\\\\n",
|
||||
"\\cdots \\\\\n",
|
||||
"f_{\\mathbf{w}}(x^{(m-1)}) \\\\\n",
|
||||
"\\end{pmatrix} \n",
|
||||
"$$\n",
|
||||
"\n",
|
||||
"- Similarly, `y_train` contains the actual values as a column vector of m examples\n",
|
||||
"$$\\mathbf{y} = \\begin{pmatrix}\n",
|
||||
"y^{(0)} \\\\\n",
|
||||
"y^{(1)} \\\\\n",
|
||||
"\\cdots \\\\\n",
|
||||
"y^{(m-1)}\\\\\n",
|
||||
"\\end{pmatrix} \n",
|
||||
"$$\n",
|
||||
"\n",
|
||||
"Performing these calculations will involve some matrix and vector operations. These should be familiar from the Linear Algebra review. If not, a short review is at the end of this notebook.\n",
|
||||
"\n",
|
||||
"Notation:\n",
|
||||
"- Adjacent matrix, vector symbols such $\\mathbf{X}\\mathbf{w}$ or $\\mathbf{x}\\mathbf{w}$ implies a matrix multiplication. \n",
|
||||
"- An explicit $*$ implies element-wise multiplication.\n",
|
||||
"- $()^2$ is element-wise squaring\n",
|
||||
"- **bold** lowercase is a vector, **bold** uppercase is a matrix\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Instructions for Vectorized implementation of equation (1) above, computing cost :\n",
|
||||
"- calculate prediction for **all** training examples\n",
|
||||
"$$f_{\\mathbf{w}}(\\mathbf{X})=\\mathbf{X}\\mathbf{w} \\tag{2}$$\n",
|
||||
"- calculate the cost **all** examples\n",
|
||||
"$$cost = \\frac{1}{2m} \\sum\\limits_{i = 0}^{m-1}((f_{\\mathbf{w}}(\\mathbf{X})-\\mathbf{y})^2) \\tag{3}$$\n",
|
||||
" \n",
|
||||
" - where $m$ is the number of training examples. The result is a scalar."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<details>\n",
|
||||
"<summary>\n",
|
||||
" <font size='3', color='darkgreen'><b>Hints</b></font>\n",
|
||||
"</summary>\n",
|
||||
" \n",
|
||||
"```\n",
|
||||
"#Function to calculate the cost\n",
|
||||
"def compute_cost(X, y, w, verbose=False):\n",
|
||||
" \"\"\"\n",
|
||||
" Computes the gradient for linear regression \n",
|
||||
" Args:\n",
|
||||
" X : (array_like Shape (m,n)) variable such as house size \n",
|
||||
" y : (array_like Shape (m,)) actual value \n",
|
||||
" w : (array_like Shape (2,)) parameters of the model \n",
|
||||
" verbose : (Boolean) If true, print out intermediate value f_w\n",
|
||||
" Returns\n",
|
||||
" cost: (scalar) The gradient of the cost w.r.t. the parameters w. \n",
|
||||
" Note that dw has the same dimensions as w.\n",
|
||||
" \"\"\"\n",
|
||||
" \n",
|
||||
" m,n = X.shape\n",
|
||||
"\n",
|
||||
" # calculate f_w for all examples.\n",
|
||||
" f_w = X @ w # @ is np.matmul, this the same as np.matmul(X, w)\n",
|
||||
" if verbose: print(\"f_w:\")\n",
|
||||
" if verbose: print(f_w)\n",
|
||||
" \n",
|
||||
" # calculate cost\n",
|
||||
" total_cost = (1/(2*m)) * np.sum((f_w-y)**2)\n",
|
||||
" \n",
|
||||
" return total_cost\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#Function to calculate the cost\n",
|
||||
"def compute_cost(X, y, w, verbose=False):\n",
|
||||
" \"\"\"\n",
|
||||
" Computes the gradient for linear regression \n",
|
||||
" Args:\n",
|
||||
" X : (array_like Shape (m,n)) variable such as house size \n",
|
||||
" y : (array_like Shape (m,)) actual value \n",
|
||||
" w : (array_like Shape (2,)) parameters of the model \n",
|
||||
" verbose : (Boolean) If true, print out intermediate value f_w\n",
|
||||
" Returns\n",
|
||||
" cost: (scalar) The gradient of the cost w.r.t. the parameters w. \n",
|
||||
" Note that dw has the same dimensions as w.\n",
|
||||
" \"\"\"\n",
|
||||
" \n",
|
||||
" m,n = X.shape\n",
|
||||
" ### START CODE HERE ### \n",
|
||||
"\n",
|
||||
" ### END CODE HERE ### \n",
|
||||
" \n",
|
||||
" return total_cost"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Compute and display cost using our pre-chosen optimal parameters. \n",
|
||||
"# cost should be nearly zero\n",
|
||||
"\n",
|
||||
"cost = compute_cost(X_train, y_train, w_init, verbose = True)\n",
|
||||
"print(f'Cost at optimal w : {cost:.3f}')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<details>\n",
|
||||
"<summary>\n",
|
||||
" <b>**Expected Output**:</b>\n",
|
||||
"</summary>\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"f_w:\n",
|
||||
"[[459.99999762]\n",
|
||||
" [231.99999837]\n",
|
||||
" [177.99999899]]\n",
|
||||
"Cost at optimal w : 0.000\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Matrix/Vector Operation Review\n",
|
||||
"Here is a small example to show you how to apply element-wise operations on numpy arrays."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# make a matrix A with 2 rows and 2 columns\n",
|
||||
"tmp_A = np.array([[1,1],[1,1]])\n",
|
||||
"print(f\"matrix A has {tmp_A.shape[0]} rows and {tmp_A.shape[1]} columns\")\n",
|
||||
"print(tmp_A)\n",
|
||||
"print()\n",
|
||||
"\n",
|
||||
"# make a column vector B with 2 rows and 1 column\n",
|
||||
"tmp_b = np.array([[2],[1]])\n",
|
||||
"print(f\"Vector b has {tmp_b.shape[0]} rows and {tmp_b.shape[1]} column\")\n",
|
||||
"print(tmp_b)\n",
|
||||
"print()\n",
|
||||
"# perform matrix multiplication A x b (2,2)(2,1)\n",
|
||||
"tmp_A_times_b = np.dot(tmp_A,tmp_b)\n",
|
||||
"print(\"Multiply A times b\")\n",
|
||||
"print(tmp_A_times_b)\n",
|
||||
"print(f\"The product has {tmp_A_times_b.shape[0]} rows and {tmp_A_times_b.shape[1]} columns\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# make a matrix A with 2 rows and 2 columns\n",
|
||||
"tmp_A = np.array([[1,1],[1,1]])\n",
|
||||
"print(f\"matrix A has {tmp_A.shape[0]} rows and {tmp_A.shape[1]} columns\")\n",
|
||||
"print(tmp_A)\n",
|
||||
"print()\n",
|
||||
"\n",
|
||||
"# make a column vector B with 2 rows and 1 column\n",
|
||||
"tmp_b = np.array([[2],[1]])\n",
|
||||
"print(f\"Vector b has {tmp_b.shape[0]} rows and {tmp_b.shape[1]} column\")\n",
|
||||
"print(tmp_b)\n",
|
||||
"print()\n",
|
||||
"\n",
|
||||
"# Try to perform matrix multiplication b x A, (2,1)(2,2)\n",
|
||||
"try:\n",
|
||||
" tmp_b_times_A = np.dot(tmp_b,tmp_A)\n",
|
||||
"except Exception as e:\n",
|
||||
" print(\"The error message you'll see is:\")\n",
|
||||
" print(e)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The message says that it's checking:\n",
|
||||
" - The number of columns of the left matrix `b`, or `dim 1` is 1.\n",
|
||||
" - The number of rows on the right matrix `dim 0`, is 2.\n",
|
||||
" - 1 does not equal 2\n",
|
||||
" - So the two matrices cannot be multiplied together."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Create two sample column vectors\n",
|
||||
"tmp_c = np.array([[1],[2],[3]])\n",
|
||||
"print(\"Create a column vector c with 3 rows and 1 column\")\n",
|
||||
"print(tmp_c)\n",
|
||||
"print()\n",
|
||||
"\n",
|
||||
"tmp_d = np.array([[2],[2],[2]])\n",
|
||||
"print(\"Create a column vector c with 3 rows and 1 column\")\n",
|
||||
"print(tmp_d)\n",
|
||||
"print()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You can apply `+, -, *, /` operators on two vectors of the same length."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Take the element-wise multiplication of two vectors\n",
|
||||
"tmp_mult = tmp_c * tmp_d\n",
|
||||
"print(\"Take the element-wise multiplication between vectors c and d\")\n",
|
||||
"print(tmp_mult)\n",
|
||||
"print()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You can use `numpy.square` to apply the element-wise square of a vector\n",
|
||||
"- Note, `**2` will also work."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Take the element-wise square of vector c\n",
|
||||
"tmp_square = np.square(tmp_c)\n",
|
||||
"tmp_square_option_2 = tmp_c**2\n",
|
||||
"print(\"Take the element-wise square of vector c\")\n",
|
||||
"print(tmp_square)\n",
|
||||
"print()\n",
|
||||
"print(\"Another way to get the element-wise square of vector c\")\n",
|
||||
"print(tmp_square_option_2)\n",
|
||||
"print()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You can use `numpy.sum` to add up all the elements of a vector (or matrix)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Take the sum of all elements in vector d\n",
|
||||
"tmp_sum = np.sum(tmp_d)\n",
|
||||
"print(\"Vector d\")\n",
|
||||
"print(tmp_d)\n",
|
||||
"print()\n",
|
||||
"print(\"Take the sum of all the elements in vector d\")\n",
|
||||
"print(tmp_sum)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.8.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
@ -0,0 +1,222 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Optional Lab: Linear Regression using Scikit-Learn"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"There is an open-source, commercially usable machine learning toolkit called [scikit-learn](https://scikit-learn.org/stable/index.html). This toolkit contains implementations of many of the algorithms that you will work with in this course.\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Goals\n",
|
||||
"In this lab you will:\n",
|
||||
"- Utilize scikit-learn to implement linear regression using Gradient Descent"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Tools\n",
|
||||
"You will utilize functions from scikit-learn as well as matplotlib and NumPy. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import numpy as np\n",
|
||||
"np.set_printoptions(precision=2)\n",
|
||||
"from sklearn.linear_model import LinearRegression, SGDRegressor\n",
|
||||
"from sklearn.preprocessing import StandardScaler\n",
|
||||
"from lab_utils_multi import load_house_data\n",
|
||||
"import matplotlib.pyplot as plt\n",
|
||||
"dlblue = '#0096ff'; dlorange = '#FF9300'; dldarkred='#C00000'; dlmagenta='#FF40FF'; dlpurple='#7030A0'; \n",
|
||||
"plt.style.use('./deeplearning.mplstyle')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Gradient Descent\n",
|
||||
"Scikit-learn has a gradient descent regression model [sklearn.linear_model.SGDRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDRegressor.html#examples-using-sklearn-linear-model-sgdregressor). Like your previous implementation of gradient descent, this model performs best with normalized inputs. [sklearn.preprocessing.StandardScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html#sklearn.preprocessing.StandardScaler) will perform z-score normalization as in a previous lab. Here it is referred to as 'standard score'."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Load the data set"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"X_train, y_train = load_house_data()\n",
|
||||
"X_features = ['size(sqft)','bedrooms','floors','age']"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Scale/normalize the training data"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"scaler = StandardScaler()\n",
|
||||
"X_norm = scaler.fit_transform(X_train)\n",
|
||||
"print(f\"Peak to Peak range by column in Raw X:{np.ptp(X_train,axis=0)}\") \n",
|
||||
"print(f\"Peak to Peak range by column in Normalized X:{np.ptp(X_norm,axis=0)}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Create and fit the regression model"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"sgdr = SGDRegressor(max_iter=1000)\n",
|
||||
"sgdr.fit(X_norm, y_train)\n",
|
||||
"print(sgdr)\n",
|
||||
"print(f\"number of iterations completed: {sgdr.n_iter_}, number of weight updates: {sgdr.t_}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### View parameters\n",
|
||||
"Note, the parameters are associated with the *normalized* input data. The fit parameters are very close to those found in the previous lab with this data."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"b_norm = sgdr.intercept_\n",
|
||||
"w_norm = sgdr.coef_\n",
|
||||
"print(f\"model parameters: w: {w_norm}, b:{b_norm}\")\n",
|
||||
"print(f\"model parameters from previous lab: w: [110.56 -21.27 -32.71 -37.97], b: 363.16\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Make predictions\n",
|
||||
"Predict the targets of the training data. Use both the `predict` routine and compute using $w$ and $b$."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# make a prediction using sgdr.predict()\n",
|
||||
"y_pred_sgd = sgdr.predict(X_norm)\n",
|
||||
"# make a prediction using w,b. \n",
|
||||
"y_pred = np.dot(X_norm, w_norm) + b_norm \n",
|
||||
"print(f\"prediction using np.dot() and sgdr.predict match: {(y_pred == y_pred_sgd).all()}\")\n",
|
||||
"\n",
|
||||
"print(f\"Prediction on training set:\\n{y_pred[:4]}\" )\n",
|
||||
"print(f\"Target values \\n{y_train[:4]}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Plot Results\n",
|
||||
"Let's plot the predictions versus the target values."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# plot predictions and targets vs original features \n",
|
||||
"fig,ax=plt.subplots(1,4,figsize=(12,3),sharey=True)\n",
|
||||
"for i in range(len(ax)):\n",
|
||||
" ax[i].scatter(X_train[:,i],y_train, label = 'target')\n",
|
||||
" ax[i].set_xlabel(X_features[i])\n",
|
||||
" ax[i].scatter(X_train[:,i],y_pred,color=dlorange, label = 'predict')\n",
|
||||
"ax[0].set_ylabel(\"Price\"); ax[0].legend();\n",
|
||||
"fig.suptitle(\"target versus prediction using z-score normalized model\")\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Congratulations!\n",
|
||||
"In this lab you:\n",
|
||||
"- utilized an open-source machine learning toolkit, scikit-learn\n",
|
||||
"- implemented linear regression using gradient descent and feature normalization from that toolkit"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.8.10"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
@ -0,0 +1,945 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Ungraded Lab - Multiple Variable Gradient Descent\n",
|
||||
"\n",
|
||||
"In this ungraded lab, you will extend gradient descent to support multiple features. You will utilize mean normalization and alpha tuning to improve performance. You will also utilize a popular python numeric library, NumPy to efficiently store and manipulate data. For detailed descriptions and examples of routines used, see [Numpy Documentation](https://numpy.org/doc/stable/reference/)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Outline\n",
|
||||
"\n",
|
||||
"- [Exercise 01- Compute Gradient](#first)\n",
|
||||
"- [Exercise 02- Gradient Descent](#second)\n",
|
||||
"- [Exercise 03- Mean Normalization](#third)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import numpy as np\n",
|
||||
"import matplotlib.pyplot as plt\n",
|
||||
"import copy\n",
|
||||
"import math"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### 2.0 Problem Statement\n",
|
||||
"\n",
|
||||
"As in the previous two labs, you will use the motivating example of housing price prediction. The training dataset contains three examples with 4 features (size,bedrooms,floors and age) shown in the table below.\n",
|
||||
"\n",
|
||||
"We would like to build a linear regression model using these values so we can then predict the price for other houses - say, a house with 1200 feet$^2$, 3 bedrooms, 1 floor, 40 years old. \n",
|
||||
"\n",
|
||||
"### 2.1 Dataset: \n",
|
||||
"| Size (feet$^2$) | Number of Bedrooms | Number of floors | Age of Home | Price (1000s dollars) | \n",
|
||||
"| ----------------| ------------------- |----------------- |--------------|-------------- | \n",
|
||||
"| 2104 | 5 | 1 | 45 | 460 | \n",
|
||||
"| 1416 | 3 | 2 | 40 | 232 | \n",
|
||||
"| 852 | 2 | 1 | 35 | 178 | \n",
|
||||
"\n",
|
||||
"\n",
|
||||
"The lectures and equations describe $\\mathbf{X}$, $\\mathbf{y}$, $\\mathbf{w}$. In our code these are represented by variables:\n",
|
||||
"- `X_orig` represents input variables, also called input features. In previous labs, there was just one feature, now there are four. `X_train` is the data set extended with a column of ones.\n",
|
||||
"- `y_train` represents output variables, also known as target variables (in this case - Price (1000s of dollars)). \n",
|
||||
"- `w_init` represents our parameters. \n",
|
||||
"- `dw` represents our gradient. A naming convention we will use in code when referring to gradients is to infer the dJ(w) and name variables for the parameter. For example, $\\frac{\\partial J(\\mathbf{w})}{\\partial w_0}$ might be `dw0`. `dw` is the gradient vector.\n",
|
||||
"- `tmp_` is prepended to some global variable names to prevent naming conflicts.\n",
|
||||
"\n",
|
||||
"We will pick up where we left off in the last notebook. Run the following to initialize our variables."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Load data set\n",
|
||||
"X_orig = np.array([[2104,5,1,45], [1416,3,2,40], [852,2,1,35]])\n",
|
||||
"y_train = np.array([460,232,178]).reshape(-1,1) #reshape creates (m,1) matrix\n",
|
||||
"\n",
|
||||
"#extend X_orig with column of ones\n",
|
||||
"tmp_ones = np.ones((3,1), dtype=np.int64) #dtype just added to keep examples neat.. not required\n",
|
||||
"X_train = np.concatenate([tmp_ones, X_orig], axis=1)\n",
|
||||
"\n",
|
||||
"# initialize parameters to near optimal value for development\n",
|
||||
"w_init = np.array([ 785.1811367994083, 0.39133535, 18.75376741, \n",
|
||||
" -53.36032453, -26.42131618]).reshape(-1,1)\n",
|
||||
"print(f\"X shape: {X_train.shape}, w_shape: {w_init.shape}, y_shape: {y_train.shape}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Gradient Descent Review\n",
|
||||
"In lecture, gradient descent was described as:\n",
|
||||
"\n",
|
||||
"$$\\begin{align*}& \\text{repeat until convergence:} \\; \\lbrace \\newline \\; & w_j := w_j - \\alpha \\frac{\\partial J(\\mathbf{w})}{\\partial w_j} \\tag{1} \\; & \\text{for j := 0..n}\\newline & \\rbrace\\end{align*}$$\n",
|
||||
"where, parameters $w_j$ are all updated simultaniously and where \n",
|
||||
"$$\n",
|
||||
"\\frac{\\partial J(\\mathbf{w})}{\\partial w_j} := \\frac{1}{m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w}}(\\mathbf{x}^{(i)}) - \\mathbf{y}^{(i)})x_{j}^{(i)} \\tag{2}\n",
|
||||
"$$\n",
|
||||
"where \n",
|
||||
"$$ f_{\\mathbf{w}}(\\mathbf{x}) = w_0 + w_1x_1 + ... + w_nx_n \\tag{3}$$"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name='first'></a>\n",
|
||||
"## Exercise 1\n",
|
||||
"We will implement a batch gradient descent algorithm for multiple variables. We'll need three functions. \n",
|
||||
"- compute_gradient implementing equation (2) above\n",
|
||||
" - **we will do two versions** of this, one using loops, the other using linear algebra\n",
|
||||
"- compute_cost.\n",
|
||||
"- gradient_descent, utilizing compute_gradient and compute_cost, runs the iterative algorithm to find the parameters with the lowest cost."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### compute_gradient using looping\n",
|
||||
"Please extend the algorithm developed in Lab3 to support multiple variables and use NumPy. Implement equation (2) above for all $w_j$."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<details>\n",
|
||||
"<summary>\n",
|
||||
" <font size='3', color='darkgreen'><b>Hints</b></font>\n",
|
||||
"</summary>\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"def compute_gradient(X, y, w): \n",
|
||||
" \"\"\"\n",
|
||||
" Computes the gradient for linear regression \n",
|
||||
" \n",
|
||||
" Args:\n",
|
||||
" X : (array_like Shape (m,)) variable such as house size \n",
|
||||
" y : (array_like Shape (m,)) actual value \n",
|
||||
" w : (array_like Shape (2,)) Initial values of parameters of the model \n",
|
||||
" Returns\n",
|
||||
" dw: (array_like Shape (2,)) The gradient of the cost w.r.t. the parameters w. \n",
|
||||
" Note that dw has the same dimensions as w.\n",
|
||||
" \"\"\"\n",
|
||||
" m,n = X.shape\n",
|
||||
" dw = np.zeros((n,1))\n",
|
||||
" ### START CODE HERE ### \n",
|
||||
" for j in range(n):\n",
|
||||
" for i in range(m):\n",
|
||||
" f_w = 0\n",
|
||||
" for k in range(n):\n",
|
||||
" f_w = f_w + w[k]*X[i][k]\n",
|
||||
" dw[j] = dw[j] + (f_w-y[i])*X[i][j] \n",
|
||||
" dw[j] = dw[j]/m\n",
|
||||
" ### END CODE HERE ### \n",
|
||||
" \n",
|
||||
" return dw\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def compute_gradient(X, y, w): \n",
|
||||
" \"\"\"\n",
|
||||
" Computes the gradient for linear regression \n",
|
||||
" \n",
|
||||
" Args:\n",
|
||||
" X : (array_like Shape (m,)) variable such as house size \n",
|
||||
" y : (array_like Shape (m,)) actual value \n",
|
||||
" w : (array_like Shape (2,)) Initial values of parameters of the model \n",
|
||||
" Returns\n",
|
||||
" dw: (array_like Shape (2,)) The gradient of the cost w.r.t. the parameters w. \n",
|
||||
" Note that dw has the same dimensions as w.\n",
|
||||
" \"\"\"\n",
|
||||
" m,n = X.shape\n",
|
||||
" dw = np.zeros((n,1))\n",
|
||||
" ### START CODE HERE ### \n",
|
||||
"\n",
|
||||
" ### END CODE HERE ### \n",
|
||||
" \n",
|
||||
" return dw"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#Compute and display gradient \n",
|
||||
"initial_w = w_init\n",
|
||||
"grad = compute_gradient(X_train, y_train, initial_w)\n",
|
||||
"print('Gradient at initial w :\\n', grad)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<details>\n",
|
||||
"<summary>\n",
|
||||
" <b>**Expected Output**:</b>\n",
|
||||
"</summary>\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"Gradient at initial w :\n",
|
||||
" [[-1.67392519e-06]\n",
|
||||
" [-2.72623590e-03]\n",
|
||||
" [-6.27197293e-06]\n",
|
||||
" [-2.21745582e-06]\n",
|
||||
" [-6.92403412e-05]]\n",
|
||||
"\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Compute Gradient using Matrices\n",
|
||||
"In this section, we will implement the gradient calculation using matrices and vectors. _If you are familiar with linear algebra, you may want to skip the explanation and try it yourself first_.\n",
|
||||
"When dealing with multi-step matrix calculations, its helpful to do 'dimensional analysis'. The diagram below details the operations involved in calculating the gradient and the dimensions of the matrices involved."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Prediction: $\\mathbf{f}_{\\mathbf{w}}(\\mathbf{X})$\n",
|
||||
"- This is the model's prediction for _all examples_. As in previous labs, this calculated : $\\mathbf{f}_{\\mathbf{w}}(\\mathbf{X}) = \\mathbf{X}\\mathbf{w}$"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"tmp_f_w = X_train @ w_init\n",
|
||||
"print(f\"The model prediction for our training set is:\")\n",
|
||||
"print(tmp_f_w)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Error, e: $\\mathbf{f}_{\\mathbf{w}}(\\mathbf{X}) - \\mathbf{y}$\n",
|
||||
" - This is the difference between the model prediction and the actual value of y for all training examples.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"tmp_e = tmp_f_w - y_train\n",
|
||||
"print(\"Error\")\n",
|
||||
"print(tmp_e)\n",
|
||||
"print(f\"Error shape: {tmp_e.shape}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Gradient: $\\nabla_{\\mathbf{w}}\\mathbf{J}$\n",
|
||||
"- $\\nabla_{\\mathbf{w}}\\mathbf{J}$ is the gradient of $\\mathbf{J}$ with respect to $w$ in matrix form. The upside down triagle $\\nabla$ is the symbol for graident. More simply, the result of equation 4 above for all parameters $\\mathbf{w}$\n",
|
||||
"- $\\nabla_{\\mathbf{w}}\\mathbf{J} := \\frac{1}{m}(\\mathbf{X}^T \\mathbf{e} )$\n",
|
||||
"- Each element of this vector describes how the cost $\\mathbf{J}(\\mathbf{w})$ changes with respect to one parameter, $w_j$. For example, first element describes how the cost change relative to $w_0$. We will use this to determine if we should increase or decrease the parameter to decrease the cost."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"tmp_m,_ = X_train.shape\n",
|
||||
"tmp_dw = (1/tmp_m) * (X_train.T @ tmp_e) \n",
|
||||
"print(\"gradient\")\n",
|
||||
"print(tmp_dw)\n",
|
||||
"print(f\"gradient shape: {tmp_dw.shape}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Utilize the equations above to implement `compute_gradient_m`, the matrix version."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<details>\n",
|
||||
"<summary>\n",
|
||||
" <font size='3', color='darkgreen'><b>Hints</b></font>\n",
|
||||
"</summary>\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"def compute_gradient_m(X, y, w): \n",
|
||||
" \"\"\"\n",
|
||||
" Computes the gradient for linear regression \n",
|
||||
" \n",
|
||||
" Args:\n",
|
||||
" X : (array_like Shape (m,)) variable such as house size \n",
|
||||
" y : (array_like Shape (m,)) actual value \n",
|
||||
" w : (array_like Shape (2,)) Initial values of parameters of the model \n",
|
||||
" Returns\n",
|
||||
" dw: (array_like Shape (2,)) The gradient of the cost w.r.t. the parameters w. \n",
|
||||
" Note that dw has the same dimensions as w.\n",
|
||||
" \"\"\"\n",
|
||||
" m,n = X.shape\n",
|
||||
" ### START CODE HERE ### \n",
|
||||
" f_w = X @ w\n",
|
||||
" e = f_w - y\n",
|
||||
" dw = (1/m) * (X.T @ e)\n",
|
||||
" ### END CODE HERE ### \n",
|
||||
" \n",
|
||||
" return dw\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def compute_gradient_m(X, y, w): \n",
|
||||
" \"\"\"\n",
|
||||
" Computes the gradient for linear regression \n",
|
||||
" \n",
|
||||
" Args:\n",
|
||||
" X : (array_like Shape (m,)) variable such as house size \n",
|
||||
" y : (array_like Shape (m,)) actual value \n",
|
||||
" w : (array_like Shape (2,)) Initial values of parameters of the model \n",
|
||||
" Returns\n",
|
||||
" dw: (array_like Shape (2,)) The gradient of the cost w.r.t. the parameters w. \n",
|
||||
" Note that dw has the same dimensions as w.\n",
|
||||
" \"\"\"\n",
|
||||
" m,n = X.shape\n",
|
||||
" ### START CODE HERE ### \n",
|
||||
"\n",
|
||||
" ### END CODE HERE ### \n",
|
||||
" \n",
|
||||
" return dw"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#Compute and display gradient USING compute_gradeint_m version\n",
|
||||
"initial_w = w_init\n",
|
||||
"grad = compute_gradient_m(X_train, y_train, initial_w)\n",
|
||||
"print('Gradient at initial w :\\n', grad)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<details>\n",
|
||||
"<summary>\n",
|
||||
" <b>**Expected Output**:</b>\n",
|
||||
"</summary>\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"Gradient at initial w :\n",
|
||||
" [[-1.67392519e-06]\n",
|
||||
" [-2.72623590e-03]\n",
|
||||
" [-6.27197293e-06]\n",
|
||||
" [-2.21745582e-06]\n",
|
||||
" [-6.92403412e-05]]\n",
|
||||
"\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Learning parameters using batch gradient descent \n",
|
||||
"\n",
|
||||
"You will now find the optimal parameters of a linear regression model by implementing batch gradient descent. You can use Lab3 as a guide. \n",
|
||||
"\n",
|
||||
"- A good way to verify that gradient descent is working correctly is to look\n",
|
||||
"at the value of $J(\\mathbf{w})$ and check that it is decreasing with each step. \n",
|
||||
"\n",
|
||||
"- Assuming you have implemented the gradient and computed the cost correctly, your value of $J(\\mathbf{w})$ should never increase, and should converge to a steady value by the end of the algorithm."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# provide routine to compute cost from Lab5\n",
|
||||
"def compute_cost(X, y, w, verbose=False):\n",
|
||||
" m,n = X.shape\n",
|
||||
" f_w = X @ w \n",
|
||||
" total_cost = (1/(2*m)) * np.sum((f_w-y)**2)\n",
|
||||
" return total_cost "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name='second'></a>\n",
|
||||
"## Exercise 2 Implement gradient_descent:\n",
|
||||
"- Looping `num_iters` number of times\n",
|
||||
" - calculate the gradient\n",
|
||||
" - update the parameters using equation (1) above\n",
|
||||
"return the updated parameters"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<details>\n",
|
||||
"<summary>\n",
|
||||
" <font size='3', color='darkgreen'><b>Hints</b></font>\n",
|
||||
"</summary>\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"def gradient_descent(X, y, w_in, cost_function, gradient_function, alpha, num_iters): \n",
|
||||
" \"\"\"\n",
|
||||
" Performs batch gradient descent to learn theta. Updates theta by taking \n",
|
||||
" num_iters gradient steps with learning rate alpha\n",
|
||||
" \n",
|
||||
" Args:\n",
|
||||
" X : (array_like Shape (m,)\n",
|
||||
" y : (array_like Shape (m,) )\n",
|
||||
" w_in : (array_like Shape (2,)) Initial values of parameters of the model\n",
|
||||
" cost_function: function to compute cost\n",
|
||||
" gradient_function: function to compute the gradient\n",
|
||||
" alpha : (float) Learning rate\n",
|
||||
" num_iters : (int) number of iterations to run gradient descent\n",
|
||||
" Returns\n",
|
||||
" w : (array_like Shape (2,)) Updated values of parameters of the model after\n",
|
||||
" running gradient descent\n",
|
||||
" \"\"\"\n",
|
||||
" \n",
|
||||
" # number of training examples\n",
|
||||
" m = len(X)\n",
|
||||
" \n",
|
||||
" # An array to store cost J and w's at each iteration primarily for graphing later\n",
|
||||
" J_history = []\n",
|
||||
" w_history = []\n",
|
||||
" w = copy.deepcopy(w_in) #avoid modifying global w within function\n",
|
||||
" \n",
|
||||
" for i in range(num_iters):\n",
|
||||
" \n",
|
||||
" # Calculate the gradient and update the parameters\n",
|
||||
" gradient = gradient_function(X, y, w)\n",
|
||||
"\n",
|
||||
" # Update Parameters \n",
|
||||
" w = w - alpha * gradient\n",
|
||||
" \n",
|
||||
" # Save cost J at each iteration\n",
|
||||
" if i<100000: # prevent resource exhaustion \n",
|
||||
" J_history.append( compute_cost(X, y, w))\n",
|
||||
"\n",
|
||||
" # Print cost every at intervals 10 times or as many iterations if < 10\n",
|
||||
" if i% math.ceil(num_iters/10) == 0:\n",
|
||||
" w_history.append(w)\n",
|
||||
" print(f\"Iteration {i:4}: Cost {J_history[-1]:8.2f} \")\n",
|
||||
" \n",
|
||||
" return w, J_history, w_history #return w and J,w history for graphing\n",
|
||||
" ```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def gradient_descent(X, y, w_in, cost_function, gradient_function, alpha, num_iters): \n",
|
||||
" \"\"\"\n",
|
||||
" Performs batch gradient descent to learn theta. Updates theta by taking \n",
|
||||
" num_iters gradient steps with learning rate alpha\n",
|
||||
" \n",
|
||||
" Args:\n",
|
||||
" X : (array_like Shape (m,)\n",
|
||||
" y : (array_like Shape (m,) )\n",
|
||||
" w_in : (array_like Shape (2,)) Initial values of parameters of the model\n",
|
||||
" cost_function: function to compute cost\n",
|
||||
" gradient_function: function to compute the gradient\n",
|
||||
" alpha : (float) Learning rate\n",
|
||||
" num_iters : (int) number of iterations to run gradient descent\n",
|
||||
" Returns\n",
|
||||
" w : (array_like Shape (2,)) Updated values of parameters of the model after\n",
|
||||
" running gradient descent\n",
|
||||
" \"\"\"\n",
|
||||
" \n",
|
||||
" # number of training examples\n",
|
||||
" m = len(X)\n",
|
||||
" \n",
|
||||
" # An array to store cost J and w's at each iteration primarily for graphing later\n",
|
||||
" J_history = []\n",
|
||||
" w_history = []\n",
|
||||
" w = copy.deepcopy(w_in) #avoid modifying global w within function\n",
|
||||
" \n",
|
||||
" for i in range(num_iters):\n",
|
||||
" \n",
|
||||
" ### START CODE HERE ### \n",
|
||||
"\n",
|
||||
" # Calculate the gradient and update the parameters\n",
|
||||
"\n",
|
||||
"\n",
|
||||
" # Update Parameters \n",
|
||||
"\n",
|
||||
"\n",
|
||||
" ### END CODE HERE ### \n",
|
||||
" \n",
|
||||
" # Save cost J at each iteration\n",
|
||||
" if i<100000: # prevent resource exhaustion \n",
|
||||
" J_history.append( compute_cost(X, y, w))\n",
|
||||
"\n",
|
||||
" # Print cost every at intervals 10 times or as many iterations if < 10\n",
|
||||
" if i% math.ceil(num_iters/10) == 0:\n",
|
||||
" w_history.append(w)\n",
|
||||
" print(f\"Iteration {i:4}: Cost {J_history[-1]:8.2f} \")\n",
|
||||
" \n",
|
||||
" return w, J_history, w_history #return w and J,w history for graphing"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In the next cell we will test your implementation. Be sure to select your preferred compute_gradient function."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# initialize parameters\n",
|
||||
"initial_w = np.zeros_like(w_init) \n",
|
||||
"# some gradient descent settings\n",
|
||||
"iterations = 1000\n",
|
||||
"alpha = 5.0e-7\n",
|
||||
"# run gradient descent - CHOOSE WHICH COMPUTE_GRADIENT TO RUN\n",
|
||||
"w_final, J_hist, w_hist = gradient_descent(X_train ,y_train, initial_w, compute_cost, \n",
|
||||
" compute_gradient, alpha, iterations)\n",
|
||||
"#w_final, J_hist, w_hist = gradient_descent(X_train ,y_train, initial_w, compute_cost, \n",
|
||||
"# compute_gradient_m, alpha, iterations)\n",
|
||||
"print(f\"w found by gradient descent: \")\n",
|
||||
"print(w_final)\n",
|
||||
"print(f\"predictions on training set\")\n",
|
||||
"print(X_train @ w_final)\n",
|
||||
"print(f\"actual values y_train \")\n",
|
||||
"print(y_train)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<details>\n",
|
||||
"<summary>\n",
|
||||
" <b>**Expected Output**:</b>\n",
|
||||
"</summary>\n",
|
||||
"\n",
|
||||
" ```\n",
|
||||
"Iteration 0: Cost 2529.46 \n",
|
||||
"Iteration 100: Cost 695.99 \n",
|
||||
"Iteration 200: Cost 694.92 \n",
|
||||
"Iteration 300: Cost 693.86 \n",
|
||||
"Iteration 400: Cost 692.81 \n",
|
||||
"Iteration 500: Cost 691.77 \n",
|
||||
"Iteration 600: Cost 690.73 \n",
|
||||
"Iteration 700: Cost 689.71 \n",
|
||||
"Iteration 800: Cost 688.70 \n",
|
||||
"Iteration 900: Cost 687.69 \n",
|
||||
"w found by gradient descent: \n",
|
||||
"[[-0.00223541]\n",
|
||||
" [ 0.20396569]\n",
|
||||
" [ 0.00374919]\n",
|
||||
" [-0.0112487 ]\n",
|
||||
" [-0.0658614 ]]\n",
|
||||
"predictions on training set\n",
|
||||
"[[426.18530497]\n",
|
||||
" [286.16747201]\n",
|
||||
" [171.46763087]]\n",
|
||||
"actual values y_train \n",
|
||||
"[[460]\n",
|
||||
" [232]\n",
|
||||
" [178]]\n",
|
||||
"\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# plot cost vs iteration \n",
|
||||
"plt.plot(J_hist)\n",
|
||||
"plt.title(\"Cost vs iteration\")\n",
|
||||
"plt.ylabel('Cost')\n",
|
||||
"plt.xlabel('iteration step')\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"*These results are not inspiring*! As in Lab 3, we have run into a situation where the mismatch in scaling between our features makes it difficult to converge. The next section will help."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Feature Scaling or Mean Normalization\n",
|
||||
"\n",
|
||||
"We can speed up gradient descent by having each of our input values in roughly the same range. This is because the speed $\\mathbf{w}$ changes depends of the range of the input features. In our example, we have the sqft feature which is 3 orders of magnitude larger than the number of bedroom features. This doesn't allow a single alpha value to be set appropriately for all features. The way to prevent this is to modify the ranges of our input variables so that they are all roughly the same. Ideally around: \n",
|
||||
"$$ -1 <= x_{(i)} <= 1 \\;\\; or \\;\\; -0.5 <= x_{(i)} <= 0.5 $$\n",
|
||||
"\n",
|
||||
"Two techniques to help with this are feature scaling and mean normalization. \n",
|
||||
"**Feature scaling** involves dividing the input values by the range (i.e. the maximum value minus the minimum value) of the input variable, resulting in a new range of just 1. \n",
|
||||
"**Mean normalization** involves subtracting the average value for an input variable from the values for that input variable resulting in a new average value for the input variable of just zero. \n",
|
||||
"In this lab we will implement _mean normalization_.\n",
|
||||
"\n",
|
||||
"To implement mean normalization, adjust your input values as shown in this formula:\n",
|
||||
"$$x_i := \\dfrac{x_i - \\mu_i}{\\sigma_i} \\tag{4}$$ \n",
|
||||
"where $i$ selects a feature or a column in our X matrix. $µ_i$ is the average of all the values for feature (i) and $\\sigma_i$ is the standard deviation over feature (i).\n",
|
||||
"\n",
|
||||
"_Usage details_: Once a model is trained with scaled features, all inputs to predictions using that model will also need to be scaled. The model targets, `y_train`, are not scaled. The resulting parameters `w` will naturally be different than those in the unscaled model. \n",
|
||||
"Clearly you don't want to scale the $x_0$ values which we have set to one. We will scale the original data and then add a column of ones.\n",
|
||||
"\n",
|
||||
"<a name='third'></a>\n",
|
||||
"### Exercise 3 Mean Normalization\n",
|
||||
"Write a function that will accept our training data and return a mean normalized version by implementing equation (4) above. You may want to use `np.mean()`, `np.std()`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<details>\n",
|
||||
"<summary>\n",
|
||||
" <font size='3', color='darkgreen'><b>Hints</b></font>\n",
|
||||
"</summary>\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
" def mean_normalize_features(X):\n",
|
||||
" \"\"\"\n",
|
||||
" returns mean normalized X by column\n",
|
||||
" Args:\n",
|
||||
" X : (numpy array (m,n)) \n",
|
||||
" Returns\n",
|
||||
" X_norm: (numpy array (m,n)) input normalized by column\n",
|
||||
" \"\"\"\n",
|
||||
" mu = np.mean(X,axis=0) \n",
|
||||
" sigma = np.std(X,axis=0)\n",
|
||||
" X_norm = (X - mu)/sigma # fancy numpy broadcasting makes these look easy\n",
|
||||
" return(X_norm)\n",
|
||||
"\n",
|
||||
"#check our work\n",
|
||||
"#from sklearn.preprocessing import scale\n",
|
||||
"#scale(X_orig, axis=0, with_mean=True, with_std=True, copy=True)\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def mean_normalize_features(X):\n",
|
||||
" \"\"\"\n",
|
||||
" returns mean normalized X by column\n",
|
||||
" Args:\n",
|
||||
" X : (numpy array (m,n)) \n",
|
||||
" Returns\n",
|
||||
" X_norm: (numpy array (m,n)) input normalized by column\n",
|
||||
" \"\"\"\n",
|
||||
" #~ 3 lines if implemented using matrices\n",
|
||||
" ### START CODE HERE ### \n",
|
||||
"\n",
|
||||
" ### END CODE HERE ### \n",
|
||||
"\n",
|
||||
" return(X_norm)\n",
|
||||
" \n",
|
||||
"#check our work\n",
|
||||
"#from sklearn.preprocessing import scale\n",
|
||||
"#scale(X_orig, axis=0, with_mean=True, with_std=True, copy=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print(\"Original data:\")\n",
|
||||
"print(X_orig)\n",
|
||||
"print(\"normalized data\")\n",
|
||||
"print(mean_normalize_features(X_orig))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<details>\n",
|
||||
"<summary>\n",
|
||||
" <b>**Expected Output**:</b>\n",
|
||||
"</summary>\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"Original data:\n",
|
||||
"[[2104 5 1 45]\n",
|
||||
" [1416 3 2 40]\n",
|
||||
" [ 852 2 1 35]]\n",
|
||||
"normalized data\n",
|
||||
"[[ 1.26311506 1.33630621 -0.70710678 1.22474487]\n",
|
||||
" [-0.08073519 -0.26726124 1.41421356 0. ]\n",
|
||||
" [-1.18237987 -1.06904497 -0.70710678 -1.22474487]]\n",
|
||||
"```\n",
|
||||
"Note the values in each normalized column."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's now normalize our original data and re-run our gradient descent algorithm."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# normalize the original features\n",
|
||||
"X_norm = mean_normalize_features(X_orig)\n",
|
||||
"\n",
|
||||
"# add the column of ones and create scaled training set\n",
|
||||
"tmp_ones = np.ones((3,1), dtype=np.int64) #dtype just added to keep examples neat.. not required\n",
|
||||
"X_train_s = np.concatenate([tmp_ones, X_norm], axis=1)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Note the **vastly larger value of alpha**. This will speed descent."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# initialize parameters\n",
|
||||
"initial_w = np.zeros_like(w_init) \n",
|
||||
"# some gradient descent settings\n",
|
||||
"iterations = 1000\n",
|
||||
"alpha = 1.0e-2\n",
|
||||
"# run gradient descent\n",
|
||||
"w_final, J_hist, w_hist = gradient_descent(X_train_s ,y_train, initial_w, \n",
|
||||
" compute_cost, compute_gradient_m, alpha, iterations)\n",
|
||||
"print(f\"w found by gradient descent: \")\n",
|
||||
"print(w_final)\n",
|
||||
"print(f\"predictions on training set\")\n",
|
||||
"print(X_train_s @ w_final)\n",
|
||||
"print(f\"actual values y_train \")\n",
|
||||
"print(y_train)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<details>\n",
|
||||
"<summary>\n",
|
||||
" <b>**Expected Output**:</b>\n",
|
||||
"</summary>\n",
|
||||
" \n",
|
||||
"```\n",
|
||||
"Iteration 0: Cost 48254.77 \n",
|
||||
"Iteration 100: Cost 5582.45 \n",
|
||||
"Iteration 200: Cost 745.80 \n",
|
||||
"Iteration 300: Cost 99.90 \n",
|
||||
"Iteration 400: Cost 13.38 \n",
|
||||
"Iteration 500: Cost 1.79 \n",
|
||||
"Iteration 600: Cost 0.24 \n",
|
||||
"Iteration 700: Cost 0.03 \n",
|
||||
"Iteration 800: Cost 0.00 \n",
|
||||
"Iteration 900: Cost 0.00 \n",
|
||||
"w found by gradient descent: \n",
|
||||
"[[289.98748034]\n",
|
||||
" [ 38.05168398]\n",
|
||||
" [ 41.54320558]\n",
|
||||
" [-30.98791712]\n",
|
||||
" [ 36.34190238]]\n",
|
||||
"predictions on training set\n",
|
||||
"[[459.98690403]\n",
|
||||
" [231.98894904]\n",
|
||||
" [177.98658794]]\n",
|
||||
"actual values y_train \n",
|
||||
"[[460]\n",
|
||||
" [232]\n",
|
||||
" [178]]\n",
|
||||
"\n",
|
||||
"```\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The scaled features get very accurate results much faster!"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# plot cost vs iteration \n",
|
||||
"plt.plot(J_hist)\n",
|
||||
"plt.title(\"Cost vs iteration\")\n",
|
||||
"plt.ylabel('Cost')\n",
|
||||
"plt.xlabel('iteration step')\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Scale by the learning rate: $\\alpha$\n",
|
||||
"- $\\alpha$ is a positive number smaller than 1 that reduces the magnitude of the update to be smaller than the actual gradient.\n",
|
||||
"- Try varying the learning rate in the example above. Is there a value where it diverges rather than converging?\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"tmp_alpha = 0.01\n",
|
||||
"print(f\"Learning rate alpha: {tmp_alpha}\")\n",
|
||||
"\n",
|
||||
"tmp_gradient = np.array([1,2]).reshape(-1,1)\n",
|
||||
"print(\"Gradient before scaling by the learning rate:\")\n",
|
||||
"print(tmp_gradient)\n",
|
||||
"print()\n",
|
||||
"\n",
|
||||
"gradient_scaled_by_learning_rate = tmp_alpha * tmp_gradient\n",
|
||||
"print(\"Gradient after scaling by the learning rate\")\n",
|
||||
"print(gradient_scaled_by_learning_rate)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"- Subtract the gradient: $-$\n",
|
||||
" - Recall that the gradient points in the direction that would INCREASE the cost. \n",
|
||||
" - Negative one multiplied by the gradient will point in the direction that REDUCES the cost.\n",
|
||||
" - So, to update the weight in the direction that reduces the cost, subtract the gradient."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"direction_of_update = -1 * gradient_scaled_by_learning_rate\n",
|
||||
"print(\"The direction to update the parameter vector\")\n",
|
||||
"print(direction_of_update)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.8.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
@ -0,0 +1,305 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Optional Lab: Linear Regression using Scikit-Learn"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"There is an open-source, commercially usable machine learning toolkit called [scikit-learn](https://scikit-learn.org/stable/index.html). This toolkit contains implementations of many of the algorithms that you will work with in this course.\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Goals\n",
|
||||
"In this lab you will:\n",
|
||||
"- Utilize scikit-learn to implement linear regression using a close form solution based on the normal equation"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Tools\n",
|
||||
"You will utilize functions from scikit-learn as well as matplotlib and NumPy. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import numpy as np\n",
|
||||
"np.set_printoptions(precision=2)\n",
|
||||
"from sklearn.linear_model import LinearRegression, SGDRegressor\n",
|
||||
"from sklearn.preprocessing import StandardScaler\n",
|
||||
"from lab_utils_multi import load_house_data\n",
|
||||
"import matplotlib.pyplot as plt\n",
|
||||
"dlblue = '#0096ff'; dlorange = '#FF9300'; dldarkred='#C00000'; dlmagenta='#FF40FF'; dlpurple='#7030A0'; \n",
|
||||
"plt.style.use('./deeplearning.mplstyle')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40291_2\"></a>\n",
|
||||
"# Linear Regression, closed-form solution\n",
|
||||
"Scikit-learn has the [linear regression model](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression) which implements a closed-form linear regression.\n",
|
||||
"\n",
|
||||
"Let's use the data from the early labs - a house with 1000 square feet sold for \\\\$300,000 and a house with 2000 square feet sold for \\\\$500,000.\n",
|
||||
"\n",
|
||||
"| Size (1000 sqft) | Price (1000s of dollars) |\n",
|
||||
"| ----------------| ------------------------ |\n",
|
||||
"| 1 | 300 |\n",
|
||||
"| 2 | 500 |\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Load the data set"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"X_train = np.array([1.0, 2.0]) #features\n",
|
||||
"y_train = np.array([300, 500]) #target value"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Create and fit the model\n",
|
||||
"The code below performs regression using scikit-learn. \n",
|
||||
"The first step creates a regression object. \n",
|
||||
"The second step utilizes one of the methods associated with the object, `fit`. This performs regression, fitting the parameters to the input data. The toolkit expects a two-dimensional X matrix."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"LinearRegression()"
|
||||
]
|
||||
},
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"linear_model = LinearRegression()\n",
|
||||
"#X must be a 2-D Matrix\n",
|
||||
"linear_model.fit(X_train.reshape(-1, 1), y_train) "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### View Parameters \n",
|
||||
"The $\\mathbf{w}$ and $\\mathbf{b}$ parameters are referred to as 'coefficients' and 'intercept' in scikit-learn."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"w = [200.], b = 100.00\n",
|
||||
"'manual' prediction: f_wb = wx+b : [240100.]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"b = linear_model.intercept_\n",
|
||||
"w = linear_model.coef_\n",
|
||||
"print(f\"w = {w:}, b = {b:0.2f}\")\n",
|
||||
"print(f\"'manual' prediction: f_wb = wx+b : {1200*w + b}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Make Predictions\n",
|
||||
"\n",
|
||||
"Calling the `predict` function generates predictions."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Prediction on training set: [300. 500.]\n",
|
||||
"Prediction for 1200 sqft house: $240100.00\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"y_pred = linear_model.predict(X_train.reshape(-1, 1))\n",
|
||||
"\n",
|
||||
"print(\"Prediction on training set:\", y_pred)\n",
|
||||
"\n",
|
||||
"X_test = np.array([[1200]])\n",
|
||||
"print(f\"Prediction for 1200 sqft house: ${linear_model.predict(X_test)[0]:0.2f}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Second Example\n",
|
||||
"The second example is from an earlier lab with multiple features. The final parameter values and predictions are very close to the results from the un-normalized 'long-run' from that lab. That un-normalized run took hours to produce results, while this is nearly instantaneous. The closed-form solution work well on smaller data sets such as these but can be computationally demanding on larger data sets. \n",
|
||||
">The closed-form solution does not require normalization."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# load the dataset\n",
|
||||
"X_train, y_train = load_house_data()\n",
|
||||
"X_features = ['size(sqft)','bedrooms','floors','age']"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"LinearRegression()"
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"linear_model = LinearRegression()\n",
|
||||
"linear_model.fit(X_train, y_train) "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"w = [ 0.27 -32.62 -67.25 -1.47], b = 220.42\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"b = linear_model.intercept_\n",
|
||||
"w = linear_model.coef_\n",
|
||||
"print(f\"w = {w:}, b = {b:0.2f}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Prediction on training set:\n",
|
||||
" [295.18 485.98 389.52 492.15]\n",
|
||||
"prediction using w,b:\n",
|
||||
" [295.18 485.98 389.52 492.15]\n",
|
||||
"Target values \n",
|
||||
" [300. 509.8 394. 540. ]\n",
|
||||
" predicted price of a house with 1200 sqft, 3 bedrooms, 1 floor, 40 years old = $318709.09\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(f\"Prediction on training set:\\n {linear_model.predict(X_train)[:4]}\" )\n",
|
||||
"print(f\"prediction using w,b:\\n {(X_train @ w + b)[:4]}\")\n",
|
||||
"print(f\"Target values \\n {y_train[:4]}\")\n",
|
||||
"\n",
|
||||
"x_house = np.array([1200, 3,1, 40]).reshape(-1,4)\n",
|
||||
"x_house_predict = linear_model.predict(x_house)[0]\n",
|
||||
"print(f\" predicted price of a house with 1200 sqft, 3 bedrooms, 1 floor, 40 years old = ${x_house_predict*1000:0.2f}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Congratulations!\n",
|
||||
"In this lab you:\n",
|
||||
"- utilized an open-source machine learning toolkit, scikit-learn\n",
|
||||
"- implemented linear regression using a close-form solution from that toolkit"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
@ -0,0 +1,282 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Ungraded Lab - Normal Equations \n",
|
||||
"\n",
|
||||
"In the lecture videos, you learned that the closed-form solution to linear regression is\n",
|
||||
"\n",
|
||||
"\\begin{equation*}\n",
|
||||
"w = (X^TX)^{-1}X^Ty \\tag{1}\n",
|
||||
"\\end{equation*}\n",
|
||||
"\n",
|
||||
"Using this formula does not require any feature scaling, and you will get an exact solution in one calculation: there is no “loop until convergence” like in gradient descent.\n",
|
||||
"\n",
|
||||
"This lab makes extensive use of linear algebra. It is not required for the course, but the solutions are provided and completing it may improve your familiarity with the subject. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import numpy as np"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Dataset\n",
|
||||
"\n",
|
||||
"You will again use the motivating example of housing price prediction as in the last few labs. The training dataset contains three examples with 4 features (size, bedrooms, floors and age) shown in the table below.\n",
|
||||
"\n",
|
||||
"| Size (feet$^2$) | Number of Bedrooms | Number of floors | Age of Home | Price (1000s dollars) | \n",
|
||||
"| ----------------| ------------------- |----------------- |--------------|-------------- | \n",
|
||||
"| 2104 | 5 | 1 | 45 | 460 | \n",
|
||||
"| 1416 | 3 | 2 | 40 | 232 | \n",
|
||||
"| 852 | 2 | 1 | 35 | 178 | \n",
|
||||
"\n",
|
||||
"We would like to build a linear regression model using these values so we can then predict the price for other houses - say, a house with 1200 feet$^2$, 3 bedrooms, 1 floor, 40 years old.\n",
|
||||
"\n",
|
||||
"Please run the following to load the data and extend X with a column of 1's."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Load data set\n",
|
||||
"X_orig = np.array([[2104,5,1,45], [1416,3,2,40], [852,2,1,35]])\n",
|
||||
"y_train = np.array([460,232,178]).reshape(-1,1) #reshape creates (m,1) matrix\n",
|
||||
"\n",
|
||||
"#extend X_orig with column of ones\n",
|
||||
"tmp_ones = np.ones((3,1), dtype=np.int64) #dtype just added to keep examples neat.. not required\n",
|
||||
"X_train = np.concatenate([tmp_ones, X_orig], axis=1)\n",
|
||||
"\n",
|
||||
"print(f\"X shape: {X_train.shape}, y_shape: {y_train.shape}\")\n",
|
||||
"print(X_train)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Exercise**\n",
|
||||
"\n",
|
||||
"Complete the code in the `normal_equation()` function below. Use the formula above to calculate $w$. Remember that while you don’t need to scale your features, we still need to add a column of 1’s to the original X matrix to have an intercept term $w_0$. \n",
|
||||
"\n",
|
||||
"**Hint**\n",
|
||||
"Look into `np.linalg.pinv()`, `np.transpose()` (also .T) and `np.dot()`. Be sure to use pinv or the pseudo inverse rather than inv."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<details>\n",
|
||||
" <summary><font size=\"2\" color=\"darkgreen\"><b>Hints</b></font></summary>\n",
|
||||
" \n",
|
||||
" \n",
|
||||
" def normal_equation(X, y): \n",
|
||||
"\n",
|
||||
" Computes the closed-form solution to linear \n",
|
||||
" regression using the normal equations.\n",
|
||||
" \n",
|
||||
" Parameters\n",
|
||||
" ----------\n",
|
||||
" X : array_like\n",
|
||||
" Shape (m,n)\n",
|
||||
" \n",
|
||||
" y: array_like\n",
|
||||
" Shape (m,)\n",
|
||||
" \n",
|
||||
" Returns\n",
|
||||
" -------\n",
|
||||
" w : array_like\n",
|
||||
" Shape (n,)\n",
|
||||
" Parameters computed by normal equation\n",
|
||||
" \n",
|
||||
" \n",
|
||||
" #(≈ 1 line of code)\n",
|
||||
" # w = \n",
|
||||
" w = np.linalg.pinv(X.T @ X) @ X.T @ y\n",
|
||||
" \n",
|
||||
" return w \n",
|
||||
"\n",
|
||||
"</details>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def normal_equation(X, y): \n",
|
||||
" \"\"\"\n",
|
||||
" Computes the closed-form solution to linear \n",
|
||||
" regression using the normal equations.\n",
|
||||
" \n",
|
||||
" Parameters\n",
|
||||
" ----------\n",
|
||||
" X : array_like\n",
|
||||
" Shape (m,n)\n",
|
||||
" \n",
|
||||
" y: array_like\n",
|
||||
" Shape (m,)\n",
|
||||
" \n",
|
||||
" Returns\n",
|
||||
" -------\n",
|
||||
" w : array_like\n",
|
||||
" Shape (n,)\n",
|
||||
" Parameters computed by normal equation\n",
|
||||
" \"\"\"\n",
|
||||
" \n",
|
||||
" #(≈ 1 line of code)\n",
|
||||
" # w = \n",
|
||||
"\n",
|
||||
" \n",
|
||||
" return w"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"w_normal = normal_equation(X_train, y_train)\n",
|
||||
"print(\"w found by normal equation:\")\n",
|
||||
"print(w_normal)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<details>\n",
|
||||
"<summary>\n",
|
||||
" <b>**Expected Output**:</b>\n",
|
||||
"</summary>\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"w found by normal equation:\n",
|
||||
"[[ 1.240339 ]\n",
|
||||
" [ 0.15440335]\n",
|
||||
" [ 23.47118976]\n",
|
||||
" [-65.69139736]\n",
|
||||
" [ 1.82734354]]\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now let's see what the prediction is on our training data."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"y_pred = X_train @ w_normal\n",
|
||||
"print(\"Prediction using computed w:\")\n",
|
||||
"print(y_pred)\n",
|
||||
"print(\"Our Target values for y:\")\n",
|
||||
"print(y_train)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<details>\n",
|
||||
"<summary>\n",
|
||||
" <b>**Expected Output**:</b>\n",
|
||||
"</summary>\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"Prediction using computed w:\n",
|
||||
"[[460.]\n",
|
||||
" [232.]\n",
|
||||
" [178.]]\n",
|
||||
"Our Target values for y:\n",
|
||||
"[[460]\n",
|
||||
" [232]\n",
|
||||
" [178]]\n",
|
||||
"```"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Great! Now we have our parameters for our model. Let's try predicting the price of a house with 1200 feet^2, 3 bedrooms, 1 floor, 40 years old. We will manually add the 1's column."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"X_test = np.array([1,1200,3,1,40])\n",
|
||||
"\n",
|
||||
"y_pred = X_test @ w_normal\n",
|
||||
"print(\"our predicted price is: %.2f thousand dollars\" % y_pred)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<details>\n",
|
||||
"<summary>\n",
|
||||
" <b>**Expected Output**:</b>\n",
|
||||
"</summary>\n",
|
||||
"\n",
|
||||
"```\n",
|
||||
"our predicted price is: 264.34 thousand dollars\n",
|
||||
"```\n",
|
||||
"_seems a bit pricy.._"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.8.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
137
work2/.ipynb_checkpoints/C1_W2_Lab08_Sklearn-checkpoint.ipynb
Normal file
@ -0,0 +1,137 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Ungraded Lab: Linear Regression using Scikit-Learn"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now that you've implemented linear regression from scratch, let's see you can train a linear regression model using scikit-learn.\n",
|
||||
"\n",
|
||||
"## Dataset \n",
|
||||
"Let's start with the same dataset as the first labs."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import numpy as np\n",
|
||||
"\n",
|
||||
"# X is the input variable (size in square feet)\n",
|
||||
"# y in the output variable (price in 1000s of dollars)\n",
|
||||
"X = np.array([1000, 2000])\n",
|
||||
"y = np.array([200, 400])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Fit the model\n",
|
||||
"\n",
|
||||
"The code below imports the [linear regression model](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression) from scikit-learn. You can fit this model on the training data by calling `fit` function."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from sklearn.linear_model import LinearRegression\n",
|
||||
"\n",
|
||||
"linear_model = LinearRegression()\n",
|
||||
"# We must reshape X using .reshape(-1, 1) because our data has a single feature\n",
|
||||
"# If X has multiple features, you don't need to reshape\n",
|
||||
"linear_model.fit(X.reshape(-1, 1), y) "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Make Predictions\n",
|
||||
"\n",
|
||||
"You can see the predictions made by this model by calling the `predict` function."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"y_pred = linear_model.predict(X.reshape(-1,1))\n",
|
||||
"\n",
|
||||
"print(\"Prediction on training set:\", y_pred)\n",
|
||||
"\n",
|
||||
"X_test = np.array([[1200]])\n",
|
||||
"print(f\"Prediction for 1200 sqft house: {linear_model.predict(X_test)}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Calculate score\n",
|
||||
"\n",
|
||||
"You can calculate how well this model is doing by calling the `score` function. Specifically, it, returns the coefficient of determination $R^2$ of the prediction. 1 is the best score."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print(\"Accuracy on training set:\", linear_model.score(X.reshape(-1,1), y))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## View Parameters \n",
|
||||
"Our $\\mathbf{w}$ parameters from our earlier labs are referred to as 'intercept' and 'coefficients' in sklearn."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print(f\"w = {linear_model.intercept_},{linear_model.coef_}\")"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.8.6"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
@ -0,0 +1,532 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "existing-laundry",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# UGL - Multiple Variable Model Representation\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "registered-finnish",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import numpy as np\n",
|
||||
"import matplotlib.pyplot as plt\n",
|
||||
"\n",
|
||||
"%matplotlib inline"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "premium-reputation",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Take two data points - TODO: come up with problem statement/explanantion\n",
|
||||
"X_orig = np.array([[10,5], [20, 2]])\n",
|
||||
"y_orig = np.array([1,2])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "mature-salmon",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"2\n",
|
||||
"2\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# print the length of X_orig\n",
|
||||
"print(len(X_orig))\n",
|
||||
"\n",
|
||||
"# print the length of y_orig\n",
|
||||
"print(len(y_orig))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "future-merchant",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"(2, 2)\n",
|
||||
"(2,)\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# print the shape of X_orig\n",
|
||||
"print(X_orig.shape)\n",
|
||||
"\n",
|
||||
"# print the shape of y_orig\n",
|
||||
"print(y_orig.shape)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "enormous-spotlight",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Hypothesis"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "wicked-bread",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Model prediction\n",
|
||||
"The model's prediction is also called the \"hypothesis\", $h_{w}(x)$. \n",
|
||||
"- The prediction is given by the linear model:\n",
|
||||
"\n",
|
||||
"$$ h_{w}(x) = w_0 + w_1x_1 \\tag{2}$$\n",
|
||||
"\n",
|
||||
"This the equation for a line, with an intercept $w_0$ and a slope $w_1$"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "stylish-report",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Vector notation\n",
|
||||
"\n",
|
||||
"For convenience of notation, you'll define $\\overrightarrow{x}$ as a vector containing two values:\n",
|
||||
"\n",
|
||||
"$$ \\vec{x} = \\begin{pmatrix}\n",
|
||||
" x_0 & x_1 \n",
|
||||
" \\end{pmatrix}\n",
|
||||
"$$\n",
|
||||
"\n",
|
||||
"- You'll set $x_0 = 1$. \n",
|
||||
"- $x_1$ will be the city population from your dataset `X_orig`. \n",
|
||||
"\n",
|
||||
"\n",
|
||||
"Similarly, you are defining $\\vec{w}$ as a vector containing two values:\n",
|
||||
"\n",
|
||||
"$$ \\vec{w} = \\begin{pmatrix}\n",
|
||||
" w_0 \\\\ \n",
|
||||
" w_1 \n",
|
||||
" \\end{pmatrix}\n",
|
||||
"$$\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"Now the hypothesis $h_{\\vec{w}}(\\vec{x})$ can now be written as\n",
|
||||
"\n",
|
||||
"$$ h_{\\vec{w}}(\\vec{x}) = \\vec{x} \\times \\vec{w} \\tag{3}\n",
|
||||
"$$ \n",
|
||||
"\n",
|
||||
"$$\n",
|
||||
"h_{\\vec{w}}(\\vec{x}) = \n",
|
||||
"\\begin{pmatrix} x_0 & x_1 \\end{pmatrix} \\times \n",
|
||||
"\\begin{pmatrix} w_0 \\\\ w_1 \\end{pmatrix} \n",
|
||||
"$$\n",
|
||||
"$$\n",
|
||||
"h_{\\vec{w}}(\\vec{x}) = x_0 \\times w_0 + x_1 \\times w_1 \n",
|
||||
"$$\n",
|
||||
"Here is a small example: \n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "embedded-planning",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"The input x is:\n",
|
||||
"[1 2]\n",
|
||||
"\n",
|
||||
"The parameter w is\n",
|
||||
"[[3]\n",
|
||||
" [4]]\n",
|
||||
"\n",
|
||||
"The model's prediction is [11]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Here is a small concrete example of x and w as vectors\n",
|
||||
"\n",
|
||||
"tmp_x = np.array([1,2])\n",
|
||||
"print(f\"The input x is:\")\n",
|
||||
"print(tmp_x)\n",
|
||||
"print()\n",
|
||||
"\n",
|
||||
"tmp_w = np.array([[3],[4]])\n",
|
||||
"print(f\"The parameter w is\")\n",
|
||||
"print(tmp_w)\n",
|
||||
"print()\n",
|
||||
"\n",
|
||||
"tmp_h = np.dot(tmp_x,tmp_w)\n",
|
||||
"print(f\"The model's prediction is {tmp_h}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "continuing-domain",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Matrix X\n",
|
||||
"\n",
|
||||
"To allow you to process multiple examples (multiple cities) at a time, you can stack multiple examples (cities) as rows of a matrix $\\mathbf{X}$.\n",
|
||||
"\n",
|
||||
"For example, let's say New York City is $\\vec{x^{(0)}}$ and San Francisco is $\\vec{x^{(1)}}$. Then stack New York City in row 1 and San Francisco in row 2 of matrix $\\mathbf{X}$:\n",
|
||||
"\n",
|
||||
"$$\\mathbf{X} = \\begin{pmatrix}\n",
|
||||
" \\vec{x^{(0)}} \\\\ \n",
|
||||
" \\vec{x^{(1)}}\n",
|
||||
"\\end{pmatrix}\n",
|
||||
"$$\n",
|
||||
"\n",
|
||||
"Recall that each vector consists of $w_0$ and $w_1$, and $\\mathbf{X}$ looks like this:\n",
|
||||
"$$\n",
|
||||
"\\mathbf{X} = \\begin{pmatrix}\n",
|
||||
" x^{(0)}_0 & x^{(0)}_1 \\\\ \n",
|
||||
" x^{(1)}_0 & x^{(1)}_1\n",
|
||||
"\\end{pmatrix}\n",
|
||||
"$$\n",
|
||||
"\n",
|
||||
"Recall that you're fixing $x_0^{(i)}$ for all cities to be `1`, so you can also write $\\mathbf{X}$ as:\n",
|
||||
"$$\\mathbf{X} =\n",
|
||||
"\\begin{pmatrix}\n",
|
||||
" 1 & x^{(0)}_1 \\\\ \n",
|
||||
" 1 & x^{(1)}_1\n",
|
||||
"\\end{pmatrix}\n",
|
||||
"$$"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "suspended-promise",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"New York City has population 9\n",
|
||||
"San Francisco has population 2\n",
|
||||
"An example of matrix X with city populations for two cities is:\n",
|
||||
"\n",
|
||||
"[[1 9]\n",
|
||||
" [1 2]]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Here is a concrete example\n",
|
||||
"\n",
|
||||
"tmp_NYC_population = 9\n",
|
||||
"tmp_SF_population = 2\n",
|
||||
"tmp_x0 = 1 # x0 for all cities\n",
|
||||
"\n",
|
||||
"tmp_X = np.array([[tmp_x0, tmp_NYC_population],\n",
|
||||
" [tmp_x0, tmp_SF_population]\n",
|
||||
" ])\n",
|
||||
"\n",
|
||||
"print(f\"New York City has population {tmp_NYC_population}\")\n",
|
||||
"print(f\"San Francisco has population {tmp_SF_population}\")\n",
|
||||
"print(f\"An example of matrix X with city populations for two cities is:\\n\")\n",
|
||||
"print(f\"{tmp_X}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "acute-blame",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Matrix X in general\n",
|
||||
"In general, when you have $m$ training examples (in this dataset $m$ is the number of cities), and there are $n$ features (here, just 1 feature, which is city population):\n",
|
||||
"- $\\mathbf{X}$ is a matrix with dimensions ($m$, $n+1$) (m rows, n+1 columns)\n",
|
||||
" - Each row is a city and its input features.\n",
|
||||
"\n",
|
||||
"$$\\mathbf{X} = \\begin{pmatrix}\n",
|
||||
" \\vec{x^{(0)}} \\\\ \n",
|
||||
" \\vec{x^{(1)}} \\\\\n",
|
||||
" \\cdots \\\\\n",
|
||||
" \\vec{x^{(m-1)}}\n",
|
||||
"\\end{pmatrix}\n",
|
||||
"= \\begin{pmatrix}\n",
|
||||
" x^{(0)}_0 & x^{(0)}_1 & \\cdots & x^{(0)}_{n} \\\\ \n",
|
||||
" x^{(1)}_0 & x^{(1)}_1 & \\cdots & x^{(1)}_{n} \\\\\n",
|
||||
" \\cdots \\\\\n",
|
||||
" x^{(m-1)}_0 & x^{(m-1)}_1 & \\cdots & x^{(m-1)}_{n} \n",
|
||||
"\\end{pmatrix}\n",
|
||||
"$$\n",
|
||||
"\n",
|
||||
"- In this dataset, $n=1$ (city population) and $m=97$ (97 cities in the dataset)\n",
|
||||
"\n",
|
||||
"$$\\mathbf{X} = \\begin{pmatrix}\n",
|
||||
" \\vec{x^{(0)}} \\\\ \n",
|
||||
" \\vec{x^{(1)}} \\\\\n",
|
||||
" \\cdots \\\\\n",
|
||||
" \\vec{x^{(m-1)}}\n",
|
||||
"\\end{pmatrix}\n",
|
||||
"= \\begin{pmatrix}\n",
|
||||
" x^{(0)}_0 & x^{(0)}_1 \\\\ \n",
|
||||
" x^{(1)}_0 & x^{(1)}_1 \\\\\n",
|
||||
" \\cdots \\\\\n",
|
||||
" x^{(97-1)}_0 & x^{(97-1)}_1 \n",
|
||||
"\\end{pmatrix}\n",
|
||||
"$$\n",
|
||||
"\n",
|
||||
"- $\\vec{w}$ is a vector with dimensions ($n+1$, $1$) (n+1 rows, 1 column)\n",
|
||||
" - Each column represents one feature.\n",
|
||||
"\n",
|
||||
"$$\\vec{w} = \\begin{pmatrix}\n",
|
||||
"w_0 \\\\ \n",
|
||||
"w_1 \\\\\n",
|
||||
"\\cdots\\\\\n",
|
||||
"w_{n}\n",
|
||||
"\\end{pmatrix}\n",
|
||||
"$$\n",
|
||||
"- In this dataset, there is just the intercept and the city population feature:\n",
|
||||
"$$\\vec{w} = \\begin{pmatrix}\n",
|
||||
"w_0 \\\\ \n",
|
||||
"w_1 \\\\\n",
|
||||
"\\end{pmatrix}\n",
|
||||
"$$"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "criminal-financing",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Processing data: Add the column for the intercept\n",
|
||||
"\n",
|
||||
"To calculate the cost and implement gradient descent, you will want to first add another column to your data (as $x_0$) to accomodate the $w_0$ intercept term. \n",
|
||||
"- This allows you to treat $w_0$ as simply another 'feature': feature 0.\n",
|
||||
"- The city population is then $w_1$, or feature 1.\n",
|
||||
"\n",
|
||||
"So if your original $\\mathbf{X_{orig}}$ looks like this:\n",
|
||||
"\n",
|
||||
"$$ \n",
|
||||
"\\mathbf{X_{orig}} = \n",
|
||||
"\\begin{pmatrix}\n",
|
||||
" x^{(0)}_1 \\\\ \n",
|
||||
" x^{(1)}_1 \\\\\n",
|
||||
" \\cdots \\\\\n",
|
||||
" x^{(97-1)}_1 \n",
|
||||
"\\end{pmatrix}\n",
|
||||
"$$\n",
|
||||
"\n",
|
||||
"You will want to combine it with a vector of ones:\n",
|
||||
"$$\n",
|
||||
"\\vec{1} = \n",
|
||||
"\\begin{pmatrix}\n",
|
||||
" x^{(0)}_0 \\\\ \n",
|
||||
" x^{(1)}_0 \\\\\n",
|
||||
" \\cdots \\\\\n",
|
||||
" x^{(m-1)}_0\n",
|
||||
"\\end{pmatrix}\n",
|
||||
"= \n",
|
||||
"\\begin{pmatrix}\n",
|
||||
" 1 \\\\ \n",
|
||||
" 1 \\\\\n",
|
||||
" \\cdots \\\\\n",
|
||||
" 1\n",
|
||||
"\\end{pmatrix}\n",
|
||||
"$$\n",
|
||||
"\n",
|
||||
"So it will look like this:\n",
|
||||
"$$\n",
|
||||
"\\mathbf{X} = \\begin{pmatrix} \\vec{1} & \\mathbf{X_{orig}}\\end{pmatrix}\n",
|
||||
"=\n",
|
||||
"\\begin{pmatrix}\n",
|
||||
" 1 & x^{(0)}_1 \\\\ \n",
|
||||
" 1 & x^{(1)}_1 \\\\\n",
|
||||
" \\cdots \\\\\n",
|
||||
" 1 & x^{(97-1)}_1 \n",
|
||||
"\\end{pmatrix}\n",
|
||||
"$$\n",
|
||||
"\n",
|
||||
"Here is a small example of what you'll want to do."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "concerned-violence",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Matrix of city populations\n",
|
||||
"[[9]\n",
|
||||
" [2]]\n",
|
||||
"\n",
|
||||
"Column vector of ones ({tmp_num_of_cities} rows and 1 column)\n",
|
||||
"[[1.]\n",
|
||||
" [1.]]\n",
|
||||
"\n",
|
||||
"Vector of ones stacked to the left of tmp_X_orig\n",
|
||||
"[[1. 9.]\n",
|
||||
" [1. 2.]]\n",
|
||||
"tmp_x has shape: (2, 2)\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"tmp_NYC_population = 9\n",
|
||||
"tmp_SF_population = 2\n",
|
||||
"tmp_x0 = 1 # x0 for all cities\n",
|
||||
"tmp_num_of_cities = 2\n",
|
||||
"\n",
|
||||
"tmp_X_orig = np.array([[tmp_NYC_population],\n",
|
||||
" [tmp_SF_population]\n",
|
||||
" ])\n",
|
||||
"\n",
|
||||
"print(\"Matrix of city populations\")\n",
|
||||
"print(tmp_X_orig)\n",
|
||||
"print()\n",
|
||||
"\n",
|
||||
"# Use np.ones to create a column vector of ones\n",
|
||||
"tmp_ones = np.ones((tmp_num_of_cities,1))\n",
|
||||
"print(\"Column vector of ones ({tmp_num_of_cities} rows and 1 column)\")\n",
|
||||
"print(tmp_ones)\n",
|
||||
"print()\n",
|
||||
"\n",
|
||||
"tmp_X = np.concatenate([tmp_ones, tmp_X_orig], axis=1)\n",
|
||||
"print(\"Vector of ones stacked to the left of tmp_X_orig\")\n",
|
||||
"print(tmp_X)\n",
|
||||
"\n",
|
||||
"print(f\"tmp_x has shape: {tmp_X.shape}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "young-living",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In this small example, the $\\mathbf{X}$ is now:\n",
|
||||
"$$\\mathbf{X} = \n",
|
||||
"\\begin{pmatrix}\n",
|
||||
"1 & 9 \\\\\n",
|
||||
"1 & 2\n",
|
||||
"\\end{pmatrix}\n",
|
||||
"$$\n",
|
||||
"\n",
|
||||
"Notice that when calling `np.concatenate`, you're setting `axis=1`. \n",
|
||||
"- This puts the vector of ones on the left and the tmp_X_orig to the right.\n",
|
||||
"- If you set axis = 0, then `np.concatenate` would place the vector of ones ON TOP of tmp_X_orig"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 13,
|
||||
"id": "united-roots",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Calling numpy.concatenate, setting axis=0\n",
|
||||
"Vector of ones stacked to the ON TOP of tmp_X_orig\n",
|
||||
"[[1.]\n",
|
||||
" [1.]\n",
|
||||
" [9.]\n",
|
||||
" [2.]]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(\"Calling numpy.concatenate, setting axis=0\")\n",
|
||||
"tmp_X_version_2 = np.concatenate([tmp_ones, tmp_X_orig], axis=0)\n",
|
||||
"print(\"Vector of ones stacked to the ON TOP of tmp_X_orig\")\n",
|
||||
"print(tmp_X_version_2)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "hydraulic-inspector",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"So if you set axis=1, $\\mathbf{X}$ looks like this:\n",
|
||||
"$$\\mathbf{X} = \n",
|
||||
"\\begin{pmatrix}\n",
|
||||
"1 \\\\ 1 \\\\\n",
|
||||
"9 \\\\ 2\n",
|
||||
"\\end{pmatrix}\n",
|
||||
"$$\n",
|
||||
"This is **NOT** what you want.\n",
|
||||
"\n",
|
||||
"You'll want to set axis=1 so that you get a column vector of ones on the left and a colun vector of the city populations on the right:\n",
|
||||
"\n",
|
||||
"$$\\mathbf{X} = \n",
|
||||
"\\begin{pmatrix}\n",
|
||||
"1 & x^{(0)}_1 \\\\\n",
|
||||
"1 & x^{(1)}_1\n",
|
||||
"\\end{pmatrix}\n",
|
||||
"$$"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "gorgeous-bermuda",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Add a column to X_orig to account for the w_0 term\n",
|
||||
"# X_train = np.stack([np.ones(X_orig.shape), X_orig], axis=1)\n",
|
||||
"m = len(X_col)\n",
|
||||
"col_vec_ones = np.ones((m, 1))\n",
|
||||
"X_train = np.concatenate([col_vec_ones, X_col], axis=1)\n",
|
||||
"# Keep y_orig the same\n",
|
||||
"y_train = y_col\n",
|
||||
"\n",
|
||||
"print ('The shape of X_train is: ' + str(X_train.shape))\n",
|
||||
"print ('The shape of y_train is: ' + str(y_train.shape))"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
@ -0,0 +1,334 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "distributed-detective",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# UGL - Multiple Variable Cost"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "after-cargo",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import numpy as np"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "entire-ecology",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"matrix A has 2 rows and 2 columns\n",
|
||||
"[[1 1]\n",
|
||||
" [1 1]]\n",
|
||||
"\n",
|
||||
"Vector b has 2 rows and 1 column\n",
|
||||
"[[2]\n",
|
||||
" [2]]\n",
|
||||
"\n",
|
||||
"Multiply A times b\n",
|
||||
"[[4]\n",
|
||||
" [4]]\n",
|
||||
"The product has 2 rows and 1 column\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# make a matrix A with 2 rows and 2 columns\n",
|
||||
"tmp_A = np.array([[1,1],[1,1]])\n",
|
||||
"print(f\"matrix A has 2 rows and 2 columns\")\n",
|
||||
"print(tmp_A)\n",
|
||||
"print()\n",
|
||||
"\n",
|
||||
"# make a colun vector B with 2 rows and 1 column\n",
|
||||
"tmp_b = np.array([[2],[2]])\n",
|
||||
"print(f\"Vector b has 2 rows and 1 column\")\n",
|
||||
"print(tmp_b)\n",
|
||||
"print()\n",
|
||||
"# perform matrix multiplication A x b\n",
|
||||
"tmp_A_times_b = np.dot(tmp_A,tmp_b)\n",
|
||||
"print(\"Multiply A times b\")\n",
|
||||
"print(tmp_A_times_b)\n",
|
||||
"print(\"The product has 2 rows and 1 column\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "drawn-product",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"matrix A has 2 rows and 2 columns\n",
|
||||
"[[1 1]\n",
|
||||
" [1 1]]\n",
|
||||
"\n",
|
||||
"Vector b has 2 rows and 1 column\n",
|
||||
"[[2]\n",
|
||||
" [2]]\n",
|
||||
"\n",
|
||||
"The error message you'll see is:\n",
|
||||
"shapes (2,1) and (2,2) not aligned: 1 (dim 1) != 2 (dim 0)\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# make a matrix A with 2 rows and 2 columns\n",
|
||||
"tmp_A = np.array([[1,1],[1,1]])\n",
|
||||
"print(f\"matrix A has 2 rows and 2 columns\")\n",
|
||||
"print(tmp_A)\n",
|
||||
"print()\n",
|
||||
"\n",
|
||||
"# make a colun vector B with 2 rows and 1 column\n",
|
||||
"tmp_b = np.array([[2],[2]])\n",
|
||||
"print(f\"Vector b has 2 rows and 1 column\")\n",
|
||||
"print(tmp_b)\n",
|
||||
"print()\n",
|
||||
"\n",
|
||||
"# Try to perform matrix multiplication A x b\n",
|
||||
"try:\n",
|
||||
" tmp_b_times_A = np.dot(tmp_b,tmp_A)\n",
|
||||
"except Exception as e:\n",
|
||||
" print(\"The error message you'll see is:\")\n",
|
||||
" print(e)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "entertaining-playback",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The message says that it's checking:\n",
|
||||
" - The number of columns of the left matrix `b`, or `dim 1` is 1.\n",
|
||||
" - The number of rows on the right matrix `dim 0`, is 2.\n",
|
||||
" - 1 does not equal 2\n",
|
||||
" - So the two matrices cannot be multiplied together."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "useful-desire",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Calculate the cost\n",
|
||||
"Next, calculate the cost $J(\\vec{w})$\n",
|
||||
"- Recall that the equation for the cost function $J(w)$ looks like this:\n",
|
||||
"$$J(\\vec{w}) = \\frac{1}{2m} \\sum\\limits_{i = 1}^{m} (h_{w}(x^{(i)}) - y^{(i)})^2 \\tag{1}$$ \n",
|
||||
"\n",
|
||||
"- The model prediction is a column vector of 97 examples:\n",
|
||||
"$$\\vec{h_{\\vec{w}}(\\mathbf{X})} = \\begin{pmatrix}\n",
|
||||
"h^{(0)}_{w}(x) \\\\\n",
|
||||
"h^{(1)}_{w}(x) \\\\\n",
|
||||
"\\cdots \\\\\n",
|
||||
"h^{(97-1)}_{w}(x) \\\\\n",
|
||||
"\\end{pmatrix} \n",
|
||||
"$$\n",
|
||||
"\n",
|
||||
"- Similarly, `y_train` contains the true profit per city as a column vector of 97 examples\n",
|
||||
"$$\\vec{y} = \\begin{pmatrix}\n",
|
||||
"y^{(0)} \\\\\n",
|
||||
"y^{(1)} \\\\\n",
|
||||
"\\cdots \\\\\n",
|
||||
"y^{(97-1)}\\\\\n",
|
||||
"\\end{pmatrix} \n",
|
||||
"$$\n",
|
||||
"\n",
|
||||
"Here is a small example to show you how to apply element-wise operations on numpy arrays."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "attempted-potato",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Create a column vector c with 3 rows and 1 column\n",
|
||||
"[[1]\n",
|
||||
" [2]\n",
|
||||
" [3]]\n",
|
||||
"\n",
|
||||
"Create a column vector c with 3 rows and 1 column\n",
|
||||
"[[2]\n",
|
||||
" [2]\n",
|
||||
" [2]]\n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Create two sample column vectors\n",
|
||||
"tmp_c = np.array([[1],[2],[3]])\n",
|
||||
"print(\"Create a column vector c with 3 rows and 1 column\")\n",
|
||||
"print(tmp_c)\n",
|
||||
"print()\n",
|
||||
"\n",
|
||||
"tmp_d = np.array([[2],[2],[2]])\n",
|
||||
"print(\"Create a column vector c with 3 rows and 1 column\")\n",
|
||||
"print(tmp_d)\n",
|
||||
"print()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "sought-postage",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You can apply `+, -, *, /` operators on two vectors of the same length."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "spoken-testament",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Take the element-wise multiplication between vectors c and d\n",
|
||||
"[[2]\n",
|
||||
" [4]\n",
|
||||
" [6]]\n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Take the element-wise multiplication of two vectors\n",
|
||||
"tmp_mult = tmp_c * tmp_d\n",
|
||||
"print(\"Take the element-wise multiplication between vectors c and d\")\n",
|
||||
"print(tmp_mult)\n",
|
||||
"print()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "hearing-nudist",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You can use `numpy.square` to apply the element-wise square of a vector\n",
|
||||
"- Note, `**2` will also work."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "median-extraction",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Take the element-wise square of vector c\n",
|
||||
"[[1]\n",
|
||||
" [4]\n",
|
||||
" [9]]\n",
|
||||
"\n",
|
||||
"Another way to get the element-wise square of vector c\n",
|
||||
"[[1]\n",
|
||||
" [4]\n",
|
||||
" [9]]\n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Take the element-wise square of vector c\n",
|
||||
"tmp_square = np.square(tmp_c)\n",
|
||||
"tmp_square_option_2 = tmp_c**2\n",
|
||||
"print(\"Take the element-wise square of vector c\")\n",
|
||||
"print(tmp_square)\n",
|
||||
"print()\n",
|
||||
"print(\"Another way to get the element-wise square of vector c\")\n",
|
||||
"print(tmp_square_option_2)\n",
|
||||
"print()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "interim-prefix",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You can use `numpy.sum` to add up all the elements of a vector (or matrix)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "fossil-objective",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Vector d\n",
|
||||
"[[2]\n",
|
||||
" [2]\n",
|
||||
" [2]]\n",
|
||||
"\n",
|
||||
"Take the sum of all the elements in vector d\n",
|
||||
"6\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Take the sum of all elements in vector d\n",
|
||||
"tmp_sum = np.sum(tmp_d)\n",
|
||||
"print(\"Vector d\")\n",
|
||||
"print(tmp_d)\n",
|
||||
"print()\n",
|
||||
"print(\"Take the sum of all the elements in vector d\")\n",
|
||||
"print(tmp_sum)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "convenient-taylor",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
@ -0,0 +1,301 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "representative-rhythm",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import numpy as np"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "buried-blackberry",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Prediction: $\\vec{h}_{\\vec{w}}(\\mathbf{X})$\n",
|
||||
"- This is the model's prediction, calculated by $\\mathbf{X}\\vec{w}$"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "obvious-keeping",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Provide two cities and their populations\n",
|
||||
"[[1 9]\n",
|
||||
" [1 2]]\n",
|
||||
"View the current parameter vector\n",
|
||||
"[[1]\n",
|
||||
" [2]]\n",
|
||||
"\n",
|
||||
"Calculate the model prediction h\n",
|
||||
"[[19]\n",
|
||||
" [ 5]]\n",
|
||||
"\n",
|
||||
"The model predicts [19] for city 0, and [5] for city 1\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Provide two cities and their populations\n",
|
||||
"tmp_X = np.array([[1, 9],[1, 2]])\n",
|
||||
"print(\"Provide two cities and their populations\")\n",
|
||||
"print(tmp_X)\n",
|
||||
"\n",
|
||||
"# View the current parameter vector\n",
|
||||
"tmp_w = np.array([[1],[2]])\n",
|
||||
"print(\"View the current parameter vector\")\n",
|
||||
"print(tmp_w)\n",
|
||||
"print()\n",
|
||||
"\n",
|
||||
"# Calculate the model prediction h\n",
|
||||
"tmp_h = np.dot(tmp_X, tmp_w)\n",
|
||||
"print(\"Calculate the model prediction h\")\n",
|
||||
"print(tmp_h)\n",
|
||||
"print()\n",
|
||||
"print(f\"The model predicts {tmp_h[0]} for city 0, and {tmp_h[1]} for city 1\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "developmental-sustainability",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Error: $\\vec{h}_{\\vec{w}}(\\mathbf{X}) - \\vec{y}$\n",
|
||||
" - This is the difference between the model prediction and the actual value of y.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "informed-recorder",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Model prediction tmp_h\n",
|
||||
"[[19]\n",
|
||||
" [ 5]]\n",
|
||||
"\n",
|
||||
"True labels for the profits per city\n",
|
||||
"[[10]\n",
|
||||
" [ 6]]\n",
|
||||
"\n",
|
||||
"Error\n",
|
||||
"[[ 9]\n",
|
||||
" [-1]]\n",
|
||||
"The error for city 0 prediction is [9] and is positive; the error for city 1 prediction is [-1] and is negative\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# View the model's predictions\n",
|
||||
"print(\"Model prediction tmp_h\")\n",
|
||||
"print(tmp_h)\n",
|
||||
"print()\n",
|
||||
"\n",
|
||||
"# Get the true labels for these two cities\n",
|
||||
"tmp_y = np.array([[10],[6]])\n",
|
||||
"print(\"True labels for the profits per city\")\n",
|
||||
"print(tmp_y)\n",
|
||||
"print()\n",
|
||||
"\n",
|
||||
"# Calculate the error\n",
|
||||
"tmp_error = tmp_h - tmp_y\n",
|
||||
"print(\"Error\")\n",
|
||||
"print(tmp_error)\n",
|
||||
"print(f\"The error for city 0 prediction is {tmp_error[0]} and is positive; the error for city 1 prediction is {tmp_error[1]} and is negative\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "suitable-chain",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Gradient: $\\frac{1}{m} \\mathbf{X}^T \\times Error$\n",
|
||||
"- This is a vector containing the gradient for each element of the parameter vector $\\vec{w}$\n",
|
||||
" - Since $\\vec{w}$ is a column vector with 2 rows, this gradient is also a column vector with 2 rows.\n",
|
||||
" - The $\\frac{1}{m}$ takes the average gradient across all 97 training examples (97 cities).\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "automatic-fiction",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"X: two cities and their populations\n",
|
||||
"[[1 9]\n",
|
||||
" [1 2]]\n",
|
||||
"\n",
|
||||
"Transpose of X\n",
|
||||
"[[1 1]\n",
|
||||
" [9 2]]\n",
|
||||
"\n",
|
||||
"The number of examples (number of cities) is 2\n",
|
||||
"\n",
|
||||
"Error\n",
|
||||
"[[ 9]\n",
|
||||
" [-1]]\n",
|
||||
"Gradient\n",
|
||||
"[[ 4. ]\n",
|
||||
" [39.5]]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Provide two cities and their populations\n",
|
||||
"tmp_X = np.array([[1, 9],[1, 2]])\n",
|
||||
"print(\"X: two cities and their populations\")\n",
|
||||
"print(tmp_X)\n",
|
||||
"print()\n",
|
||||
"\n",
|
||||
"# transpose of X\n",
|
||||
"tmp_X_T = tmp_X.T\n",
|
||||
"print(\"Transpose of X\")\n",
|
||||
"print(tmp_X_T)\n",
|
||||
"print()\n",
|
||||
"\n",
|
||||
"# The number of examples (cities)\n",
|
||||
"tmp_m = tmp_X.shape[0]\n",
|
||||
"print(f\"The number of examples (number of cities) is {tmp_m}\\n\")\n",
|
||||
"\n",
|
||||
"# error\n",
|
||||
"print(\"Error\")\n",
|
||||
"print(tmp_error)\n",
|
||||
"\n",
|
||||
"# Calculate the gradient\n",
|
||||
"tmp_gradient = (1/tmp_m) * np.dot(tmp_X_T, tmp_error)\n",
|
||||
"print(\"Gradient\")\n",
|
||||
"print(tmp_gradient)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "virgin-kitchen",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"#### Scale by the learning rate: $\\alpha$\n",
|
||||
"- $\\alpha$ is a positive number smaller than 1 that reduces the magnitude of the update to be smaller than the actual gradient.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "authentic-output",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Learning rate alpha: 0.01\n",
|
||||
"Gradient before scaling by the learning rate:\n",
|
||||
"[[ 4. ]\n",
|
||||
" [39.5]]\n",
|
||||
"\n",
|
||||
"Gradient after scaling by the learning rate\n",
|
||||
"[[0.04 ]\n",
|
||||
" [0.395]]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"tmp_alpha = 0.01\n",
|
||||
"print(f\"Learning rate alpha: {tmp_alpha}\")\n",
|
||||
"\n",
|
||||
"print(\"Gradient before scaling by the learning rate:\")\n",
|
||||
"print(tmp_gradient)\n",
|
||||
"print()\n",
|
||||
"\n",
|
||||
"gradient_scaled_by_learning_rate = tmp_alpha * tmp_gradient\n",
|
||||
"print(\"Gradient after scaling by the learning rate\")\n",
|
||||
"print(gradient_scaled_by_learning_rate)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "incorporate-queen",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"- Subtract the gradient: $-$\n",
|
||||
" - Recall that the gradient points in the direction that would INCREASE the cost, negative one multiplied by the gradient will point in the direction that REDUCES the cost.\n",
|
||||
" - So, to update the weight in the direction that reduces the cost, subtract the gradient."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "hybrid-patent",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Gradient after scaling by the learning rate\n",
|
||||
"[[0.04 ]\n",
|
||||
" [0.395]]\n",
|
||||
"\n",
|
||||
"The direction to update the parameter vector\n",
|
||||
"[[-0.04 ]\n",
|
||||
" [-0.395]]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"gradient_scaled_by_learning_rate = tmp_alpha * tmp_gradient\n",
|
||||
"print(\"Gradient after scaling by the learning rate\")\n",
|
||||
"print(gradient_scaled_by_learning_rate)\n",
|
||||
"print()\n",
|
||||
"\n",
|
||||
"direction_of_update = -1 * gradient_scaled_by_learning_rate\n",
|
||||
"print(\"The direction to update the parameter vector\")\n",
|
||||
"print(direction_of_update)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "western-theory",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
@ -0,0 +1,140 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "balanced-gather",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## UGL - Normal Equations \n",
|
||||
"\n",
|
||||
"In the lecture videos, you learned that the closed-form solution to linear regression is\n",
|
||||
"\n",
|
||||
"\\begin{equation*}\n",
|
||||
"w = (X^TX)^{-1}X^Ty\n",
|
||||
"\\end{equation*}\n",
|
||||
"\n",
|
||||
"Using this formula does not require any feature scaling, and you will get an exact solution in one calculation: there is no “loop until convergence” like in gradient descent.\n",
|
||||
"\n",
|
||||
"**Exercise**\n",
|
||||
"\n",
|
||||
"Complete the code in the `normal_equation()` function below to use the formula above to calculate $w$. Remember that while you don’t need to scale your features, we still need to add a column of 1’s to the original X matrix to have an intercept term $w_0$. You can assume that this has already been done in the previous parts and the variable that you should use is `X_train`.\n",
|
||||
"\n",
|
||||
"**Hint**\n",
|
||||
"Look into np.linalg.inv(), .T and np.dot()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "radio-latest",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# TODO: Originally was the assignment dataset. Either reuse or add new one\n",
|
||||
"X_train = np.zeros((5,2)) \n",
|
||||
"y_train = np.zeros(2)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "mexican-marsh",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def normal_equation(X, y): \n",
|
||||
" \"\"\"\n",
|
||||
" Computes the closed-form solution to linear \n",
|
||||
" regression using the normal equations.\n",
|
||||
" \n",
|
||||
" Parameters\n",
|
||||
" ----------\n",
|
||||
" X : array_like\n",
|
||||
" Shape (m,n)\n",
|
||||
" \n",
|
||||
" y: array_like\n",
|
||||
" Shape (m,)\n",
|
||||
" \n",
|
||||
" Returns\n",
|
||||
" -------\n",
|
||||
" w : array_like\n",
|
||||
" Shape (n,)\n",
|
||||
" Parameters computed by normal equation\n",
|
||||
" \"\"\"\n",
|
||||
" \n",
|
||||
" #(≈ 1 line of code)\n",
|
||||
" # w = \n",
|
||||
" ### BEGIN SOLUTION ###\n",
|
||||
" w = np.dot(np.dot(np.linalg.inv(np.dot(X.T, X)),X.T), y)\n",
|
||||
" ### END SOLUTION ### \n",
|
||||
" \n",
|
||||
" return w"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "smoking-optimum",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"w_normal = normal_equation(X_train, y_train)\n",
|
||||
"print(\"w found by normal equation:\", w_normal)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "bibliographic-services",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now let's see what the prediction is on unseen input"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "wrapped-tradition",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"X_test_orig = np.array([1650, 3])\n",
|
||||
"\n",
|
||||
"X_test_norm = (X_test_orig - mu)/sigma\n",
|
||||
"X_test = np.hstack((1, X_test_norm))\n",
|
||||
"y_pred_normal = np.dot(X_test, w_normal)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "relative-array",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print(\"Predicted price of a 1650 sq-ft, 3 br house \\\n",
|
||||
" using normal equations is is: $%.2f\" % (y_pred_normal))"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
153
work2/.ipynb_checkpoints/oldC1_W2_Lab08_Sklearn-checkpoint.ipynb
Normal file
@ -0,0 +1,153 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "expected-characterization",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Ungraded Lab: Linear Regression using Scikit-Learn"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "gorgeous-lincoln",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now that you've implemented linear regression from scratch, let's see you can train a linear regression model using scikit-learn.\n",
|
||||
"\n",
|
||||
"## Dataset \n",
|
||||
"Let's start with the same dataset as before."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "mobile-firmware",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import numpy as np\n",
|
||||
"\n",
|
||||
"# X is the input variable (size in square feet)\n",
|
||||
"# y in the output variable (price in 1000s of dollars)\n",
|
||||
"X = np.array([1000, 2000])\n",
|
||||
"y = np.array([200, 400])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "offshore-lease",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Fit the model\n",
|
||||
"\n",
|
||||
"The code below imports the [linear regression model](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression) from scikit-learn. You can fit this model on the training data by calling `fit` function."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "monetary-tactics",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"LinearRegression()"
|
||||
]
|
||||
},
|
||||
"execution_count": 4,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"from sklearn.linear_model import LinearRegression\n",
|
||||
"\n",
|
||||
"linear_model = LinearRegression()\n",
|
||||
"# We have to reshape X using .reshape(-1, 1) because our data has a single feature\n",
|
||||
"# If X has multiple features, you don't need to reshape\n",
|
||||
"linear_model.fit(X.reshape(-1, 1), y) "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "thick-seven",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Make Predictions\n",
|
||||
"\n",
|
||||
"You can see the predictions made by this model by calling the `predict` function."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "norwegian-variety",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Prediction on training set: [200. 400.]\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"y_pred = linear_model.predict(X.reshape(-1,1))\n",
|
||||
"\n",
|
||||
"print(\"Prediction on training set:\", y_pred)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "geographic-archive",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Calculate accuracy\n",
|
||||
"\n",
|
||||
"You can calculate this accuracy of this model by calling the `score` function."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "immune-password",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Accuracy on training set: 1.0\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(\"Accuracy on training set:\", linear_model.score(X.reshape(-1,1), y))"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
@ -0,0 +1,126 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "expected-characterization",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Ungraded Lab: Linear Regression using Scikit-Learn"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "gorgeous-lincoln",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now that you've implemented linear regression from scratch, let's see you can train a linear regression model using scikit-learn.\n",
|
||||
"\n",
|
||||
"## Dataset \n",
|
||||
"Let's start with the same dataset as before."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "mobile-firmware",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import numpy as np\n",
|
||||
"\n",
|
||||
"# X is the input variable (size in square feet)\n",
|
||||
"# y in the output variable (price in 1000s of dollars)\n",
|
||||
"X = np.array([1000, 2000])\n",
|
||||
"y = np.array([200, 400])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "offshore-lease",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Fit the model\n",
|
||||
"\n",
|
||||
"The code below imports the [linear regression model](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression) from scikit-learn. You can fit this model on the training data by calling `fit` function."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "monetary-tactics",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from sklearn.linear_model import LinearRegression\n",
|
||||
"\n",
|
||||
"linear_model = LinearRegression()\n",
|
||||
"# We have to reshape X using .reshape(-1, 1) because our data has a single feature\n",
|
||||
"# If X has multiple features, you don't need to reshape\n",
|
||||
"linear_model.fit(X.reshape(-1, 1), y) "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "thick-seven",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Make Predictions\n",
|
||||
"\n",
|
||||
"You can see the predictions made by this model by calling the `predict` function."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "norwegian-variety",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"y_pred = linear_model.predict(X.reshape(-1,1))\n",
|
||||
"\n",
|
||||
"print(\"Prediction on training set:\", y_pred)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "geographic-archive",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Calculate accuracy\n",
|
||||
"\n",
|
||||
"You can calculate this accuracy of this model by calling the `score` function."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "immune-password",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"print(\"Accuracy on training set:\", linear_model.score(X.reshape(-1,1), y))"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
730
work2/C1_W2_Lab01_Python_Numpy_Vectorization_Soln.ipynb
Normal file
@ -0,0 +1,730 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Optional Lab: Python, NumPy and Vectorization\n",
|
||||
"A brief introduction to some of the scientific computing used in this course. In particular the NumPy scientific computing package and its use with python.\n",
|
||||
"\n",
|
||||
"# Outline\n",
|
||||
"- [ 1.1 Goals](#toc_40015_1.1)\n",
|
||||
"- [ 1.2 Useful References](#toc_40015_1.2)\n",
|
||||
"- [2 Python and NumPy <a name='Python and NumPy'></a>](#toc_40015_2)\n",
|
||||
"- [3 Vectors](#toc_40015_3)\n",
|
||||
"- [ 3.1 Abstract](#toc_40015_3.1)\n",
|
||||
"- [ 3.2 NumPy Arrays](#toc_40015_3.2)\n",
|
||||
"- [ 3.3 Vector Creation](#toc_40015_3.3)\n",
|
||||
"- [ 3.4 Operations on Vectors](#toc_40015_3.4)\n",
|
||||
"- [4 Matrices](#toc_40015_4)\n",
|
||||
"- [ 4.1 Abstract](#toc_40015_4.1)\n",
|
||||
"- [ 4.2 NumPy Arrays](#toc_40015_4.2)\n",
|
||||
"- [ 4.3 Matrix Creation](#toc_40015_4.3)\n",
|
||||
"- [ 4.4 Operations on Matrices](#toc_40015_4.4)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import numpy as np # it is an unofficial standard to use np for numpy\n",
|
||||
"import time"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_1.1\"></a>\n",
|
||||
"## 1.1 Goals\n",
|
||||
"In this lab, you will:\n",
|
||||
"- Review the features of NumPy and Python that are used in Course 1"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_1.2\"></a>\n",
|
||||
"## 1.2 Useful References\n",
|
||||
"- NumPy Documentation including a basic introduction: [NumPy.org](https://NumPy.org/doc/stable/)\n",
|
||||
"- A challenging feature topic: [NumPy Broadcasting](https://NumPy.org/doc/stable/user/basics.broadcasting.html)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_2\"></a>\n",
|
||||
"# 2 Python and NumPy <a name='Python and NumPy'></a>\n",
|
||||
"Python is the programming language we will be using in this course. It has a set of numeric data types and arithmetic operations. NumPy is a library that extends the base capabilities of python to add a richer data set including more numeric types, vectors, matrices, and many matrix functions. NumPy and python work together fairly seamlessly. Python arithmetic operators work on NumPy data types and many NumPy functions will accept python data types.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3\"></a>\n",
|
||||
"# 3 Vectors\n",
|
||||
"<a name=\"toc_40015_3.1\"></a>\n",
|
||||
"## 3.1 Abstract\n",
|
||||
"<img align=\"right\" src=\"./images/C1_W2_Lab04_Vectors.PNG\" style=\"width:340px;\" >Vectors, as you will use them in this course, are ordered arrays of numbers. In notation, vectors are denoted with lower case bold letters such as $\\mathbf{x}$. The elements of a vector are all the same type. A vector does not, for example, contain both characters and numbers. The number of elements in the array is often referred to as the *dimension* though mathematicians may prefer *rank*. The vector shown has a dimension of $n$. The elements of a vector can be referenced with an index. In math settings, indexes typically run from 1 to n. In computer science and these labs, indexing will typically run from 0 to n-1. In notation, elements of a vector, when referenced individually will indicate the index in a subscript, for example, the $0^{th}$ element, of the vector $\\mathbf{x}$ is $x_0$. Note, the x is not bold in this case. \n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.2\"></a>\n",
|
||||
"## 3.2 NumPy Arrays\n",
|
||||
"\n",
|
||||
"NumPy's basic data structure is an indexable, n-dimensional *array* containing elements of the same type (`dtype`). Right away, you may notice we have overloaded the term 'dimension'. Above, it was the number of elements in the vector, here, dimension refers to the number of indexes of an array. A one-dimensional or 1-D array has one index. In Course 1, we will represent vectors as NumPy 1-D arrays. \n",
|
||||
"\n",
|
||||
" - 1-D array, shape (n,): n elements indexed [0] through [n-1]\n",
|
||||
" "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.3\"></a>\n",
|
||||
"## 3.3 Vector Creation\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Data creation routines in NumPy will generally have a first parameter which is the shape of the object. This can either be a single value for a 1-D result or a tuple (n,m,...) specifying the shape of the result. Below are examples of creating vectors using these routines."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# NumPy routines which allocate memory and fill arrays with value\n",
|
||||
"a = np.zeros(4); print(f\"np.zeros(4) : a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")\n",
|
||||
"a = np.zeros((4,)); print(f\"np.zeros(4,) : a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")\n",
|
||||
"a = np.random.random_sample(4); print(f\"np.random.random_sample(4): a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Some data creation routines do not take a shape tuple:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# NumPy routines which allocate memory and fill arrays with value but do not accept shape as input argument\n",
|
||||
"a = np.arange(4.); print(f\"np.arange(4.): a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")\n",
|
||||
"a = np.random.rand(4); print(f\"np.random.rand(4): a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"values can be specified manually as well. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# NumPy routines which allocate memory and fill with user specified values\n",
|
||||
"a = np.array([5,4,3,2]); print(f\"np.array([5,4,3,2]): a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")\n",
|
||||
"a = np.array([5.,4,3,2]); print(f\"np.array([5.,4,3,2]): a = {a}, a shape = {a.shape}, a data type = {a.dtype}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"These have all created a one-dimensional vector `a` with four elements. `a.shape` returns the dimensions. Here we see a.shape = `(4,)` indicating a 1-d array with 4 elements. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.4\"></a>\n",
|
||||
"## 3.4 Operations on Vectors\n",
|
||||
"Let's explore some operations using vectors.\n",
|
||||
"<a name=\"toc_40015_3.4.1\"></a>\n",
|
||||
"### 3.4.1 Indexing\n",
|
||||
"Elements of vectors can be accessed via indexing and slicing. NumPy provides a very complete set of indexing and slicing capabilities. We will explore only the basics needed for the course here. Reference [Slicing and Indexing](https://NumPy.org/doc/stable/reference/arrays.indexing.html) for more details. \n",
|
||||
"**Indexing** means referring to *an element* of an array by its position within the array. \n",
|
||||
"**Slicing** means getting a *subset* of elements from an array based on their indices. \n",
|
||||
"NumPy starts indexing at zero so the 3rd element of an vector $\\mathbf{a}$ is `a[2]`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#vector indexing operations on 1-D vectors\n",
|
||||
"a = np.arange(10)\n",
|
||||
"print(a)\n",
|
||||
"\n",
|
||||
"#access an element\n",
|
||||
"print(f\"a[2].shape: {a[2].shape} a[2] = {a[2]}, Accessing an element returns a scalar\")\n",
|
||||
"\n",
|
||||
"# access the last element, negative indexes count from the end\n",
|
||||
"print(f\"a[-1] = {a[-1]}\")\n",
|
||||
"\n",
|
||||
"#indexs must be within the range of the vector or they will produce and error\n",
|
||||
"try:\n",
|
||||
" c = a[10]\n",
|
||||
"except Exception as e:\n",
|
||||
" print(\"The error message you'll see is:\")\n",
|
||||
" print(e)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.4.2\"></a>\n",
|
||||
"### 3.4.2 Slicing\n",
|
||||
"Slicing creates an array of indices using a set of three values (`start:stop:step`). A subset of values is also valid. Its use is best explained by example:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#vector slicing operations\n",
|
||||
"a = np.arange(10)\n",
|
||||
"print(f\"a = {a}\")\n",
|
||||
"\n",
|
||||
"#access 5 consecutive elements (start:stop:step)\n",
|
||||
"c = a[2:7:1]; print(\"a[2:7:1] = \", c)\n",
|
||||
"\n",
|
||||
"# access 3 elements separated by two \n",
|
||||
"c = a[2:7:2]; print(\"a[2:7:2] = \", c)\n",
|
||||
"\n",
|
||||
"# access all elements index 3 and above\n",
|
||||
"c = a[3:]; print(\"a[3:] = \", c)\n",
|
||||
"\n",
|
||||
"# access all elements below index 3\n",
|
||||
"c = a[:3]; print(\"a[:3] = \", c)\n",
|
||||
"\n",
|
||||
"# access all elements\n",
|
||||
"c = a[:]; print(\"a[:] = \", c)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.4.3\"></a>\n",
|
||||
"### 3.4.3 Single vector operations\n",
|
||||
"There are a number of useful operations that involve operations on a single vector."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"a = np.array([1,2,3,4])\n",
|
||||
"print(f\"a : {a}\")\n",
|
||||
"# negate elements of a\n",
|
||||
"b = -a \n",
|
||||
"print(f\"b = -a : {b}\")\n",
|
||||
"\n",
|
||||
"# sum all elements of a, returns a scalar\n",
|
||||
"b = np.sum(a) \n",
|
||||
"print(f\"b = np.sum(a) : {b}\")\n",
|
||||
"\n",
|
||||
"b = np.mean(a)\n",
|
||||
"print(f\"b = np.mean(a): {b}\")\n",
|
||||
"\n",
|
||||
"b = a**2\n",
|
||||
"print(f\"b = a**2 : {b}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.4.4\"></a>\n",
|
||||
"### 3.4.4 Vector Vector element-wise operations\n",
|
||||
"Most of the NumPy arithmetic, logical and comparison operations apply to vectors as well. These operators work on an element-by-element basis. For example \n",
|
||||
"$$ c_i = a_i + b_i $$"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"a = np.array([ 1, 2, 3, 4])\n",
|
||||
"b = np.array([-1,-2, 3, 4])\n",
|
||||
"print(f\"Binary operators work element wise: {a + b}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Of course, for this to work correctly, the vectors must be of the same size:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#try a mismatched vector operation\n",
|
||||
"c = np.array([1, 2])\n",
|
||||
"try:\n",
|
||||
" d = a + c\n",
|
||||
"except Exception as e:\n",
|
||||
" print(\"The error message you'll see is:\")\n",
|
||||
" print(e)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.4.5\"></a>\n",
|
||||
"### 3.4.5 Scalar Vector operations\n",
|
||||
"Vectors can be 'scaled' by scalar values. A scalar value is just a number. The scalar multiplies all the elements of the vector."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"a = np.array([1, 2, 3, 4])\n",
|
||||
"\n",
|
||||
"# multiply a by a scalar\n",
|
||||
"b = 5 * a \n",
|
||||
"print(f\"b = 5 * a : {b}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.4.6\"></a>\n",
|
||||
"### 3.4.6 Vector Vector dot product\n",
|
||||
"The dot product is a mainstay of Linear Algebra and NumPy. This is an operation used extensively in this course and should be well understood. The dot product is shown below."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<img src=\"./images/C1_W2_Lab04_dot_notrans.gif\" width=800> "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The dot product multiplies the values in two vectors element-wise and then sums the result.\n",
|
||||
"Vector dot product requires the dimensions of the two vectors to be the same. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's implement our own version of the dot product below:\n",
|
||||
"\n",
|
||||
"**Using a for loop**, implement a function which returns the dot product of two vectors. The function to return given inputs $a$ and $b$:\n",
|
||||
"$$ x = \\sum_{i=0}^{n-1} a_i b_i $$\n",
|
||||
"Assume both `a` and `b` are the same shape."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def my_dot(a, b): \n",
|
||||
" \"\"\"\n",
|
||||
" Compute the dot product of two vectors\n",
|
||||
" \n",
|
||||
" Args:\n",
|
||||
" a (ndarray (n,)): input vector \n",
|
||||
" b (ndarray (n,)): input vector with same dimension as a\n",
|
||||
" \n",
|
||||
" Returns:\n",
|
||||
" x (scalar): \n",
|
||||
" \"\"\"\n",
|
||||
" x=0\n",
|
||||
" for i in range(a.shape[0]):\n",
|
||||
" x = x + a[i] * b[i]\n",
|
||||
" return x"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# test 1-D\n",
|
||||
"a = np.array([1, 2, 3, 4])\n",
|
||||
"b = np.array([-1, 4, 3, 2])\n",
|
||||
"print(f\"my_dot(a, b) = {my_dot(a, b)}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Note, the dot product is expected to return a scalar value. \n",
|
||||
"\n",
|
||||
"Let's try the same operations using `np.dot`. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# test 1-D\n",
|
||||
"a = np.array([1, 2, 3, 4])\n",
|
||||
"b = np.array([-1, 4, 3, 2])\n",
|
||||
"c = np.dot(a, b)\n",
|
||||
"print(f\"NumPy 1-D np.dot(a, b) = {c}, np.dot(a, b).shape = {c.shape} \") \n",
|
||||
"c = np.dot(b, a)\n",
|
||||
"print(f\"NumPy 1-D np.dot(b, a) = {c}, np.dot(a, b).shape = {c.shape} \")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Above, you will note that the results for 1-D matched our implementation."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_3.4.7\"></a>\n",
|
||||
"### 3.4.7 The Need for Speed: vector vs for loop\n",
|
||||
"We utilized the NumPy library because it improves speed memory efficiency. Let's demonstrate:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"np.random.seed(1)\n",
|
||||
"a = np.random.rand(10000000) # very large arrays\n",
|
||||
"b = np.random.rand(10000000)\n",
|
||||
"\n",
|
||||
"tic = time.time() # capture start time\n",
|
||||
"c = np.dot(a, b)\n",
|
||||
"toc = time.time() # capture end time\n",
|
||||
"\n",
|
||||
"print(f\"np.dot(a, b) = {c:.4f}\")\n",
|
||||
"print(f\"Vectorized version duration: {1000*(toc-tic):.4f} ms \")\n",
|
||||
"\n",
|
||||
"tic = time.time() # capture start time\n",
|
||||
"c = my_dot(a,b)\n",
|
||||
"toc = time.time() # capture end time\n",
|
||||
"\n",
|
||||
"print(f\"my_dot(a, b) = {c:.4f}\")\n",
|
||||
"print(f\"loop version duration: {1000*(toc-tic):.4f} ms \")\n",
|
||||
"\n",
|
||||
"del(a);del(b) #remove these big arrays from memory"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"So, vectorization provides a large speed up in this example. This is because NumPy makes better use of available data parallelism in the underlying hardware. GPU's and modern CPU's implement Single Instruction, Multiple Data (SIMD) pipelines allowing multiple operations to be issued in parallel. This is critical in Machine Learning where the data sets are often very large."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_12345_3.4.8\"></a>\n",
|
||||
"### 3.4.8 Vector Vector operations in Course 1\n",
|
||||
"Vector Vector operations will appear frequently in course 1. Here is why:\n",
|
||||
"- Going forward, our examples will be stored in an array, `X_train` of dimension (m,n). This will be explained more in context, but here it is important to note it is a 2 Dimensional array or matrix (see next section on matrices).\n",
|
||||
"- `w` will be a 1-dimensional vector of shape (n,).\n",
|
||||
"- we will perform operations by looping through the examples, extracting each example to work on individually by indexing X. For example:`X[i]`\n",
|
||||
"- `X[i]` returns a value of shape (n,), a 1-dimensional vector. Consequently, operations involving `X[i]` are often vector-vector. \n",
|
||||
"\n",
|
||||
"That is a somewhat lengthy explanation, but aligning and understanding the shapes of your operands is important when performing vector operations."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# show common Course 1 example\n",
|
||||
"X = np.array([[1],[2],[3],[4]])\n",
|
||||
"w = np.array([2])\n",
|
||||
"c = np.dot(X[1], w)\n",
|
||||
"\n",
|
||||
"print(f\"X[1] has shape {X[1].shape}\")\n",
|
||||
"print(f\"w has shape {w.shape}\")\n",
|
||||
"print(f\"c has shape {c.shape}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_4\"></a>\n",
|
||||
"# 4 Matrices\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_4.1\"></a>\n",
|
||||
"## 4.1 Abstract\n",
|
||||
"Matrices, are two dimensional arrays. The elements of a matrix are all of the same type. In notation, matrices are denoted with capitol, bold letter such as $\\mathbf{X}$. In this and other labs, `m` is often the number of rows and `n` the number of columns. The elements of a matrix can be referenced with a two dimensional index. In math settings, numbers in the index typically run from 1 to n. In computer science and these labs, indexing will run from 0 to n-1. \n",
|
||||
"<figure>\n",
|
||||
" <center> <img src=\"./images/C1_W2_Lab04_Matrices.PNG\" alt='missing' width=900><center/>\n",
|
||||
" <figcaption> Generic Matrix Notation, 1st index is row, 2nd is column </figcaption>\n",
|
||||
"<figure/>"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_4.2\"></a>\n",
|
||||
"## 4.2 NumPy Arrays\n",
|
||||
"\n",
|
||||
"NumPy's basic data structure is an indexable, n-dimensional *array* containing elements of the same type (`dtype`). These were described earlier. Matrices have a two-dimensional (2-D) index [m,n].\n",
|
||||
"\n",
|
||||
"In Course 1, 2-D matrices are used to hold training data. Training data is $m$ examples by $n$ features creating an (m,n) array. Course 1 does not do operations directly on matrices but typically extracts an example as a vector and operates on that. Below you will review: \n",
|
||||
"- data creation\n",
|
||||
"- slicing and indexing"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_4.3\"></a>\n",
|
||||
"## 4.3 Matrix Creation\n",
|
||||
"The same functions that created 1-D vectors will create 2-D or n-D arrays. Here are some examples\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Below, the shape tuple is provided to achieve a 2-D result. Notice how NumPy uses brackets to denote each dimension. Notice further than NumPy, when printing, will print one row per line.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"a = np.zeros((1, 5)) \n",
|
||||
"print(f\"a shape = {a.shape}, a = {a}\") \n",
|
||||
"\n",
|
||||
"a = np.zeros((2, 1)) \n",
|
||||
"print(f\"a shape = {a.shape}, a = {a}\") \n",
|
||||
"\n",
|
||||
"a = np.random.random_sample((1, 1)) \n",
|
||||
"print(f\"a shape = {a.shape}, a = {a}\") "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"One can also manually specify data. Dimensions are specified with additional brackets matching the format in the printing above."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# NumPy routines which allocate memory and fill with user specified values\n",
|
||||
"a = np.array([[5], [4], [3]]); print(f\" a shape = {a.shape}, np.array: a = {a}\")\n",
|
||||
"a = np.array([[5], # One can also\n",
|
||||
" [4], # separate values\n",
|
||||
" [3]]); #into separate rows\n",
|
||||
"print(f\" a shape = {a.shape}, np.array: a = {a}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_4.4\"></a>\n",
|
||||
"## 4.4 Operations on Matrices\n",
|
||||
"Let's explore some operations using matrices."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_4.4.1\"></a>\n",
|
||||
"### 4.4.1 Indexing\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Matrices include a second index. The two indexes describe [row, column]. Access can either return an element or a row/column. See below:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#vector indexing operations on matrices\n",
|
||||
"a = np.arange(6).reshape(-1, 2) #reshape is a convenient way to create matrices\n",
|
||||
"print(f\"a.shape: {a.shape}, \\na= {a}\")\n",
|
||||
"\n",
|
||||
"#access an element\n",
|
||||
"print(f\"\\na[2,0].shape: {a[2, 0].shape}, a[2,0] = {a[2, 0]}, type(a[2,0]) = {type(a[2, 0])} Accessing an element returns a scalar\\n\")\n",
|
||||
"\n",
|
||||
"#access a row\n",
|
||||
"print(f\"a[2].shape: {a[2].shape}, a[2] = {a[2]}, type(a[2]) = {type(a[2])}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"It is worth drawing attention to the last example. Accessing a matrix by just specifying the row will return a *1-D vector*."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Reshape** \n",
|
||||
"The previous example used [reshape](https://numpy.org/doc/stable/reference/generated/numpy.reshape.html) to shape the array. \n",
|
||||
"`a = np.arange(6).reshape(-1, 2) ` \n",
|
||||
"This line of code first created a *1-D Vector* of six elements. It then reshaped that vector into a *2-D* array using the reshape command. This could have been written: \n",
|
||||
"`a = np.arange(6).reshape(3, 2) ` \n",
|
||||
"To arrive at the same 3 row, 2 column array.\n",
|
||||
"The -1 argument tells the routine to compute the number of rows given the size of the array and the number of columns.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_4.4.2\"></a>\n",
|
||||
"### 4.4.2 Slicing\n",
|
||||
"Slicing creates an array of indices using a set of three values (`start:stop:step`). A subset of values is also valid. Its use is best explained by example:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#vector 2-D slicing operations\n",
|
||||
"a = np.arange(20).reshape(-1, 10)\n",
|
||||
"print(f\"a = \\n{a}\")\n",
|
||||
"\n",
|
||||
"#access 5 consecutive elements (start:stop:step)\n",
|
||||
"print(\"a[0, 2:7:1] = \", a[0, 2:7:1], \", a[0, 2:7:1].shape =\", a[0, 2:7:1].shape, \"a 1-D array\")\n",
|
||||
"\n",
|
||||
"#access 5 consecutive elements (start:stop:step) in two rows\n",
|
||||
"print(\"a[:, 2:7:1] = \\n\", a[:, 2:7:1], \", a[:, 2:7:1].shape =\", a[:, 2:7:1].shape, \"a 2-D array\")\n",
|
||||
"\n",
|
||||
"# access all elements\n",
|
||||
"print(\"a[:,:] = \\n\", a[:,:], \", a[:,:].shape =\", a[:,:].shape)\n",
|
||||
"\n",
|
||||
"# access all elements in one row (very common usage)\n",
|
||||
"print(\"a[1,:] = \", a[1,:], \", a[1,:].shape =\", a[1,:].shape, \"a 1-D array\")\n",
|
||||
"# same as\n",
|
||||
"print(\"a[1] = \", a[1], \", a[1].shape =\", a[1].shape, \"a 1-D array\")\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_40015_5.0\"></a>\n",
|
||||
"## Congratulations!\n",
|
||||
"In this lab you mastered the features of Python and NumPy that are needed for Course 1."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"dl_toc_settings": {
|
||||
"rndtag": "40015"
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.9.10"
|
||||
},
|
||||
"toc-autonumbering": false
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
648
work2/C1_W2_Lab02_Multiple_Variable_Soln.ipynb
Normal file
@ -0,0 +1,648 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Optional Lab: Multiple Variable Linear Regression\n",
|
||||
"\n",
|
||||
"In this lab, you will extend the data structures and previously developed routines to support multiple features. Several routines are updated making the lab appear lengthy, but it makes minor adjustments to previous routines making it quick to review.\n",
|
||||
"# Outline\n",
|
||||
"- [ 1.1 Goals](#toc_15456_1.1)\n",
|
||||
"- [ 1.2 Tools](#toc_15456_1.2)\n",
|
||||
"- [ 1.3 Notation](#toc_15456_1.3)\n",
|
||||
"- [2 Problem Statement](#toc_15456_2)\n",
|
||||
"- [ 2.1 Matrix X containing our examples](#toc_15456_2.1)\n",
|
||||
"- [ 2.2 Parameter vector w, b](#toc_15456_2.2)\n",
|
||||
"- [3 Model Prediction With Multiple Variables](#toc_15456_3)\n",
|
||||
"- [ 3.1 Single Prediction element by element](#toc_15456_3.1)\n",
|
||||
"- [ 3.2 Single Prediction, vector](#toc_15456_3.2)\n",
|
||||
"- [4 Compute Cost With Multiple Variables](#toc_15456_4)\n",
|
||||
"- [5 Gradient Descent With Multiple Variables](#toc_15456_5)\n",
|
||||
"- [ 5.1 Compute Gradient with Multiple Variables](#toc_15456_5.1)\n",
|
||||
"- [ 5.2 Gradient Descent With Multiple Variables](#toc_15456_5.2)\n",
|
||||
"- [6 Congratulations](#toc_15456_6)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_1.1\"></a>\n",
|
||||
"## 1.1 Goals\n",
|
||||
"- Extend our regression model routines to support multiple features\n",
|
||||
" - Extend data structures to support multiple features\n",
|
||||
" - Rewrite prediction, cost and gradient routines to support multiple features\n",
|
||||
" - Utilize NumPy `np.dot` to vectorize their implementations for speed and simplicity"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_1.2\"></a>\n",
|
||||
"## 1.2 Tools\n",
|
||||
"In this lab, we will make use of: \n",
|
||||
"- NumPy, a popular library for scientific computing\n",
|
||||
"- Matplotlib, a popular library for plotting data"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import copy, math\n",
|
||||
"import numpy as np\n",
|
||||
"import matplotlib.pyplot as plt\n",
|
||||
"plt.style.use('./deeplearning.mplstyle')\n",
|
||||
"np.set_printoptions(precision=2) # reduced display precision on numpy arrays"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_1.3\"></a>\n",
|
||||
"## 1.3 Notation\n",
|
||||
"Here is a summary of some of the notation you will encounter, updated for multiple features. \n",
|
||||
"\n",
|
||||
"|General <img width=70/> <br /> Notation <img width=70/> | Description<img width=350/>| Python (if applicable) |\n",
|
||||
"|: ------------|: ------------------------------------------------------------||\n",
|
||||
"| $a$ | scalar, non bold ||\n",
|
||||
"| $\\mathbf{a}$ | vector, bold ||\n",
|
||||
"| $\\mathbf{A}$ | matrix, bold capital ||\n",
|
||||
"| **Regression** | | | |\n",
|
||||
"| $\\mathbf{X}$ | training example matrix | `X_train` | \n",
|
||||
"| $\\mathbf{y}$ | training example targets | `y_train` \n",
|
||||
"| $\\mathbf{x}^{(i)}$, $y^{(i)}$ | $i_{th}$Training Example | `X[i]`, `y[i]`|\n",
|
||||
"| m | number of training examples | `m`|\n",
|
||||
"| n | number of features in each example | `n`|\n",
|
||||
"| $\\mathbf{w}$ | parameter: weight, | `w` |\n",
|
||||
"| $b$ | parameter: bias | `b` | \n",
|
||||
"| $f_{\\mathbf{w},b}(\\mathbf{x}^{(i)})$ | The result of the model evaluation at $\\mathbf{x^{(i)}}$ parameterized by $\\mathbf{w},b$: $f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) = \\mathbf{w} \\cdot \\mathbf{x}^{(i)}+b$ | `f_wb` | \n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_2\"></a>\n",
|
||||
"# 2 Problem Statement\n",
|
||||
"\n",
|
||||
"You will use the motivating example of housing price prediction. The training dataset contains three examples with four features (size, bedrooms, floors and, age) shown in the table below. Note that, unlike the earlier labs, size is in sqft rather than 1000 sqft. This causes an issue, which you will solve in the next lab!\n",
|
||||
"\n",
|
||||
"| Size (sqft) | Number of Bedrooms | Number of floors | Age of Home | Price (1000s dollars) | \n",
|
||||
"| ----------------| ------------------- |----------------- |--------------|-------------- | \n",
|
||||
"| 2104 | 5 | 1 | 45 | 460 | \n",
|
||||
"| 1416 | 3 | 2 | 40 | 232 | \n",
|
||||
"| 852 | 2 | 1 | 35 | 178 | \n",
|
||||
"\n",
|
||||
"You will build a linear regression model using these values so you can then predict the price for other houses. For example, a house with 1200 sqft, 3 bedrooms, 1 floor, 40 years old. \n",
|
||||
"\n",
|
||||
"Please run the following code cell to create your `X_train` and `y_train` variables."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"X_train = np.array([[2104, 5, 1, 45], [1416, 3, 2, 40], [852, 2, 1, 35]])\n",
|
||||
"y_train = np.array([460, 232, 178])"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_2.1\"></a>\n",
|
||||
"## 2.1 Matrix X containing our examples\n",
|
||||
"Similar to the table above, examples are stored in a NumPy matrix `X_train`. Each row of the matrix represents one example. When you have $m$ training examples ( $m$ is three in our example), and there are $n$ features (four in our example), $\\mathbf{X}$ is a matrix with dimensions ($m$, $n$) (m rows, n columns).\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"$$\\mathbf{X} = \n",
|
||||
"\\begin{pmatrix}\n",
|
||||
" x^{(0)}_0 & x^{(0)}_1 & \\cdots & x^{(0)}_{n-1} \\\\ \n",
|
||||
" x^{(1)}_0 & x^{(1)}_1 & \\cdots & x^{(1)}_{n-1} \\\\\n",
|
||||
" \\cdots \\\\\n",
|
||||
" x^{(m-1)}_0 & x^{(m-1)}_1 & \\cdots & x^{(m-1)}_{n-1} \n",
|
||||
"\\end{pmatrix}\n",
|
||||
"$$\n",
|
||||
"notation:\n",
|
||||
"- $\\mathbf{x}^{(i)}$ is vector containing example i. $\\mathbf{x}^{(i)}$ $ = (x^{(i)}_0, x^{(i)}_1, \\cdots,x^{(i)}_{n-1})$\n",
|
||||
"- $x^{(i)}_j$ is element j in example i. The superscript in parenthesis indicates the example number while the subscript represents an element. \n",
|
||||
"\n",
|
||||
"Display the input data."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# data is stored in numpy array/matrix\n",
|
||||
"print(f\"X Shape: {X_train.shape}, X Type:{type(X_train)})\")\n",
|
||||
"print(X_train)\n",
|
||||
"print(f\"y Shape: {y_train.shape}, y Type:{type(y_train)})\")\n",
|
||||
"print(y_train)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_2.2\"></a>\n",
|
||||
"## 2.2 Parameter vector w, b\n",
|
||||
"\n",
|
||||
"* $\\mathbf{w}$ is a vector with $n$ elements.\n",
|
||||
" - Each element contains the parameter associated with one feature.\n",
|
||||
" - in our dataset, n is 4.\n",
|
||||
" - notionally, we draw this as a column vector\n",
|
||||
"\n",
|
||||
"$$\\mathbf{w} = \\begin{pmatrix}\n",
|
||||
"w_0 \\\\ \n",
|
||||
"w_1 \\\\\n",
|
||||
"\\cdots\\\\\n",
|
||||
"w_{n-1}\n",
|
||||
"\\end{pmatrix}\n",
|
||||
"$$\n",
|
||||
"* $b$ is a scalar parameter. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"For demonstration, $\\mathbf{w}$ and $b$ will be loaded with some initial selected values that are near the optimal. $\\mathbf{w}$ is a 1-D NumPy vector."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"b_init = 785.1811367994083\n",
|
||||
"w_init = np.array([ 0.39133535, 18.75376741, -53.36032453, -26.42131618])\n",
|
||||
"print(f\"w_init shape: {w_init.shape}, b_init type: {type(b_init)}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_3\"></a>\n",
|
||||
"# 3 Model Prediction With Multiple Variables\n",
|
||||
"The model's prediction with multiple variables is given by the linear model:\n",
|
||||
"\n",
|
||||
"$$ f_{\\mathbf{w},b}(\\mathbf{x}) = w_0x_0 + w_1x_1 +... + w_{n-1}x_{n-1} + b \\tag{1}$$\n",
|
||||
"or in vector notation:\n",
|
||||
"$$ f_{\\mathbf{w},b}(\\mathbf{x}) = \\mathbf{w} \\cdot \\mathbf{x} + b \\tag{2} $$ \n",
|
||||
"where $\\cdot$ is a vector `dot product`\n",
|
||||
"\n",
|
||||
"To demonstrate the dot product, we will implement prediction using (1) and (2)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_3.1\"></a>\n",
|
||||
"## 3.1 Single Prediction element by element\n",
|
||||
"Our previous prediction multiplied one feature value by one parameter and added a bias parameter. A direct extension of our previous implementation of prediction to multiple features would be to implement (1) above using loop over each element, performing the multiply with its parameter and then adding the bias parameter at the end.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def predict_single_loop(x, w, b): \n",
|
||||
" \"\"\"\n",
|
||||
" single predict using linear regression\n",
|
||||
" \n",
|
||||
" Args:\n",
|
||||
" x (ndarray): Shape (n,) example with multiple features\n",
|
||||
" w (ndarray): Shape (n,) model parameters \n",
|
||||
" b (scalar): model parameter \n",
|
||||
" \n",
|
||||
" Returns:\n",
|
||||
" p (scalar): prediction\n",
|
||||
" \"\"\"\n",
|
||||
" n = x.shape[0]\n",
|
||||
" p = 0\n",
|
||||
" for i in range(n):\n",
|
||||
" p_i = x[i] * w[i] \n",
|
||||
" p = p + p_i \n",
|
||||
" p = p + b \n",
|
||||
" return p"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# get a row from our training data\n",
|
||||
"x_vec = X_train[0,:]\n",
|
||||
"print(f\"x_vec shape {x_vec.shape}, x_vec value: {x_vec}\")\n",
|
||||
"\n",
|
||||
"# make a prediction\n",
|
||||
"f_wb = predict_single_loop(x_vec, w_init, b_init)\n",
|
||||
"print(f\"f_wb shape {f_wb.shape}, prediction: {f_wb}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Note the shape of `x_vec`. It is a 1-D NumPy vector with 4 elements, (4,). The result, `f_wb` is a scalar."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_3.2\"></a>\n",
|
||||
"## 3.2 Single Prediction, vector\n",
|
||||
"\n",
|
||||
"Noting that equation (1) above can be implemented using the dot product as in (2) above. We can make use of vector operations to speed up predictions.\n",
|
||||
"\n",
|
||||
"Recall from the Python/Numpy lab that NumPy `np.dot()`[[link](https://numpy.org/doc/stable/reference/generated/numpy.dot.html)] can be used to perform a vector dot product. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def predict(x, w, b): \n",
|
||||
" \"\"\"\n",
|
||||
" single predict using linear regression\n",
|
||||
" Args:\n",
|
||||
" x (ndarray): Shape (n,) example with multiple features\n",
|
||||
" w (ndarray): Shape (n,) model parameters \n",
|
||||
" b (scalar): model parameter \n",
|
||||
" \n",
|
||||
" Returns:\n",
|
||||
" p (scalar): prediction\n",
|
||||
" \"\"\"\n",
|
||||
" p = np.dot(x, w) + b \n",
|
||||
" return p "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# get a row from our training data\n",
|
||||
"x_vec = X_train[0,:]\n",
|
||||
"print(f\"x_vec shape {x_vec.shape}, x_vec value: {x_vec}\")\n",
|
||||
"\n",
|
||||
"# make a prediction\n",
|
||||
"f_wb = predict(x_vec,w_init, b_init)\n",
|
||||
"print(f\"f_wb shape {f_wb.shape}, prediction: {f_wb}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The results and shapes are the same as the previous version which used looping. Going forward, `np.dot` will be used for these operations. The prediction is now a single statement. Most routines will implement it directly rather than calling a separate predict routine."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_4\"></a>\n",
|
||||
"# 4 Compute Cost With Multiple Variables\n",
|
||||
"The equation for the cost function with multiple variables $J(\\mathbf{w},b)$ is:\n",
|
||||
"$$J(\\mathbf{w},b) = \\frac{1}{2m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) - y^{(i)})^2 \\tag{3}$$ \n",
|
||||
"where:\n",
|
||||
"$$ f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) = \\mathbf{w} \\cdot \\mathbf{x}^{(i)} + b \\tag{4} $$ \n",
|
||||
"\n",
|
||||
"\n",
|
||||
"In contrast to previous labs, $\\mathbf{w}$ and $\\mathbf{x}^{(i)}$ are vectors rather than scalars supporting multiple features."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Below is an implementation of equations (3) and (4). Note that this uses a *standard pattern for this course* where a for loop over all `m` examples is used."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def compute_cost(X, y, w, b): \n",
|
||||
" \"\"\"\n",
|
||||
" compute cost\n",
|
||||
" Args:\n",
|
||||
" X (ndarray (m,n)): Data, m examples with n features\n",
|
||||
" y (ndarray (m,)) : target values\n",
|
||||
" w (ndarray (n,)) : model parameters \n",
|
||||
" b (scalar) : model parameter\n",
|
||||
" \n",
|
||||
" Returns:\n",
|
||||
" cost (scalar): cost\n",
|
||||
" \"\"\"\n",
|
||||
" m = X.shape[0]\n",
|
||||
" cost = 0.0\n",
|
||||
" for i in range(m): \n",
|
||||
" f_wb_i = np.dot(X[i], w) + b #(n,)(n,) = scalar (see np.dot)\n",
|
||||
" cost = cost + (f_wb_i - y[i])**2 #scalar\n",
|
||||
" cost = cost / (2 * m) #scalar \n",
|
||||
" return cost"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# Compute and display cost using our pre-chosen optimal parameters. \n",
|
||||
"cost = compute_cost(X_train, y_train, w_init, b_init)\n",
|
||||
"print(f'Cost at optimal w : {cost}')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Expected Result**: Cost at optimal w : 1.5578904045996674e-12"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_5\"></a>\n",
|
||||
"# 5 Gradient Descent With Multiple Variables\n",
|
||||
"Gradient descent for multiple variables:\n",
|
||||
"\n",
|
||||
"$$\\begin{align*} \\text{repeat}&\\text{ until convergence:} \\; \\lbrace \\newline\\;\n",
|
||||
"& w_j = w_j - \\alpha \\frac{\\partial J(\\mathbf{w},b)}{\\partial w_j} \\tag{5} \\; & \\text{for j = 0..n-1}\\newline\n",
|
||||
"&b\\ \\ = b - \\alpha \\frac{\\partial J(\\mathbf{w},b)}{\\partial b} \\newline \\rbrace\n",
|
||||
"\\end{align*}$$\n",
|
||||
"\n",
|
||||
"where, n is the number of features, parameters $w_j$, $b$, are updated simultaneously and where \n",
|
||||
"\n",
|
||||
"$$\n",
|
||||
"\\begin{align}\n",
|
||||
"\\frac{\\partial J(\\mathbf{w},b)}{\\partial w_j} &= \\frac{1}{m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) - y^{(i)})x_{j}^{(i)} \\tag{6} \\\\\n",
|
||||
"\\frac{\\partial J(\\mathbf{w},b)}{\\partial b} &= \\frac{1}{m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) - y^{(i)}) \\tag{7}\n",
|
||||
"\\end{align}\n",
|
||||
"$$\n",
|
||||
"* m is the number of training examples in the data set\n",
|
||||
"\n",
|
||||
" \n",
|
||||
"* $f_{\\mathbf{w},b}(\\mathbf{x}^{(i)})$ is the model's prediction, while $y^{(i)}$ is the target value\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_5.1\"></a>\n",
|
||||
"## 5.1 Compute Gradient with Multiple Variables\n",
|
||||
"An implementation for calculating the equations (6) and (7) is below. There are many ways to implement this. In this version, there is an\n",
|
||||
"- outer loop over all m examples. \n",
|
||||
" - $\\frac{\\partial J(\\mathbf{w},b)}{\\partial b}$ for the example can be computed directly and accumulated\n",
|
||||
" - in a second loop over all n features:\n",
|
||||
" - $\\frac{\\partial J(\\mathbf{w},b)}{\\partial w_j}$ is computed for each $w_j$.\n",
|
||||
" "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def compute_gradient(X, y, w, b): \n",
|
||||
" \"\"\"\n",
|
||||
" Computes the gradient for linear regression \n",
|
||||
" Args:\n",
|
||||
" X (ndarray (m,n)): Data, m examples with n features\n",
|
||||
" y (ndarray (m,)) : target values\n",
|
||||
" w (ndarray (n,)) : model parameters \n",
|
||||
" b (scalar) : model parameter\n",
|
||||
" \n",
|
||||
" Returns:\n",
|
||||
" dj_dw (ndarray (n,)): The gradient of the cost w.r.t. the parameters w. \n",
|
||||
" dj_db (scalar): The gradient of the cost w.r.t. the parameter b. \n",
|
||||
" \"\"\"\n",
|
||||
" m,n = X.shape #(number of examples, number of features)\n",
|
||||
" dj_dw = np.zeros((n,))\n",
|
||||
" dj_db = 0.\n",
|
||||
"\n",
|
||||
" for i in range(m): \n",
|
||||
" err = (np.dot(X[i], w) + b) - y[i] \n",
|
||||
" for j in range(n): \n",
|
||||
" dj_dw[j] = dj_dw[j] + err * X[i, j] \n",
|
||||
" dj_db = dj_db + err \n",
|
||||
" dj_dw = dj_dw / m \n",
|
||||
" dj_db = dj_db / m \n",
|
||||
" \n",
|
||||
" return dj_db, dj_dw"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#Compute and display gradient \n",
|
||||
"tmp_dj_db, tmp_dj_dw = compute_gradient(X_train, y_train, w_init, b_init)\n",
|
||||
"print(f'dj_db at initial w,b: {tmp_dj_db}')\n",
|
||||
"print(f'dj_dw at initial w,b: \\n {tmp_dj_dw}')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Expected Result**: \n",
|
||||
"dj_db at initial w,b: -1.6739251122999121e-06 \n",
|
||||
"dj_dw at initial w,b: \n",
|
||||
" [-2.73e-03 -6.27e-06 -2.22e-06 -6.92e-05] "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_5.2\"></a>\n",
|
||||
"## 5.2 Gradient Descent With Multiple Variables\n",
|
||||
"The routine below implements equation (5) above."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def gradient_descent(X, y, w_in, b_in, cost_function, gradient_function, alpha, num_iters): \n",
|
||||
" \"\"\"\n",
|
||||
" Performs batch gradient descent to learn w and b. Updates w and b by taking \n",
|
||||
" num_iters gradient steps with learning rate alpha\n",
|
||||
" \n",
|
||||
" Args:\n",
|
||||
" X (ndarray (m,n)) : Data, m examples with n features\n",
|
||||
" y (ndarray (m,)) : target values\n",
|
||||
" w_in (ndarray (n,)) : initial model parameters \n",
|
||||
" b_in (scalar) : initial model parameter\n",
|
||||
" cost_function : function to compute cost\n",
|
||||
" gradient_function : function to compute the gradient\n",
|
||||
" alpha (float) : Learning rate\n",
|
||||
" num_iters (int) : number of iterations to run gradient descent\n",
|
||||
" \n",
|
||||
" Returns:\n",
|
||||
" w (ndarray (n,)) : Updated values of parameters \n",
|
||||
" b (scalar) : Updated value of parameter \n",
|
||||
" \"\"\"\n",
|
||||
" \n",
|
||||
" # An array to store cost J and w's at each iteration primarily for graphing later\n",
|
||||
" J_history = []\n",
|
||||
" w = copy.deepcopy(w_in) #avoid modifying global w within function\n",
|
||||
" b = b_in\n",
|
||||
" \n",
|
||||
" for i in range(num_iters):\n",
|
||||
"\n",
|
||||
" # Calculate the gradient and update the parameters\n",
|
||||
" dj_db,dj_dw = gradient_function(X, y, w, b) ##None\n",
|
||||
"\n",
|
||||
" # Update Parameters using w, b, alpha and gradient\n",
|
||||
" w = w - alpha * dj_dw ##None\n",
|
||||
" b = b - alpha * dj_db ##None\n",
|
||||
" \n",
|
||||
" # Save cost J at each iteration\n",
|
||||
" if i<100000: # prevent resource exhaustion \n",
|
||||
" J_history.append( cost_function(X, y, w, b))\n",
|
||||
"\n",
|
||||
" # Print cost every at intervals 10 times or as many iterations if < 10\n",
|
||||
" if i% math.ceil(num_iters / 10) == 0:\n",
|
||||
" print(f\"Iteration {i:4d}: Cost {J_history[-1]:8.2f} \")\n",
|
||||
" \n",
|
||||
" return w, b, J_history #return final w,b and J history for graphing"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In the next cell you will test the implementation. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# initialize parameters\n",
|
||||
"initial_w = np.zeros_like(w_init)\n",
|
||||
"initial_b = 0.\n",
|
||||
"# some gradient descent settings\n",
|
||||
"iterations = 1000\n",
|
||||
"alpha = 5.0e-7\n",
|
||||
"# run gradient descent \n",
|
||||
"w_final, b_final, J_hist = gradient_descent(X_train, y_train, initial_w, initial_b,\n",
|
||||
" compute_cost, compute_gradient, \n",
|
||||
" alpha, iterations)\n",
|
||||
"print(f\"b,w found by gradient descent: {b_final:0.2f},{w_final} \")\n",
|
||||
"m,_ = X_train.shape\n",
|
||||
"for i in range(m):\n",
|
||||
" print(f\"prediction: {np.dot(X_train[i], w_final) + b_final:0.2f}, target value: {y_train[i]}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Expected Result**: \n",
|
||||
"b,w found by gradient descent: -0.00,[ 0.2 0. -0.01 -0.07] \n",
|
||||
"prediction: 426.19, target value: 460 \n",
|
||||
"prediction: 286.17, target value: 232 \n",
|
||||
"prediction: 171.47, target value: 178 "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# plot cost versus iteration \n",
|
||||
"fig, (ax1, ax2) = plt.subplots(1, 2, constrained_layout=True, figsize=(12, 4))\n",
|
||||
"ax1.plot(J_hist)\n",
|
||||
"ax2.plot(100 + np.arange(len(J_hist[100:])), J_hist[100:])\n",
|
||||
"ax1.set_title(\"Cost vs. iteration\"); ax2.set_title(\"Cost vs. iteration (tail)\")\n",
|
||||
"ax1.set_ylabel('Cost') ; ax2.set_ylabel('Cost') \n",
|
||||
"ax1.set_xlabel('iteration step') ; ax2.set_xlabel('iteration step') \n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"*These results are not inspiring*! Cost is still declining and our predictions are not very accurate. The next lab will explore how to improve on this."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"<a name=\"toc_15456_6\"></a>\n",
|
||||
"# 6 Congratulations!\n",
|
||||
"In this lab you:\n",
|
||||
"- Redeveloped the routines for linear regression, now with multiple variables.\n",
|
||||
"- Utilized NumPy `np.dot` to vectorize the implementations"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"dl_toc_settings": {
|
||||
"rndtag": "15456"
|
||||
},
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.6"
|
||||
},
|
||||
"toc-autonumbering": false
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
666
work2/C1_W2_Lab03_Feature_Scaling_and_Learning_Rate_Soln.ipynb
Normal file
@ -0,0 +1,666 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Optional Lab: Feature scaling and Learning Rate (Multi-variable)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Goals\n",
|
||||
"In this lab you will:\n",
|
||||
"- Utilize the multiple variables routines developed in the previous lab\n",
|
||||
"- run Gradient Descent on a data set with multiple features\n",
|
||||
"- explore the impact of the *learning rate alpha* on gradient descent\n",
|
||||
"- improve performance of gradient descent by *feature scaling* using z-score normalization"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Tools\n",
|
||||
"You will utilize the functions developed in the last lab as well as matplotlib and NumPy. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import numpy as np\n",
|
||||
"import matplotlib.pyplot as plt\n",
|
||||
"from lab_utils_multi import load_house_data, run_gradient_descent \n",
|
||||
"from lab_utils_multi import norm_plot, plt_equal_scale, plot_cost_i_w\n",
|
||||
"from lab_utils_common import dlc\n",
|
||||
"np.set_printoptions(precision=2)\n",
|
||||
"plt.style.use('./deeplearning.mplstyle')"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Notation\n",
|
||||
"\n",
|
||||
"|General <br /> Notation | Description| Python (if applicable) |\n",
|
||||
"|: ------------|: ------------------------------------------------------------||\n",
|
||||
"| $a$ | scalar, non bold ||\n",
|
||||
"| $\\mathbf{a}$ | vector, bold ||\n",
|
||||
"| $\\mathbf{A}$ | matrix, bold capital ||\n",
|
||||
"| **Regression** | | | |\n",
|
||||
"| $\\mathbf{X}$ | training example maxtrix | `X_train` | \n",
|
||||
"| $\\mathbf{y}$ | training example targets | `y_train` \n",
|
||||
"| $\\mathbf{x}^{(i)}$, $y^{(i)}$ | $i_{th}$Training Example | `X[i]`, `y[i]`|\n",
|
||||
"| m | number of training examples | `m`|\n",
|
||||
"| n | number of features in each example | `n`|\n",
|
||||
"| $\\mathbf{w}$ | parameter: weight, | `w` |\n",
|
||||
"| $b$ | parameter: bias | `b` | \n",
|
||||
"| $f_{\\mathbf{w},b}(\\mathbf{x}^{(i)})$ | The result of the model evaluation at $\\mathbf{x}^{(i)}$ parameterized by $\\mathbf{w},b$: $f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) = \\mathbf{w} \\cdot \\mathbf{x}^{(i)}+b$ | `f_wb` | \n",
|
||||
"|$\\frac{\\partial J(\\mathbf{w},b)}{\\partial w_j}$| the gradient or partial derivative of cost with respect to a parameter $w_j$ |`dj_dw[j]`| \n",
|
||||
"|$\\frac{\\partial J(\\mathbf{w},b)}{\\partial b}$| the gradient or partial derivative of cost with respect to a parameter $b$| `dj_db`|"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Problem Statement\n",
|
||||
"\n",
|
||||
"As in the previous labs, you will use the motivating example of housing price prediction. The training data set contains many examples with 4 features (size, bedrooms, floors and age) shown in the table below. Note, in this lab, the Size feature is in sqft while earlier labs utilized 1000 sqft. This data set is larger than the previous lab.\n",
|
||||
"\n",
|
||||
"We would like to build a linear regression model using these values so we can then predict the price for other houses - say, a house with 1200 sqft, 3 bedrooms, 1 floor, 40 years old. \n",
|
||||
"\n",
|
||||
"## Dataset: \n",
|
||||
"| Size (sqft) | Number of Bedrooms | Number of floors | Age of Home | Price (1000s dollars) | \n",
|
||||
"| ----------------| ------------------- |----------------- |--------------|----------------------- | \n",
|
||||
"| 952 | 2 | 1 | 65 | 271.5 | \n",
|
||||
"| 1244 | 3 | 2 | 64 | 232 | \n",
|
||||
"| 1947 | 3 | 2 | 17 | 509.8 | \n",
|
||||
"| ... | ... | ... | ... | ... |\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# load the dataset\n",
|
||||
"X_train, y_train = load_house_data()\n",
|
||||
"X_features = ['size(sqft)','bedrooms','floors','age']"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's view the dataset and its features by plotting each feature versus price."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"fig,ax=plt.subplots(1, 4, figsize=(12, 3), sharey=True)\n",
|
||||
"for i in range(len(ax)):\n",
|
||||
" ax[i].scatter(X_train[:,i],y_train)\n",
|
||||
" ax[i].set_xlabel(X_features[i])\n",
|
||||
"ax[0].set_ylabel(\"Price (1000's)\")\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Plotting each feature vs. the target, price, provides some indication of which features have the strongest influence on price. Above, increasing size also increases price. Bedrooms and floors don't seem to have a strong impact on price. Newer houses have higher prices than older houses."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name=\"toc_15456_5\"></a>\n",
|
||||
"## Gradient Descent With Multiple Variables\n",
|
||||
"Here are the equations you developed in the last lab on gradient descent for multiple variables.:\n",
|
||||
"\n",
|
||||
"$$\\begin{align*} \\text{repeat}&\\text{ until convergence:} \\; \\lbrace \\newline\\;\n",
|
||||
"& w_j := w_j - \\alpha \\frac{\\partial J(\\mathbf{w},b)}{\\partial w_j} \\tag{1} \\; & \\text{for j = 0..n-1}\\newline\n",
|
||||
"&b\\ \\ := b - \\alpha \\frac{\\partial J(\\mathbf{w},b)}{\\partial b} \\newline \\rbrace\n",
|
||||
"\\end{align*}$$\n",
|
||||
"\n",
|
||||
"where, n is the number of features, parameters $w_j$, $b$, are updated simultaneously and where \n",
|
||||
"\n",
|
||||
"$$\n",
|
||||
"\\begin{align}\n",
|
||||
"\\frac{\\partial J(\\mathbf{w},b)}{\\partial w_j} &= \\frac{1}{m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) - y^{(i)})x_{j}^{(i)} \\tag{2} \\\\\n",
|
||||
"\\frac{\\partial J(\\mathbf{w},b)}{\\partial b} &= \\frac{1}{m} \\sum\\limits_{i = 0}^{m-1} (f_{\\mathbf{w},b}(\\mathbf{x}^{(i)}) - y^{(i)}) \\tag{3}\n",
|
||||
"\\end{align}\n",
|
||||
"$$\n",
|
||||
"* m is the number of training examples in the data set\n",
|
||||
"\n",
|
||||
" \n",
|
||||
"* $f_{\\mathbf{w},b}(\\mathbf{x}^{(i)})$ is the model's prediction, while $y^{(i)}$ is the target value\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Learning Rate\n",
|
||||
"<figure>\n",
|
||||
" <img src=\"./images/C1_W2_Lab06_learningrate.PNG\" style=\"width:1200px;\" >\n",
|
||||
"</figure>\n",
|
||||
"The lectures discussed some of the issues related to setting the learning rate $\\alpha$. The learning rate controls the size of the update to the parameters. See equation (1) above. It is shared by all the parameters. \n",
|
||||
"\n",
|
||||
"Let's run gradient descent and try a few settings of $\\alpha$ on our data set"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### $\\alpha$ = 9.9e-7"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#set alpha to 9.9e-7\n",
|
||||
"_, _, hist = run_gradient_descent(X_train, y_train, 10, alpha = 9.9e-7)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"It appears the learning rate is too high. The solution does not converge. Cost is *increasing* rather than decreasing. Let's plot the result:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"plot_cost_i_w(X_train, y_train, hist)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The plot on the right shows the value of one of the parameters, $w_0$. At each iteration, it is overshooting the optimal value and as a result, cost ends up *increasing* rather than approaching the minimum. Note that this is not a completely accurate picture as there are 4 parameters being modified each pass rather than just one. This plot is only showing $w_0$ with the other parameters fixed at benign values. In this and later plots you may notice the blue and orange lines being slightly off."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"### $\\alpha$ = 9e-7\n",
|
||||
"Let's try a bit smaller value and see what happens."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#set alpha to 9e-7\n",
|
||||
"_,_,hist = run_gradient_descent(X_train, y_train, 10, alpha = 9e-7)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Cost is decreasing throughout the run showing that alpha is not too large. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"plot_cost_i_w(X_train, y_train, hist)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"On the left, you see that cost is decreasing as it should. On the right, you can see that $w_0$ is still oscillating around the minimum, but it is decreasing each iteration rather than increasing. Note above that `dj_dw[0]` changes sign with each iteration as `w[0]` jumps over the optimal value.\n",
|
||||
"This alpha value will converge. You can vary the number of iterations to see how it behaves."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### $\\alpha$ = 1e-7\n",
|
||||
"Let's try a bit smaller value for $\\alpha$ and see what happens."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#set alpha to 1e-7\n",
|
||||
"_,_,hist = run_gradient_descent(X_train, y_train, 10, alpha = 1e-7)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Cost is decreasing throughout the run showing that $\\alpha$ is not too large. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"plot_cost_i_w(X_train,y_train,hist)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"On the left, you see that cost is decreasing as it should. On the right you can see that $w_0$ is decreasing without crossing the minimum. Note above that `dj_w0` is negative throughout the run. This solution will also converge, though not quite as quickly as the previous example."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"tags": []
|
||||
},
|
||||
"source": [
|
||||
"## Feature Scaling \n",
|
||||
"<figure>\n",
|
||||
" <img src=\"./images/C1_W2_Lab06_featurescalingheader.PNG\" style=\"width:1200px;\" >\n",
|
||||
"</figure>\n",
|
||||
"The lectures described the importance of rescaling the dataset so the features have a similar range.\n",
|
||||
"If you are interested in the details of why this is the case, click on the 'details' header below. If not, the section below will walk through an implementation of how to do feature scaling."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<details>\n",
|
||||
"<summary>\n",
|
||||
" <font size='3', color='darkgreen'><b>Details</b></font>\n",
|
||||
"</summary>\n",
|
||||
"\n",
|
||||
"Let's look again at the situation with $\\alpha$ = 9e-7. This is pretty close to the maximum value we can set $\\alpha$ to without diverging. This is a short run showing the first few iterations:\n",
|
||||
"\n",
|
||||
"<figure>\n",
|
||||
" <img src=\"./images/C1_W2_Lab06_ShortRun.PNG\" style=\"width:1200px;\" >\n",
|
||||
"</figure>\n",
|
||||
"\n",
|
||||
"Above, while cost is being decreased, its clear that $w_0$ is making more rapid progress than the other parameters due to its much larger gradient.\n",
|
||||
"\n",
|
||||
"The graphic below shows the result of a very long run with $\\alpha$ = 9e-7. This takes several hours.\n",
|
||||
"\n",
|
||||
"<figure>\n",
|
||||
" <img src=\"./images/C1_W2_Lab06_LongRun.PNG\" style=\"width:1200px;\" >\n",
|
||||
"</figure>\n",
|
||||
" \n",
|
||||
"Above, you can see cost decreased slowly after its initial reduction. Notice the difference between `w0` and `w1`,`w2`,`w3` as well as `dj_dw0` and `dj_dw1-3`. `w0` reaches its near final value very quickly and `dj_dw0` has quickly decreased to a small value showing that `w0` is near the final value. The other parameters were reduced much more slowly.\n",
|
||||
"\n",
|
||||
"Why is this? Is there something we can improve? See below:\n",
|
||||
"<figure>\n",
|
||||
" <center> <img src=\"./images/C1_W2_Lab06_scale.PNG\" ></center>\n",
|
||||
"</figure> \n",
|
||||
"\n",
|
||||
"The figure above shows why $w$'s are updated unevenly. \n",
|
||||
"- $\\alpha$ is shared by all parameter updates ($w$'s and $b$).\n",
|
||||
"- the common error term is multiplied by the features for the $w$'s. (not $b$).\n",
|
||||
"- the features vary significantly in magnitude making some features update much faster than others. In this case, $w_0$ is multiplied by 'size(sqft)', which is generally > 1000, while $w_1$ is multiplied by 'number of bedrooms', which is generally 2-4. \n",
|
||||
" \n",
|
||||
"The solution is Feature Scaling."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The lectures discussed three different techniques: \n",
|
||||
"- Feature scaling, essentially dividing each positive feature by its maximum value, or more generally, rescale each feature by both its minimum and maximum values using (x-min)/(max-min). Both ways normalizes features to the range of -1 and 1, where the former method works for positive features which is simple and serves well for the lecture's example, and the latter method works for any features.\n",
|
||||
"- Mean normalization: $x_i := \\dfrac{x_i - \\mu_i}{max - min} $ \n",
|
||||
"- Z-score normalization which we will explore below. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"### z-score normalization \n",
|
||||
"After z-score normalization, all features will have a mean of 0 and a standard deviation of 1.\n",
|
||||
"\n",
|
||||
"To implement z-score normalization, adjust your input values as shown in this formula:\n",
|
||||
"$$x^{(i)}_j = \\dfrac{x^{(i)}_j - \\mu_j}{\\sigma_j} \\tag{4}$$ \n",
|
||||
"where $j$ selects a feature or a column in the $\\mathbf{X}$ matrix. $µ_j$ is the mean of all the values for feature (j) and $\\sigma_j$ is the standard deviation of feature (j).\n",
|
||||
"$$\n",
|
||||
"\\begin{align}\n",
|
||||
"\\mu_j &= \\frac{1}{m} \\sum_{i=0}^{m-1} x^{(i)}_j \\tag{5}\\\\\n",
|
||||
"\\sigma^2_j &= \\frac{1}{m} \\sum_{i=0}^{m-1} (x^{(i)}_j - \\mu_j)^2 \\tag{6}\n",
|
||||
"\\end{align}\n",
|
||||
"$$\n",
|
||||
"\n",
|
||||
">**Implementation Note:** When normalizing the features, it is important\n",
|
||||
"to store the values used for normalization - the mean value and the standard deviation used for the computations. After learning the parameters\n",
|
||||
"from the model, we often want to predict the prices of houses we have not\n",
|
||||
"seen before. Given a new x value (living room area and number of bed-\n",
|
||||
"rooms), we must first normalize x using the mean and standard deviation\n",
|
||||
"that we had previously computed from the training set.\n",
|
||||
"\n",
|
||||
"**Implementation**"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def zscore_normalize_features(X):\n",
|
||||
" \"\"\"\n",
|
||||
" computes X, zcore normalized by column\n",
|
||||
" \n",
|
||||
" Args:\n",
|
||||
" X (ndarray (m,n)) : input data, m examples, n features\n",
|
||||
" \n",
|
||||
" Returns:\n",
|
||||
" X_norm (ndarray (m,n)): input normalized by column\n",
|
||||
" mu (ndarray (n,)) : mean of each feature\n",
|
||||
" sigma (ndarray (n,)) : standard deviation of each feature\n",
|
||||
" \"\"\"\n",
|
||||
" # find the mean of each column/feature\n",
|
||||
" mu = np.mean(X, axis=0) # mu will have shape (n,)\n",
|
||||
" # find the standard deviation of each column/feature\n",
|
||||
" sigma = np.std(X, axis=0) # sigma will have shape (n,)\n",
|
||||
" # element-wise, subtract mu for that column from each example, divide by std for that column\n",
|
||||
" X_norm = (X - mu) / sigma \n",
|
||||
"\n",
|
||||
" return (X_norm, mu, sigma)\n",
|
||||
" \n",
|
||||
"#check our work\n",
|
||||
"#from sklearn.preprocessing import scale\n",
|
||||
"#scale(X_orig, axis=0, with_mean=True, with_std=True, copy=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's look at the steps involved in Z-score normalization. The plot below shows the transformation step by step."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"mu = np.mean(X_train,axis=0) \n",
|
||||
"sigma = np.std(X_train,axis=0) \n",
|
||||
"X_mean = (X_train - mu)\n",
|
||||
"X_norm = (X_train - mu)/sigma \n",
|
||||
"\n",
|
||||
"fig,ax=plt.subplots(1, 3, figsize=(12, 3))\n",
|
||||
"ax[0].scatter(X_train[:,0], X_train[:,3])\n",
|
||||
"ax[0].set_xlabel(X_features[0]); ax[0].set_ylabel(X_features[3]);\n",
|
||||
"ax[0].set_title(\"unnormalized\")\n",
|
||||
"ax[0].axis('equal')\n",
|
||||
"\n",
|
||||
"ax[1].scatter(X_mean[:,0], X_mean[:,3])\n",
|
||||
"ax[1].set_xlabel(X_features[0]); ax[0].set_ylabel(X_features[3]);\n",
|
||||
"ax[1].set_title(r\"X - $\\mu$\")\n",
|
||||
"ax[1].axis('equal')\n",
|
||||
"\n",
|
||||
"ax[2].scatter(X_norm[:,0], X_norm[:,3])\n",
|
||||
"ax[2].set_xlabel(X_features[0]); ax[0].set_ylabel(X_features[3]);\n",
|
||||
"ax[2].set_title(r\"Z-score normalized\")\n",
|
||||
"ax[2].axis('equal')\n",
|
||||
"plt.tight_layout(rect=[0, 0.03, 1, 0.95])\n",
|
||||
"fig.suptitle(\"distribution of features before, during, after normalization\")\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The plot above shows the relationship between two of the training set parameters, \"age\" and \"size(sqft)\". *These are plotted with equal scale*. \n",
|
||||
"- Left: Unnormalized: The range of values or the variance of the 'size(sqft)' feature is much larger than that of age\n",
|
||||
"- Middle: The first step removes the mean or average value from each feature. This leaves features that are centered around zero. It's difficult to see the difference for the 'age' feature, but 'size(sqft)' is clearly around zero.\n",
|
||||
"- Right: The second step divides by the standard deviation. This leaves both features centered at zero with a similar scale."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's normalize the data and compare it to the original data."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# normalize the original features\n",
|
||||
"X_norm, X_mu, X_sigma = zscore_normalize_features(X_train)\n",
|
||||
"print(f\"X_mu = {X_mu}, \\nX_sigma = {X_sigma}\")\n",
|
||||
"print(f\"Peak to Peak range by column in Raw X:{np.ptp(X_train,axis=0)}\") \n",
|
||||
"print(f\"Peak to Peak range by column in Normalized X:{np.ptp(X_norm,axis=0)}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The peak to peak range of each column is reduced from a factor of thousands to a factor of 2-3 by normalization."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"fig,ax=plt.subplots(1, 4, figsize=(12, 3))\n",
|
||||
"for i in range(len(ax)):\n",
|
||||
" norm_plot(ax[i],X_train[:,i],)\n",
|
||||
" ax[i].set_xlabel(X_features[i])\n",
|
||||
"ax[0].set_ylabel(\"count\");\n",
|
||||
"fig.suptitle(\"distribution of features before normalization\")\n",
|
||||
"plt.show()\n",
|
||||
"fig,ax=plt.subplots(1,4,figsize=(12,3))\n",
|
||||
"for i in range(len(ax)):\n",
|
||||
" norm_plot(ax[i],X_norm[:,i],)\n",
|
||||
" ax[i].set_xlabel(X_features[i])\n",
|
||||
"ax[0].set_ylabel(\"count\"); \n",
|
||||
"fig.suptitle(\"distribution of features after normalization\")\n",
|
||||
"\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Notice, above, the range of the normalized data (x-axis) is centered around zero and roughly +/- 2. Most importantly, the range is similar for each feature."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's re-run our gradient descent algorithm with normalized data.\n",
|
||||
"Note the **vastly larger value of alpha**. This will speed up gradient descent."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"w_norm, b_norm, hist = run_gradient_descent(X_norm, y_train, 1000, 1.0e-1, )"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The scaled features get very accurate results **much, much faster!**. Notice the gradient of each parameter is tiny by the end of this fairly short run. A learning rate of 0.1 is a good start for regression with normalized features.\n",
|
||||
"Let's plot our predictions versus the target values. Note, the prediction is made using the normalized feature while the plot is shown using the original feature values."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"#predict target using normalized features\n",
|
||||
"m = X_norm.shape[0]\n",
|
||||
"yp = np.zeros(m)\n",
|
||||
"for i in range(m):\n",
|
||||
" yp[i] = np.dot(X_norm[i], w_norm) + b_norm\n",
|
||||
"\n",
|
||||
" # plot predictions and targets versus original features \n",
|
||||
"fig,ax=plt.subplots(1,4,figsize=(12, 3),sharey=True)\n",
|
||||
"for i in range(len(ax)):\n",
|
||||
" ax[i].scatter(X_train[:,i],y_train, label = 'target')\n",
|
||||
" ax[i].set_xlabel(X_features[i])\n",
|
||||
" ax[i].scatter(X_train[:,i],yp,color=dlc[\"dlorange\"], label = 'predict')\n",
|
||||
"ax[0].set_ylabel(\"Price\"); ax[0].legend();\n",
|
||||
"fig.suptitle(\"target versus prediction using z-score normalized model\")\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The results look good. A few points to note:\n",
|
||||
"- with multiple features, we can no longer have a single plot showing results versus features.\n",
|
||||
"- when generating the plot, the normalized features were used. Any predictions using the parameters learned from a normalized training set must also be normalized."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Prediction**\n",
|
||||
"The point of generating our model is to use it to predict housing prices that are not in the data set. Let's predict the price of a house with 1200 sqft, 3 bedrooms, 1 floor, 40 years old. Recall, that you must normalize the data with the mean and standard deviation derived when the training data was normalized. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# First, normalize out example.\n",
|
||||
"x_house = np.array([1200, 3, 1, 40])\n",
|
||||
"x_house_norm = (x_house - X_mu) / X_sigma\n",
|
||||
"print(x_house_norm)\n",
|
||||
"x_house_predict = np.dot(x_house_norm, w_norm) + b_norm\n",
|
||||
"print(f\" predicted price of a house with 1200 sqft, 3 bedrooms, 1 floor, 40 years old = ${x_house_predict*1000:0.0f}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Cost Contours** \n",
|
||||
"<img align=\"left\" src=\"./images/C1_W2_Lab06_contours.PNG\" style=\"width:240px;\" >Another way to view feature scaling is in terms of the cost contours. When feature scales do not match, the plot of cost versus parameters in a contour plot is asymmetric. \n",
|
||||
"\n",
|
||||
"In the plot below, the scale of the parameters is matched. The left plot is the cost contour plot of w[0], the square feet versus w[1], the number of bedrooms before normalizing the features. The plot is so asymmetric, the curves completing the contours are not visible. In contrast, when the features are normalized, the cost contour is much more symmetric. The result is that updates to parameters during gradient descent can make equal progress for each parameter. \n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"plt_equal_scale(X_train, X_norm, y_train)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"## Congratulations!\n",
|
||||
"In this lab you:\n",
|
||||
"- utilized the routines for linear regression with multiple features you developed in previous labs\n",
|
||||
"- explored the impact of the learning rate $\\alpha$ on convergence \n",
|
||||
"- discovered the value of feature scaling using z-score normalization in speeding convergence"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Acknowledgments\n",
|
||||
"The housing data was derived from the [Ames Housing dataset](http://jse.amstat.org/v19n3/decock.pdf) compiled by Dean De Cock for use in data science education."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.6"
|
||||
},
|
||||
"toc-autonumbering": false
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
344
work2/C1_W2_Lab04_FeatEng_PolyReg_Soln.ipynb
Normal file
@ -0,0 +1,344 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Optional Lab: Feature Engineering and Polynomial Regression\n",
|
||||
"\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Goals\n",
|
||||
"In this lab you will:\n",
|
||||
"- explore feature engineering and polynomial regression which allows you to use the machinery of linear regression to fit very complicated, even very non-linear functions.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Tools\n",
|
||||
"You will utilize the function developed in previous labs as well as matplotlib and NumPy. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import numpy as np\n",
|
||||
"import matplotlib.pyplot as plt\n",
|
||||
"from lab_utils_multi import zscore_normalize_features, run_gradient_descent_feng\n",
|
||||
"np.set_printoptions(precision=2) # reduced display precision on numpy arrays"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name='FeatureEng'></a>\n",
|
||||
"# Feature Engineering and Polynomial Regression Overview\n",
|
||||
"\n",
|
||||
"Out of the box, linear regression provides a means of building models of the form:\n",
|
||||
"$$f_{\\mathbf{w},b} = w_0x_0 + w_1x_1+ ... + w_{n-1}x_{n-1} + b \\tag{1}$$ \n",
|
||||
"What if your features/data are non-linear or are combinations of features? For example, Housing prices do not tend to be linear with living area but penalize very small or very large houses resulting in the curves shown in the graphic above. How can we use the machinery of linear regression to fit this curve? Recall, the 'machinery' we have is the ability to modify the parameters $\\mathbf{w}$, $\\mathbf{b}$ in (1) to 'fit' the equation to the training data. However, no amount of adjusting of $\\mathbf{w}$,$\\mathbf{b}$ in (1) will achieve a fit to a non-linear curve.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<a name='PolynomialFeatures'></a>\n",
|
||||
"## Polynomial Features\n",
|
||||
"\n",
|
||||
"Above we were considering a scenario where the data was non-linear. Let's try using what we know so far to fit a non-linear curve. We'll start with a simple quadratic: $y = 1+x^2$\n",
|
||||
"\n",
|
||||
"You're familiar with all the routines we're using. They are available in the lab_utils.py file for review. We'll use [`np.c_[..]`](https://numpy.org/doc/stable/reference/generated/numpy.c_.html) which is a NumPy routine to concatenate along the column boundary."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# create target data\n",
|
||||
"x = np.arange(0, 20, 1)\n",
|
||||
"y = 1 + x**2\n",
|
||||
"X = x.reshape(-1, 1)\n",
|
||||
"\n",
|
||||
"model_w,model_b = run_gradient_descent_feng(X,y,iterations=1000, alpha = 1e-2)\n",
|
||||
"\n",
|
||||
"plt.scatter(x, y, marker='x', c='r', label=\"Actual Value\"); plt.title(\"no feature engineering\")\n",
|
||||
"plt.plot(x,X@model_w + model_b, label=\"Predicted Value\"); plt.xlabel(\"X\"); plt.ylabel(\"y\"); plt.legend(); plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Well, as expected, not a great fit. What is needed is something like $y= w_0x_0^2 + b$, or a **polynomial feature**.\n",
|
||||
"To accomplish this, you can modify the *input data* to *engineer* the needed features. If you swap the original data with a version that squares the $x$ value, then you can achieve $y= w_0x_0^2 + b$. Let's try it. Swap `X` for `X**2` below:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# create target data\n",
|
||||
"x = np.arange(0, 20, 1)\n",
|
||||
"y = 1 + x**2\n",
|
||||
"\n",
|
||||
"# Engineer features \n",
|
||||
"X = x**2 #<-- added engineered feature"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"X = X.reshape(-1, 1) #X should be a 2-D Matrix\n",
|
||||
"model_w,model_b = run_gradient_descent_feng(X, y, iterations=10000, alpha = 1e-5)\n",
|
||||
"\n",
|
||||
"plt.scatter(x, y, marker='x', c='r', label=\"Actual Value\"); plt.title(\"Added x**2 feature\")\n",
|
||||
"plt.plot(x, np.dot(X,model_w) + model_b, label=\"Predicted Value\"); plt.xlabel(\"x\"); plt.ylabel(\"y\"); plt.legend(); plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Great! near perfect fit. Notice the values of $\\mathbf{w}$ and b printed right above the graph: `w,b found by gradient descent: w: [1.], b: 0.0490`. Gradient descent modified our initial values of $\\mathbf{w},b $ to be (1.0,0.049) or a model of $y=1*x_0^2+0.049$, very close to our target of $y=1*x_0^2+1$. If you ran it longer, it could be a better match. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Selecting Features\n",
|
||||
"<a name='GDF'></a>\n",
|
||||
"Above, we knew that an $x^2$ term was required. It may not always be obvious which features are required. One could add a variety of potential features to try and find the most useful. For example, what if we had instead tried : $y=w_0x_0 + w_1x_1^2 + w_2x_2^3+b$ ? \n",
|
||||
"\n",
|
||||
"Run the next cells. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# create target data\n",
|
||||
"x = np.arange(0, 20, 1)\n",
|
||||
"y = x**2\n",
|
||||
"\n",
|
||||
"# engineer features .\n",
|
||||
"X = np.c_[x, x**2, x**3] #<-- added engineered feature"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"model_w,model_b = run_gradient_descent_feng(X, y, iterations=10000, alpha=1e-7)\n",
|
||||
"\n",
|
||||
"plt.scatter(x, y, marker='x', c='r', label=\"Actual Value\"); plt.title(\"x, x**2, x**3 features\")\n",
|
||||
"plt.plot(x, X@model_w + model_b, label=\"Predicted Value\"); plt.xlabel(\"x\"); plt.ylabel(\"y\"); plt.legend(); plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Note the value of $\\mathbf{w}$, `[0.08 0.54 0.03]` and b is `0.0106`.This implies the model after fitting/training is:\n",
|
||||
"$$ 0.08x + 0.54x^2 + 0.03x^3 + 0.0106 $$\n",
|
||||
"Gradient descent has emphasized the data that is the best fit to the $x^2$ data by increasing the $w_1$ term relative to the others. If you were to run for a very long time, it would continue to reduce the impact of the other terms. \n",
|
||||
">Gradient descent is picking the 'correct' features for us by emphasizing its associated parameter\n",
|
||||
"\n",
|
||||
"Let's review this idea:\n",
|
||||
"- Intially, the features were re-scaled so they are comparable to each other\n",
|
||||
"- less weight value implies less important/correct feature, and in extreme, when the weight becomes zero or very close to zero, the associated feature is not useful in fitting the model to the data.\n",
|
||||
"- above, after fitting, the weight associated with the $x^2$ feature is much larger than the weights for $x$ or $x^3$ as it is the most useful in fitting the data. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### An Alternate View\n",
|
||||
"Above, polynomial features were chosen based on how well they matched the target data. Another way to think about this is to note that we are still using linear regression once we have created new features. Given that, the best features will be linear relative to the target. This is best understood with an example. "
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# create target data\n",
|
||||
"x = np.arange(0, 20, 1)\n",
|
||||
"y = x**2\n",
|
||||
"\n",
|
||||
"# engineer features .\n",
|
||||
"X = np.c_[x, x**2, x**3] #<-- added engineered feature\n",
|
||||
"X_features = ['x','x^2','x^3']"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"fig,ax=plt.subplots(1, 3, figsize=(12, 3), sharey=True)\n",
|
||||
"for i in range(len(ax)):\n",
|
||||
" ax[i].scatter(X[:,i],y)\n",
|
||||
" ax[i].set_xlabel(X_features[i])\n",
|
||||
"ax[0].set_ylabel(\"y\")\n",
|
||||
"plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Above, it is clear that the $x^2$ feature mapped against the target value $y$ is linear. Linear regression can then easily generate a model using that feature."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Scaling features\n",
|
||||
"As described in the last lab, if the data set has features with significantly different scales, one should apply feature scaling to speed gradient descent. In the example above, there is $x$, $x^2$ and $x^3$ which will naturally have very different scales. Let's apply Z-score normalization to our example."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# create target data\n",
|
||||
"x = np.arange(0,20,1)\n",
|
||||
"X = np.c_[x, x**2, x**3]\n",
|
||||
"print(f\"Peak to Peak range by column in Raw X:{np.ptp(X,axis=0)}\")\n",
|
||||
"\n",
|
||||
"# add mean_normalization \n",
|
||||
"X = zscore_normalize_features(X) \n",
|
||||
"print(f\"Peak to Peak range by column in Normalized X:{np.ptp(X,axis=0)}\")"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now we can try again with a more aggressive value of alpha:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"x = np.arange(0,20,1)\n",
|
||||
"y = x**2\n",
|
||||
"\n",
|
||||
"X = np.c_[x, x**2, x**3]\n",
|
||||
"X = zscore_normalize_features(X) \n",
|
||||
"\n",
|
||||
"model_w, model_b = run_gradient_descent_feng(X, y, iterations=100000, alpha=1e-1)\n",
|
||||
"\n",
|
||||
"plt.scatter(x, y, marker='x', c='r', label=\"Actual Value\"); plt.title(\"Normalized x x**2, x**3 feature\")\n",
|
||||
"plt.plot(x,X@model_w + model_b, label=\"Predicted Value\"); plt.xlabel(\"x\"); plt.ylabel(\"y\"); plt.legend(); plt.show()"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Feature scaling allows this to converge much faster. \n",
|
||||
"Note again the values of $\\mathbf{w}$. The $w_1$ term, which is the $x^2$ term is the most emphasized. Gradient descent has all but eliminated the $x^3$ term."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"### Complex Functions\n",
|
||||
"With feature engineering, even quite complex functions can be modeled:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"x = np.arange(0,20,1)\n",
|
||||
"y = np.cos(x/2)\n",
|
||||
"\n",
|
||||
"X = np.c_[x, x**2, x**3,x**4, x**5, x**6, x**7, x**8, x**9, x**10, x**11, x**12, x**13]\n",
|
||||
"X = zscore_normalize_features(X) \n",
|
||||
"\n",
|
||||
"model_w,model_b = run_gradient_descent_feng(X, y, iterations=1000000, alpha = 1e-1)\n",
|
||||
"\n",
|
||||
"plt.scatter(x, y, marker='x', c='r', label=\"Actual Value\"); plt.title(\"Normalized x x**2, x**3 feature\")\n",
|
||||
"plt.plot(x,X@model_w + model_b, label=\"Predicted Value\"); plt.xlabel(\"x\"); plt.ylabel(\"y\"); plt.legend(); plt.show()\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"\n",
|
||||
"## Congratulations!\n",
|
||||
"In this lab you:\n",
|
||||
"- learned how linear regression can model complex, even highly non-linear functions using feature engineering\n",
|
||||
"- recognized that it is important to apply feature scaling when doing feature engineering"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.7.6"
|
||||
},
|
||||
"toc-autonumbering": false
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|