382 lines
36 KiB
Plaintext
382 lines
36 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "specialized-stanley",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Ungraded Lab: Model Representation\n",
|
|
"\n",
|
|
"In this ungraded lab, you will implement the model $f$ for linear regression with one variable.\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "active-bernard",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Problem Statement\n",
|
|
"\n",
|
|
"You will use the motivating example of housing price prediction again. For sake of simplicity, let's assume that you just have two data points - a house with 1000 square feet sold for \\\\$200,000 and a house with 2000 square feet sold for \\\\$400,000.\n",
|
|
"\n",
|
|
"Therefore, your dataset contains the following two points - \n",
|
|
"\n",
|
|
"| Size (feet$^2$) | Price (1000s of dollars) |\n",
|
|
"| -------------------| ------------------------ |\n",
|
|
"| 1000 | 200 |\n",
|
|
"| 2000 | 400 |\n",
|
|
"\n",
|
|
"You'd like to fit a linear regression model (represented with a straight line) through these two points, so you can then predict price for other houses - say, a house with 1200 feet$^2$.\n",
|
|
"\n",
|
|
"### Notation: `X` and `y`\n",
|
|
"\n",
|
|
"For the next few labs, you will use lists in python to represent your dataset. As shown in the video:\n",
|
|
"- `X` represents input variables, also called input features (in this case - Size (feet$^2$)) and \n",
|
|
"- `y` represents output variables, also known as target variables (in this case - Price (1000s of dollars)). \n",
|
|
"\n",
|
|
"Please run the following code cell to create your `X` and `y` variables."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 10,
|
|
"id": "headed-custom",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# X is the input variable (size in square feet)\n",
|
|
"# y in the output variable (price in 1000s of dollars)\n",
|
|
"X = [1000, 2000] \n",
|
|
"y = [200, 400]"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "dependent-attribute",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Number of training examples `m`\n",
|
|
"You will use `m` to denote the number of training examples. In Python, use the `len()` function to get the number of examples in a list. You can get `m` by running the next code cell."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 11,
|
|
"id": "novel-vessel",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"Number of training examples is: 2\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# m is the number of training examples\n",
|
|
"m = len(X)\n",
|
|
"print(\"Number of training examples is: %d\" %m)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "permanent-uncertainty",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Training example `x_i, y_i`\n",
|
|
"\n",
|
|
"You will use (x$^i$, y$^i$) to denote the $i^{th}$ training example. Since Python is zero indexed, (x$^0$, y$^0$) is (1000, 200) and (x$^1$, y$^1$) is (2000, 300). \n",
|
|
"\n",
|
|
"Run the next code block below to get the $i^{th}$ training example in a Python list."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 12,
|
|
"id": "executive-chick",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"(x^(0), y^(0)) = (1000, 200)\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"i = 0 # Change this to 1 to see (x^1, y^1)\n",
|
|
"\n",
|
|
"x_i = X[i]\n",
|
|
"y_i = y[i]\n",
|
|
"print(\"(x^(%d), y^(%d)) = (%d, %d)\" %(i, i, x_i, y_i))"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "solid-sharing",
|
|
"metadata": {},
|
|
"source": [
|
|
"### Plotting the data\n",
|
|
"First, let's run the cell below to import [matplotlib](http://matplotlib.org), which is a famous library to plot graphs in Python. "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 13,
|
|
"id": "designing-sociology",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import matplotlib.pyplot as plt"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "virgin-germany",
|
|
"metadata": {},
|
|
"source": [
|
|
"You can plot these two points using the `scatter()` function in the `matplotlib` library, as shown in the cell below. \n",
|
|
"- The function arguments `marker` and `c` show the points as red crosses (the default is blue dots).\n",
|
|
"\n",
|
|
"You can also use other functions in the `matplotlib` library to display the title and labels for the axes."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 14,
|
|
"id": "reduced-cartoon",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"Text(0.5, 0, 'Size (feet^2)')"
|
|
]
|
|
},
|
|
"execution_count": 14,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
},
|
|
{
|
|
"data": {
|
|
"image/png": "\n",
|
|
"text/plain": [
|
|
"<Figure size 432x288 with 1 Axes>"
|
|
]
|
|
},
|
|
"metadata": {
|
|
"needs_background": "light"
|
|
},
|
|
"output_type": "display_data"
|
|
}
|
|
],
|
|
"source": [
|
|
"# Plot the data points\n",
|
|
"plt.scatter(X, y, marker='x', c='r')\n",
|
|
"\n",
|
|
"# Set the title\n",
|
|
"plt.title(\"Housing Prices\")\n",
|
|
"# Set the y-axis label\n",
|
|
"plt.ylabel('Price (in 1000s of dollars)')\n",
|
|
"# Set the x-axis label\n",
|
|
"plt.xlabel('Size (feet^2)')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "level-nirvana",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Model function\n",
|
|
"\n",
|
|
"The model function for linear regression (which is a function that maps from `X` to `y`) is represented as \n",
|
|
"\n",
|
|
"$f(x) = w_0 + w_1x$\n",
|
|
"\n",
|
|
"The formula above is how you can represent straight lines - different values of $w_0$ and $w_1$ give you different straight lines on the plot. Let's try to get a better intuition for this through the code blocks below.\n",
|
|
"\n",
|
|
"Let's represent $w$ as a list in python, with $w_0$ as the first item in the list and $w_1$ as the second. \n",
|
|
"\n",
|
|
"Let's start with $w_0 = 3$ and $w_1 = 1$ \n",
|
|
"\n",
|
|
"### Note: You can come back to this cell to adjust the model's w0 and w1 parameters"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 29,
|
|
"id": "temporal-investor",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"name": "stdout",
|
|
"output_type": "stream",
|
|
"text": [
|
|
"w_0: 1\n",
|
|
"w_1: 0.2\n"
|
|
]
|
|
}
|
|
],
|
|
"source": [
|
|
"# You can come back here later to adjust w0 and w1\n",
|
|
"w = [1, 0.2] \n",
|
|
"print(\"w_0:\", w[0])\n",
|
|
"print(\"w_1:\", w[1])"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "capable-westminster",
|
|
"metadata": {},
|
|
"source": [
|
|
"Now, let's calculate the value of $f(x)$ for your two data points. You can explicitly write this out for each data point as - \n",
|
|
"\n",
|
|
"for $x^0$, `f = w[0]+w[1]*X[0]`\n",
|
|
"\n",
|
|
"for $x^1$, `f = w[0]+w[1]*X[1]`\n",
|
|
"\n",
|
|
"For a large number of data points, this can get unwieldy and repetitive. So instead, you can calculate the function output in a `for` loop as follows - \n",
|
|
"\n",
|
|
"```\n",
|
|
"f = []\n",
|
|
"for i in range(len(X)):\n",
|
|
" f_x = w[0] + w[1]*X[i]\n",
|
|
" f.append(f_x)\n",
|
|
"```\n",
|
|
"\n",
|
|
"Paste the code shown above in the `calculate_model_output` function below."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 30,
|
|
"id": "tracked-bubble",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"def calculate_model_output(w, X):\n",
|
|
" \n",
|
|
" ### START CODE HERE ### \n",
|
|
" \n",
|
|
" ### END CODE HERE ###\n",
|
|
" return f"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "blind-vocabulary",
|
|
"metadata": {},
|
|
"source": [
|
|
"Now let's call the `calculate_model_output` function and plot the output using the `plot` method from `matplotlib` libarary."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 31,
|
|
"id": "blocked-franklin",
|
|
"metadata": {},
|
|
"outputs": [
|
|
{
|
|
"data": {
|
|
"text/plain": [
|
|
"Text(0.5, 0, 'Size (feet^2)')"
|
|
]
|
|
},
|
|
"execution_count": 31,
|
|
"metadata": {},
|
|
"output_type": "execute_result"
|
|
},
|
|
{
|
|
"data": {
|
|
"image/png": "\n",
|
|
"text/plain": [
|
|
"<Figure size 432x288 with 1 Axes>"
|
|
]
|
|
},
|
|
"metadata": {
|
|
"needs_background": "light"
|
|
},
|
|
"output_type": "display_data"
|
|
}
|
|
],
|
|
"source": [
|
|
"f = calculate_model_output(w, X)\n",
|
|
"\n",
|
|
"# Plot our hypothesis\n",
|
|
"plt.plot(X, f, c='b')\n",
|
|
"\n",
|
|
"# Plot the data points\n",
|
|
"plt.scatter(X, y, marker='x', c='r')\n",
|
|
"\n",
|
|
"# Set the title\n",
|
|
"plt.title(\"Housing Prices\")\n",
|
|
"# Set the y-axis label\n",
|
|
"plt.ylabel('Price (in 1000s of dollars)')\n",
|
|
"# Set the x-axis label\n",
|
|
"plt.xlabel('Size (feet^2)')"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "dominican-panel",
|
|
"metadata": {},
|
|
"source": [
|
|
"As you can see, $w_0 = 2$ and $w_1 = 1$ does not result in a line that fits our data. \n",
|
|
"\n",
|
|
"### Challenge\n",
|
|
"Try experimenting with different values of $w_0$ and $w_1$. What should the values be for getting a line that fits our data?\n",
|
|
"\n",
|
|
"#### Tip:\n",
|
|
"You can use your mouse to click on the triangle to the left of the green \"Hints\" below to reveal some hints for choosing w0 and w1."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "respected-median",
|
|
"metadata": {},
|
|
"source": [
|
|
"<details>\n",
|
|
"<summary>\n",
|
|
" <font size='3', color='darkgreen'><b>Hints</b></font>\n",
|
|
"</summary>\n",
|
|
" <p>\n",
|
|
" <ul>\n",
|
|
" <li>Try w0 = 1 and w1 = 0.5, or w = [0, 0.2] </li>\n",
|
|
" </ul>\n",
|
|
" </p>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "recreational-tennis",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": []
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.9.1"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|