# Ungraded Lab: Model Representation

In this ungraded lab, you will implement the model $f_w$ for linear regression with one variable.


## Problem Statement

You will use the motivating example of housing price prediction. There are two data points - a house with 1000 square feet sold for \\$200,000 and a house with 2000 square feet sold for \\$400,000.

Therefore, your dataset contains the following two points - 

| Size (feet$^2$)     | Price (1000s of dollars) |
| -------------------| ------------------------ |
| 1000               | 200                      |
| 2000               | 400                      |

You would like to fit a linear regression model (represented with a straight line) through these two points, so you can then predict price for other houses - say, a house with 1200 feet$^2$.

### Notation: `X` and `y`

For the next few labs, you will use lists in python to represent your dataset. As shown in the video:
- `X` represents input variables, also called input features (in this case - Size (feet$^2$)) and 
- `y` represents output variables, also known as target variables (in this case - Price (1000s of dollars)). 

Please run the following code cell to create your `X` and `y` variables.

In [None]:
# X is the input variable (size in square feet)
# y in the output variable (price in 1000s of dollars)
X = [1000, 2000] 
y = [200, 400]

### Number of training examples `m`
You will use `m` to denote the number of training examples. In Python, use the `len()` function to get the number of examples in a list.  You can get `m` by running the next code cell.

In [None]:
# m is the number of training examples
m = len(X)
print(f"Number of training examples is: {m}")

### Training example `x_i, y_i`

You will use (x$^i$, y$^i$) to denote the $i^{th}$ training example. Since Python is zero indexed, (x$^0$, y$^0$) is (1000, 200) and (x$^1$, y$^1$) is (2000, 400). 

Run the next code block below to get the $i^{th}$ training example in a Python list.

In [None]:
i = 0 # Change this to 1 to see (x^1, y^1)

x_i = X[i]
y_i = y[i]
print(f"(x^({i}), y^({i})) = ({x_i}, {y_i})")

### Plotting the data
First, let's run the cell below to import [matplotlib](http://matplotlib.org), which is a famous library to plot graphs in Python. 

In [None]:
import matplotlib.pyplot as plt

You can plot these two points using the `scatter()` function in the `matplotlib` library, as shown in the cell below. 
- The function arguments `marker` and `c` show the points as red crosses (the default is blue dots).

You can also use other functions in the `matplotlib` library to display the title and labels for the axes.

In [None]:
# Plot the data points
plt.scatter(X, y, marker='x', c='r')

# Set the title
plt.title("Housing Prices")
# Set the y-axis label
plt.ylabel('Price (in 1000s of dollars)')
# Set the x-axis label
plt.xlabel('Size (feet^2)')
plt.show()

## Model function

The model function for linear regression (which is a function that maps from `X` to `y`) is represented as 

$f(x) = w_0 + w_1x$

The formula above is how you can represent straight lines - different values of $w_0$ and $w_1$ give you different straight lines on the plot. Let's try to get a better intuition for this through the code blocks below.

Let's represent $w$ as a list in python, with $w_0$ as the first item in the list and $w_1$ as the second. 

Let's start with $w_0 = 3$ and $w_1 = 1$ 

### Note: You can come back to this cell to adjust the model's w0 and w1 parameters

In [None]:
# You can come back here later to adjust w0 and w1
w = [3, 1] 
print("w_0:", w[0])
print("w_1:", w[1])

Now, let's calculate the value of $f(x)$ for your two data points. You can explicitly write this out for each data point as - 

for $x^0$, `f = w[0]+w[1]*X[0]`

for $x^1$, `f = w[0]+w[1]*X[1]`

For a large number of data points, this can get unwieldy and repetitive. So instead, you can calculate the function output in a `for` loop as follows - 

```
f = []
for i in range(len(X)):
    f_x = w[0] + w[1]*X[i]
    f.append(f_x)
```

Paste the code shown above in the `calculate_model_output` function below.
Please recall that in Python, indentation is significant. Incorrect indentation may result in a Python error message.

In [None]:
def calculate_model_output(w, X):
    ### START CODE HERE ### 

    ### END CODE HERE ###
    return f

Now let's call the `calculate_model_output` function and plot the output using the `plot` method from `matplotlib` library.

In [None]:
f = calculate_model_output(w, X)

# Plot our model prediction
plt.plot(X, f, c='b',label='Our Prediction')

# Plot the data points
plt.scatter(X, y, marker='x', c='r',label='Actual Values')

# Set the title
plt.title("Housing Prices")
# Set the y-axis label
plt.ylabel('Price (in 1000s of dollars)')
# Set the x-axis label
plt.xlabel('Size (feet^2)')
plt.legend()
plt.show()

As you can see, setting $w_0 = 3$ and $w_1 = 1$ does not result in a line that fits our data. 

### Challenge
Try experimenting with different values of $w_0$ and $w_1$. What should the values be for getting a line that fits our data?

#### Tip:
You can use your mouse to click on the triangle to the left of the green "Hints" below to reveal some hints for choosing w0 and w1.

<details>
<summary>
    <font size='3', color='darkgreen'><b>Hints</b></font>
</summary>
    <p>
    <ul>
        <li>Try w0 = 1 and w1 = 0.5,  w = [1, 0.5] </li>
        <li>Try w0 = 0 and w1 = 0.2,  w = [0, 0.2] </li>
    </ul>
    </p>

### Prediction
Now that we have a model, we can use it to make our original prediction. Write the expression to predict the price of a house with 1200 feet^2. You can check your answer below.


In [None]:

print(f"{cost_1200sqft:.0f} thousand dollars")

<details>
<summary>
    <font size='3', color='darkgreen'><b>Answer</b></font>  
</summary>    

```
    w = [0, 0.2] 
    cost_1200sqft = w[0] + w[1]*1200
 ```

240 thousand dollars