{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "view-in-github"
},
"source": [
""
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "zm7ji3Gcb7yz"
},
"source": [
"# Chapter 11 - Linear Regression: Development vs Culture\n",
"\n",
"This chapter we use visualizations to understand Linear Regression.\n",
"\n",
"We assume a linear relationship between variable X and Y in the form of:\n",
"\n",
" `y = 2 * x + 1`\n",
"\n",
" This means each time x increases by one unit, y increases by two units. This is a positive linear relationship and can be visualized as a straight upward line. We consider this equation represents the underlying theory about X and Y.\n",
"\n",
"However, in reality, when we measure X and Y (for example, a person's height and weight), the instruments we use may not have 100% precision, and our measuring and reading may not be 100% accurate. This inprecision and inaccuracy lead to erroroneous data collected. \n",
"\n",
"\n",
"\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "gBFexiv2RCWo"
},
"outputs": [],
"source": [
"import numpy as np\n",
"import plotly.graph_objects as go"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "LHLp9l03gjqv"
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "K2o6dI5QdWaz"
},
"source": [
"# Generate Sample Data\n",
"First, let's use Numpy to generate some random samples for X and\n",
"calculate Y based on the X using the equation;\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "tvboUxVMUQZU",
"outputId": "dddc4ccb-0310-41a0-eead-a20fcde6429c"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[5 9 3 8 9 1 8 4 9 5]\n",
"[11 19 7 17 19 3 17 9 19 11]\n"
]
}
],
"source": [
"x= np.random.randint(low=1, high=10, size=100) # random integer between 1 and 10 (10 is not included)\n",
"\n",
"y = x * 2 + 1\n",
"\n",
"print(x[:10]) # print the first 10 x's\n",
"print(y[:10]) # print the last 10 y's"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "RI4vXarshITW"
},
"source": [
"## Visualize the sample data\n",
"\n",
"This shows a straight line since we did not account for any measurement errors."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "7H10sqvcUbX_",
"outputId": "42fdef27-9ad4-4d30-ab0b-e3ac0d502c38"
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"