{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Linear Regression from a Predictive Perspective\n", "\n", "In this Laboratory, we will see an example of linear regression from a predictive-analysis perspective. We well see all the best practices to split the data and select the model with best performance. \n", "\n", "## California Housing Dataset\n", "We'll use the California Housing dataset provided by the `scikit-learn` library. Let us load the data as a dataframe and have a look at the data description:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ ".. _california_housing_dataset:\n", "\n", "California Housing dataset\n", "--------------------------\n", "\n", "**Data Set Characteristics:**\n", "\n", " :Number of Instances: 20640\n", "\n", " :Number of Attributes: 8 numeric, predictive attributes and the target\n", "\n", " :Attribute Information:\n", " - MedInc median income in block group\n", " - HouseAge median house age in block group\n", " - AveRooms average number of rooms per household\n", " - AveBedrms average number of bedrooms per household\n", " - Population block group population\n", " - AveOccup average number of household members\n", " - Latitude block group latitude\n", " - Longitude block group longitude\n", "\n", " :Missing Attribute Values: None\n", "\n", "This dataset was obtained from the StatLib repository.\n", "https://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html\n", "\n", "The target variable is the median house value for California districts,\n", "expressed in hundreds of thousands of dollars ($100,000).\n", "\n", "This dataset was derived from the 1990 U.S. census, using one row per census\n", "block group. A block group is the smallest geographical unit for which the U.S.\n", "Census Bureau publishes sample data (a block group typically has a population\n", "of 600 to 3,000 people).\n", "\n", "A household is a group of people residing within a home. Since the average\n", "number of rooms and bedrooms in this dataset are provided per household, these\n", "columns may take surprisingly large values for block groups with few households\n", "and many empty houses, such as vacation resorts.\n", "\n", "It can be downloaded/loaded using the\n", ":func:`sklearn.datasets.fetch_california_housing` function.\n", "\n", ".. topic:: References\n", "\n", " - Pace, R. Kelley and Ronald Barry, Sparse Spatial Autoregressions,\n", " Statistics and Probability Letters, 33 (1997) 291-297\n", "\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MedIncHouseAgeAveRoomsAveBedrmsPopulationAveOccupLatitudeLongitude
08.325241.06.9841271.023810322.02.55555637.88-122.23
18.301421.06.2381370.9718802401.02.10984237.86-122.22
27.257452.08.2881361.073446496.02.80226037.85-122.24
35.643152.05.8173521.073059558.02.54794537.85-122.25
43.846252.06.2818531.081081565.02.18146737.85-122.25
...........................
206351.560325.05.0454551.133333845.02.56060639.48-121.09
206362.556818.06.1140351.315789356.03.12280739.49-121.21
206371.700017.05.2055431.1200921007.02.32563539.43-121.22
206381.867218.05.3295131.171920741.02.12320939.43-121.32
206392.388616.05.2547171.1622641387.02.61698139.37-121.24
\n", "

20640 rows × 8 columns

\n", "
" ], "text/plain": [ " MedInc HouseAge AveRooms AveBedrms Population AveOccup Latitude \\\n", "0 8.3252 41.0 6.984127 1.023810 322.0 2.555556 37.88 \n", "1 8.3014 21.0 6.238137 0.971880 2401.0 2.109842 37.86 \n", "2 7.2574 52.0 8.288136 1.073446 496.0 2.802260 37.85 \n", "3 5.6431 52.0 5.817352 1.073059 558.0 2.547945 37.85 \n", "4 3.8462 52.0 6.281853 1.081081 565.0 2.181467 37.85 \n", "... ... ... ... ... ... ... ... \n", "20635 1.5603 25.0 5.045455 1.133333 845.0 2.560606 39.48 \n", "20636 2.5568 18.0 6.114035 1.315789 356.0 3.122807 39.49 \n", "20637 1.7000 17.0 5.205543 1.120092 1007.0 2.325635 39.43 \n", "20638 1.8672 18.0 5.329513 1.171920 741.0 2.123209 39.43 \n", "20639 2.3886 16.0 5.254717 1.162264 1387.0 2.616981 39.37 \n", "\n", " Longitude \n", "0 -122.23 \n", "1 -122.22 \n", "2 -122.24 \n", "3 -122.25 \n", "4 -122.25 \n", "... ... \n", "20635 -121.09 \n", "20636 -121.21 \n", "20637 -121.22 \n", "20638 -121.32 \n", "20639 -121.24 \n", "\n", "[20640 rows x 8 columns]" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from sklearn.datasets import fetch_california_housing\n", "data = fetch_california_housing(as_frame=True)\n", "print(data['DESCR'])\n", "data['data']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The dataset contains $8$ variables. The independent variable is `MedInc`, the average value of houses in a given suburb, while all other variables are independent. For our aims, we will treat the data as a matrix of numerical variables. We could easily convert the dataframe in this format, but scikit-learn allows to load the data directly in this format:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(20640, 8) (20640,)\n" ] } ], "source": [ "# let us load the data without passing as_frame=True\n", "data = fetch_california_housing()\n", "X = data.data # the features\n", "y = data.target # the targets\n", "\n", "print(X.shape, y.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data Splitting\n", "We will split the dataset into a training, a validation and a test set using the `train_test_split` function:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(12384, 8) (4128, 8) (4128, 8)\n", "0.6 0.2 0.2\n" ] } ], "source": [ "from sklearn.model_selection import train_test_split\n", "# We'll do a 60:20:20 split\n", "val_prop = 0.2\n", "test_prop = 0.2\n", "\n", "# We'll split the data in two steps - first let's create a test set and a combined trainval set\n", "X_trainval, X_test, y_trainval, y_test = train_test_split(X, y, test_size=test_prop, random_state=42)\n", "\n", "# We'll now split the combined trainval into train and val set with the chosen proportions\n", "X_train, X_val, y_train, y_val = train_test_split(X_trainval, y_trainval, test_size=test_prop/(1-test_prop), random_state=42)\n", "\n", "# Let us check shapes and proportions\n", "print(X_train.shape, X_val.shape, X_test.shape)\n", "print(X_train.shape[0]/X.shape[0], X_val.shape[0]/X.shape[0], X_test.shape[0]/X.shape[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `train_test_split` function will split the data randomly. We are passing a fixed `random_state` to be able to replicate the results, but, in general, we should avoid that if we want the split to be truly random (though it is common to use random seeds for splitting in research). Note that, while the split is random, the function makes sure that the i-th element of the y variable corresponds to the i-th element of the X variable after the split.\n", "\n", "We will now reason mainly on the validation set, comparing different models and parameter configurations. Once we are done with our explorations, we'll check the final results on the test set.\n", "\n", "### Data Normalization\n", "We'll start by normalizing the data with z-scoring. This will prove useful later when we use certain algorithms (e.g., regularization). Note that we have not normalized data before because we need to **make sure that even mean and standard deviation parameters are not computed on the validation or test set**. While this may seem a trivial detail, it is important to follow this rule as strictly as possible to avoid bias. We can normalize the data with the `StandardScaler` object:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sklearn.preprocessing import StandardScaler\n", "scaler = StandardScaler()\n", "scaler.fit(X_train) # tunes the internal parameters of the standard scaler\n", "\n", "X_train = scaler.transform(X_train) # does not tune the parameters anymore\n", "X_val = scaler.transform(X_val)\n", "X_test = scaler.transform(X_test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Scikit-learn objects have a unified object-oriented interface. Each algorithm is an object (e.g., `StandardScaler`) with standard methods, such as:\n", " * A `fit` method to tune the internal parameters of the algorithm. In this case, it is a vector of means and a vector of standard deviations, but in the case of a linear regression it will be a vector of weights;\n", " * A `transform` method to transform the data. Note that in this stage no parameters are tuned, so we can safely apply this method to validation and test data. This method only applies to objects which transform the data, such as the standard scaler;\n", " * A `predict` method to obtain predictions. This applies only to predictive models, such as a linear regressor;\n", " * A `score` method to obtain a standard performance measure on the test or validation data. Also this only applies to predictive models.\n", "\n", "We will see examples of the last two methods later." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Linear Regressor\n", "\n", "We will start by training a linear regressor. We will use scikit-learn's implementation which does not provide statistical details (e.g., p-values) but is optimized for predictive modeling. The train/test interface is the same as above:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[ 0.86025287 0.1200073 -0.28039183 0.31208687 -0.00957447 -0.02615781\n", " -0.88821331 -0.86190739]\n", "2.0680774192504314\n" ] } ], "source": [ "from sklearn.linear_model import LinearRegression\n", "\n", "linear_regressor = LinearRegression()\n", "linear_regressor.fit(X_train, y_train) # this tunes the internal parameters of the model\n", "\n", "# Let us print the model's parameters\n", "print(linear_regressor.coef_) \n", "print(linear_regressor.intercept_)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can obtain predictions on the validation set using the `predict` method:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(4128,)\n" ] } ], "source": [ "y_val_pred = linear_regressor.predict(X_val)\n", "print(y_val_pred.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The function returns a vector of $4128$ predictions, one for each example in the validation set. We can now evaluate the predictions using regression evaluation measures. We will use the standard implementation of the main evaluation measures as provided by scikit-learn:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.533334644741504 0.5297481095803488 0.727837969317587\n" ] } ], "source": [ "from sklearn.metrics import mean_absolute_error\n", "from sklearn.metrics import mean_squared_error\n", "\n", "mae = mean_absolute_error(y_val, y_val_pred)\n", "mse = mean_squared_error(y_val, y_val_pred)\n", "rmse = mean_squared_error(y_val, y_val_pred, squared=False)\n", "\n", "print(mae, mse, rmse)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "All evaluation measures in scikit-learn follow the `evaluation_measure(y_true, y_pred)` convention. Note that the target variable `MedInc` is measured in tens of thousands of dollars, so an MAE of about $0.5$ corresponds to an average error of about $5000$ dollars. This is not that bad if we consider the mean and standard deviation of targets:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(2.068077419250646, 1.1509151433486544)" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "y_train.mean(), y_train.std()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Each predictor in scikit-learn also provides a `score` method which takes as input the validation (or test) inputs and outputs and computes some standard evaluation measures. By default the linear regressor in scikit-learn returns the $R^2$ value:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.6142000785497264" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "linear_regressor.score(X_val, y_val)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "While we are mainly interested in the performance of the model on the validation set (and ultimately on those on the test set), it is still useful to assess the performance on the training set for model diagnostics. For instance, if we see a big discrepancy between training and validation errors, then we can imagine that some overfitting is going on:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.5266487515751342 0.5143795055231386 0.7172025554354492\n" ] } ], "source": [ "y_train_pred = linear_regressor.predict(X_train)\n", "\n", "mae_train = mean_absolute_error(y_train, y_train_pred)\n", "mse_train = mean_squared_error(y_train, y_train_pred)\n", "rmse_train = mean_squared_error(y_train, y_train_pred, squared=False)\n", "\n", "print(mae_train, mse_train, rmse_train)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see that, while there are some differences between training and test performance, those are minor, so we can deduce that there is no significant overfitting going on.\n", "\n", "> We should note that we should **always expect a certain degree of overfitting, depending on the task, the data and the model**. When the difference between train and test error is large, and hence there is significant overfitting, we can try to reduce this effect with regularization techniques.\n", "\n", "To better compare models, we will now store the results of our analyses in a dataframe:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MethodParametersMAEMSERMSE
0Linear Regressor0.5333350.5297480.727838
\n", "
" ], "text/plain": [ " Method Parameters MAE MSE RMSE\n", "0 Linear Regressor 0.533335 0.529748 0.727838" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import pandas as pd\n", "california_housing_val_results = pd.DataFrame({\n", " 'Method': ['Linear Regressor'],\n", " 'Parameters': [''],\n", " 'MAE': [mae],\n", " 'MSE': [mse],\n", " 'RMSE': [rmse]\n", "})\n", "\n", "california_housing_val_results" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is common to use the word \"method\" to refer to a predictive algorithm or pipeline." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Non-Linear Regression\n", "Let us now try to fit a non-linear regressor. We will use polynomial regression with different polynomial degrees. To do so, we will perform an explicit polynomial expansion of the features using the `PolynomialFeatures` object. For convenience, we will define a function performing training and validation and returning both training and validation performance:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sklearn.preprocessing import PolynomialFeatures\n", "def trainval_polynomial(degree):\n", " pf = PolynomialFeatures(degree)\n", " # While the model does not have any learnable parameters, the \"fit\" method here is used to compute the output number of features\n", " pf.fit(X_train)\n", " X_train_poly = pf.transform(X_train)\n", " X_val_poly = pf.transform(X_val)\n", "\n", " polyreg = LinearRegression() # a Polynomial regressor is simply a linear regressor using polynomial features\n", " polyreg.fit(X_train_poly, y_train)\n", "\n", " y_poly_train_pred = polyreg.predict(X_train_poly)\n", " y_poly_val_pred = polyreg.predict(X_val_poly)\n", "\n", " mae_train = mean_absolute_error(y_train, y_poly_train_pred)\n", " mse_train = mean_squared_error(y_train, y_poly_train_pred)\n", " rmse_train = mean_squared_error(y_train, y_poly_train_pred, squared=False)\n", "\n", " mae_val = mean_absolute_error(y_val, y_poly_val_pred)\n", " mse_val = mean_squared_error(y_val, y_poly_val_pred)\n", " rmse_val = mean_squared_error(y_val, y_poly_val_pred, squared=False)\n", "\n", " return mae_train, mse_train, rmse_train, mae_val, mse_val, rmse_val" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us now see what happens with different degrees:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "DEGREE: 1 \n", " MAE MSE RMSE\n", "TRAIN 0.53 0.51 0.72 \n", "VAL 0.53 0.53 0.73\n", "\n", "\n", "DEGREE: 2 \n", " MAE MSE RMSE\n", "TRAIN 0.46 0.42 0.65 \n", "VAL 0.48 0.91 0.95\n", "\n", "\n", "DEGREE: 3 \n", " MAE MSE RMSE\n", "TRAIN 0.42 0.34 0.58 \n", "VAL 23.48 2157650.15 1468.89\n", "\n", "\n" ] } ], "source": [ "for d in range(1,4):\n", " print(\"DEGREE: {} \\n {:>8s} {:>8s} {:>8s}\\nTRAIN {:8.2f} {:8.2f} {:8.2f} \\nVAL {:8.2f} {:8.2f} {:8.2f}\\n\\n\".format(d,\"MAE\", \"MSE\", \"RMSE\", *trainval_polynomial(d)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Ridge Regularization\n", "We can see that, as the polynomial gets larger, the effect of overfitting increases. We can try to reduce this effect with Ridge or Lasso regularization. We'll focus on degree $2$ and try to apply ridge regression to it. Since Ridge regression relies on a parameter, we will try some values of the regularization parameter $\\alpha$ (as it is called by sklearn). Let us define a function for convenience:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sklearn.linear_model import Ridge\n", "def trainval_polynomial_ridge(degree, alpha):\n", " pf = PolynomialFeatures(degree)\n", " # While the model does not have any learnable parameters, the \"fit\" method here is used to compute the output number of features\n", " pf.fit(X_train)\n", " X_train_poly = pf.transform(X_train)\n", " X_val_poly = pf.transform(X_val)\n", "\n", " polyreg = Ridge(alpha=alpha) # a Polynomial regressor is simply a linear regressor using polynomial features\n", " polyreg.fit(X_train_poly, y_train)\n", "\n", " y_poly_train_pred = polyreg.predict(X_train_poly)\n", " y_poly_val_pred = polyreg.predict(X_val_poly)\n", "\n", " mae_train = mean_absolute_error(y_train, y_poly_train_pred)\n", " mse_train = mean_squared_error(y_train, y_poly_train_pred)\n", " rmse_train = mean_squared_error(y_train, y_poly_train_pred, squared=False)\n", "\n", " mae_val = mean_absolute_error(y_val, y_poly_val_pred)\n", " mse_val = mean_squared_error(y_val, y_poly_val_pred)\n", " rmse_val = mean_squared_error(y_val, y_poly_val_pred, squared=False)\n", "\n", " return mae_train, mse_train, rmse_train, mae_val, mse_val, rmse_val" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us now see the results for different values of $\\alpha$. $\\alpha=0$ means no regularization:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "RIDGE, DEGREE: 2\n", "Alpha: 0.00 \n", " MAE MSE RMSE\n", "TRAIN 0.46 0.42 0.65 \n", "VAL 0.48 0.91 0.96\n", "\n", "\n", "Alpha: 100.00 \n", " MAE MSE RMSE\n", "TRAIN 0.47 0.43 0.66 \n", "VAL 0.48 0.54 0.74\n", "\n", "\n", "Alpha: 200.00 \n", " MAE MSE RMSE\n", "TRAIN 0.48 0.44 0.67 \n", "VAL 0.49 0.51 0.72\n", "\n", "\n", "Alpha: 300.00 \n", " MAE MSE RMSE\n", "TRAIN 0.49 0.46 0.68 \n", "VAL 0.50 0.50 0.71\n", "\n", "\n", "Alpha: 400.00 \n", " MAE MSE RMSE\n", "TRAIN 0.50 0.47 0.68 \n", "VAL 0.51 0.51 0.71\n", "\n", "\n" ] } ], "source": [ "print(\"RIDGE, DEGREE: 2\")\n", "for alpha in [0,100,200,300,400]:\n", " print(\"Alpha: {:0.2f} \\n {:>8s} {:>8s} {:>8s}\\nTRAIN {:8.2f} {:8.2f} {:8.2f} \\nVAL {:8.2f} {:8.2f} {:8.2f}\\n\\n\".format(alpha,\"MAE\", \"MSE\", \"RMSE\", *trainval_polynomial_ridge(2,alpha)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see how, as alpha increases, the error on the training set increases, while the error on the test set decreases. For $\\alpha=300$ we obtained a slightly better result than our linear regressor: " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MethodParametersMAEMSERMSE
0Linear Regressor0.5333350.5297480.727838
\n", "
" ], "text/plain": [ " Method Parameters MAE MSE RMSE\n", "0 Linear Regressor 0.533335 0.529748 0.727838" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "california_housing_val_results" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us see if we can improve the results with a polynomial of degree 3:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "DEGREE: 3\n", "Alpha: 0.00 \n", " MAE MSE RMSE\n", "TRAIN 0.42 0.34 0.58 \n", "VAL 23.50 2162209.37 1470.45\n", "\n", "\n", "Alpha: 1.00 \n", " MAE MSE RMSE\n", "TRAIN 0.42 0.34 0.58 \n", "VAL 15.59 934580.07 966.74\n", "\n", "\n", "Alpha: 10.00 \n", " MAE MSE RMSE\n", "TRAIN 0.42 0.34 0.59 \n", "VAL 1.57 4867.65 69.77\n", "\n", "\n", "Alpha: 20.00 \n", " MAE MSE RMSE\n", "TRAIN 0.42 0.35 0.59 \n", "VAL 1.78 6690.78 81.80\n", "\n", "\n" ] } ], "source": [ "print(\"RIDGE, DEGREE: 3\")\n", "for alpha in [0,1,10,20]:\n", " print(\"Alpha: {:0.2f} \\n {:>8s} {:>8s} {:>8s}\\nTRAIN {:8.2f} {:8.2f} {:8.2f} \\nVAL {:8.2f} {:8.2f} {:8.2f}\\n\\n\".format(alpha,\"MAE\", \"MSE\", \"RMSE\", *trainval_polynomial_ridge(3,alpha)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us add the results of Polynomial regression of degree 2 with and without regularization:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MethodParametersMAEMSERMSE
0Linear Regressor0.5333350.5297480.727838
1Polynomial Regressordegree=20.4804480.9129760.955498
2Polynomial Ridge Regressordegree=2, alpha=3000.4992280.5041550.710039
\n", "
" ], "text/plain": [ " Method Parameters MAE MSE \\\n", "0 Linear Regressor 0.533335 0.529748 \n", "1 Polynomial Regressor degree=2 0.480448 0.912976 \n", "2 Polynomial Ridge Regressor degree=2, alpha=300 0.499228 0.504155 \n", "\n", " RMSE \n", "0 0.727838 \n", "1 0.955498 \n", "2 0.710039 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "poly2 = trainval_polynomial_ridge(2,0)\n", "poly2_ridge300 = trainval_polynomial_ridge(2,300)\n", "california_housing_val_results = pd.concat([\n", " california_housing_val_results,\n", " pd.DataFrame({'Method':'Polynomial Regressor', 'Parameters': 'degree=2', 'MAE':poly2[-3], 'MSE':poly2[-2], 'RMSE':poly2[-1]}, index=[1]),\n", " pd.DataFrame({'Method':'Polynomial Ridge Regressor', 'Parameters': 'degree=2, alpha=300', 'MAE':poly2_ridge300[-3], 'MSE':poly2_ridge300[-2], 'RMSE':poly2_ridge300[-1]}, index=[2])\n", "])\n", "california_housing_val_results" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Lasso Regression\n", "Let us now try the same with Lasso regression:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sklearn.linear_model import Lasso\n", "def trainval_polynomial_lasso(degree, alpha):\n", " pf = PolynomialFeatures(degree)\n", " # While the model does not have any learnable parameters, the \"fit\" method here is used to compute the output number of features\n", " pf.fit(X_train)\n", " X_train_poly = pf.transform(X_train)\n", " X_val_poly = pf.transform(X_val)\n", "\n", " polyreg = Lasso(alpha=alpha) # a Polynomial regressor is simply a linear regressor using polynomial features\n", " polyreg.fit(X_train_poly, y_train)\n", "\n", " y_poly_train_pred = polyreg.predict(X_train_poly)\n", " y_poly_val_pred = polyreg.predict(X_val_poly)\n", "\n", " mae_train = mean_absolute_error(y_train, y_poly_train_pred)\n", " mse_train = mean_squared_error(y_train, y_poly_train_pred)\n", " rmse_train = mean_squared_error(y_train, y_poly_train_pred, squared=False)\n", "\n", " mae_val = mean_absolute_error(y_val, y_poly_val_pred)\n", " mse_val = mean_squared_error(y_val, y_poly_val_pred)\n", " rmse_val = mean_squared_error(y_val, y_poly_val_pred, squared=False)\n", "\n", " return mae_train, mse_train, rmse_train, mae_val, mse_val, rmse_val" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "LSSO, DEGREE: 2\n", "Alpha: 0.02 \n", " MAE MSE RMSE\n", "TRAIN 0.52 0.51 0.71 \n", "VAL 0.54 1.19 1.09\n", "\n", "\n", "Alpha: 0.03 \n", " MAE MSE RMSE\n", "TRAIN 0.55 0.55 0.74 \n", "VAL 0.55 0.59 0.77\n", "\n", "\n", "Alpha: 0.04 \n", " MAE MSE RMSE\n", "TRAIN 0.56 0.57 0.76 \n", "VAL 0.57 0.59 0.77\n", "\n", "\n", "Alpha: 0.05 \n", " MAE MSE RMSE\n", "TRAIN 0.58 0.60 0.78 \n", "VAL 0.58 0.61 0.78\n", "\n", "\n", "Alpha: 0.06 \n", " MAE MSE RMSE\n", "TRAIN 0.60 0.63 0.80 \n", "VAL 0.60 0.64 0.80\n", "\n", "\n" ] } ], "source": [ "print(\"LSSO, DEGREE: 2\")\n", "for alpha in [0.02,0.03,0.04,0.05, 0.06]:\n", " print(\"Alpha: {:0.2f} \\n {:>8s} {:>8s} {:>8s}\\nTRAIN {:8.2f} {:8.2f} {:8.2f} \\nVAL {:8.2f} {:8.2f} {:8.2f}\\n\\n\".format(alpha,\"MAE\", \"MSE\", \"RMSE\", *trainval_polynomial_lasso(2,alpha)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lasso regression does not seem to improve results. Let us put the results obtained for $\\alpha=0.04$ to the dataframe:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MethodParametersMAEMSERMSE
0Linear Regressor0.5333350.5297480.727838
1Polynomial Regressordegree=20.4804480.9129760.955498
2Polynomial Ridge Regressordegree=2, alpha=3000.4992280.5041550.710039
3Polynomial Lasso Regressordegree=2, alpha=0.040.5673180.5901000.768180
\n", "
" ], "text/plain": [ " Method Parameters MAE MSE \\\n", "0 Linear Regressor 0.533335 0.529748 \n", "1 Polynomial Regressor degree=2 0.480448 0.912976 \n", "2 Polynomial Ridge Regressor degree=2, alpha=300 0.499228 0.504155 \n", "3 Polynomial Lasso Regressor degree=2, alpha=0.04 0.567318 0.590100 \n", "\n", " RMSE \n", "0 0.727838 \n", "1 0.955498 \n", "2 0.710039 \n", "3 0.768180 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "poly2_lasso004 = trainval_polynomial_lasso(2,0.04)\n", "california_housing_val_results = pd.concat([\n", " california_housing_val_results,\n", " pd.DataFrame({'Method':'Polynomial Lasso Regressor', 'Parameters': 'degree=2, alpha=0.04', 'MAE':poly2_lasso004[-3], 'MSE':poly2_lasso004[-2], 'RMSE':poly2_lasso004[-1]}, index=[3])\n", "])\n", "california_housing_val_results" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Grid Search\n", "\n", "Polynomial regression and ridge regression have parameters to optimize. We have so far optimized them manually. However, in practice, it is common to perform a grid search. This consists in defining a grid of possible values to try and train/validate many models, to finally choose the one with best performance.\n", "\n", "This can be done manually as shown in the following example:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Evaluating a=200 d=0 MSE=1.37\n", "Evaluating a=200 d=1 MSE=0.53\n", "Evaluating a=200 d=2 MSE=0.51\n", "Evaluating a=200 d=3 MSE=8161.95\n", "Evaluating a=200 d=4 MSE=10884172.40\n", "Evaluating a=225 d=0 MSE=1.37\n", "Evaluating a=225 d=1 MSE=0.53\n", "Evaluating a=225 d=2 MSE=0.51\n", "Evaluating a=225 d=3 MSE=6710.18\n", "Evaluating a=225 d=4 MSE=7243991.68\n", "Evaluating a=250 d=0 MSE=1.37\n", "Evaluating a=250 d=1 MSE=0.53\n", "Evaluating a=250 d=2 MSE=0.51\n", "Evaluating a=250 d=3 MSE=5542.92\n", "Evaluating a=250 d=4 MSE=4880680.92\n", "Evaluating a=275 d=0 MSE=1.37\n", "Evaluating a=275 d=1 MSE=0.53\n", "Evaluating a=275 d=2 MSE=0.50\n", "Evaluating a=275 d=3 MSE=4598.99\n", "Evaluating a=275 d=4 MSE=3312886.74\n", "Evaluating a=300 d=0 MSE=1.37\n", "Evaluating a=300 d=1 MSE=0.54\n", "Evaluating a=300 d=2 MSE=0.50\n", "Evaluating a=300 d=3 MSE=3831.00\n", "Evaluating a=300 d=4 MSE=2255901.13\n", "Evaluating a=325 d=0 MSE=1.37\n", "Evaluating a=325 d=1 MSE=0.54\n", "Evaluating a=325 d=2 MSE=0.50\n", "Evaluating a=325 d=3 MSE=3202.40\n", "Evaluating a=325 d=4 MSE=1534556.72\n", "Evaluating a=350 d=0 MSE=1.37\n", "Evaluating a=350 d=1 MSE=0.54\n", "Evaluating a=350 d=2 MSE=0.51\n", "Evaluating a=350 d=3 MSE=2684.99\n", "Evaluating a=350 d=4 MSE=1038222.45\n", "Evaluating a=375 d=0 MSE=1.37\n", "Evaluating a=375 d=1 MSE=0.54\n", "Evaluating a=375 d=2 MSE=0.51\n", "Evaluating a=375 d=3 MSE=2256.89\n", "Evaluating a=375 d=4 MSE=695227.56\n" ] }, { "data": { "text/plain": [ "(0.504155356305115, 300, 2)" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "def grid_search(alphas=range(200,400,25), degrees=range(5)):\n", " best_mse = np.inf\n", " for a in alphas:\n", " for d in degrees:\n", " print(f\"Evaluating a={a} d={d} MSE=\", end='')\n", " results = trainval_polynomial_ridge(d,a)\n", " mse = results[-2]\n", " print(f\"{mse:0.2f}\")\n", " if mse#sk-container-id-2 {color: black;}#sk-container-id-2 pre{padding: 0;}#sk-container-id-2 div.sk-toggleable {background-color: white;}#sk-container-id-2 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-2 label.sk-toggleable__label-arrow:before {content: \"▸\";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-2 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-2 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-2 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-2 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-2 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-2 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: \"▾\";}#sk-container-id-2 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-2 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-2 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-2 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-2 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-2 div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-2 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-2 div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-2 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-2 div.sk-item {position: relative;z-index: 1;}#sk-container-id-2 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-2 div.sk-item::before, #sk-container-id-2 div.sk-parallel-item::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-2 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-2 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-2 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-2 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-2 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-2 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-2 div.sk-label-container {text-align: center;}#sk-container-id-2 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-2 div.sk-text-repr-fallback {display: none;}
Pipeline(steps=[('scaler', StandardScaler()),\n",
       "                ('polynomial_expansion', PolynomialFeatures()),\n",
       "                ('ridge_regression', Ridge(alpha=300))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ], "text/plain": [ "Pipeline(steps=[('scaler', StandardScaler()),\n", " ('polynomial_expansion', PolynomialFeatures()),\n", " ('ridge_regression', Ridge(alpha=300))])" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# We use the notation \"object__parameter\" to identify parameter names\n", "polynomial_regressor.set_params(polynomial_expansion__degree=2, ridge_regression__alpha=300)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now fit and test the model as follows:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(0.4992282794518701, 0.504155356305115, 0.7100389822433096)" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "polynomial_regressor.fit(X_train, y_train)\n", "y_val_pred = polynomial_regressor.predict(X_val)\n", "\n", "mean_absolute_error(y_val, y_val_pred), mean_squared_error(y_val, y_val_pred), mean_squared_error(y_val, y_val_pred, squared=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Grid Search with Cross Validation\n", "Scikit-learn offers a powerful interface to perform grid search with cross validation. In this case, rather than using a fixed training set, a K-Fold validation is performed for each parameter choice in order to find the best performing parameter combination. This is convenient when a validation set is not available. We will combine this approach with the pipelines to easily automate the search of optimal parameters:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from sklearn.model_selection import GridSearchCV\n", "from sklearn.metrics import make_scorer\n", "\n", "gs = GridSearchCV(polynomial_regressor, param_grid={'polynomial_expansion__degree':range(0,5), 'ridge_regression__alpha':range(200,400,25)}, scoring=make_scorer(mean_squared_error,greater_is_better=False))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will now fit the model on the union of training and validation set:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
GridSearchCV(estimator=Pipeline(steps=[('scaler', StandardScaler()),\n",
       "                                       ('polynomial_expansion',\n",
       "                                        PolynomialFeatures(degree=0)),\n",
       "                                       ('ridge_regression', Ridge(alpha=200))]),\n",
       "             param_grid={'polynomial_expansion__degree': range(0, 5),\n",
       "                         'ridge_regression__alpha': range(200, 400, 25)},\n",
       "             scoring=make_scorer(mean_squared_error, greater_is_better=False))
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
" ], "text/plain": [ "GridSearchCV(estimator=Pipeline(steps=[('scaler', StandardScaler()),\n", " ('polynomial_expansion',\n", " PolynomialFeatures(degree=0)),\n", " ('ridge_regression', Ridge(alpha=200))]),\n", " param_grid={'polynomial_expansion__degree': range(0, 5),\n", " 'ridge_regression__alpha': range(200, 400, 25)},\n", " scoring=make_scorer(mean_squared_error, greater_is_better=False))" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "gs.fit(X_trainval, y_trainval)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us check the best parameters:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'polynomial_expansion__degree': 2, 'ridge_regression__alpha': 250}" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "gs.best_params_" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These are similar to the ones found with our previous grid search. We can now fit the final model on the training set and evaluate on the validation set:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.49556977845052963 0.506286626421441 0.711538211497767\n" ] } ], "source": [ "polynomial_regressor.set_params(**gs.best_params_)\n", "polynomial_regressor.fit(X_train, y_train)\n", "y_val_pred = polynomial_regressor.predict(X_val)\n", "mae, mse, rmse = mean_absolute_error(y_val, y_val_pred), mean_squared_error(y_val, y_val_pred), mean_squared_error(y_val, y_val_pred, squared=False)\n", "\n", "print(mae, mse, rmse)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us add this result to our dataframe:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MethodParametersMAEMSERMSE
0Linear Regressor0.5333350.5297480.727838
1Polynomial Regressordegree=20.4804480.9129760.955498
2Polynomial Ridge Regressordegree=2, alpha=3000.4992280.5041550.710039
3Polynomial Lasso Regressordegree=2, alpha=0.040.5673180.5901000.768180
4Cross-Validated Polynomial Ridge Regressordegree=2, alpha=2500.4955700.5062870.711538
\n", "
" ], "text/plain": [ " Method Parameters MAE \\\n", "0 Linear Regressor 0.533335 \n", "1 Polynomial Regressor degree=2 0.480448 \n", "2 Polynomial Ridge Regressor degree=2, alpha=300 0.499228 \n", "3 Polynomial Lasso Regressor degree=2, alpha=0.04 0.567318 \n", "4 Cross-Validated Polynomial Ridge Regressor degree=2, alpha=250 0.495570 \n", "\n", " MSE RMSE \n", "0 0.529748 0.727838 \n", "1 0.912976 0.955498 \n", "2 0.504155 0.710039 \n", "3 0.590100 0.768180 \n", "4 0.506287 0.711538 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "california_housing_val_results = pd.concat([\n", " california_housing_val_results,\n", " pd.DataFrame({'Method':'Cross-Validated Polynomial Ridge Regressor', 'Parameters': 'degree=2, alpha=250', 'MAE':mae, 'MSE':mse, 'RMSE':rmse}, index=[4])\n", "])\n", "california_housing_val_results" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Other Regression Algorithms\n", "\n", "Thanks to the unified interface of scikit-learn objects, we can easily train other algorithms even if we do not know how they work inside. Of course, to be able to optimize them in the most complex situations we will need to know how they work internally. The following code shows how to train a neural network (we will not see the algorithm formally):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/furnari/opt/anaconda3/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:691: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (200) reached and the optimization hasn't converged yet.\n", " warnings.warn(\n" ] }, { "data": { "text/plain": [ "(0.3802246120708036, 0.3067554686836898, 0.5538550971903119)" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from sklearn.neural_network import MLPRegressor\n", "\n", "mlp_regressor = Pipeline([\n", " ('scaler', StandardScaler()),\n", " ('mlp_regression', MLPRegressor())\n", "])\n", "\n", "mlp_regressor.fit(X_train,y_train)\n", "y_val_pred = mlp_regressor.predict(X_val)\n", "mae, mse, rmse = mean_absolute_error(y_val, y_val_pred), mean_squared_error(y_val, y_val_pred), mean_squared_error(y_val, y_val_pred, squared=False)\n", "mae, mse, rmse" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MethodParametersMAEMSERMSE
0Linear Regressor0.5333350.5297480.727838
1Polynomial Regressordegree=20.4804480.9129760.955498
2Polynomial Ridge Regressordegree=2, alpha=3000.4992280.5041550.710039
3Polynomial Lasso Regressordegree=2, alpha=0.040.5673180.5901000.768180
4Cross-Validated Polynomial Ridge Regressordegree=2, alpha=2500.4955700.5062870.711538
4Neural Network0.3802250.3067550.553855
\n", "
" ], "text/plain": [ " Method Parameters MAE \\\n", "0 Linear Regressor 0.533335 \n", "1 Polynomial Regressor degree=2 0.480448 \n", "2 Polynomial Ridge Regressor degree=2, alpha=300 0.499228 \n", "3 Polynomial Lasso Regressor degree=2, alpha=0.04 0.567318 \n", "4 Cross-Validated Polynomial Ridge Regressor degree=2, alpha=250 0.495570 \n", "4 Neural Network 0.380225 \n", "\n", " MSE RMSE \n", "0 0.529748 0.727838 \n", "1 0.912976 0.955498 \n", "2 0.504155 0.710039 \n", "3 0.590100 0.768180 \n", "4 0.506287 0.711538 \n", "4 0.306755 0.553855 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "california_housing_val_results = pd.concat([\n", " california_housing_val_results,\n", " pd.DataFrame({'Method':'Neural Network', 'Parameters': '', 'MAE':mae, 'MSE':mse, 'RMSE':rmse}, index=[4])\n", "])\n", "california_housing_val_results" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Comparison and Model Selection\n", "We can now compare the performance of the different models using the table:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
MethodParametersMAEMSERMSE
0Linear Regressor0.5333350.5297480.727838
1Polynomial Regressordegree=20.4804480.9129760.955498
2Polynomial Ridge Regressordegree=2, alpha=3000.4992280.5041550.710039
3Polynomial Lasso Regressordegree=2, alpha=0.040.5673180.5901000.768180
4Cross-Validated Polynomial Ridge Regressordegree=2, alpha=2500.4955700.5062870.711538
4Neural Network0.3802250.3067550.553855
\n", "
" ], "text/plain": [ " Method Parameters MAE \\\n", "0 Linear Regressor 0.533335 \n", "1 Polynomial Regressor degree=2 0.480448 \n", "2 Polynomial Ridge Regressor degree=2, alpha=300 0.499228 \n", "3 Polynomial Lasso Regressor degree=2, alpha=0.04 0.567318 \n", "4 Cross-Validated Polynomial Ridge Regressor degree=2, alpha=250 0.495570 \n", "4 Neural Network 0.380225 \n", "\n", " MSE RMSE \n", "0 0.529748 0.727838 \n", "1 0.912976 0.955498 \n", "2 0.504155 0.710039 \n", "3 0.590100 0.768180 \n", "4 0.506287 0.711538 \n", "4 0.306755 0.553855 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "california_housing_val_results" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Alternatively, we can visualize the results graphically:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAlMAAAD4CAYAAADIBWPsAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAAA3MElEQVR4nO3deZzVZd3/8dc7QEFRvBXzVpFATVyAcNdfWoNL7ku5pylgqdyJ5lrdGeJyl+aWS95l3ilaopmmiaaYekS9xR0ZDOnWGLdMc0MHAQU+vz+ua2YO4yxnOMycA7yfj8c85pzvcl2f72fOzHzmuq7vGUUEZmZmZrZkPlfpAMzMzMyWZS6mzMzMzMrgYsrMzMysDC6mzMzMzMrgYsrMzMysDN0rHYCZdY411lgjNt5440qHURXmzJnDqquuWukwqoJzkTgPTZyLJnPmzOHFF198JyLW7sh5LqbMllPrrLMOTz/9dKXDqAqFQoGamppKh1EVnIvEeWjiXDQpFAoMHz78lY6e52k+MzMzszK4mDIzMzMrg4spMzMzszJ4zZSZmdkK6tNPP6V3797MmDGj0qF0uZ49e9KvXz969OhRdlsupszMzFZQr7/+Ouussw79+vVDUqXD6TIRwbvvvsvrr7/OwIEDy27P03xmZmYrqHnz5tGnT58VqpACkMRaa63FvHnzlkp7LqbMzMxWYCtaIdVgaV63p/nMllPzFsxjyPghlQ6jKozuPZox48dUOoyqsDzlovaY2kqHYAa4mDIzM7NswA/uXqrt1V2wT7vHSOKoo47ixhtvBGDBggWsu+66bL/99kycOLHxuAMOOIC3336bxx9/vHHbuHHj+PWvf83aaze9YXmhUGCNNdZYehdRAhdTZmZmVjGrrroq06dPZ+7cufTq1Yv777+f9ddff7FjPvjgA5599ll69+7NrFmzFls0fsopp3D66ad3ddiL8ZopMzMzq6i99tqLu+9Oo2ITJkzgiCOOWGz/bbfdxn777cfhhx/OzTffXIkQ2+RiyszMzCqqoUiaN28e06ZNY/vtt19sf0OBdcQRRzBhwoTF9l122WUMGzaMYcOGMXz48K4Mu5Gn+czMzKyihg4dSl1dHRMmTGDvvfdebN9bb73FSy+9xE477YQkunfvzvTp0xk8eDBQHdN8LqasakgK4NKIOC0/Px3oHRHjOrnfAnB6RDzdwvbeEbFNfr4NcHFE1LTR1gDg/0XETUs5xgHAxIgYXOo5PWMRtbNeXZphLLMKgz6pnlyMm13R7guFArUH+S44qz77778/p59+OoVCgXfffbdx+y233ML777/fuE7qww8/5Oabb+b888+vVKif4Wk+qybzgW9I6rs0G1WypK/1z0vaqwPHDwC+uYR9tUhSt6XZnplZNRo1ahRjx45lyJDF39JlwoQJ3HvvvdTV1VFXV8czzzxTdeumPDJl1WQBcA1wCvCj4h2S1gZ+CfTPm74XEY9JGgfUR8TF+bjpwL75mD8DDwE7AgdK+gGwLdAL+ENEnF1CTBcBZ+W2iuPpBlwA1AArA7+IiF/lbZtJmgqMB3YHfhAR0yQ9B/wxIs6VdB7wCvA/wM+AvYAAzo+IWyTVAGcDbwLDgL2L+t4QuA04LiKeKuEazMxKUspbGXSWfv36cfLJJy+2ra6ujldffZUddtihcdvAgQNZffXVeeKJJ4C0Zuq3v/1t4/477riDAQMGdEnMDRQRXdqhWWsk1QPrAdOALwHfIU/zSboJuDoiHpXUH7gvIjZrp5j6O2nKbUret2ZEvJcLoQeAk3KRU6D1ab7TScXOecBH5Gk+SccBn4+I8yWtDDwGHAJ8Ibe1b27jB/m8G3Of70XEHpIeAk4ABufPewJ9gaeA7YFBwN3A4IiY1TDNBxwE3AyMjIipLeTwOOA4gLX79t3691f8qPkhK6T6ldej9/x/VDqMZN1hFe2+vr6e3r17VzSGauA8JH369GHgwIF067ZiDoC/9NJLzJ7dNPVeX1/Pfvvt90zD8o5SeWTKqkpEfCjpBuAkYG7Rrt2AzYve/n91Sau109wrDYVUdmguNroD6wKbkwq39pxPGp36ftG2rwFDJR2cn/cBvgh80uzcR/K1zCIVR7tLWgUYEBEzJZ0ATIiIhcBbkh4mjZ59CDwZEbOK2lobuBM4KCJeaCnQiLiGNLrHoA03iJqZpQy+Lf8Kg86hanJxROXXTNXU1FQ0hmrgPCQzZsygW7durLZaez9Ol089e/Zkyy23bHxeKBSWqB0XU1aNfg48C1xXtO1zwI4RUVxgIWkBi6/961n0eE7RcQNJo0zbRsT7kq5vdmyrIuLBPC23Q9FmAWMi4r5m8dQ0O/0pYBvSKNn9pNGn7wDPFLXTmjnNns8GXgO+DLRYTJmZWdfzAnSrOhHxHvB74NiizZOAExueSBqWH9YBW+VtWwEDadnqpOJktqR1SGuUOuK/gDOLnt8HjJbUI/e9iaRVSVN6jX/iRcQnpALoUGAKaaTq9PwZYDJwmKRueV3YV4AnW4nhE+BA4GhJS3WRu5mZLTmPTFm1uoSi4ok0VfYLSdNIr9vJpLVGt5GKi6mkUaC/tdRYRDyfF4C/QBoleqwjwUTEPZL+VbTpWtKde88qzT3+i1ToTAMWSHoeuD4iLiMVTrtGxMeSHgH60VRM/ZG0QP550gL0MyPin5I2bSWOOZL2Be6XNCci7mwt5rmszIB5S/UdGpZZpy1awIhqycVS/t9nLankImKzFZGLKasaEdG76PFbwCpFz98BDmvhnLmk9UstGdzs2BGt9FtTyvaI2Lro8SLgP/NHc7s2O+/HwI/z439QNLUX6Q6QM/JH8TkFoFD0vK7heiLiA9K6KjMzqwKe5jMzMzMrg0emzMzMLBnXZym31/7dq5I46qijuPHGGwFYsGAB6667Lttvvz0TJ07krbfe4thjj+W1117j008/ZcCAAdxzzz3U1dWx2WabMWjQoMa2Tj31VI4++uilew0lcDFlZmZmFbPqqqsyffp05s6dS69evbj//vtZf/31G/ePHTuW3XffvfENPadNa3pHm4022oipU6d2dcif4Wk+MzMzq6i99tqLu+9ON2dMmDCBI444onHfm2++Sb9+/RqfDx06tMvja49HpsyWU716dGOm7+oC0hvx1R1ZU+kwzKwVhx9+OOeeey777rsv06ZNY9SoUTzySLrp+bvf/S6HHXYYV111FbvtthsjR45kvfXWA+Dll19m2LBhje1ceeWV7Lzzzl0ev4spMzMzq6ihQ4dSV1fHhAkT2HvvvRfbt8cee/D3v/+de++9lz//+c9sueWWTJ8+HfA0n5mZmVmj/fffn9NPP32xKb4Ga665Jt/85je58cYb2XbbbZk8eXIFImydiykzMzOruFGjRjF27FiGDBmy2PYHH3yQjz/+GICPPvqIl19+mf79+1cixFZ5ms/MzMySEt7KoLP069ev8Y69Ys888wwnnngi3bt3Z9GiRXz7299m2223pa6u7jNrpkaNGsVJJ53UhVEnLqbMzMysYurr6z+zraamhpqaGgDOOOMMzjjjjM8cM2DAAObOndvZ4ZXE03xmZmZmZXAxZWZmZlYGF1NmZmZmZXAxZWZmZlYGF1NmZmZmZXAxZWZmZlYGvzWCmZmZATBk/JD2D+qA2mNq2z2mW7duDBkyhAULFjBw4EBuvPFG1lhjDerq6hg4cCBnnXUW5513HgDvvPMO6667LscffzxXXXUVM2fO5Pjjj+eDDz5g/vz57LzzzlxzzTUUCgUOOOAABg4c2NjPxRdfzG677bZUr6+BR6bMzMysYnr16sXUqVOZPn06a665Jr/4xS8a92244YZMnDix8fmtt97KFlts0fj8pJNO4pRTTmHq1KnMmDGDMWPGNO7beeedmTp1auNHZxVS4GLKzMzMqsSOO+7IG2+80fi8V69ebLbZZjz99NMA3HLLLRx66KGN+99880369evX+Lz5v6LpKi6mzMzMrOIWLlzIAw88wP7777/Y9sMPP5ybb76Z119/nW7durHeeus17jvllFPYZZdd2Guvvbjsssv44IMPGvc98sgjDBs2rPHj5Zdf7rTYXUyZmZlZxcydO5dhw4ax1lpr8d5777H77rsvtn/PPffk/vvvZ8KECRx22GGL7Rs5ciQzZszgkEMOoVAosMMOOzB//nzgs9N8G220Uaddg4spMzMzq5iGNVOvvPIKn3zyyWJrpgBWWmkltt56ay655BIOOuigz5y/3nrrMWrUKO688066d+/O9OnTuyr0Rr6bz2w5NW/BvKV+Z86yanTv0YwZP6b9A1cAzkVSrXko5e635VWfPn244oorOOCAAxg9evRi+0477TS++tWvstZaay22/d5772XXXXelR48e/POf/+Tdd99l/fXX58UXX+zK0F1MmZmZWVLpYm7LLbfkS1/6EjfffDM777xz4/Yttthisbv4GkyaNImTTz6Znj17AnDRRRfx7//+77z44ouNa6YanHXWWRx88MGdEreLKTMzM6uY+vr6xZ7fddddjY9bmrIbMWIEI0aMAODSSy/l0ksv/cwxNTU1zJ49e+kG2gavmTIzMzMrg4spMzMzszK4mDIzM1uBRUSlQ6iIpXndnbZmStK/Az8HtgXmA3XA9yLib53Y5zhg5Yj4YdG2YcCEiNisjXPqI+JiSecCkyPiL82OqQFOj4h92+h7GLBeRNzTwZgLue2nW9i+LjAPqAdGRcTMVtoYAEyMiMEd6XtpkLQ/sHlEXNDGMSOAbSLixBa2XwS8AfQEfhURl+V9JwAfR8QNzc4ZwFK6VknXA18FZgMCTo2IB8ptt1r0jEXUznq10mFUhcKgT5yLrKpyMa7r1rQ0VygUqD1oxb1zrkHPnj2ZPXs2q622GpIqHU6XiQjefffdxoXr5eqUYkrpK/JHYHxEHJ63DQPWAf5WdFy3iFi4FLueAPwZ+GHRtsOBm0o5OSLGltH3MGAboEPFVDuOjIinJR1HKjr2b++ErhYRfwL+VEYTt0TEiZLWAmZK+kNEvBYRv1xKIbbnjIj4g6ThwDXAF8ttsBNe16310z0iFnR2P2a2/OrXrx/PP//8ZxaBrwh69uy52L+iKUdnjUwNBz4t/oUYEVOhcZTnbOBNYJikrYD/JhUiC0ijAw9J2gK4DliJNB15EPAP4PdAP6AbcF5E3FLUx0xJH0jaPiKeyJsPBfaQ9B3guNzeS8C3IuLj4qDzSMXE/Mt1T9LI2jvAs0XHbJe39wLmAiOBWcC5QC9JOwE/BSYCVwJDSHkeFxF3SuqVr2tzYEZupz2Tge/lIvVnwF5AAOcXX3+O7xFgTFG+HwNGA98A+gMb5s8/j4gr8jGnAqNyE9dGxM/zCNC9wKPADsDzOe5zgM+TCr0ni0edJO0HnJVz/G4+5q0Sro+IeFfSS6TRuNeajRhuDfwG+DjH03CtqwDXA5uScjkA+G4uQL+WY10ZeBkYGRFt/bR4HFg/t9sNuACoyef/IiJ+JelzwFWk0axZpNflb/LrpS7H+DXgKknvtdS/pAtIRfECYFJEnC7pENL3xEJgdkR8RVJPWv6+GAHsQxrJWxXYpZT8mpm1pEePHtTX17PNNttUOpRlWmcVU4OBZ9rYvx0wOCJmSToNICKGSNoUmCRpE+AE4PKI+J2klUjF097APyJiHwBJfVpoewJpNOoJSTsA70bE/0l6LyJ+nc87HziWVOx8Rv5F9mvSL6qXgOKC5UXgKxGxQNJuwE8i4iBJYymaypL0E+DBiBglaQ3gSUl/AY4nTV8NlTSUokKtDfsBtaSCaBjwJaAv8JSkyc2OvRYYQSq+NiFNe06T9A1S0TEcWI00CvTfwFBSQbg9aarrCUkPA+8DGwOHkIrQp4BvAjuRioH/BA5s1vejwA4REZK+DZwJnFbC9SGpP6lAmNbC7utIBeLDki4q2v4fwPs5l4OBqbmtvqSibreImCPp+8CppIK3NXsCd+THx5KKmm0lrQw8JmkSsDWpYBtCKihnkAqoBvMiYqfc/+3N+5d0FfB1YNOcozXyeWOBPSLijaJt34UWvy8AdgSGRsR7zS8ij2IeB7B2374UBp3TxiWvOOpXXs+5yKoqF4VCxbqur6+nUMH+q4lz0WRJR+gq9T5TT0bErPx4J3JRExEvSnoF2IQ0UvAjSf2A23NBVAtcLOlC0gjSIy20fTPwv7lIO5xUXAEMzkXUGkBv4L424tsUmBUR/wcg6bfkX1BAH2C8pC+SRod6tNLG14D9JZ2en/ckjQh9BbgiX+80SS0VDw1+J2kuab3ZGFJBMCFPIb2Vi55tWbwAuRX4saQzSKNN1xftuzsi5gPzJb1NmnbdCfhjRMzJ13o7sDNp6m5WRNTm7S8AD+QioJZUVDTXD7hF0rqk0alZLRzT3GF5im0Q8J2ImFe8MxfMa0TEw3nTjaSROXLslwNExPSiXO5AGvl7LK8BWIn0emrJRZJ+RiqOdsjbvgYMldTw7m59SNN/OwG3RsQi4J+SHmrWVkPR3Vr/H5LWwF0r6W7S6CXAY8D1kn5PKsIarq2l7wuA+1sqpPKx15CmKxm04QZRM/PsVi57xVIYdA7ORVJVuTiismumampqKtZ/NXEumixpUdlZd/O9QPorvjVzih63uOItIm4ijYDMBe6TtEtevL41aZTmp5LGStpe0tT8sX9EvEYqPr5Kmhr8fW7yeuDEiBhCmn5pb9VZa8v8zwMeygug92ujHQEHRcSw/NE/Ima003ZzR+ZzD8zX1e7qwDx1eT9wAGmKs3i92PyixwtJxXRbbRYfv6jo+SJaLsSvBK7KOT6e9nMMac3UFqQC7hKlGxeKidbz1VrsIhUcDbnfPCKObeXYM0gjcGcB44vOH1N0/sCImNRGfw0aXtct9p/XN20H3EYa1bsXICJOyP1vAEzN68fa6mtOG/vMzKyLddbI1IPATyR9p2hqbVtglRaOnQwcCTyYpzH6k6agNgT+HhFX5MdDJb0IvBcRv5VUD4yIiHNJU1/FJgCXAS9HxOt522rAm5J65P7eaCP+F4GBkjaKiJeBI4r29Sk6d0TR9o9yHw3uA8ZIGpNHc7aMiOeKrvehPDU1tI04mpsMHC9pPLAmaZTrDD5btFwL3AU80toIRrM2r89reUSahvpWB2IqVpybYzpyYkQ8LulG4GSKbiCIiA8kzZa0U0Q8Sspdg0dJBeNDkjYnTb8BTAF+IWnjiHgpr63qF63cSRoRiyRdDhwjaQ/S1260pAcj4tP8unwj93dMzv/apDVVLd3c0GL/pDV/q0TEPZKmkKaQya+zJ0hTrPuRiqoWvy+ArUrN6VxWZsC8ku69WO6dtmgBI5wLoLK5qLtgn4r0a9bZOmVkKiKC9Et5d0kv5ymicaRfJs1dDXTLU0e3kAqk+cBhwHRJU0nTbjeQflk+mbf9CDi/lRBuBbYgTfk1+DHwBGnUps3/gJinmo4D7pb0KPBK0e6fkUbFHiOt42rwELB5HiE7jDSC1QOYJml6fg5pUXHvPCV1JvBkW7E080fSlN7zpIL1zIj4ZwvxP0OaUrquvQYj4lnSqN2TpPxcm4u+JTEOuFVpEfw7S3D+hcBISas12z6SVJw8ThqpbHA1sHbO5fdJuZkdEf8iFboT8r4ppNdQq/Jr9nzS1+Ra4K/As/lr9yvSHx63Aa8DDdueIL2tQvO2Wut/NWBi3vYwcEo+5SJJtbmvyaSvb2vfF2ZmVmUUK+ibdS3PJK0HFEgLnRdVOJxOo3TXXY+ImCdpI+ABYJOI+KQT++yd78pbi1SAfrmlgrYa9N9w4/jcoZdXOoyqcNqQBVxS639FCpXNRTWNTHmdUBPnokmhUGD48OHPRESHbm/0T5fljKSjgf8i3Uq/3BZS2SqkKb4epCnK0Z1ZSGUT8x13K5HemqMqCykzM+s6LqaWM5HeMfyGdg9cDkTER6T3YerKPmu6sj8zM6t+/t98ZmZmZmXwyJTZcqpXj27MrKI1KpVUKBSoO7Km0mFUBefCbOnzyJSZmZlZGVxMmZmZmZXBxZSZmZlZGVxMmZmZmZXBxZSZmZlZGVxMmZmZmZXBxZSZmZlZGVxMmZmZmZXBxZSZmZlZGVxMmZmZmZXBxZSZmZlZGVxMmZmZmZXBxZSZmZlZGVxMmZmZmZXBxZSZmZlZGVxMmZmZmZXBxZSZmZlZGbpXOgAz6xzzFsxjyPghlQ6jKozuPZox48dUOoyq4Fwky3seao+prXQIKxSPTJmZmZmVwcWUmZmZWRlcTJmZmZmVwcWUmZmZWRlcTJmZmZmVwXfzVTFJC4Fa0tdpBnBMRHzcyrEjgG0i4sSui7Cx73OByRHxlzaOuR6YGBF/KGV7V8g5uwh4A+gJ/CoiLuvqODpLz1hE7axXKx1GVSgM+sS5yFboXIyb3fiwUChQe5DveLOlwyNT1W1uRAyLiMHAJ8AJlQ6oJRExtq1CqsrdEhHDgC8DP5K0QbkNSuqSP1KU+HvYzKzC/IN42fEIsLGkNSXdIWmapCmShhYfJGk1SbMk9cjPV5dUJ6mHpIKkCyU9KelvknbOx/SUdJ2kWknPSRqet4/Ifd2V2zxR0qn5mCmS1szHXS/p4Px4rKSnJE2XdI0kdfRCJfWW9ICkZ3NMB+Ttq0q6W9Lzuf3D8vYLJP015+TivO0LuY1p+XP/tvqMiHeBl4B18/lH5TxNlfQrSd3y9mNz7gqSfi3pqqIcXCrpIeBCSRtJulfSM5IekbRpPu6QHPvzkibnbVsU9TVN0hfz9lPzsdMlfS9vGyBphqSrgWeBsos/MzMrj6f5lgF5pGMv4F7gHOC5iDhQ0i7ADcCwhmMj4iNJBWAf4A7gcOC2iPg01zXdI2I7SXsDZwO7Ad/N5w7Jv/QnSdokNzkY2JI0DfYS8P2I2FLSZcDRwM+bhXtVRJyb474R2Be4q4OXPA/4ekR8KKkvMEXSn4A9gX9ExD65/T65oPs6sGlEhKQ1GuIAboiI8ZJGAVcAB7bWYS62egLTJG0GHAZ8OeftauBISX8BfgxsBXwEPAg8X9TMJsBuEbFQ0gPACRHxf5K2B64GdgHGAntExBtFsZ4AXB4Rv5O0EtBN0tbASGB7QMATkh4G3gcGASMj4j9auI7jgOMA1u7bl8Kgc9pN9oqgfuX1nItshc5FodD4sL6+nkLR8xWZc9Gkvr5+ic5zMVXdekmamh8/AvwP8ARwEEBEPChpLUl9mp13LXAmqZgaCXynaN/t+fMzwID8eCfgytzmi5JeIRUGAA9FxEfAR5Jm01QY1QKLjYplwyWdCawCrAm8QMeLKQE/kfQVYBGwPrBO7vNiSReS1lk9kgvNecC1ku4GJuY2dgS+kR/fCPyslb4OyyNxg4DvRMQ8SbsCWwNP5QK0F/A2sB3wcES8ByDpVpryBHBrLqR6A/8PuLVoYG7l/Pkx4HpJv6fpa/E4aYqxH3B7LsB2Av4YEXNyX7cDOwN/Al6JiCktXUxEXANcAzBoww2iZubZrVz2iqUw6Byci2SFzsURi6+ZqqmpqVwsVcS5aLKkRaWLqeo2N6/nadTKtFks9iTisTwd9FWgW0RML9o9P39eSNPXv62puPlFjxcVPV9Es9ePpJ6kEZhtIuI1SeNIoz0ddSSwNrB1HhmqA3pGxN/yiM3ewE8lTYqIcyVtB+xKGoU7kTQC1Fy0sA3SmqkTJe0I3C3pz6R8jI+IHza7vq+3E/ec/PlzwAfNv3YAEXFCHqnaB5gqaVhE3CTpibztPknfpu2vyZw29pmZWRdrs5iSdGpb+yPi0qUbjpVgMqnYOE9SDfBOng5rftwNwATgvA60+WCe3usPzCRNZ3VEQ+H0Th6dORhYkrv0+gBv50JqOPAFAEnrAe9FxG8l1QMjcj+rRMQ9kqaQpiIB/pdUXN2Yr+3RtjqMiMfztOTJ+Zw7JV0WEW/nqcTVgCeByyT9G2ma7yDSaFnztj5UWmN2SETcmgvgoRHxvKSNIuIJ0rTdfsAGeWTx7xFxhaQNSSN+k0kjWBeQCquvA9/qSBLnsjID5t3UkVOWW6ctWsAI5wJYvnNRd8E+lQ7BVlDtjUytlj8PArYlTTEA7Ef6YW9dbxxwnaRpwMfAMa0c9zvgfFJB1Z6rgV9KqgUWACMiYn7Lg2Cti4gPJP2aVGDUAU+VeOqvJP08P36N9Pq6S9LTwFTgxbxvCHCRpEXAp8Bo0mv0zjwqJuCUfOxJwG8knQH8izTd2Z4LSYu6fwKcRVo79rnc13cjYoqkn5CmWv8B/BWY3UpbRwL/LeksoAdwM2l91UV5gbmAB/K2HwBHSfoU+CdwbkS8p/S2EU/m9q6NiOckDSjhOszMrAsporXZj6KDpEnAQXntDJJWI60P2bOT47MlpHR33QER0aHRDGubpN4RUZ/Xav0R+E1E/LHScbWk/4Ybx+cOvbzSYVSF04Ys4JJar2qA5TsXHRmZ8jqhJs5Fk0KhwPDhw5+JiG06cl6p31H9Se9z1OATmhYvW5WRdCXp7r+9Kx3LcmicpN1IU5qTSIv8zcxsBVZqMXUj8KSkhr/ADwTGd0pEVraIGFPpGJZXEXF6pWMwM7PqUlIxFRH/le9y2pl0V9TIiHiuUyMzMzMzWwZ0ZOJ8Iel2+MifzayK9erRjZm+uwlI6yDqjqypdBhVwbkwW/pK+ncykk4m3R3WF/g88FtJnkoyMzOzFV6pI1PHAtsXvRvzhaR3bb6yswIzMzMzWxaU+o+ORZrma7CQtt+h2czMzGyFUOrI1HWkd2wuvpvvfzolIjMzM7NlSKl3812a/2P9l0kjUr6bz8zMzIyO3c03FXiz4RxJ/SPi1c4IyszMzGxZUVIxle/cOxt4i6b1UkH6h6xmZmZmK6xSR6ZOBgZFxLudGYyZmZnZsqbUu/leA2Z3ZiBmZmZmy6I2R6YknZof/h0oSLobmN+wPyIu7cTYzMzMzKpee9N8q+XPr+aPlfIHpDVTZmZmZiu0NoupiDgHQNIhEXFr8T5Jh3RmYGZmZmbLglLXTP2wxG1mZmZmK5T21kztBewNrC/piqJdqwMLOjMwMzMzs2VBe2um/gE8DewPPFO0/SPglM4KyszMzGxZ0d6aqeeB5yXdlI/tHxEzuyQyMzMzs2VAqWum9iT9O5l7ASQNk/SnzgrKzMzMbFlR6jugjwO2AwoAETFV0oDOCcnMloZ5C+YxZPyQSodRFUb3Hs2Y8WMqHUZVcC6SasxD7TG1lQ7BllCpI1MLIsLvgG5mZmbWTKkjU9MlfRPoJumLwEnA/3ZeWGZmZmbLhlJHpsYAW5D+lcwE4EPge50Uk5mZmdkyo6SRqYj4GPhR/jAzMzOzrL037Wzzjr2I2H/phmNmZma2bGlvZGpH4DXS1N4TgDo9IgNA0kKglvQ1mgEck0cIWzp2BLBNRJzYdRE29n0uMDki/tLGMdcDEyPiDy1s/yowm/TaOjUiHsj7rgUujYi/NjtnBEvpWiXVkd6ANoD3gaMj4pVy260WPWMRtbNerXQYVaEw6BPnIquaXIyr7D1NhUKB2oN895wtHe2tmfp34D+BwcDlwO7AOxHxcEQ83NnBreDmRsSwiBgMfAKcUOmAWhIRY9sqpEpwRkQMI63B+2VRu99uXkh1kuERMZT0th9nlduYklLXIpbbV6k3kJiZWSdq84d+RCyMiHsj4hhgB+AloCCput6cY/n3CLCxpDUl3SFpmqQpkoYWHyRpNUmzJPXIz1eXVCeph6SCpAslPSnpb5J2zsf0lHSdpFpJz0kanrePyH3dlds8UdKp+ZgpktbMx10v6eD8eKykpyRNl3SNpI6MZD4OrF90LQVJ2+THI3PMDwNfLjpmoxzLU5LOlVRftO+MvH2apHM60r+ktSXdls9/StKXi7bfL+lZSb+S9IqkvpIGSJoh6WrgWWCDlvqXtKqkuyU9n3N0WN5+gaS/5mMvztu+IOmBvO0BSf2L8n2ppIeACzuQXzMz6yTt/mUraWVgH+AIYABwBXB754ZlDfLow16kd58/B3guIg6UtAtwAzCs4diI+EhSgfT1ugM4HLgtIj7NdU33iNhO0t7A2cBuwHfzuUMkbQpMkrRJbnIwsCXQk1RIfz8itpR0GXA08PNm4V4VEefmuG8E9gXuKvFS98wxN7/+dfN1b02aDnwIeC7vvhy4PCImSDqh6JyvAV8kvdGsgD9J+kpETC6x/8uByyLi0VzE3AdsRsrZgxHxU0l7AscVnT8IGBkR/9Fa/8DawD8iYp8cZ59clH4d2DQiQtIaub2rgBsiYrykUaTvuwPzvk2A3SJiYQv5Oq4hrrX79qUwqJQ6cvlXv/J6zkVWNbkoFCrafX19PYUKx1AtnIsm9fX17R/UgvYWoI8n/UL9M3BORExfol5sSfSSNDU/fgT4H9K6tYMAIuJBSWtJ6tPsvGuBM0mFwUjgO0X7GorgZ0iFMcBOwJW5zRclvUL6ZQ3wUER8BHwkaTZNhVEtsNioWDZc0pnAKsCawAu0X0xdJOlnwOdJo5/NbQ8UIuJfAJJuKYpvR5oKjJuAi/Pjr+WPhqKrN6m4aamYekjSOsDbNE3z7QZsXjSwtrqk1Ui5+jpARNwr6f2idl6JiCnt9P8IcLGkC0lryB7JxfI84FpJdwMTi67tG/nxjcDPivq6taVCKsd1DXANwKANN4iamWe3dNgKpzDoHJyLpGpycUTl10zV1NRUNIZq4Vw0WdKisr2RqW8Bc0i/vE4q+uUiICJi9SXq1UoxN68latTKtFks9iTisTzt9FWgW7MCeH7+vJCmr31bU3Hzix4vKnq+iGavHUk9gatJi8NfkzSONKLVnjNIRd5JwHjSCFRz0cK2tgj4aUT8qoRjh5Ne49cD5wKnkqa/d4yIuYs12va05ZxS+pe0NbA38FNJkyLiXEnbAbuSRhJPBHZpof3iHMxpYb+ZmVVIm8VURHTJQlor2WTgSOA8STWkmwE+bOF3/A2kOzDP60CbD+bpvf7ATGCrDsbWUDi9I6k3cDDwhzaObxQRiyRdDhwjaY+IuK9o9xPA5ZLWIr1Z7CHA83nfFNJI3S2kQqTBfaQc/S4i6iWtD3waEW+30v9cSd8DaiWdD0wiFTUXQfrH3hExFXgUOBS4ME/l/Vsrl9Ri/6Tvt/ci4rd5fdeInKtVIuIeSVNI06mQ/sPA4aRRqSNz3x0yl5UZMO+mjp62XDpt0QJGOBdAdeSi7oJ9Ktq/2dLmu4GWLeOA6yRNAz4GjmnluN8B55MKqvZcDfxSUi2wABgREfPbHoT5rIj4QNKvSVOAdcBTHTw/ciFzJqkYadj+Zh7lehx4k7TAu1ve/T3gt5JOA+4mrakiIiZJ2gx4PF9HPXAUaSqvtf7flDSBtIbsJOAXOc/dSQXnCaS1WxPywvGHczwfkabxittqrf+NSdOai0jF1WhgNeDOPLIn4JTczEnAbySdAfyLNGVrZmZVSBEdnUGxaqd0d90BEfGtSsfSmSStQpoODUmHA0dExAGd2N/KwMKIWCBpR+C/m0/FVpP+G24cnzv08kqHURVOG7KAS2r9tyNURy6qYWTK64SaOBdNCoUCw4cPfyYitunIef7pspyRdCXp7r+9Kx1LF9gauCqvZfoAGNXJ/fUHfq/0PlKfsPjifjMzW0G5mFrORMQK8x5gEfEI8KUu7O//SG8VYWZm1sgLzM3MzMzK4JEps+VUrx7dmFkFa1OqQaFQoO7ImkqHURWcC7OlzyNTZmZmZmVwMWVmZmZWBhdTZmZmZmVwMWVmZmZWBhdTZmZmZmVwMWVmZmZWBhdTZmZmZmVwMWVmZmZWBhdTZmZmZmVwMWVmZmZWBhdTZmZmZmVwMWVmZmZWBhdTZmZmZmVwMWVmZmZWBhdTZmZmZmVwMWVmZmZWBhdTZmZmZmXoXukAzKxzzFswjyHjh1Q6jKowuvdoxowfU+kwqoJzkTgPTZa3XNQeU9vlfXpkyszMzKwMLqbMzMzMyuBiyszMzKwMLqbMzMzMyuBiyszMzKwMvpvPWiRpIVBLeo3MAI6JiI9bOXYEsE1EnNh1ETb2fS4wOSL+0sYx1wMTI+IPLWz/KjAbEHBqRDzQedF2rZ6xiNpZr1Y6jKpQGPSJc5E5F4nz0KRqczFudqUjKJlHpqw1cyNiWEQMBj4BTqh0QC2JiLFtFVIlOCMihgHfA365NGKS1G1ptFNCP/5jyMysCriYslI8AmwsaU1Jd0iaJmmKpKHFB0laTdIsST3y89Ul1UnqIakg6UJJT0r6m6Sd8zE9JV0nqVbSc5KG5+0jcl935TZPlHRqPmaKpDXzcddLOjg/HivpKUnTJV0jSR24xseB9XM73SRdlNuaJun4vP1zkq6W9IKkiZLuKeq7Lvf/KHCIpK9JelzSs5JuldQ7H3eBpL/mdi/O2w7JMT8vaXIJeblV0l3ApCX4WpqZ2VLmv2ytTXn0Yy/gXuAc4LmIOFDSLsANwLCGYyPiI0kFYB/gDuBw4LaI+DTXNd0jYjtJewNnA7sB383nDpG0KTBJ0ia5ycHAlkBP4CXg+xGxpaTLgKOBnzcL96qIODfHfSOwL3BXiZe6Z44Z4FhgdkRsK2ll4DFJk4CtgQHAEODzpOnP3xS1MS8idpLUF7gd2C0i5kj6PnCqpKuArwObRkRIWiOfNxbYIyLeKNrWVl52BIZGxHvNL0LSccBxAGv37Uth0DklXv7yrX7l9ZyLzLlInIcmVZuLQqHLu6yvr1+i81xMWWt6SZqaHz8C/A/wBHAQQEQ8KGktSX2anXctcCapMBkJfKdo3+358zOkogRgJ+DK3OaLkl4BGoqGhyLiI+AjSbNpKoxqgcVGxbLhks4EVgHWBF6g/WLqIkk/IxVHO+RtXwOGNow6AX2AL+ZYb42IRcA/JT3UrK1b8ucdgM1JRRjASqSRrw+BecC1ku4GJubjHwOul/R7mnLUVl7ub6mQysdeA1wDMGjDDaJm5tntXP6KoTDoHJyLxLlInIcmVZuLI7p+zVRhCQs4F1PWmrl5LVGjVqbNYrEnEY9JGiDpq0C3iJhetHt+/ryQptdeW1Nx84seLyp6vohmr11JPYGrSQvhX5M0jjSi1Z4zSAXMScB40uiTgDERcV+zPvZpp605DYeSCp4jmh8gaTtgV9Ko3YnALhFxgqTtSSN6UyUNo+28zGljn5mZdTEXU9YRk4EjgfMk1QDvRMSHLdRYNwATgPM60OaDeRqrPzAT2KqDsTUUTu/k9UkHA39o4/hGEbFI0uXAMZL2AO4DRkt6ME9RbgK8ATyajxkPrA3UADe10OQU4BeSNo6IlyStAvQD/gGsEhH3SJpCmrpE0kYR8QTwhKT9gA2WRl7msjID5rUU3orntEULGOFcAF2Xi7oL2vvbo8IKhYqMfFQl56JsLqasI8YB10maBnwMHNPKcb8DzicVVO25GvilpFpgATAiIuZ3bO04RMQHkn5NmgKsA57q4Pkh6XzSFOXupGnIZ/No3L+AA4HbSKNK04G/kaY9P/MTKCL+pfR2ERPymiuAs4CPgDvzKJqAU/K+iyR9MW97AHgeeJGlkBczM+t8ioj2jzLrgLzW6ICI+FalY1naJPWOiHpJawFPAl+OiH9WOq6W9N9w4/jcoZdXOoyqcNqQBVxS678doetyUe0jU4VCgZqamkqHURWciyaFQoHhw4c/ExHbdOQ8/3SxpUrSlaS7//audCydZGK+424l4LxqLaTMzKzruJiypSoixlQ6hs4UETWVjsHMzKqL37TTzMzMrAwupszMzMzK4Gk+s+VUrx7dmFnli4C7SqFQoO7ImkqHURWcC7OlzyNTZmZmZmVwMWVmZmZWBhdTZmZmZmVwMWVmZmZWBhdTZmZmZmVwMWVmZmZWBhdTZmZmZmVwMWVmZmZWBhdTZmZmZmVwMWVmZmZWBhdTZmZmZmVwMWVmZmZWBhdTZmZmZmVwMWVmZmZWBhdTZmZmZmVwMWVmZmZWBhdTZmZmZmXoXukAzKxzzFswjyHjh1Q6jKowuvdoxowfU+kwqoJzkSxLeag9prbSIVg7PDJlZmZmVgYXU2ZmZmZlcDFlZmZmVgYXU2ZmZmZlcDFlZmZmVgbfzWddTlJ9RPRutu0E4OOIuKEL4ygA6wLzgE+A70TE1K7qv7P1jEXUznq10mFUhcKgT5yLbJnKxbjZndZ0oVCg9iDfJWdLh4spqwoR8cvObF+SAEXEoma7joyIpyWNBC4Cdl8KfXWLiIXltlNCP90jYkFn92NmZm3zNJ9VBUnjJJ2eHxckXSjpSUl/k7Rz3t5N0kWSnpI0TdLxeXtvSQ9IelZSraQD8vYBkmZIuhp4FtigjRAeB9bP560q6Te5n+eK2ltF0u9z37dIekLSNnlfvaRzJT0B7CjpqBz/VEm/yrF3k3S9pOk5zlPyuSdJ+mtu9+a8bU1Jd+RtUyQNLcrTNZImAV02imdmZq3zyJRVq+4RsZ2kvYGzgd2AY4HZEbGtpJWBx3JR8Rrw9Yj4UFJfYIqkP+V2BgEjI+I/2ulvT+CO/PhHwIMRMUrSGsCTkv4CjAbej4ihkgYDU4vOXxWYHhFjJW0GfB/4ckR8mou5I4EXgPUjYjBAbhvgB8DAiJhftO0c4LmIOFDSLqTCaVjetzWwU0TMbX4Rko4DjgNYu29fCoPOaeeyVwz1K6/nXGTLVC4KhU5rur6+nkIntr8scS6a1NfXL9F5LqasWt2ePz8DDMiPvwYMlXRwft4H+CLwOvATSV8BFpFGmNbJx7wSEVPa6Od3klYFugFbFfWzf8NIGdAT6A/sBFwOEBHTJU0ramchcFt+vCup4HkqzS7SC3gbuAvYUNKVwN3ApHz8tBzHHTQVdDsBB+W+HpS0lqQ+ed+fWiqk8rHXANcADNpwg6iZeXYbl77iKAw6B+ciWaZycUTnrpmqqanptPaXJc5FkyUtKl1MWbWanz8vpOl1KmBMRNxXfKCkEcDawNZ5JKiOVAABzGmnnyOB54ELgF8A38j9HBQRM5v1ozbamVe0TkrA+Ij4YfODJH0J2AP4LnAoMArYB/gKsD/wY0lb5DaaixKvyczMupCLKVuW3AeMlvRgLpo2Ad4gjVC9nbcNB77QkUbzeWcBL+cpuvuAMZLGRERI2jIingMeJRVAD0naHGjtH989ANwp6bKIeFvSmsBqpCLok4i4TdLLwPWSPgdsEBEPSXoU+CbQG5hMKvTOk1QDvJOnMUu+rrmszIB5N3UkFcut0xYtYIRzASxjufjB3dRdsE+lozBrl4spq4RVJL1e9PzSEs+7ljTl92weJfoXcCDwO+AuSU+T1jG92NGAImKupEuA04ETgZ8D03I/dcC+wNXA+Dy99xxpeu4z8xAR8ddcnE3KxdKnpJGoucB1eRvAD0nTi7/NU3gCLouIDySNy8dOAz4GjunoNZmZWddwMWVdLiLavIs0ImqKHr9DXjOV39bgP/NHczu20tzgUvrJzy8penp8C6fMA46KiHmSNiKNQL2Sz13sfbMi4hbglhba2KqFbTu1ENt7wAEtbB/XwvlmZlZBLqbMSrcKaYqvB2kUaXREfFLhmMzMrMJcTJmVKCI+ArapdBxmZlZd/KadZmZmZmXwyJTZcqpXj27M9J1QQHrvmLojayodRlVwLsyWPo9MmZmZmZXBxZSZmZlZGVxMmZmZmZXBxZSZmZlZGVxMmZmZmZXBxZSZmZlZGVxMmZmZmZXBxZSZmZlZGRQRlY7BzDqBpI+AmZWOo0r0Bd6pdBBVwrlInIcmzkWTvsCqEbF2R07yO6CbLb9mRoT/lyAg6WnnInEuEuehiXPRJOdiQEfP8zSfmZmZWRlcTJmZmZmVwcWU2fLrmkoHUEWciybOReI8NHEumixRLrwA3czMzKwMHpkyMzMzK4OLKTMzM7MyuJgyW8ZJ2lPSTEkvSfpBC/sl6Yq8f5qkrSoRZ2crIQ9H5uufJul/JX2pEnF2hfZyUXTctpIWSjq4K+PrSqXkQlKNpKmSXpD0cFfH2FVK+B7pI+kuSc/nXIysRJydTdJvJL0taXor+zv+MzMi/OEPfyyjH0A34GVgQ2Al4Hlg82bH7A38GRCwA/BEpeOuUB7+H/Bv+fFey2MeSs1F0XEPAvcAB1c67gq+LtYA/gr0z88/X+m4K5iL/wQuzI/XBt4DVqp07J2Qi68AWwHTW9nf4Z+ZHpkyW7ZtB7wUEX+PiE+Am4EDmh1zAHBDJFOANSSt29WBdrJ28xAR/xsR7+enU4B+XRxjVynlNQEwBrgNeLsrg+tipeTim8DtEfEqQEQsr/koJRcBrCZJQG9SMbWga8PsfBExmXRtrenwz0wXU2bLtvWB14qev563dfSYZV1Hr/FY0l+ey6N2cyFpfeDrwC+7MK5KKOV1sQnwb5IKkp6RdHSXRde1SsnFVcBmwD+AWuDkiFjUNeFVlQ7/zPS/kzFbtqmFbc3f76SUY5Z1JV+jpOGkYmqnTo2ockrJxc+B70fEwjQIsdwqJRfdga2BXYFewOOSpkTE3zo7uC5WSi72AKYCuwAbAfdLeiQiPuzk2KpNh39mupgyW7a9DmxQ9Lwf6a/Kjh6zrCvpGiUNBa4F9oqId7sotq5WSi62AW7OhVRfYG9JCyLiji6JsOuU+v3xTkTMAeZImgx8CVjeiqlScjESuCDSwqGXJM0CNgWe7JoQq0aHf2Z6ms9s2fYU8EVJAyWtBBwO/KnZMX8Cjs53qOwAzI6IN7s60E7Wbh4k9QduB761HI46FGs3FxExMCIGRPqHrn8A/mM5LKSgtO+PO4GdJXWXtAqwPTCji+PsCqXk4lXSCB2S1gEGAX/v0iirQ4d/ZnpkymwZFhELJJ0I3Ee6W+c3EfGCpBPy/l+S7tbaG3gJ+Jj01+dypcQ8jAXWAq7OIzILImKbSsXcWUrMxQqhlFxExAxJ9wLTgEXAtRHR4i3zy7ISXxfnAddLqiVNdX0/It6pWNCdRNIEoAboK+l14GygByz5z0z/OxkzMzOzMniaz8zMzKwMLqbMzMzMyuBiyszMzKwMLqbMzMzMyuBiyszMzKwMLqbMzMzMyuBiyszMzKwM/x+3rv2SAYt0pwAAAABJRU5ErkJggg==", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "california_housing_val_results.plot.barh(x='Method')\n", "plt.grid()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "From the analysis of the validation performance, it is clear that the neural network performs better. We can now compute the final performance on the test set:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(0.37814683358719486, 0.31599512987902556, 0.562134441107308)" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "y_test_pred = mlp_regressor.predict(X_test)\n", "mae, mse, rmse = mean_absolute_error(y_test, y_test_pred), mean_squared_error(y_test, y_test_pred), mean_squared_error(y_test, y_test_pred, squared=False)\n", "mae, mse, rmse" ] } ], "metadata": { "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 2 }