diff --git a/HW4.ipynb b/HW4.ipynb index c1b666c..08bc8d4 100644 --- a/HW4.ipynb +++ b/HW4.ipynb @@ -1099,7 +1099,7 @@ }, { "source": [ - "**2.2** Now write a function that returns the predicted rating for a user and an item using the formula at the beginning of this problem. Include code to deal with the possibility that the sum of scores that goes in the denominator is 0: return an predicted rating of the baseline portion of the formula in that case. This function `rating` takes as arguments the dataframe, the database, the wanted `restaurant_id` and `user_id`, and `k` as well as the regularizer." + "**2.2** Now write a function that returns the predicted rating for a user and an item using the formula at the beginning of this problem. Include code to deal with the possibility that the sum of scores that goes in the denominator is 0: return a predicted rating of the baseline portion of the formula in that case. This function `rating` takes as arguments the dataframe, the database, the wanted `restaurant_id` and `user_id`, and `k` as well as the regularizer." ], "cell_type": "markdown", "metadata": {} @@ -1230,7 +1230,7 @@ "source": [ "###Error Analysis\n", "\n", - "This next function takes a set of actual ratings, and a set of predicted ratings, and plots the latter against the former. We can use a graph of this kind to see how well or badly we do in our predictions. Since the nearest neighbor models can have alternating positive and negative similarities (the sum of similarity weights in the denominator can get large), the ratings can get very large. Thus we restrict ourselves to be between -10 and 15 in our ratings and calculate the fraction within these bounds. We also plot the line with unit slope, line sehments joining the means, and a filled in area representing one standard deviation from the mean.\n", + "This next function takes a set of actual ratings, and a set of predicted ratings, and plots the latter against the former. We can use a graph of this kind to see how well or badly we do in our predictions. Since the nearest neighbor models can have alternating positive and negative similarities (the sum of similarity weights in the denominator can get large), the ratings can get very large. Thus we restrict ourselves to be between -10 and 15 in our ratings and calculate the fraction within these bounds. We also plot the line with unit slope, line segments joining the means, and a filled in area representing one standard deviation from the mean.\n", "\n", "The first argument to `compare_results` is a numpy array of the actual star ratings obtained from the dataframe, while the second argument is the numpy array of the predicted ones. (*Feel free to improve this function for your display*)" ],