w4

oballinger · Oct 18, 2024 · ee96fe2 · ee96fe2
1 parent d0a3809
commit ee96fe2
Show file tree

Hide file tree

Showing 3 changed files with 15 additions and 80 deletions.
diff --git a/_quarto.yml b/_quarto.yml
@@ -12,7 +12,7 @@ book:
     - ./notebooks/W01. Python Recap.ipynb
     - ./notebooks/W02. Pandas.ipynb
     - ./notebooks/W03. Spatial Data.ipynb
-    # - ./notebooks/W04. Natural Language Processing.ipynb
+    - ./notebooks/W04. Natural Language Processing.ipynb
     # - ./notebooks/W05. Distributions and Basic Statistics.ipynb
     # - ./notebooks/W06. Merging and Joining.ipynb
     # - ./notebooks/W07. Hypothesis Testing.ipynb

diff --git a/notebooks/W03. Spatial Data.ipynb b/notebooks/W03. Spatial Data.ipynb
@@ -1,15 +1,5 @@
 {
   "cells": [
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "colab_type": "text",
-        "id": "view-in-github"
-      },
-      "source": [
-        "<a href=\"https://colab.research.google.com/github/oballinger/QM2/blob/main/notebooks/W03.%20Spatial%20Data.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
-      ]
-    },
     {
       "cell_type": "markdown",
       "metadata": {
@@ -2296,6 +2286,19 @@
       "outputs": [],
       "source": []
     },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "# Assessed Question\n",
+        "\n",
+        "Earlier, we created a dataframe called `daily` in which we calculated the average daily AQI across the state for every day of the year. (hint: try re-generating this dataframe using the `Date` column rather than the `Day` column in the `.groupby()` function)\n",
+        "\n",
+        "1. Sort that dataframe to figure out which day had the worst AQI. \n",
+        "2. Plug that date into the `satellite_plot()` function to visualize the corresponding satellite image. If you've done things correctly so far, you should see a big plume of smoke emanating from the central region of california spreading northwards. \n",
+        "3. Clicking on a sensor will reveal its name-- three sensors just north of the fire are caught under the plume. What are their names? "
+      ]
+    },
     {
       "cell_type": "code",
       "execution_count": null,

diff --git a/notebooks/W04. Natural Language Processing.ipynb b/notebooks/W04. Natural Language Processing.ipynb
@@ -9694,75 +9694,7 @@
         "id": "rIF7MLm-safJ"
       },
       "source": [
-        "We can see that the model has deemed this tweet to be expressing negative sentiment: it has a polarity of -0.15. It also deems this to be a pretty subjective tweet, with a subjectivity score of 0.8. It does indeed appear to be expressing a subjective opinion. Finally, we can see which words are leading to this assessment. The word \"good\" is leading to a 0.7 increase in the polarity score, and a 0.6 increase in the subjectivity score. The word \"worst\" is leading to a -1 change polarity, and a +1 change in subjectivity. The overall scores are weighted averages of these values. Though these scores do roughly align with the actual sentiment of this tweet, **ALWAYS** pay attention to whats going on inside of your sentiment analysis pipeline. Even though the overall sentiment score here is negative, it should probably be even more negative; the algorithm picked up on the word \"good\" in this tweet, and this improved the polarity score by 0.7. But the context in which \"good\" was uttered in this tweet is actually negative! the person is saying \"stop saying #Tillerson is good on climate\"-- this is expressing negative sentiment!\n",
-        "\n",
-        "---\n",
-        "\n",
-        "### Assessed Question\n",
-        "\n",
-        "In this assessed question, we want to use NLP and spatial analysis to map out where the people who are most angry about Exxon are located. \n",
-        "\n",
-        "In the code cell below:\n",
-        "\n",
-        "1. create a dataframe called \"sample\" which contains the first 1000 tweets from the \"tweets\" dataframe. \n",
-        "2. Using a lambda function, `.apply(lambda x: nlp(x)._.blob.polarity)`, create a column in the sample dataframe that contains the polarity of each tweet. \n",
-        "3. Create a column that contains the subjectivity score for each tweet. \n",
-        "4. Filter the dataframe to keep only the tweets that are subjective (subjectivity score > 0.5) *and* tweets that have negative sentiment (polarity score < 0)."
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "id": "ldyS7EtqnsTv"
-      },
-      "outputs": [],
-      "source": []
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {
-        "id": "Ebi_SrpkmyFc"
-      },
-      "source": [
-        "5. Modify the code below to plot the distribution of the #ExxonKnew campaign. Color the points according to sentiment polarity. Only a handful of the tweets have location information, so don't worry if you don't see loads of points. "
-      ]
-    },
-    {
-      "cell_type": "code",
-      "execution_count": null,
-      "metadata": {
-        "colab": {
-          "base_uri": "https://localhost:8080/",
-          "height": 131
-        },
-        "id": "za3zlZ95s4jB",
-        "outputId": "8e7a8f3d-3de2-4f95-a965-c411a98b55f6"
-      },
-      "outputs": [],
-      "source": [
-        "import matplotlib.pyplot as plt\n",
-        "from mpl_toolkits.basemap import Basemap\n",
-        "\n",
-        "map = Basemap(projection='mill')\n",
-        "\n",
-        "map.drawcountries()\n",
-        "map.drawcoastlines()\n",
-        "\n",
-        "map.scatter(\n",
-        "      x=, \n",
-        "      y=, \n",
-        "      c=,\n",
-        "      latlon=True, \n",
-        "      vmin=-1, \n",
-        "      vmax=0)"
-      ]
-    },
-    {
-      "cell_type": "markdown",
-      "metadata": {},
-      "source": [
-        "*Question:* What country are these tweets located in?"
+        "We can see that the model has deemed this tweet to be expressing negative sentiment: it has a polarity of -0.15. It also deems this to be a pretty subjective tweet, with a subjectivity score of 0.8. It does indeed appear to be expressing a subjective opinion. Finally, we can see which words are leading to this assessment. The word \"good\" is leading to a 0.7 increase in the polarity score, and a 0.6 increase in the subjectivity score. The word \"worst\" is leading to a -1 change polarity, and a +1 change in subjectivity. The overall scores are weighted averages of these values. Though these scores do roughly align with the actual sentiment of this tweet, **ALWAYS** pay attention to whats going on inside of your sentiment analysis pipeline. Even though the overall sentiment score here is negative, it should probably be even more negative; the algorithm picked up on the word \"good\" in this tweet, and this improved the polarity score by 0.7. But the context in which \"good\" was uttered in this tweet is actually negative! the person is saying \"stop saying #Tillerson is good on climate\"-- this is expressing negative sentiment!"
       ]
     }
   ],