-
Notifications
You must be signed in to change notification settings - Fork 9
/
sentimentTM.Rmd
153 lines (96 loc) · 3.47 KB
/
sentimentTM.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
---
title: "Sentiment Analysis"
output:
revealjs::revealjs_presentation:
theme: solarized
center: true
transition: fade
slide_level: 2
---
## Sentiment
Used when we want to know the general *feelings* of text.
Applied to Twitter and other social media posts
Can use it anywhere where people have written/said something.
Sentiment can take many different forms: positive/negative affect, emotional states, and even financial contexts.
##
We will cover two forms of sentiment:
- Word level (simple)
- Sentence level (complex)
## Skipping Around
We are not going to get into text prep now.
It is practically its own lecture.
## Helpful Packages
```{r, eval = FALSE}
install.packages(c("tidytext", "sentimentr"))
```
## Simple
Let's consider the following statements:
```{r, warning = FALSE, message = FALSE}
library(dplyr); library(tidyr); library(tidytext)
statement <- "I dislike beer, but I really love the shine."
tokens <- tibble(text = statement) %>%
unnest_tokens(tbl = ., output = word, input = text)
tokens
```
##
Using our tokens against a pre-defined dictionary:
```{r, warning = FALSE, message = FALSE}
tokens %>%
inner_join(get_sentiments("bing")) %>%
count(sentiment) %>%
spread(sentiment, n, fill = 0) %>%
mutate(sentiment = positive - negative)
```
## Thinking About The Output
Do you think that disklike and love are of the same magnitude?
I might say that love is stronger than dislike.
Let's switch out our sentiment library to get something with a little better notion of polarity.
##
```{r, warning = FALSE, message = FALSE}
tokens %>%
inner_join(get_sentiments("afinn"))
```
Now this looks a bit more interesting! "Love" has a stronger positive polarity than "dislike" has negative polarity. So, we could guess that we would have some positive sentiment.
##
We can get an idea of our sentence's overall sentiment, if we divide the sum of our word sentiments by the number of words within the dictionary
```{r, warning = FALSE, message = FALSE}
tokens %>%
inner_join(get_sentiments("afinn")) %>%
summarize(n = nrow(.), sentSum = sum(score)) %>%
mutate(sentiment = sentSum / n)
```
##
These simple sentiment analyses provide some decent measures to the sentiment of our text.
We are ignoring big chunks of our text by just counting keywords.
## Smarter Sentiment Analysis
```{r, warning = FALSE, message = FALSE}
library(sentimentr); library(lexicon); library(magrittr)
statement <- "I dislike beer, but I really love the shine."
sentiment(statement, polarity_dt = lexicon::hash_sentiment_jockers)
```
##
The first part of our sentence starts out negative (dislike has a sentiment value of -1).
We have an adversarial "but" that will downweight whatever is in the initial phrase.
We have the amplified sentiment of "love" (with a weight of .75 in our dictionary).
With all of this together, we get a much better idea about the sentiment of our text.
## An Added Bonus
I want to show you gganimate!
```{r, eval = FALSE}
library(gganimate); library(ggplot2)
ggplot(mtcars, aes(mpg, wt, color = as.factor(cyl))) +
geom_point() +
geom_smooth(method = "lm") +
scale_color_brewer(type = "qual") +
theme_minimal() +
transition_states(cyl)
```
##
```{r, echo = FALSE, warning = FALSE, message = FALSE}
library(gganimate); library(ggplot2)
ggplot(mtcars, aes(mpg, wt, color = as.factor(cyl))) +
geom_point() +
geom_smooth(method = "lm") +
scale_color_brewer(type = "qual") +
theme_minimal() +
transition_states(cyl)
```