Skip to content

A guide to apply sentiment analysis on text data and make visualizations to get more insights into the data.

Notifications You must be signed in to change notification settings

imnileshd/nlp-sentiment-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sentiment Analysis on Earnings Call Transcript

Earning call is a conference call between executives and major investors where they discuss their quarterly financial performance and future outlook. It is usually held quarterly by most publicly traded corporations. These calls provide valuable insight to investors and analysts to make trading decisions. However, listening an earnings call is a time consuming process. What if the overall sentiment score of an earnings call could be determined in order to get an idea of the company's future outlook?

This article will show the sentiment analysis on the earnings call, visualize the points for analysts or investors to make trading decisions.

Getting Transcripts Data

To get data for sentiment analysis, I manually scraped earnings call transcripts from here. You can also subscribe to APIS that give call transcripts and use that data to apply sentiment analysis.

Load Data

We have the transcript text, now we will load that into dataframe for further processing:

transcript = """
Operator: Good day, everyone. Welcome to the Apple Incorporated Third Quarter Fiscal Year 2020 Earnings Conference Call. 
    Today's call is being recorded. At this time, for opening remarks and introductions, .......
"""
# split the transcript into sentences
sentences = [' '.join(sent.split()).strip() for sent in transcript.replace('\n', '').split('. ')]

# convert to dataframe
df = pd.DataFrame(sentences, columns=['content'])

Apply Sentiment Analysis

Now, we have to perform necessary steps to get sentiment of the data, We will use NLTK stands for Natural Language ToolKit. It is a library that helps us to manage and analyse languages.

Clean and Prepare for Sentiment Analysis

The first step in any NLP task is text preprocessing: removing noise and correctly formatting the text for use by your models. Below are the text preprocessing measures that we will take on the earnings call transcripts:

  • Lowercase - Convert all words to lowercase to avoid storing both uppercase and lowercase forms of the same word.
  • Tokenization - Convert the sentences into tokens.
  • Punctuation - Remove all punctuations to avoid multiple representations of the same word.
  • Stop Words - Common words which should be ignored, such as "a", "an", and "the".
  • Lemmatization - Removing inflection to reduce words down to their "dictionary form".

Here is the code that we can use to preprocess the text before applying sentiment analysis:

import string
from nltk import pos_tag
from nltk.corpus import wordnet
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer

# return the wordnet object value corresponding to the POS tag
def get_wordnet_pos(pos_tag):
    if pos_tag.startswith('J'):
        return wordnet.ADJ
    elif pos_tag.startswith('V'):
        return wordnet.VERB
    elif pos_tag.startswith('N'):
        return wordnet.NOUN
    elif pos_tag.startswith('R'):
        return wordnet.ADV
    else:
        return wordnet.NOUN

# return the cleaned text 
def clean_text(text, digits=False, stop_words=False, lemmatize=False, only_noun=False):
    # lower text
    text = str(text).lower()
    
    # tokenize text and remove puncutation
    text = [word.strip(string.punctuation) for word in text.split(" ")]
    
    # remove words that contain numbers
    if digits:
        text = [word for word in text if not any(c.isdigit() for c in word)]
        
    # remove stop words
    if stop_words:
        stop = stopwords.words('english')
        text = [x for x in text if x not in stop]
    
    # remove empty tokens
    text = [t for t in text if len(t) > 0]
    
    # pos tag text
    if lemmatize:
        pos_tags = pos_tag(text)    
        # lemmatize text
        text = [WordNetLemmatizer().lemmatize(t[0], get_wordnet_pos(t[1])) for t in pos_tags]
        
    if only_noun:
        # select only nouns
        is_noun = lambda pos: pos[:2] == 'NN'
        text = [word for (word, pos) in pos_tag(text) if is_noun(pos)]
    
    # remove words with only one letter
    text = [t for t in text if len(t) > 1]
    
    # join all
    text = " ".join(text)
    
    return(text)

We will use apply function to apply clean_text over each row of dataframe as per below:

# clean text data
df['content_clean'] = df['content'].apply(lambda x: clean_text(x, digits=True, stop_words=True, lemmatize=True))

Now we have cleaned text in content_clean column, our next task is to apply sentiment analysis.

Sentiment Intensity Analyser

Here, We will use the Sentiment Intensity Analyser which uses the VADER Lexicon. VADER is a rule-based sentiment analysis tool. VADER calculates text emotions and determines whether the text is positive, neutral or, negative. This analyzer calculates text sentiment and produces four different classes of output scores: positive, negative, neutral, and compound.

# import NLTK library for importing the SentimentIntensityAnalyzer
from nltk.sentiment.vader import SentimentIntensityAnalyzer

# create the instance of SentimentIntensityAnalyzer
sid = SentimentIntensityAnalyzer()

# get sentiment score for each category
df['sentiment']= df['content_clean'].apply(lambda x: sid.polarity_scores(x))
df = pd.concat([df.drop(['sentiment'], axis=1), df['sentiment'].apply(pd.Series)], axis=1)
df = df.rename(columns={'neu': 'neutral', 'neg': 'negative', 'pos': 'positive'})

# add sentiment based on max score
df['confidence'] = df[["negative", "neutral", "positive"]].max(axis=1)
df['sentiment'] = df[["negative", "neutral", "positive"]].idxmax(axis=1)

We dit it! Look at the output below for the steps performed so far...

Output

Now in next section we will visualizing the results obtained by performing sentiment analysis to get more insights out of the data.

Insights

We have sentiments data for sentences from the earnings call, let us get more insights out of it.

Display percentage of positive, negative and neutral sentiments

Let us understand the amount of positive, neutral, or negative sentiments that are involved in the sentences from the earnings call of the data. For visualizing the results, we will use plotly python package:

import plotly.express as px
import plotly.graph_objects as go

# create data for plot
grouped = pd.DataFrame(df['sentiment'].value_counts()).reset_index()
grouped.columns = ['sentiment', 'count']

# Display percentage of positive, negative and neutral sentiments
fig = px.pie(grouped, values='count', names='sentiment', title='Sentiments')
fig.show()

And this plot will show you what is the percentage of each sentiments:

Display percentage of Sentiments

Display sentiment score

We will also plot an indicator that will signify if we have more positive or negative responses. We will calculate the sentiment score as shown in the below code block and analyze the indicator.

# calculate sentiment ratio
sentiment_ratio = df['sentiment'].value_counts(normalize=True).to_dict()
for key in ['negative', 'neutral', 'positive']:
    if key not in sentiment_ratio:
        sentiment_ratio[key] = 0.0

## Display sentiment score
sentiment_score = (sentiment_ratio['neutral'] + sentiment_ratio['positive']) - sentiment_ratio['negative']

fig = go.Figure(go.Indicator(
    mode = "number+delta",
    value = sentiment_score,
    delta = {"reference": 0.5},
    title = {"text": "Sentiment Score"},))

fig.show()

The up arrow signifies a positive indication, whereas the negative indication will be represented through the down indicator.

Display sentiment score

Display sentence locations

Now with the help of scatter plot we can show the sentiments timeline, please see the output below:

## Display negative sentence locations
fig = px.scatter(df, y='sentiment', color='sentiment', size='confidence', hover_data=['content'], color_discrete_map={"negative":"firebrick","neutral":"navajowhite","positive":"darkgreen"})


fig.update_layout(
    width=800,
    height=300,
)

Display sentence locations

Let's put it all together

To put it all together, I have developed an interactive app using Streamlit. I would recommend checking out the link here that covers the entire code and the requirements that are necessary to successfully deploy this app.

Deploy an app

Now that we have created our data app, and ready to share it! We can use Streamlit Cloud to deploy and share our app. Streamlit Cloud has multiple tiers, the free Community tier is the perfect solution if your app is hosted in a public GitHub repo and you'd like anyone in the world to be able to access it.

You can check steps to deploy apps with the free Community tier here.

Now my app is live and you can interact with it here.

Conclusion

Thank you for reading! I hope that this project walkthrough inspires you to create your own sentiment analysis model.

Happy Coding!

About

A guide to apply sentiment analysis on text data and make visualizations to get more insights into the data.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published