Sentiment Analysis - Drug Review using BERT Extension

The following workflow will demonstrate how to use a CNN to use the BERT extension within KNIME to do sentiment analysis for a drug review dataset.

Dataset Link

Drug Review Dataset: https://www.kaggle.com/jessicali9530/kuc-hackathon-winter-2018

Workflow Link

Drug Review Sentiment Analysis Workflow: https://tinyurl.com/2p8krhx5

Drug Review - BERT

To do sentiment analysis on a dataset containing drug reviews, we will use the BERT Extension that exists within KNIME. There are many different types of BERT models, but we will be using a general one to do our analysis. BERT stands for Bidirectional Encoder Representations from Transformers, and it has been pretrained on a very large dataset - including the entire Wikipedia. BERT looks at both the right and left of the token’s context during the training phase, hence the first part of it’s name is Bidirectional.

Image Pre-processing & Partitioning

The pre-processing done for this workflow is not very complicated as we only need to do minimal things to prepare the data to be fed into the BERT Extension. We first use the row filter nodes to make sure there are no empty rows in each of the following columns: text, drugs, sentiment.

We then use the string manipulation node to make all the texts lowercase - if you are using the distil-bert-uncased model it is not necessary to do this, however, it is still good practice to do so. Next, we use the shuffle node to shuffle all the texts and use the number to string node to convert the sentiment - originally an integer - to a string. We then partition our model into an 80-20 divide.

Conda Environment Propagation & BERT Extension

Drag and drop the Conda Environment Propagation Node into the workflow. This ensures that all the necessary packages needed in the Conda environment will be installed. Link the variable to the BERT Model Selector Node. Configure the BERT Model Selector Node to the following:

Use the following configurations for the BERT Classification Learner Node. You can choose to fine-tune in order to get better results (around 8% better accuracy) but this comes at the expense of a much longer computation time.

Post Processing & Model Evaluation

Finally, drop the Scorer into the workflow. This workflow should demonstrate an accuracy of 82% for 2 classes, and 74% for 3 classes (within the sentiment column).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sentiment Analysis - Drug Review (BERT Extension).md

Sentiment Analysis - Drug Review (BERT Extension).md

Sentiment Analysis - Drug Review using BERT Extension

Dataset Link

Workflow Link

Drug Review - BERT

Image Pre-processing & Partitioning

Conda Environment Propagation & BERT Extension

Post Processing & Model Evaluation

Files

Sentiment Analysis - Drug Review (BERT Extension).md

Latest commit

History

Sentiment Analysis - Drug Review (BERT Extension).md

File metadata and controls

Sentiment Analysis - Drug Review using BERT Extension

Dataset Link

Workflow Link

Drug Review - BERT

Image Pre-processing & Partitioning

Conda Environment Propagation & BERT Extension

Post Processing & Model Evaluation