- extraction of single web page content
- analysis of single opinion structure
Component | CSS selector | Variable name | Data type |
---|---|---|---|
Opinion | div.js_product-review | opinion | dict |
Opinion id | ["data-entry-id"] | opinion_id | str |
Author | span.user-post__author-name | author | str |
Recommendation | span.user-post__author-recomendation > em | recommendation | bool |
Stars rating | span.user-post__score-count | stars | float |
Content | div.user-post__text | content | str |
Advantages | div.review-feature__col:has(> div.[class$="positives"]) > div.review-feature__item | pros | list(str) |
Disadvantages | div.review-feature__col:has(> div.[class$="negatives"]) > div.review-feature__item | cons | list(str) |
Verification | div.review-pz | verified | bool |
Post date | span.user-post__published > time:nth-child(1)["datetime"] | post_date | str |
Purchase date | span.user-post__published > time:nth-child(2)["datetime"] | purchase_date | str |
Usefulness count | span.[id^="votes-yes"] | usefulness | int |
Uselessness count | span.[id^="votes-no"] | uselessness | int |
- extraction of single opinion components
- transformation of extracted data to given data types
- definition of dictionary to store all components of single opinion
- definition of list for opinions' dictionaries storing
- implementation of loop traversing through all opinions from single page
- implementation of loop traversing through consecutive pages with options
- loading extracted options to .json file
- parametrization of product id and reading product id from standart input
- implementation of component extraction function
- using dictionary with components selectors and comprehension for single opinion representation
- displaying list of products for which opinions have been exracted
- reading data from .json file representing single product to dataframe
- calculating basic statistics
- average score
- number of opinions for which list of advantages was given
- number of opinions for which list of disadvantages was given
- pie chart showing share of particular recommendations
- bar chart showing frequency of individual ratings