About the Project

Aspect Extraction and Opinion Analysis is an academic project as a part of the CSE6242 course offered by Dr Polo Chau at Georgia Institute of Technology.

Important Links of the project -

Opinion mining or sentiment analysis is the computational analysis of a person’s emotion towards entities like products and services. It can be done at three levels - document, sentence and aspect. We have implemented an aspect-based analysis system to extract various aspects of an entity from Amazon product reviews, group them and determine the respective polarities.

  1. Why this Project?
  2. Solution Overview

Why this Project?

Giving a bad review for a product seldom means that it is bad in every aspect. If an Amazon delivery was delayed by a week, the bad review would not necessarily reflect product quality. It is important for customers and sellers to understand what exactly the negative review was about.

Consumers and sellers spend a large amount of time reading through long reviews to find out what is perceived as good and bad about a product. Amazon currently has a feature that lets users filter reviews by popular keywords, which is still tedious and time-consuming for customers. The users have to read through numerous reviews to get the relevant information about the products that they need.

Looking at the example below, we can notice that a mobile can have two main aspects - battery & camera. Customers and sellers both can gain insights if a model can identify the opinions of such aspects -

Solution Overview

When we step back and think about different steps involved in the process, the pipeline seems very complicated. The intuition behind our model is that the aspects extracted from a set of reviews of a product can be similar or related to one other. Users may discuss the same features of a product in different words; clustering the aspects before determining the most frequently mentioned aspects would prevent the omission of key aspects. Additionally, it will also ensure that there is no redundancy.

We broke down the whole process into sub modules -

  • Getting the data ready

    Amazon has made all the product reviews from 1995-2015 available on S3 buckets. This made the first part of any data science pipeline - Getting the data, a bit easy for the project. However, the massive size of the dataset brought its own set of challenges.

  • Identifying Aspects

    The objective of this step was to extract instances of product aspects and modifiers that express the opinion about a particular aspect.

  • Grouping Aspects into clusters and giving polarity scores

    The extracted aspects were grouped into different clusters, each denoting a distinct property of the product. And based on the modifiers, polarity scores were given to each cluster.

  • Visualising the results

    With the motive of developing an end-product, we modelled an interactive UI so that users can gain insights from reviews.