Why this Project?
Giving a bad review for a product seldom means that it is bad in every aspect. If an Amazon delivery was delayed by a week, the bad review would not necessarily reflect product quality. It is important for customers and sellers to understand what exactly the negative review was about.
Consumers and sellers spend a large amount of time reading through long reviews to find out what is perceived as good and bad about a product. Amazon currently has a feature that lets users filter reviews by popular keywords, which is still tedious and time-consuming for customers. The users have to read through numerous reviews to get the relevant information about the products that they need.
Looking at the example below, we can notice that a mobile can have two main aspects - battery & camera. Customers and sellers both can gain insights if a model can identify the opinions of such aspects -
When we step back and think about different steps involved in the process, the pipeline seems very complicated. The intuition behind our model is that the aspects extracted from a set of reviews of a product can be similar or related to one other. Users may discuss the same features of a product in different words; clustering the aspects before determining the most frequently mentioned aspects would prevent the omission of key aspects. Additionally, it will also ensure that there is no redundancy.
We broke down the whole process into sub modules -
Getting the data ready
Amazon has made all the product reviews from 1995-2015 available on S3 buckets. This made the first part of any data science pipeline - Getting the data, a bit easy for the project. However, the massive size of the dataset brought its own set of challenges.
The objective of this step was to extract instances of product aspects and modifiers that express the opinion about a particular aspect.
Grouping Aspects into clusters and giving polarity scores
The extracted aspects were grouped into different clusters, each denoting a distinct property of the product. And based on the modifiers, polarity scores were given to each cluster.
Visualising the results
With the motive of developing an end-product, we modelled an interactive UI so that users can gain insights from reviews.