Text mining, also known as text data mining or knowledge discovery from textual corpus, refers to the process of extracting interesting and non-trivial patterns or knowledge from text documents. Thus, text analysis to understand sentiment towards different products, entertainments and decision making is gaining popularity. For this data mining project, we want to explore Amazon movie review text data set along with movie rating by different user. Sentiment analysis of movie review can potentially trace important movie rating pattern.
Name: Paromita Nitu
Graduate Teaching Assistant
MSCS Department, Marquette University
Email: paromita.nitu@marquette.edu
Name: Zachary Boyd
Graduate student
Department of Biophysics, Medical College of Wisconsin
AND
Medical College of Wisconsin
Name: Nihel Charfi
Graduate student
MSCS Department, Marquette University
Email: nihel.charfi@marquette.edu
Name: Matthew Shafis
Graduate student
MSCS Department, Marquette University
Email: matthew.shafis@marquette.edu
ghjknbvfghjk
- dfghuijokjhbjvgcfdrtfyguhijjnhbgvcf
-The dataset consists of movie reviews from Amazon.
-The Amazon Movies Reviews dataset consists of 7,911,684 reviews Amazon users left between Aug 1997 - Oct 2012 about 253,059 products.
-As per the data format following below are the details of each column name shown below:
Product/ Product Id:This is a unique generated by Amazon and assigned to a unique movie.
User Id:The ID of the user.
Profile Name:The name of the user who found the review useful.
Score:The column signifies the times of the review.
Time:The column signifies the times of the review.
Summary:The summary of the movie.
Text:The comments and reviews written by the user about the movie.
-All coding, data manipulation, and processing will be performed using Python v3.6. A number of specific tools and packages have been identified as potentially useful for the goals of the project.
-The Natural Language Toolkit: The tool kit provides several useful tools when working with text in python, specifically it contains an implementation of a naïve Bayes classifier that can be used in sentiment analysis of the data set.
Additionally, it contains tools for generating word clouds for easy data visualization when working with text.
-The scikit-learn machine learning library
-The pandas and Numpy libraries will likely be used throughout the project for general data handling.