SlideShare uma empresa Scribd logo
1 de 29
Sentiment Analysis of movie reviews
Introduction
• In an era where the digital landscape is flooded with an abundance of
user-generated content, understanding the sentiments expressed in
movie reviews provides valuable insights into audience reactions,
preferences, and also, provide filmmakers with feedback on how their
work is being received.
• Sentiment analysis is a technique for analyzing a piece of text to
determine the sentiment contained within it. In our case, we have been
given an IMDB movie review dataset that contains about 50k
sentimental movie reviews as positive or negative.
• However, our aim is to study the given dataset, build and train a model
such that it would be able to classify a new unseen review as positive or
negative accurately.
Problem Statement
• In the realm of the ever-expanding digital landscape and the proliferation
of user-generated content, the film industry faces a pressing need to
systematically understand and analyze the sentiments expressed in
movie reviews.
• Moviegoers share their opinions across diverse platforms, including
websites, social media, and online forums, offering a rich tapestry of
sentiments ranging from enthusiastic praise to critical evaluation.
• The problem at hand involves developing an effective sentiment analysis
system tailored specifically for movie reviews. This system must
automatically categorize and interpret the sentiments expressed in
textual content, classifying them as positive, or negative
How will the ML model help
• Insight Generation:- By accurately classifying sentiments, the model will
generate actionable insights into how audiences perceive and react to
movies. Filmmakers and studios can gain a comprehensive understanding of
the strengths and weaknesses of their films, helping them make informed
decisions for future projects.
• Box Office Predictions:- The model's analysis of sentiments can contribute to
predicting box office performance. Positive sentiments often correlate with
higher audience interest, potentially leading to increased box office revenue.
This predictive capability provides stakeholders with valuable foresight into a
film's commercial success.
• Marketing Strategy Optimization:- The model's outputs can guide the
optimization of marketing and promotional strategies. Positive sentiments can
be leveraged to create compelling promotional content, while addressing
negative sentiments allows for targeted improvements and strategic
communication to manage public perception.
Challenges Faced
• Nuanced language:- Movie reviews often contain nuanced language,
sarcasm, irony, or humor. Capturing these subtleties can be challenging
for sentiment analysis models, as they may misinterpret the intended
sentiment.
• Subjectivity:- Sentiment is inherently subjective, and individuals may
express their opinions in diverse ways. Differentiating between personal
opinions and objective statements poses a challenge, as models need
to navigate the subjective nature of language.
• Contextual understanding:- Understanding the context in which certain
phrases or cultural references are used is crucial. A lack of contextual
understanding may lead to misinterpretation of sentiments, especially
when specific references are involved.
Proposed System
• Data collection:- The dataset should encompass a wide range of
reviews, including different outcomes. By including diverse data, the
model can learn patterns from the reviews.
• EDA:- Exploratory Data Analysis (EDA) plays a crucial role in
understanding the dataset and extracting meaningful insights, which can
aid in predicting the sentiment of the reviews.
• Data preprocessing:-
 Removing html tags:- Our reviews have html tags because this data is scraped
from the internet so we will have to remove the html tags.
Converting everything to lower case:- Here we will convert all the words to lower
case.
 Removing all punctuations:- We will remove all the punctuations used in the reviews as it is of no use.
 Spelling correction:- Here we will correct all the spelling mistakes in the reviews using the .correct() function
 Tokenization:- This involves breaking down a text into smaller units called tokens.
Removing stop words:- Stop words are words like and,or,the,from which exist in
the reviews and are of no use in training the model so we will remove them.
Stemming:- This will convert all the similar words to the most basic version of the
word. Example - playing, played will all be converted to play.
• Feature Extraction:-
Bag Of Words:- The "Bag of Words" (BoW) model is a common and
straightforward technique used in natural language processing (NLP) for
representing textual data. The basic idea behind the Bag of Words
model is to represent a document as an unordered set of words. The
Bag of Words model helps convert raw textual data into a numerical
format that machine learning algorithms can understand.
• Model Training:- Model training is a crucial step in machine learning
where a model learns to make predictions or decisions by being
exposed to a labeled dataset. In the context of sentiment analysis, the
training process involves teaching the model to associate features
extracted from text data (such as bag of words or word embeddings)
with corresponding sentiment labels (positive, or negative).
• Model Selection:-
Multinomial Naïve Bayes:- Multinomial Naive Bayes is often considered
a suitable choice for sentiment analysis of text, including movie reviews,
due to several characteristics that align well with the nature of the task.
Sentiment analysis is essentially a text classification task where the goal
is to assign a sentiment label (positive, or negative) to a given
document. Multinomial Naive Bayes is particularly well-suited for such
classification tasks.
 Logistic Regression:- Logistic Regression is another commonly used
model for sentiment analysis, including the analysis of movie reviews.
Sentiment analysis is often treated as a binary classification task where
the goal is to predict whether a document (e.g., a movie review)
expresses positive or negative sentiment. Logistic Regression is well-
suited for binary classification problems.
 Random Forest:- Random Forest is an ensemble of decision trees. It
combines the predictions of multiple weak learners (individual decision
trees) to create a more robust and accurate model. Ensemble methods
often lead to improved generalization performance. Random Forest
provides a feature importance ranking, indicating the contribution of
each feature (word) to the overall predictive performance. This can be
valuable for understanding which words play a crucial role in
determining a sentiment.
• Model Evaluation:-
Multinomial Naive Bayes:-
In this model, we got the accuracy of 84%. The precision was 83% and
recall was 87% for negative sentiments and the precision was 86% and
recall was 82% for positive sentiments
Logistic Regression:-
In this model, we got the accuracy of 85%. The precision was 86% and
recall was 84% for negative sentiments and we got the precision of 84%
and recall of 86% for positive sentiments
Random forest:-
In this model, we got the accuracy of 83%. The precision was 82% and recall was
83% for negative sentiments and we got the precision of 83% and recall of 82% for
positive sentiments respectively.
From the above figures, it is clear that we should select Logistic Regression as our
final model as it gives us the higher level of accuracy. Also, we can see that the
precision and recall values are almost the same for all the models.
• Error analysis:-
Multinomial Naïve Bayes:-
In Multinomial Naive Bayes, we got 2530 correct predictions and 470
wrong predictions.
Logistic Regression:-
In Logistic Regression, we got 2549 correct predictions and 451
erroneous predictions.
Random Forest:-
In Random Forest, we got 2504 correct predictions and 496 incorrect
predictions.
• Hyperparameter tuning:-
Multinomial Naive Bayes:-
By tuning the hyperparameters, we have increased the accuracy of 1%
which is from 84% accuracy to 85% accuracy.
Logistic Regression:-
Tuning the hyperparameters of logistic regression helped us increase the
accuracy of 1% as we got 86% accuracy post hyperparameter tuning.
Random Forest:-
There was no change observed in the case of hyperparameter tuning of
random forest as the accuracy remained the same.
• Experimenting with Term Frequency - Inverse Document Frequency (TF-
IDF) Vectorizer:-
Multinomial Naive Bayes:-
Using TFIDF vectorizer increased the accuracy by 2% as we got 86%
accuracy which was 84% previously
Logistic Regression:-
Using TFIDF helped us achieve 2% higher accuracy as we got 87%
accuracy for Logistic Regression
Random Forest:-
Here in the case of random forest, the accuracy was increased by 1%
when TFIDF vectorizer was used
• Conclusion:- In this sentiment analysis of movie reviews, we aimed to assess the
performance of our sentiment analysis model on a diverse set of movie reviews.
Through our investigation, we have gained valuable insights into the both rightly
and wrongly classified instances. Our sentiment analysis model demonstrated
commendable accuracy, achieving an accuracy of 87% based on the Logistic
Regression model. Positive sentiments were well-captured, with precision of 86%
and recall of 89%. Negative sentiments had a precision of 89% and a recall of
85%.
• Limitations of the model:-
Lack of Context Understanding:- Sentiment analysis models may struggle to
understand the context in which certain words or phrases are used in movie
reviews. For instance, positive words in a sarcastic context might be
misinterpreted.
Overemphasis on Keywords:- Some models may rely heavily on specific
keywords, potentially leading to misclassifications when sentiments are expressed
through less common or synonymous terms.
• Future scope:-
Fine-grained Sentiment Analysis:- Future models can aim for more fine-
grained sentiment analysis, capturing not only positive and negative
sentiments but also nuanced emotions and sentiments across a
spectrum. This could involve incorporating more granular sentiment
categories or intensity levels.
Multimodal Sentiment Analysis:- Integrating information from multiple
modalities, such as text, images, and possibly even video clips from
movie reviews, can provide a more comprehensive understanding of
sentiment. This could enhance the model's accuracy by considering
visual cues and expressions.
Domain-specific Adaptations:- Designing sentiment analysis models
specifically tailored for the domain of movie reviews can lead to improved
accuracy. Considering film-related terminology, genre-specific sentiments,
and understanding cinematic nuances can enhance the model's
performance in this context.
Analyzing Movie Reviews : Machine learning project

Mais conteúdo relacionado

Semelhante a Analyzing Movie Reviews : Machine learning project

NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment AnalysisAnkur Tyagi
 
Recommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic AlgorithmRecommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic AlgorithmVaibhav Varshney
 
Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluationeShikshak
 
IRJET- Survey of Classification of Business Reviews using Sentiment Analysis
IRJET- Survey of Classification of Business Reviews using Sentiment AnalysisIRJET- Survey of Classification of Business Reviews using Sentiment Analysis
IRJET- Survey of Classification of Business Reviews using Sentiment AnalysisIRJET Journal
 
Proceedings Template - WORD
Proceedings Template - WORDProceedings Template - WORD
Proceedings Template - WORDbutest
 
Opinion Mining or Sentiment Analysis
Opinion Mining or Sentiment AnalysisOpinion Mining or Sentiment Analysis
Opinion Mining or Sentiment AnalysisRachna Raveendran
 
A SURVEY OF SENTIMENT CLASSSIFICTION TECHNIQUES
A SURVEY OF SENTIMENT CLASSSIFICTION TECHNIQUESA SURVEY OF SENTIMENT CLASSSIFICTION TECHNIQUES
A SURVEY OF SENTIMENT CLASSSIFICTION TECHNIQUESJournal For Research
 
AI_attachment.pptx prepared for all students
AI_attachment.pptx prepared for all  studentsAI_attachment.pptx prepared for all  students
AI_attachment.pptx prepared for all studentstalldesalegn
 
Dowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inferenceDowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inferenceAmit Sharma
 
Final Video on Sustainability by IndustryStudent instructions fo.docx
Final Video on Sustainability by IndustryStudent instructions fo.docxFinal Video on Sustainability by IndustryStudent instructions fo.docx
Final Video on Sustainability by IndustryStudent instructions fo.docxlmelaine
 
Sentiment Analysis Using Hybrid Approach: A Survey
Sentiment Analysis Using Hybrid Approach: A SurveySentiment Analysis Using Hybrid Approach: A Survey
Sentiment Analysis Using Hybrid Approach: A SurveyIJERA Editor
 

Semelhante a Analyzing Movie Reviews : Machine learning project (20)

NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 
presentation
presentationpresentation
presentation
 
sentiment analysis
sentiment analysis sentiment analysis
sentiment analysis
 
data analysis.ppt
data analysis.pptdata analysis.ppt
data analysis.ppt
 
data analysis.pptx
data analysis.pptxdata analysis.pptx
data analysis.pptx
 
Recommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic AlgorithmRecommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic Algorithm
 
Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluation
 
IRJET- Survey of Classification of Business Reviews using Sentiment Analysis
IRJET- Survey of Classification of Business Reviews using Sentiment AnalysisIRJET- Survey of Classification of Business Reviews using Sentiment Analysis
IRJET- Survey of Classification of Business Reviews using Sentiment Analysis
 
Proceedings Template - WORD
Proceedings Template - WORDProceedings Template - WORD
Proceedings Template - WORD
 
Sentiment analysis
Sentiment analysisSentiment analysis
Sentiment analysis
 
ACL-IJCNLP 2015
ACL-IJCNLP 2015ACL-IJCNLP 2015
ACL-IJCNLP 2015
 
Opinion Mining or Sentiment Analysis
Opinion Mining or Sentiment AnalysisOpinion Mining or Sentiment Analysis
Opinion Mining or Sentiment Analysis
 
A SURVEY OF SENTIMENT CLASSSIFICTION TECHNIQUES
A SURVEY OF SENTIMENT CLASSSIFICTION TECHNIQUESA SURVEY OF SENTIMENT CLASSSIFICTION TECHNIQUES
A SURVEY OF SENTIMENT CLASSSIFICTION TECHNIQUES
 
AI_attachment.pptx prepared for all students
AI_attachment.pptx prepared for all  studentsAI_attachment.pptx prepared for all  students
AI_attachment.pptx prepared for all students
 
Lecture 3 ml
Lecture 3 mlLecture 3 ml
Lecture 3 ml
 
Dowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inferenceDowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inference
 
Final Video on Sustainability by IndustryStudent instructions fo.docx
Final Video on Sustainability by IndustryStudent instructions fo.docxFinal Video on Sustainability by IndustryStudent instructions fo.docx
Final Video on Sustainability by IndustryStudent instructions fo.docx
 
Sentiment analysis
Sentiment analysisSentiment analysis
Sentiment analysis
 
Sentiment Analysis Using Hybrid Approach: A Survey
Sentiment Analysis Using Hybrid Approach: A SurveySentiment Analysis Using Hybrid Approach: A Survey
Sentiment Analysis Using Hybrid Approach: A Survey
 

Mais de Boston Institute of Analytics

Enhancing Cybersecurity: An In-depth Analysis of Travelblog.org
Enhancing Cybersecurity: An In-depth Analysis of Travelblog.orgEnhancing Cybersecurity: An In-depth Analysis of Travelblog.org
Enhancing Cybersecurity: An In-depth Analysis of Travelblog.orgBoston Institute of Analytics
 
Exploring Web Security Threats: A Practical Study on SQL Injection and CSRF
Exploring Web Security Threats: A Practical Study on SQL Injection and CSRFExploring Web Security Threats: A Practical Study on SQL Injection and CSRF
Exploring Web Security Threats: A Practical Study on SQL Injection and CSRFBoston Institute of Analytics
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachBoston Institute of Analytics
 
Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...
Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...
Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...Boston Institute of Analytics
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
NLP Based project presentation: Analyzing Automobile Prices
NLP Based project presentation: Analyzing Automobile PricesNLP Based project presentation: Analyzing Automobile Prices
NLP Based project presentation: Analyzing Automobile PricesBoston Institute of Analytics
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationBoston Institute of Analytics
 
Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud Detection
Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud DetectionCombating Fraudulent Transactions: A Deep Dive into Credit Card Fraud Detection
Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud DetectionBoston Institute of Analytics
 
Predicting Liver Disease in India: A Machine Learning Approach
Predicting Liver Disease in India: A Machine Learning ApproachPredicting Liver Disease in India: A Machine Learning Approach
Predicting Liver Disease in India: A Machine Learning ApproachBoston Institute of Analytics
 
Employee Churn Prediction: Artificial Intelligence Project Presentation
Employee Churn Prediction: Artificial Intelligence Project PresentationEmployee Churn Prediction: Artificial Intelligence Project Presentation
Employee Churn Prediction: Artificial Intelligence Project PresentationBoston Institute of Analytics
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 

Mais de Boston Institute of Analytics (20)

Enhancing Cybersecurity: An In-depth Analysis of Travelblog.org
Enhancing Cybersecurity: An In-depth Analysis of Travelblog.orgEnhancing Cybersecurity: An In-depth Analysis of Travelblog.org
Enhancing Cybersecurity: An In-depth Analysis of Travelblog.org
 
Exploring Web Security Threats: A Practical Study on SQL Injection and CSRF
Exploring Web Security Threats: A Practical Study on SQL Injection and CSRFExploring Web Security Threats: A Practical Study on SQL Injection and CSRF
Exploring Web Security Threats: A Practical Study on SQL Injection and CSRF
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Detecting Credit Card Fraud: An AI-driven Approach
Detecting Credit Card Fraud: An AI-driven ApproachDetecting Credit Card Fraud: An AI-driven Approach
Detecting Credit Card Fraud: An AI-driven Approach
 
Predicting House Prices: A Machine Learning Approach
Predicting House Prices: A Machine Learning ApproachPredicting House Prices: A Machine Learning Approach
Predicting House Prices: A Machine Learning Approach
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...
Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...
Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
NLP Based project presentation: Analyzing Automobile Prices
NLP Based project presentation: Analyzing Automobile PricesNLP Based project presentation: Analyzing Automobile Prices
NLP Based project presentation: Analyzing Automobile Prices
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health Classification
 
Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud Detection
Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud DetectionCombating Fraudulent Transactions: A Deep Dive into Credit Card Fraud Detection
Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud Detection
 
Predicting Liver Disease in India: A Machine Learning Approach
Predicting Liver Disease in India: A Machine Learning ApproachPredicting Liver Disease in India: A Machine Learning Approach
Predicting Liver Disease in India: A Machine Learning Approach
 
Employee Churn Prediction: Artificial Intelligence Project Presentation
Employee Churn Prediction: Artificial Intelligence Project PresentationEmployee Churn Prediction: Artificial Intelligence Project Presentation
Employee Churn Prediction: Artificial Intelligence Project Presentation
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 

Último

The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...RKavithamani
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 

Último (20)

The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 

Analyzing Movie Reviews : Machine learning project

  • 1. Sentiment Analysis of movie reviews
  • 2. Introduction • In an era where the digital landscape is flooded with an abundance of user-generated content, understanding the sentiments expressed in movie reviews provides valuable insights into audience reactions, preferences, and also, provide filmmakers with feedback on how their work is being received. • Sentiment analysis is a technique for analyzing a piece of text to determine the sentiment contained within it. In our case, we have been given an IMDB movie review dataset that contains about 50k sentimental movie reviews as positive or negative. • However, our aim is to study the given dataset, build and train a model such that it would be able to classify a new unseen review as positive or negative accurately.
  • 3. Problem Statement • In the realm of the ever-expanding digital landscape and the proliferation of user-generated content, the film industry faces a pressing need to systematically understand and analyze the sentiments expressed in movie reviews. • Moviegoers share their opinions across diverse platforms, including websites, social media, and online forums, offering a rich tapestry of sentiments ranging from enthusiastic praise to critical evaluation. • The problem at hand involves developing an effective sentiment analysis system tailored specifically for movie reviews. This system must automatically categorize and interpret the sentiments expressed in textual content, classifying them as positive, or negative
  • 4. How will the ML model help • Insight Generation:- By accurately classifying sentiments, the model will generate actionable insights into how audiences perceive and react to movies. Filmmakers and studios can gain a comprehensive understanding of the strengths and weaknesses of their films, helping them make informed decisions for future projects. • Box Office Predictions:- The model's analysis of sentiments can contribute to predicting box office performance. Positive sentiments often correlate with higher audience interest, potentially leading to increased box office revenue. This predictive capability provides stakeholders with valuable foresight into a film's commercial success. • Marketing Strategy Optimization:- The model's outputs can guide the optimization of marketing and promotional strategies. Positive sentiments can be leveraged to create compelling promotional content, while addressing negative sentiments allows for targeted improvements and strategic communication to manage public perception.
  • 5. Challenges Faced • Nuanced language:- Movie reviews often contain nuanced language, sarcasm, irony, or humor. Capturing these subtleties can be challenging for sentiment analysis models, as they may misinterpret the intended sentiment. • Subjectivity:- Sentiment is inherently subjective, and individuals may express their opinions in diverse ways. Differentiating between personal opinions and objective statements poses a challenge, as models need to navigate the subjective nature of language. • Contextual understanding:- Understanding the context in which certain phrases or cultural references are used is crucial. A lack of contextual understanding may lead to misinterpretation of sentiments, especially when specific references are involved.
  • 6. Proposed System • Data collection:- The dataset should encompass a wide range of reviews, including different outcomes. By including diverse data, the model can learn patterns from the reviews. • EDA:- Exploratory Data Analysis (EDA) plays a crucial role in understanding the dataset and extracting meaningful insights, which can aid in predicting the sentiment of the reviews.
  • 7. • Data preprocessing:-  Removing html tags:- Our reviews have html tags because this data is scraped from the internet so we will have to remove the html tags. Converting everything to lower case:- Here we will convert all the words to lower case.
  • 8.  Removing all punctuations:- We will remove all the punctuations used in the reviews as it is of no use.  Spelling correction:- Here we will correct all the spelling mistakes in the reviews using the .correct() function  Tokenization:- This involves breaking down a text into smaller units called tokens.
  • 9. Removing stop words:- Stop words are words like and,or,the,from which exist in the reviews and are of no use in training the model so we will remove them. Stemming:- This will convert all the similar words to the most basic version of the word. Example - playing, played will all be converted to play.
  • 10. • Feature Extraction:- Bag Of Words:- The "Bag of Words" (BoW) model is a common and straightforward technique used in natural language processing (NLP) for representing textual data. The basic idea behind the Bag of Words model is to represent a document as an unordered set of words. The Bag of Words model helps convert raw textual data into a numerical format that machine learning algorithms can understand.
  • 11. • Model Training:- Model training is a crucial step in machine learning where a model learns to make predictions or decisions by being exposed to a labeled dataset. In the context of sentiment analysis, the training process involves teaching the model to associate features extracted from text data (such as bag of words or word embeddings) with corresponding sentiment labels (positive, or negative).
  • 12. • Model Selection:- Multinomial Naïve Bayes:- Multinomial Naive Bayes is often considered a suitable choice for sentiment analysis of text, including movie reviews, due to several characteristics that align well with the nature of the task. Sentiment analysis is essentially a text classification task where the goal is to assign a sentiment label (positive, or negative) to a given document. Multinomial Naive Bayes is particularly well-suited for such classification tasks.
  • 13.  Logistic Regression:- Logistic Regression is another commonly used model for sentiment analysis, including the analysis of movie reviews. Sentiment analysis is often treated as a binary classification task where the goal is to predict whether a document (e.g., a movie review) expresses positive or negative sentiment. Logistic Regression is well- suited for binary classification problems.
  • 14.  Random Forest:- Random Forest is an ensemble of decision trees. It combines the predictions of multiple weak learners (individual decision trees) to create a more robust and accurate model. Ensemble methods often lead to improved generalization performance. Random Forest provides a feature importance ranking, indicating the contribution of each feature (word) to the overall predictive performance. This can be valuable for understanding which words play a crucial role in determining a sentiment.
  • 15. • Model Evaluation:- Multinomial Naive Bayes:- In this model, we got the accuracy of 84%. The precision was 83% and recall was 87% for negative sentiments and the precision was 86% and recall was 82% for positive sentiments
  • 16. Logistic Regression:- In this model, we got the accuracy of 85%. The precision was 86% and recall was 84% for negative sentiments and we got the precision of 84% and recall of 86% for positive sentiments
  • 17. Random forest:- In this model, we got the accuracy of 83%. The precision was 82% and recall was 83% for negative sentiments and we got the precision of 83% and recall of 82% for positive sentiments respectively. From the above figures, it is clear that we should select Logistic Regression as our final model as it gives us the higher level of accuracy. Also, we can see that the precision and recall values are almost the same for all the models.
  • 18. • Error analysis:- Multinomial Naïve Bayes:- In Multinomial Naive Bayes, we got 2530 correct predictions and 470 wrong predictions.
  • 19. Logistic Regression:- In Logistic Regression, we got 2549 correct predictions and 451 erroneous predictions.
  • 20. Random Forest:- In Random Forest, we got 2504 correct predictions and 496 incorrect predictions.
  • 21. • Hyperparameter tuning:- Multinomial Naive Bayes:- By tuning the hyperparameters, we have increased the accuracy of 1% which is from 84% accuracy to 85% accuracy.
  • 22. Logistic Regression:- Tuning the hyperparameters of logistic regression helped us increase the accuracy of 1% as we got 86% accuracy post hyperparameter tuning.
  • 23. Random Forest:- There was no change observed in the case of hyperparameter tuning of random forest as the accuracy remained the same.
  • 24. • Experimenting with Term Frequency - Inverse Document Frequency (TF- IDF) Vectorizer:- Multinomial Naive Bayes:- Using TFIDF vectorizer increased the accuracy by 2% as we got 86% accuracy which was 84% previously
  • 25. Logistic Regression:- Using TFIDF helped us achieve 2% higher accuracy as we got 87% accuracy for Logistic Regression
  • 26. Random Forest:- Here in the case of random forest, the accuracy was increased by 1% when TFIDF vectorizer was used
  • 27. • Conclusion:- In this sentiment analysis of movie reviews, we aimed to assess the performance of our sentiment analysis model on a diverse set of movie reviews. Through our investigation, we have gained valuable insights into the both rightly and wrongly classified instances. Our sentiment analysis model demonstrated commendable accuracy, achieving an accuracy of 87% based on the Logistic Regression model. Positive sentiments were well-captured, with precision of 86% and recall of 89%. Negative sentiments had a precision of 89% and a recall of 85%. • Limitations of the model:- Lack of Context Understanding:- Sentiment analysis models may struggle to understand the context in which certain words or phrases are used in movie reviews. For instance, positive words in a sarcastic context might be misinterpreted. Overemphasis on Keywords:- Some models may rely heavily on specific keywords, potentially leading to misclassifications when sentiments are expressed through less common or synonymous terms.
  • 28. • Future scope:- Fine-grained Sentiment Analysis:- Future models can aim for more fine- grained sentiment analysis, capturing not only positive and negative sentiments but also nuanced emotions and sentiments across a spectrum. This could involve incorporating more granular sentiment categories or intensity levels. Multimodal Sentiment Analysis:- Integrating information from multiple modalities, such as text, images, and possibly even video clips from movie reviews, can provide a more comprehensive understanding of sentiment. This could enhance the model's accuracy by considering visual cues and expressions. Domain-specific Adaptations:- Designing sentiment analysis models specifically tailored for the domain of movie reviews can lead to improved accuracy. Considering film-related terminology, genre-specific sentiments, and understanding cinematic nuances can enhance the model's performance in this context.