Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Apache Storm based Real Time Analytics for Recommending Trending Topics and Sentiment Analysis on Cloud Compouting Environment
1. Akhmedov Khumoyun
Storm based Real Time Analytics
for Recommending Trending
Topics and Sentiment Analysis
on Cloud Computing
Environment
Konkuk 2015 humoyun@konkuk.ac.kr
SMCC
Lab
Social Media Cloud Computing
Research Center
2. Outline
• Motivation
• Real Time Systems and CEP
• Storm Introduction
• Used Technologies
• Related Work
• System Overview
• System Architecture
• Use Case: Social Media Analytics by SAS
3. Motivation
• Real time computation is on demand
• Responding to the problem almost instantly
• Business value
• Tightly connected to Cloud Computing
• Batch processing limitations
• and …
4. Real Time Systems and CEP
• Real Time System?
Real-time system has been described as one which “controls an environment
by receiving data, processing then, and returning the results sufficiently and
quickly to affect the environment at that time”. Real-time response latency is
often in the order of seconds, or milliseconds.
• CEP(Complex Event Processing)?
CEP is event processing that combines data from multiple sources to infer
events or patterns that suggest more complicated circumstances. The goal of
CEP is to identify meaningful events (such as threats of attacks) and respond
to them asap.
5. Apache Storm is
• Fast & scalable
• Fault-tolerant
• Guarantees messages will be processed
• Easy to setup & operate
• Free & open source
distributes real time computation system
- originally developed by Nathan Marz at BackType
(acquired by Twitter)
8. Why we need Kafka
Apache Kafka is an ideal source for Storm topologies. It provides everything
necessary for :
- At most once processing
- At least once processing
- Exactly once processing
Apache Storm includes Kafka spout implementations for all levels of reliability.
Kafka supports a wide variety of languages and integration points for both
producers and consumers.
9. Used Technologies
• Apache Storm
• Apache HBase
• MySQL
• Hadoop2
• Apache ZooKeeper
• Apache Kafka (message broker)
• Java and some Python
• jQuery and Bootstrap
• Play Framework(Java) or Django(Python)
10. System Overview
• Trending Topics?
“Twitter Trends are automatically generated by an algorithm that attempts to
identify topics that are being talked about more now than were previously.”
The Trends list is designed to help people discover the most hottest topics,
breaking news from across the world, in real-time.
• Sentiment Analysis?
Generally speaking, sentiment analysis aims to determine the attitude of a
speaker or a writer with respect to some topic or the overall contextual
polarity of a document.
14. Top Ten Trending Tweets
N User Tweets Sentiment
1 BigData Red Hat Offers Apache Hadoop Big Data Services For Business Critica
l Workloads : http://tinyurl.com/qb83boj
Positive
2 Checkmax Secure your source code. http://bit.ly/1MnVRwQ Get a full vulnerabilit
y report and prevent security breaches
Negative
3 Time.com 5 players to follow in the Women’s World Cup http://ti.me/1Lk
M0Ku
Neutral
4 …. …. ….
. …. …. ….
. …. …. ….
8 …. …. ….
9 Iran #Iran, #Russia discuss regional development, #SCO membershi
p http://theiranproject.com/blog/2015/06/20/iran-russia-discus
s-regional-development-sco-membership/ …
Negative
10 ….. …. ….
15. Sentiment Analysis
To find sentiment of incoming tweets I will use some Machine Learning
algorithms such as Naïve Bayesian Algorithm (predictive learning) and other
related techniques.
Besides, I will use predefined reference sentiment dictionary as a model for
efficiently determine sentiment value of tweets.