The document discusses streaming data architectures and streaming engines. It provides an overview of classic batch architectures like Hadoop and Spark and new streaming architectures using technologies like Kafka, Flink and Beam. It then examines different streaming engines, considering factors like latency, volume, data processing needs, and preferred application architecture. Key streaming engines highlighted include Apache Beam, Flink, Spark, Akka Streams and Kafka Streams.
Kawika Technologies pvt ltd Software Development Company in Trivandrum
Akka, Spark or Kafka? Selecting The Right Streaming Engine For the Job
2. Check out these resources:
Dean’s book
Webinars
etc.
Fast Data Architectures
for Streaming Applications
Getting Answers Now from Data Sets that Never End
By Dean Wampler, Ph. D., VP of Fast Data Engineering
2
lightbend.com/products/fast-data-platform
16. • Why Kafka?
Service 1
Log &
Other Files
Internet
Services
Service 2
Service 3
Services
Services
N * M links ConsumersProducers
Before:
Service 1
Log &
Other Files
Internet
Services
Service 2
Service 3
Services
Services
N + M links ConsumersProducers
After:
22. • Low latency? How low?
• High volume? How high?
• Which kinds of data processing, analytics?
• Process data in bulk or individually?
•Bulk processing of records?
•Individual processing of events?
• Preferred application architecture?
28. • Low latency? How low?
• < 1 second to minutes
ETL
Model Training
storage
Data
Model
Training
Model
Serving
Other
Logic
Logs
Ka'a
Raw Logs Topic
Parsed Logs Topic
Ka'a
Streams
Job
29. • Low latency? How low?
• > 1 minute?
• Use short batch jobs
31. • High volume? How high?
• < 1oK -100K per second?
drdobbs.com/web-development/
soa-web-services-and-restful-systems/199902676
32. • High volume? How high?
• > 1M per second?
https://store.nest.com/product/thermostat/
33. • Which kinds of data processing, analytics?
• SQL?
SELECT COUNT(*)
FROM my-iot-data
GROUP BY zip-code
val input = spark.read.
format(“parquet”).
stream(“my-iot-data”)
input.groupBy(“zip-code”).
count()
34. • Which kinds of data processing, analytics?
• “Dataflow”?
val sc = new SparkContext("local[*]", "Inverted Idx")
sc.textFile("data/crawl")
.map { line => val Array(path, text) = line.split(“t”,2); (path, text
} flatMap {
case (path, text) => text.split(“”"W+""").map((_, path))
} map {
case (w, p) => ((w, p), 1)
} reduceByKey {
case (n1, n2) => n1 + n2
} map {
35. • Which kinds of data processing, analytics?
• ETL?
ETL
Logs
Ka'a
Raw Logs Topic
Parsed Logs Topic
Ka'a
Streams
Job
36. • Which kinds of data processing, analytics?
• Train and serve ML models?
storage
Data
Model
Training
Model
Serving
Other
Logic
37. • Process data in bulk or individually?
• Individual events (i.e., CEP).
• In bulk records (i.e., each datum’s identity
unimportant).
Microservice
Microservice
Microservice
Microservice
Service
Actor 1
Event
Event
Event
Event
Event
Event
Router
Actor
Service
Actor 2
…
SA13
SA11
SA12
SA23
SA21
SA22
SELECT COUNT(*)
FROM my-iot-data
GROUP BY zip-code
47. Low Latency and
Mini-batch
Spark
Streaming
Batch
Spark
…
Low Latency
Flink
Ka9a Streams
Akka Streams
Beam
…
Persistence
S3
HDFS
DiskDiskDisk
SQL/
NoSQL
Search
1
5
6
KaFa Cluster
Ka9a
2
4
7
8
9
10
Beam
• Spark or Flink?
• Best for massive data sets
• Rich analytics
• Akka Streams or Kafka Streams
• Best for microservice
integration
• Wider flexibility
48. Check out these resources:
Dean’s book
Webinars
etc.
Fast Data Architectures
for Streaming Applications
Getting Answers Now from Data Sets that Never End
By Dean Wampler, Ph. D., VP of Fast Data Engineering
48
lightbend.com/products/fast-data-platform
49. For more information on
Lightbend Fast Data Platform:
lightbend.com/fast-data-platform