Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Real-Time Streaming in Any and
All Clouds, Hybrid and Beyond
Timothy Spann | Developer Advocate
Tim Spann, Developer Advocate
DZone Zone Leader and Big Data MVB Data DJay
USE CASE
IoT Ingestion: High-volume
streaming sources, sensors,
multiple message formats,
diverse protocols and
multi-vend...
KEY CHALLENGES
Visibility: Lack visibility of end-to-end streaming data flows,
inability to troubleshoot bottlenecks, consu...
Multiple users, protocols, frameworks, languages, clouds, data sources & clusters
CLOUD DATA ENGINEER
• Experience in ETL/...
StreamNative Solution
Application Messaging Data Pipelines Real-time Contextual Analytics
Tiered Storage
APP Layer
Computi...
● Apache Flink
● Apache Pulsar
● StreamNative's Flink Connector for Pulsar
● Apache NiFi
● Apache +++
FLiP(N) Stack
What is Apache NiFi?
Apache NiFi is a scalable, real-time streaming data
platform that collects, curates, and analyzes dat...
• Guaranteed delivery
• Data buffering
- Backpressure
- Pressure release
• Prioritized queuing
• Flow specific QoS
- Laten...
APACHE NIFI HIGH LEVEL CAPABILITIES
• Scale horizontal and vertically
• Scale your data flow to millions event/s
• Ingest T...
Apache NiFi
Enable easy ingestion, routing, management and delivery of any data anywhere (Edge, cloud,
data center) to any...
What is Apache Pulsar?
Apache Pulsar is an open source, cloud-native
distributed messaging and streaming platform.
A Unified Messaging Platform
Message
Queuing
Data
Streaming
Apache Pulsar
● Pub-Sub
● Geo-Replication
● Pulsar Functions
● Horizontal Scalability
● Multi-tenancy
● Tiered Persistent ...
● “Bookies”
● Stores messages and
cursors
● Messages are grouped in
segments/ledgers
● A group of bookies form an
“ensembl...
Reader and
Batch API
Pulsar
IO/Connectors
Stream Processor
Applications
Prebuilt Connectors Custom Connectors
Microservice...
Subscription Modes
Different subscription modes
have different semantics:
Exclusive/Failover -
guaranteed order, single
ac...
Unified Messaging Model
Streaming
Messaging
Producer 1
Producer 2
Pulsar
Topic/Partition
m0
m1
m2
m3
m4
Consumer D-1
Consum...
A cloud-native, real-time
messaging and streaming
platform to support
multi-cloud and hybrid
cloud strategies.
Powered
by ...
21
Apache Pulsar - Cloud Storage Sink
https://hub.streamnative.io/connectors/cloud-storage-sink/2.5.1/
● Ensure exactly-on...
22
Apache Pulsar - Other Sinks
https://hub.streamnative.io/connectors/cloud-storage-sink/2.5.1/
● mongoDB
● AWS Lambda
● r...
23
Apache Pulsar - Other Sinks
https://hub.streamnative.io/connectors/cloud-storage-sink/2.5.1/
mongoDB
AWS Lambda
redis
A...
StreamNative Hub
StreamNative Cloud
Unified Batch and Stream
COMPUTING
Batch
(Batch + Stream)
Unified Batch and Stream STORA...
25
{"ir": "252.0", "id": "20210914001822_5e4882ee-22d9-432c-9074-19f12be62006", "end":
"1631578962.03", "uuid": "nano_uuid...
26
Show Me More Data
https://github.com/tspannhw/minifi-xaviernx/
https://github.com/tspannhw/minifi-jetson-nano
https://github.com/tspannhw/Flip...
Deeper Content
● https://www.datainmotion.dev/2020/10/running-flink-sql-against-kafka-using.html
● https://www.datainmotion...
Connect with the Community
& Stay Up-To-Date
● Join the Pulsar Slack channel -
Apache-Pulsar.slack.com
● Follow @streamnat...
streamnative.io
Pulsar Summit Europe
October 6, 2021
Pulsar Summit Asia
November 20-21, 2021
Contact us at partners@pulsar...
Let’s Keep in Touch!
https://github.com/tspannhw
Tim Spann
Developer Advocate
https://twitter.com/paasDev
https://www.link...
Q&A
Big data conference europe   real-time streaming in any and all clouds, hybrid and beyond
Upcoming SlideShare
Loading in …5
×

of

Big data conference europe   real-time streaming in any and all clouds, hybrid and beyond Slide 1 Big data conference europe   real-time streaming in any and all clouds, hybrid and beyond Slide 2 Big data conference europe   real-time streaming in any and all clouds, hybrid and beyond Slide 3 Big data conference europe   real-time streaming in any and all clouds, hybrid and beyond Slide 4 Big data conference europe   real-time streaming in any and all clouds, hybrid and beyond Slide 5 Big data conference europe   real-time streaming in any and all clouds, hybrid and beyond Slide 6 Big data conference europe   real-time streaming in any and all clouds, hybrid and beyond Slide 7 Big data conference europe   real-time streaming in any and all clouds, hybrid and beyond Slide 8 Big data conference europe   real-time streaming in any and all clouds, hybrid and beyond Slide 9 Big data conference europe   real-time streaming in any and all clouds, hybrid and beyond Slide 10 Big data conference europe   real-time streaming in any and all clouds, hybrid and beyond Slide 11 Big data conference europe   real-time streaming in any and all clouds, hybrid and beyond Slide 12 Big data conference europe   real-time streaming in any and all clouds, hybrid and beyond Slide 13 Big data conference europe   real-time streaming in any and all clouds, hybrid and beyond Slide 14 Big data conference europe   real-time streaming in any and all clouds, hybrid and beyond Slide 15 Big data conference europe   real-time streaming in any and all clouds, hybrid and beyond Slide 16 Big data conference europe   real-time streaming in any and all clouds, hybrid and beyond Slide 17 Big data conference europe   real-time streaming in any and all clouds, hybrid and beyond Slide 18 Big data conference europe   real-time streaming in any and all clouds, hybrid and beyond Slide 19 Big data conference europe   real-time streaming in any and all clouds, hybrid and beyond Slide 20 Big data conference europe   real-time streaming in any and all clouds, hybrid and beyond Slide 21 Big data conference europe   real-time streaming in any and all clouds, hybrid and beyond Slide 22 Big data conference europe   real-time streaming in any and all clouds, hybrid and beyond Slide 23 Big data conference europe   real-time streaming in any and all clouds, hybrid and beyond Slide 24 Big data conference europe   real-time streaming in any and all clouds, hybrid and beyond Slide 25 Big data conference europe   real-time streaming in any and all clouds, hybrid and beyond Slide 26 Big data conference europe   real-time streaming in any and all clouds, hybrid and beyond Slide 27 Big data conference europe   real-time streaming in any and all clouds, hybrid and beyond Slide 28 Big data conference europe   real-time streaming in any and all clouds, hybrid and beyond Slide 29 Big data conference europe   real-time streaming in any and all clouds, hybrid and beyond Slide 30 Big data conference europe   real-time streaming in any and all clouds, hybrid and beyond Slide 31 Big data conference europe   real-time streaming in any and all clouds, hybrid and beyond Slide 32
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

1 Like

Share

Download to read offline

Big data conference europe real-time streaming in any and all clouds, hybrid and beyond

Download to read offline

Biography

Tim Spann is a Principal DataFlow Field Engineer at Cloudera where he works with Apache NiFi, MiniFi, Pulsar, Apache Flink, Apache MXNet, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a senior solutions architect at AirisData and a senior field engineer at Pivotal. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as IoT Fusion, Strata, ApacheCon, Data Works Summit Berlin, DataWorks Summit Sydney, and Oracle Code NYC. He holds a BS and MS in computer science.

Talk

Real-Time Streaming in Any and All Clouds, Hybrid and Beyond
Today, data is being generated from devices and containers living at the edge of networks, clouds and data centers. We need to run business logic, analytics and deep learning at the scale and as events arrive.

Tools:
Apache Flink, Apache Pulsar, Apache NiFi, MiNiFi, DJL.ai Apache MXNet.

References:

https://www.datainmotion.dev/2019/11/introducing-mm-flank-apache-flink-stack.html
https://www.datainmotion.dev/2019/08/rapid-iot-development-with-cloudera.html
https://www.datainmotion.dev/2019/09/powering-edge-ai-for-sensor-reading.html
https://www.datainmotion.dev/2019/05/dataworks-summit-dc-2019-report.html
https://www.datainmotion.dev/2019/03/using-raspberry-pi-3b-with-apache-nifi.html
Source Code: https://github.com/tspannhw/MmFLaNK

FLiP Stack
StreamNative

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Big data conference europe real-time streaming in any and all clouds, hybrid and beyond

  1. 1. Real-Time Streaming in Any and All Clouds, Hybrid and Beyond Timothy Spann | Developer Advocate
  2. 2. Tim Spann, Developer Advocate DZone Zone Leader and Big Data MVB Data DJay
  3. 3. USE CASE IoT Ingestion: High-volume streaming sources, sensors, multiple message formats, diverse protocols and multi-vendor devices creates data ingestion challenges. Other Sources: Transit data, news, twitter, status feeds, REST data, stock data and more.
  4. 4. KEY CHALLENGES Visibility: Lack visibility of end-to-end streaming data flows, inability to troubleshoot bottlenecks, consumption patterns etc. Data Ingestion: High-volume streaming sources, multiple message formats, diverse protocols and multi-vendor devices creates data ingestion challenges. Real-time Insights: Analyzing continuous and rapid inflow (velocity) of streaming data at high volumes creates major challenges for gaining real-time insights.
  5. 5. Multiple users, protocols, frameworks, languages, clouds, data sources & clusters CLOUD DATA ENGINEER • Experience in ETL/ELT • Coding skills in Python or Java • Knowledge of database query languages such as SQL • Experience with Streaming • Knowledge of Cloud Tools • Expert in ETL (Eating, Ties and Laziness) • Edge Camera Interaction • Typical User • No Coding Skills • Can use NiFi • Questions your cloud spend CAT AI / Deep Learning / ML / DS • Can run in Apache NiFi • Can run in Apache Pulsar Functions • Can run in Apache Flink • Can run in Apache Flink SQL • Can run in Apache Pulsar Clients • Can run in Apache Pulsar Microservices • Can run in Function Mesh https://functionmesh.io/ FLiP(N) Stack for Data Engineers
  6. 6. StreamNative Solution Application Messaging Data Pipelines Real-time Contextual Analytics Tiered Storage APP Layer Computing Layer Storage Layer StreamNative Platform IaaS Layer Micro Service Notification Dashboard Risk Control Auditing Payment ETL
  7. 7. ● Apache Flink ● Apache Pulsar ● StreamNative's Flink Connector for Pulsar ● Apache NiFi ● Apache +++ FLiP(N) Stack
  8. 8. What is Apache NiFi? Apache NiFi is a scalable, real-time streaming data platform that collects, curates, and analyzes data so customers gain key insights for immediate actionable intelligence.
  9. 9. • Guaranteed delivery • Data buffering - Backpressure - Pressure release • Prioritized queuing • Flow specific QoS - Latency vs. throughput - Loss tolerance • Data provenance • Supports push and pull models • Hundreds of processors • Visual command and control • Over a sixty sources • Flow templates • Pluggable/multi-role security • Designed for extension • Clustering • Version Control Why Apache NiFi?
  10. 10. APACHE NIFI HIGH LEVEL CAPABILITIES • Scale horizontal and vertically • Scale your data flow to millions event/s • Ingest TB to PB of data per day • Adapt to your flow requirements • Back pressure & Dynamic prioritization • Loss tolerant vs guaranteed delivery • Low latency vs high throughput • Secure • SSL, HTTPS, SFTP, etc. • Governance and data provenance • Extensible • Build your own processors and Controller services (providers) • Integrate with external systems (Security, Monitoring, Governance, etc)
  11. 11. Apache NiFi Enable easy ingestion, routing, management and delivery of any data anywhere (Edge, cloud, data center) to any downstream system with built in end-to-end security and provenance. ACQUIRE PROCESS DELIVER • Over 300 Prebuilt Processors • Easy to build your own • Parse, Enrich & Apply Schema • Filter, Split, Merger & Route • Throttle & Backpressure • Guaranteed Delivery • Full data provenance from acquisition to delivery • Diverse, Non-Traditional Sources • Eco-system integration Advanced tooling to industrialize flow development (Flow Development Life Cycle) FTP SFTP HL7 UDP XML HTTP EMAIL HTML IMAGE SYSLOG FTP SFTP HL7 UDP XML HTTP EMAIL HTML IMAGE SYSLOG HASH MERGE EXTRACT DUPLICATE SPLIT ROUTE TEXT ROUTE CONTENT ROUTE CONTEXT CONTROL RATE DISTRIBUTE LOAD GEOENRICH SCAN REPLACE TRANSLATE CONVERT ENCRYPT TALL EVALUATE EXECUTE
  12. 12. What is Apache Pulsar? Apache Pulsar is an open source, cloud-native distributed messaging and streaming platform.
  13. 13. A Unified Messaging Platform Message Queuing Data Streaming
  14. 14. Apache Pulsar ● Pub-Sub ● Geo-Replication ● Pulsar Functions ● Horizontal Scalability ● Multi-tenancy ● Tiered Persistent Storage ● Pulsar Connectors ● REST API ● CLI ● Many clients available ● Four Different Subscription Types ● Multi-Protocol Support ○ MQTT ○ AMQP ○ JMS ○ Kafka ○ ...
  15. 15. ● “Bookies” ● Stores messages and cursors ● Messages are grouped in segments/ledgers ● A group of bookies form an “ensemble” to store a ledger ● “Brokers” ● Handles message routing and connections ● Stateless, but with caches ● Automatic load-balancing ● Topics are composed of multiple segments ● Stores metadata for both Pulsar and BookKeeper ● Service discovery Store Messages Metadata & Service Discovery Metadata & Service Discovery Pulsar Cluster
  16. 16. Reader and Batch API Pulsar IO/Connectors Stream Processor Applications Prebuilt Connectors Custom Connectors Microservices or Event-Driven Architecture Pub/Sub API Publisher Subscriber Admin API Operators & Administrators Teams Tenant Pulsar API Design
  17. 17. Subscription Modes Different subscription modes have different semantics: Exclusive/Failover - guaranteed order, single active consumer Shared - multiple active consumers, no order Key_Shared - multiple active consumers, order for given key Producer 1 Producer 2 Pulsar Topic Subscription D Consumer D-1 Consumer D-2 Key-Shared < K 1 ,V 1 0 > < K 1 ,V 1 1 > < K 1 ,V 1 2 > < K 2 ,V 2 0 > < K 2 ,V 2 1 > < K 2 ,V 2 2 > Subscription C Consumer C-1 Consumer C-2 Shared < K 1 ,V 1 0 > < K 2 ,V 2 1 > < K 1 ,V 1 2 > < K 2 ,V 2 0 > < K 1 ,V 1 1 > < K 2 ,V 2 2 > Subscription A Consumer A Exclusive Subscription B Consumer B-1 Consumer B-2 In case of failure in Consumer B-1 Failover
  18. 18. Unified Messaging Model Streaming Messaging Producer 1 Producer 2 Pulsar Topic/Partition m0 m1 m2 m3 m4 Consumer D-1 Consumer D-2 Consumer D-3 Subscription D < k 2 , v 1 > < k 2 , v 3 > <k3,v2 > < k 1 , v 0 > < k 1 , v 4 > Key-Shared Consumer C-1 Consumer C-2 Consumer C-3 Subscription C m1 m2 m3 m4 m0 Shared Failover Consumer B-1 Consumer B-0 Subscription B m1 m2 m3 m4 m0 In case of failure in Consumer B-0 Consumer A-1 Consumer A-0 Subscription A m1 m2 m3 m4 m0 Exclusive X
  19. 19. A cloud-native, real-time messaging and streaming platform to support multi-cloud and hybrid cloud strategies. Powered by Pulsar Built for Containers Flink SQL Cloud Native
  20. 20. 21 Apache Pulsar - Cloud Storage Sink https://hub.streamnative.io/connectors/cloud-storage-sink/2.5.1/ ● Ensure exactly-once delivery. Records, which are exported using a deterministic partitioner, are delivered with exactly-once semantics regardless of the eventual consistency of cloud storage. ● Support data formats with or without a Schema. The Cloud Storage sink connector supports writing data to objects in cloud storage in either Avro, JSON, or Parquet format. Generally, the Cloud Storage sink connector may accept any data format that provides an implementation of the Format interface. ● Support time-based partitioner. The Cloud Storage sink connector supports the TimeBasedPartitioner class based on the publishTime timestamp of Pulsar messages. Time-based partitioning options are daily or hourly. ● Support more kinds of object storage. The Cloud Storage sink connector uses jclouds as an implementation of cloud storage. You can use the JAR package of the jclouds object storage to connect to more types of object storage. If you need to customize credentials, you can register ʻorg.apache.pulsar.io.jcloud.credential.JcloudsCredential` via the Service Provider Interface (SPI).
  21. 21. 22 Apache Pulsar - Other Sinks https://hub.streamnative.io/connectors/cloud-storage-sink/2.5.1/ ● mongoDB ● AWS Lambda ● redis ● AWS S3 ● GCS
  22. 22. 23 Apache Pulsar - Other Sinks https://hub.streamnative.io/connectors/cloud-storage-sink/2.5.1/ mongoDB AWS Lambda redis AWS S3 GCS
  23. 23. StreamNative Hub StreamNative Cloud Unified Batch and Stream COMPUTING Batch (Batch + Stream) Unified Batch and Stream STORAGE Offload (Queuing + Streaming) End-to-End Streaming FLiPN Edge AI Application Apache Flink - Apache Pulsar - Apache NiFi <-> Devices - GPU/TPU - Python/Go/Java Tiered Storage Pulsar --- KoP --- MoP --- Websocket --- HTTP Pulsar Sink Pulsar Sink Streaming Edge Gateway Protocols
  24. 24. 25 {"ir": "252.0", "id": "20210914001822_5e4882ee-22d9-432c-9074-19f12be62006", "end": "1631578962.03", "uuid": "nano_uuid_koo_20210914001822", "lux": "0", "gputemp": "26.0", "cputemp": "25.5", "te": "259.676094055", "systemtime": "09/13/2021 20:22:42", "hum": "52.31", "memory": 20.1, "gas": "29671.0", "pressure": "1013.62", "host": "nano2gb-desktop", "diskusage": "33312.5 MB", "ipaddress": "192.168.1.170", "macaddress": "1c:bf:ce:1a:7f:a0", "temp": "22.92", "uv": "0.01", "gputempf": "79.0", "host_name": "nano2gb-desktop", "runtime": "260.0", "cpu": 3.8, "cputempf": "78.0"} Show Me The Data!
  25. 25. 26 Show Me More Data
  26. 26. https://github.com/tspannhw/minifi-xaviernx/ https://github.com/tspannhw/minifi-jetson-nano https://github.com/tspannhw/Flip-iot https://github.com/tspannhw/FLiP-EdgeAI https://github.com/tspannhw/FLiP-CloudIngest https://github.com/tspannhw/FLiP-Transit https://github.com/tspannhw/FLiP-Jetson https://www.datainmotion.dev/2020/10/flank-streaming-edgeai-on-new-nvidia.html DEMO TIME Using NVIDIA Jetson Devices With Pulsar
  27. 27. Deeper Content ● https://www.datainmotion.dev/2020/10/running-flink-sql-against-kafka-using.html ● https://www.datainmotion.dev/2020/10/top-25-use-cases-of-cloudera-flow.html ● https://github.com/tspannhw/EverythingApacheNiFi ● https://github.com/tspannhw/CloudDemo2021 ● https://github.com/tspannhw/StreamingSQLExamples ● https://www.linkedin.com/pulse/2021-schedule-tim-spann/ ● https://github.com/tspannhw/StreamingSQLExamples/blob/8d02e62260e82b027b43abb911b5c366 a3081927/README.md ● https://www.pulsardeveloper.com/
  28. 28. Connect with the Community & Stay Up-To-Date ● Join the Pulsar Slack channel - Apache-Pulsar.slack.com ● Follow @streamnativeio and @apache_pulsar on Twitter ● Subscribe to Monthly Pulsar Newsletter for major news, events, project updates, and resources in the Pulsar community
  29. 29. streamnative.io Pulsar Summit Europe October 6, 2021 Pulsar Summit Asia November 20-21, 2021 Contact us at partners@pulsar-summit.org to become a sponsor or partner
  30. 30. Let’s Keep in Touch! https://github.com/tspannhw Tim Spann Developer Advocate https://twitter.com/paasDev https://www.linkedin.com/in/timothyspann ● https://www.datainmotion.dev/ ● https://github.com/tspannhw/SpeakerProfile ● https://dev.to/tspannhw ● https://sessionize.com/tspann/ ● https://www.slideshare.net/bunkertor Other Resources:
  31. 31. Q&A
  • bunkertor

    Sep. 27, 2021

Biography Tim Spann is a Principal DataFlow Field Engineer at Cloudera where he works with Apache NiFi, MiniFi, Pulsar, Apache Flink, Apache MXNet, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a senior solutions architect at AirisData and a senior field engineer at Pivotal. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as IoT Fusion, Strata, ApacheCon, Data Works Summit Berlin, DataWorks Summit Sydney, and Oracle Code NYC. He holds a BS and MS in computer science. Talk Real-Time Streaming in Any and All Clouds, Hybrid and Beyond Today, data is being generated from devices and containers living at the edge of networks, clouds and data centers. We need to run business logic, analytics and deep learning at the scale and as events arrive. Tools: Apache Flink, Apache Pulsar, Apache NiFi, MiNiFi, DJL.ai Apache MXNet. References: https://www.datainmotion.dev/2019/11/introducing-mm-flank-apache-flink-stack.html https://www.datainmotion.dev/2019/08/rapid-iot-development-with-cloudera.html https://www.datainmotion.dev/2019/09/powering-edge-ai-for-sensor-reading.html https://www.datainmotion.dev/2019/05/dataworks-summit-dc-2019-report.html https://www.datainmotion.dev/2019/03/using-raspberry-pi-3b-with-apache-nifi.html Source Code: https://github.com/tspannhw/MmFLaNK FLiP Stack StreamNative

Views

Total views

278

On Slideshare

0

From embeds

0

Number of embeds

61

Actions

Downloads

2

Shares

0

Comments

0

Likes

1

×