Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

of

DBCC 2021 - FLiP Stack for Cloud Data Lakes Slide 1 DBCC 2021 - FLiP Stack for Cloud Data Lakes Slide 2 DBCC 2021 - FLiP Stack for Cloud Data Lakes Slide 3 DBCC 2021 - FLiP Stack for Cloud Data Lakes Slide 4 DBCC 2021 - FLiP Stack for Cloud Data Lakes Slide 5 DBCC 2021 - FLiP Stack for Cloud Data Lakes Slide 6 DBCC 2021 - FLiP Stack for Cloud Data Lakes Slide 7 DBCC 2021 - FLiP Stack for Cloud Data Lakes Slide 8 DBCC 2021 - FLiP Stack for Cloud Data Lakes Slide 9 DBCC 2021 - FLiP Stack for Cloud Data Lakes Slide 10 DBCC 2021 - FLiP Stack for Cloud Data Lakes Slide 11 DBCC 2021 - FLiP Stack for Cloud Data Lakes Slide 12 DBCC 2021 - FLiP Stack for Cloud Data Lakes Slide 13 DBCC 2021 - FLiP Stack for Cloud Data Lakes Slide 14 DBCC 2021 - FLiP Stack for Cloud Data Lakes Slide 15 DBCC 2021 - FLiP Stack for Cloud Data Lakes Slide 16 DBCC 2021 - FLiP Stack for Cloud Data Lakes Slide 17 DBCC 2021 - FLiP Stack for Cloud Data Lakes Slide 18 DBCC 2021 - FLiP Stack for Cloud Data Lakes Slide 19 DBCC 2021 - FLiP Stack for Cloud Data Lakes Slide 20 DBCC 2021 - FLiP Stack for Cloud Data Lakes Slide 21 DBCC 2021 - FLiP Stack for Cloud Data Lakes Slide 22 DBCC 2021 - FLiP Stack for Cloud Data Lakes Slide 23 DBCC 2021 - FLiP Stack for Cloud Data Lakes Slide 24 DBCC 2021 - FLiP Stack for Cloud Data Lakes Slide 25 DBCC 2021 - FLiP Stack for Cloud Data Lakes Slide 26 DBCC 2021 - FLiP Stack for Cloud Data Lakes Slide 27 DBCC 2021 - FLiP Stack for Cloud Data Lakes Slide 28 DBCC 2021 - FLiP Stack for Cloud Data Lakes Slide 29 DBCC 2021 - FLiP Stack for Cloud Data Lakes Slide 30 DBCC 2021 - FLiP Stack for Cloud Data Lakes Slide 31 DBCC 2021 - FLiP Stack for Cloud Data Lakes Slide 32 DBCC 2021 - FLiP Stack for Cloud Data Lakes Slide 33 DBCC 2021 - FLiP Stack for Cloud Data Lakes Slide 34 DBCC 2021 - FLiP Stack for Cloud Data Lakes Slide 35 DBCC 2021 - FLiP Stack for Cloud Data Lakes Slide 36
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

0 Likes

Share

Download to read offline

DBCC 2021 - FLiP Stack for Cloud Data Lakes

Download to read offline

DBCC 2021 - FLiP Stack for Cloud Data Lakes
With Apache Pulsar, Apache NiFi, Apache Flink. The FLiP(N) Stack for Event processing and IoT. With StreamNative Cloud.

DBCC International – Friday 15.10.2021
Powered by Apache Pulsar, StreamNative provides a cloud-native, real-time messaging and streaming platform to support multi-cloud and hybrid cloud strategies.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all
  • Be the first to like this

DBCC 2021 - FLiP Stack for Cloud Data Lakes

  1. 1. FLiP Stack for Cloud Data Lakes Timothy Spann DBCC International – Friday 15.10.2021
  2. 2. streamnative.io Tim Spann, Developer Advocate DZone Zone Leader and Big Data MVB Data DJay
  3. 3. streamnative.io Founded by the original developers of Apache Pulsar and Apache BookKeeper, StreamNative builds a cloud-native event streaming platform that enables enterprises to easily access data as real-time event streams.
  4. 4. ● Apache Flink ● Apache Pulsar ● StreamNative's Flink Connector for Pulsar ● Apache NiFi ● Apache +++ FLiP(N) Stack
  5. 5. Apache Pulsar
  6. 6. Apache is an open source, cloud-native distributed messaging and streaming platform.
  7. 7. What are the Benefits of Pulsar? Data Durability Scalability Geo-Replication Multi-Tenancy Unified Messaging Model
  8. 8. Apache Pulsar
  9. 9. A Unified Messaging Platform Message Queuing Data Streaming
  10. 10. Pulsar is built for easy scale-out. *Illustrations by Jack Vanlightly
  11. 11. Apache Pulsar Overview Enable Geo-Replicated Messaging ● Pub-Sub ● Geo-Replication ● Pulsar Functions ● Horizontal Scalability ● Multi-tenancy ● Tiered Persistent Storage ● Pulsar Connectors ● REST API ● CLI ● Many clients available ● Four Different Subscription Types ● Multi-Protocol Support ○ MQTT ○ AMQP ○ JMS ○ Kafka ○ ...
  12. 12. What is the Pulsar Ecosystem? ● Functions and Connectors ○ Functions: Lightweight stream processing ○ Connectors: Part of “Pulsar IO”, includes “Source” and “Sink” APIs ■ Files, Databases, Data tools, Cloud Services, etc ● Protocol Handlers ○ Allows Pulsar to handle additional protocols by an extendable API running in the broker ■ AoP (AMQP), KoP (Kafka), MoP (MQTT)
  13. 13. What is the Pulsar Ecosystem? (cont’d) ● Processing Engines ○ Supports modern processing engines ■ Flink and Spark, as well as Pulsar SQL (Presto/Trino) ● Offloaders ○ Allows data to be offloaded to cloud storage and used with existing Pulsar APIs ■ S3, GCP Cloud Storage, HDFS, File (NFS), Azure Blob Storage (in Pulsar 2.7.0)
  14. 14. Pulsar Functions Provides a simple API to: ● Receive a message (consume) ● Process the message using your own code ● Send a message (produce) Takes care of the boilerplate code so there is no need to create producers and consumers.
  15. 15. Moving Data In and Out of Pulsar IO/Connectors are a simple way to integrate with external systems and move data in and out of Pulsar. ● Built on top of Pulsar Functions ● Built-in connectors - hub.streamnative.io Source Sink
  16. 16. Pulsar SQL Presto/Trino workers can read segments directly from bookies (or offloaded storage) in parallel. Bookie 1 Segment 1 Producer Consumer Broker 1 Topic1-Part1 Broker 2 Topic1-Part2 Broker 3 Topic1-Part3 Segment 2 Segment 3 Segment 4 Segment X Segment 1 Segment 1 Segment 1 Segment 3 Segment 3 Segment 3 Segment 2 Segment 2 Segment 2 Segment 4 Segment 4 Segment 4 Segment X Segment X Segment X Bookie 2 Bookie 3 Query Coordinator ... ... SQL Worker SQL Worker SQL Worker SQL Worker Query Topic Metadata
  17. 17. Query Your Topics with Pulsar SQL (Trino)
  18. 18. • Guaranteed delivery • Data buffering - Backpressure - Pressure release • Prioritized queuing • Flow specific QoS - Latency vs. throughput - Loss tolerance • Data provenance • Supports push and pull models • Hundreds of processors • Visual command and control • Over a 300 sources • Flow templates • Pluggable/multi-role security • Designed for extension • Clustering • Version Control Why Apache NiFi?
  19. 19. ● Unified computing engine ● Batch processing is a special case of stream processing ● Stateful processing ● Massive Scalability ● Flink SQL for queries, inserts against Pulsar Topics ● Streaming Analytics ● Continuous SQL ● Continuous ETL ● Complex Event Processing ● Standard SQL Powered by Apache Calcite Why Apache Flink?
  20. 20. https://flink.apache.org/2019/05/03/pulsar-flink.html https://github.com/streamnative/pulsar-flink https://streamnative.io/en/blog/release/2021-04-20-flink-sql-on-st reamnative-cloud Flink + Pulsar
  21. 21. StreamNative Cloud
  22. 22. StreamNative Hub StreamNative Cloud Unified Batch and Stream COMPUTING Batch (Batch + Stream) Unified Batch and Stream STORAGE Offload (Queuing + Streaming) Apache Flink - Apache Pulsar - Apache NiFi <-> Devices <-> Cloud Data Lake Tiered Storage Pulsar --- KoP --- MoP --- Websocket --- HTTP Pulsar Sink Pulsar Sink Streaming Edge Gateway Protocols End-to-End Streaming Edge App
  23. 23. Powered by Apache Pulsar, StreamNative provides a cloud-native, real-time messaging and streaming platform to support multi-cloud and hybrid cloud strategies. Built for Containers Cloud Native Flink SQL StreamNative Cloud
  24. 24. Demo
  25. 25. Demo Walkthrough {"entriesAddedCounter":1,"numberOfEntries":1,"totalSize":651,"currentLedgerEntri es":1,"currentLedgerSize":651,"lastLedgerCreatedTimestamp":"2021-09-13T16:13: 06.6-04:00","waitingCursorsCount":0,"pendingAddEntriesCount":0,"lastConfirm edEntry":"7076:0","state":"LedgerOpened","ledgers":[{"ledgerId":7076,"entries":0," size":0,"offloaded":false,"underReplicated":false}],"cursors":{},"schemaLedgers":[], "compactedLedger":{"ledgerId":-1,"entries":-1,"size":-1,"offloaded":false,"underRe plicated":false}}
  26. 26. Real-Time Cloud Streaming Pipeline TRANSCOM Traffic Sensors Aggregates Status SQL Analytics APIs REST
  27. 27. Wrap-Up
  28. 28. ● https://www.datainmotion.dev/2020/04/building-search-indexes-with-apache.html ● https://github.com/tspannhw/nifi-solr-example ● https://github.com/streamnative/pulsar-flink ● https://www.linkedin.com/pulse/2021-schedule-tim-spann/ ● https://github.com/tspannhw/SpeakerProfile/blob/main/2021/talks/20210729_HailHydr ate!FromStreamtoLake_TimSpann.pdf ● https://streamnative.io/en/blog/release/2021-04-20-flink-sql-on-streamnative-cloud ● https://docs.streamnative.io/cloud/stable/compute/flink-sql Deeper Content @PaasDev https://www.pulsardeveloper.com/ timothyspann
  29. 29. Interested In Learning More? Flink SQL Cookbook The Github Source for Flink SQL Demo The GitHub Source for Demo Manning's Apache Pulsar in Action O’Reilly Book [10/21] Trino Summit Resources Free eBooks Upcoming Events
  30. 30. 30 Pulsar Summit Asia November 20-21, 2021 Contact us at partners@pulsar-summit.org to become a sponsor or partner
  31. 31. streamnative.io Questions
  32. 32. Let’s Keep in Touch! Speaker Name Speaker title @PassDev https://www.linkedin.com/in/timothyspann https://github.com/tspannhw
  33. 33. Thank you! Please submit your feedback here: https://bit.ly/dbcc2021-feedback

DBCC 2021 - FLiP Stack for Cloud Data Lakes With Apache Pulsar, Apache NiFi, Apache Flink. The FLiP(N) Stack for Event processing and IoT. With StreamNative Cloud. DBCC International – Friday 15.10.2021 Powered by Apache Pulsar, StreamNative provides a cloud-native, real-time messaging and streaming platform to support multi-cloud and hybrid cloud strategies.

Views

Total views

160

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

0

Shares

0

Comments

0

Likes

0

×