Building a Replicated Logging System with Apache Kafka

1. Building a Replicated Logging System with Apache Kafka Guozhang Wang, Joel Koshy, Sriram Subramanian, Kartik Paramasivam Mammad Zadeh, Neha Narkhede, Jun Rao, Jay Kreps, Joe Stein

2. We All Love Logs!

6. Apache Kafka • A distributed messaging system ..that store messages as a log!

7. Example: LinkedIn back in 2010 Point-to-Point Pipelines What We Want: A Centralized Data Pipeline

8. Log-centric Data Flow • Logical Ordering • Persistent Buffering • “Source-of-Truth”

9. Store Messages as a Log 4 5 5 7 8 9 10 11 12... Producer Write Consumer1 Reads (offset 7) Consumer2 Reads (offset 10) Messages 3

10. Partition the Log across Machines Topic 1 Topic 2 Partitions Producers Producers Consumers Consumers Brokers

11. Apache Kafka Example: Kafka at LinkedIn

12. “Source-of-Truth” should not be lost even when..

13. Replicas and Layout Logs Broker-1 topic1-part1 topic1-part3 topic1-part2 Logs topic1-part2 topic1-part1 topic1-part3 Logs topic1-part3 topic1-part2 topic1-part1 Broker-2 Broker-3

14. Consensus for Log Replication Logs Broker-1 Logs Logs Broker-2 Broker-3 Write Consensus Protocol Consensus Protocol

15. Key Idea Separate membership configuration from data replication

16. Primary-backup Replication Logs Broker-1 * Logs Logs Broker-2 Broker-3 Write

17. Conventional Quorum Commits Logs Broker-1 * Logs Logs Broker-2 Broker-3 Write

18. Conventional Quorum Commits Logs Broker-1 * Logs Logs Broker-2 Broker-3 Write

19. Conventional Quorum Commits Logs Broker-1 * Logs Logs Broker-2 Broker-3

20. Conventional Quorum Commits Logs Broker-1 * Logs Logs Broker-2 Broker-3

21. • Leader maintains in-sync-replicas (ISR) • Failed / slow follower => drop from ISR • Caught-up follower => re-join ISR • Producer specifies required ACK based on ISR Configurable ISR Commits

22. Example: ACK with all ISRs Logs Broker-1 * Logs Logs Broker-2 Broker-3 Write (ack=“all”) ISR {1, 2, 3}

29. Example: ACK with Leader-only Logs Broker-1 * Logs Logs Broker-2 Broker-3 Write (ack=“leader”) ISR {1, 2, 3}

35. Example: Slow Follower Logs Broker-1 * Logs Logs Broker-2 Broker-3 Write (ack=“all”) ISR {1, 2, 3}

40. Example: Slow Follower Logs Broker-1 * Logs Logs Broker-2 Broker-3 Write (ack=“all”) ISR {1, 2}

41. Example: Slow Follower Logs Broker-1 * Logs Logs Broker-2 Broker-3 Write (ack=“all”) ISR {1, 2}

42. Configurable ISR Commits ACK mode Latency On Failures “no" no network delay some data loss “leader" 1 network roundtrip a few data loss “all" ~2 network roundtrips no data loss

43. • Use an embedded controller • Detect broker failure via ZooKeeper • Leader failure => elect new leader from ISR • Leader and ISR persisted in Zookeeper • For Controller fail-over Membership Management

44. Example: Broker Failure Logs Broker-1 * Logs Logs Broker-2 Broker-3 ISR {1, 2}

45. Example: Broker Failure Logs Broker-1 Logs Logs Broker-2 Broker-3

48. Example: Broker Failure Logs Broker-1 Logs Logs Broker-2 Broker-3 ISR {2}

51. Example: Broker Failure Logs Broker-1 Logs Logs Broker-2 * Broker-3 ISR {2}

54. Example: Broker Failure Logs Broker-1 Logs Logs Broker-2 * Broker-3 ISR {2, 3}

55. • Overview: Logs and Kafka • Log Replication in Kafka • Kafka Usage at LinkedIn • Conclusion Agenda

56. Change Log Replication

58. Example: Espresso • A distributed document store • Primary online data serving platform at LI • Member profile, homepage, InMail, etc [SIGMOD 2013]

59. Old Espresso Replication Data Center-1 Storage Node Storage NodeMySQL Replication MySQL MySQL Search Index Hadoop … …Databus Cross-DC Replicator Data Center-1 Storage Node Storage NodeMySQL Replication MySQL MySQL Search Index Hadoop … DatabusCross-DC Replicator

60. Problems with MySQL Replication Master Storage Node P1 Slave Storage Node P2 P3 P4 P5 P6 P1 P2 P3 P4 P5 P6 Binary Log Shipping

61. Replicate Logs with Kafka Storage Node Kafka Logs P1 Storage Node P2 P3 P4 P5 P6 P1 P2 P3 P4 P5 P6 Kafka Producer Kafka Consumer Kafka Consumer Kafka Producer

62. Key-based Log Compaction ... Partition Messages Segment-3 Segment-4 Segment-6 *

63. Key-based Log Compaction d: 3 f: 8 b: 0 c: null... Partition Messages c: 3 a: 5 a: 6 a: 5 f: 9 ... Segment-3 Segment-4 b: 2 d: 4a: 1

64. Key-based Log Compaction ... d: 3 f: 8 b: 0 c: null a: 5 f: 9 ... Segment-3 Segment-4 c: 3 a: 5 a: 6b: 2 d: 4a: 1 c: 3 a: 5 a: 6b: 2 d: 4a: 1 d: 3 f: 8 b: 0 a: 5 f: 9 New Segment Partition Messages

65. Key-based Log Compaction ... d: 3 f: 8 b: 0 c: null a: 5 f: 9 ... Segment-3 Segment-4 c: 3 a: 5 a: 6b: 2 d: 4a: 1 c: 3 a: 6 d: 3 f: 8 b: 0 c: null a: 5 f: 9 New Segment Partition Messages

66. Key-based Log Compaction ... d: 3 f: 8 b: 0 c: null a: 5 f: 9 ... Segment-3 Segment-4 c: 3 a: 5 a: 6b: 2 d: 4a: 1 d: 3 b: 0 a: 5 f: 9 New Segment Partition Messages

67. Key-based Log Compaction ... d: 3 f: 8 b: 0 c: null a: 5 f: 9 ... Segment-3 Segment-4 c: 3 a: 5 a: 6b: 2 d: 4a: 1 d: 3 b: 0 a: 5 f: 9 New Segment Partition Messages

68. New Espresso Replication Data Center-1 Storage Node Storage Node Storage Node Kafka Logs MySQL MySQL MySQL Data Center-n Storage Node Storage Node Storage Node Kafka Logs MySQL MySQL MySQL Kafka MirrorMaker Search Index Hadoop … … Search Index Hadoop … * In Progress

69. Stream Processing

71. • Data flow streaming on Kafka and YARN • Stateful processing • Re-processing • Failure Recovery Example: Samza [CIDR 2015]

72. Kafka Kafka Samza StateProces s Protoc ol StateProces s Protoc ol StateProces s Protoc ol Samza Processing

73. Kafka Kafka Samza StateProces s Protoc ol StateProces s Protoc ol StateProces s Protoc ol Samza Processing Kafka Changelog

74. Kafka Kafka Samza StateProces s Protoc ol StateProces s Protoc ol StateProces s Protoc ol Samza Processing Kafka Changlog

75. Kafka Kafka Samza StateProces s Protoc ol StateProces s Protoc ol StateProces s Protoc ol Samza Processing Kafka Changlog StateProces s Protoc ol

76. Take-aways • Log-centric data flow helps scaling your systems • Kafka: replicated log streams for real-time platforms

77. We are Hiring

78. Take-aways • Log-centric data flow helps scaling your systems • Kafka: replicated log streams for real-time platforms THANKS!

Building a Replicated Logging System with Apache Kafka

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (20)

Semelhante a Building a Replicated Logging System with Apache Kafka

Semelhante a Building a Replicated Logging System with Apache Kafka (20)

Mais de Guozhang Wang

Mais de Guozhang Wang (8)

Último

Último (20)

Building a Replicated Logging System with Apache Kafka

Notas do Editor