Apache Kafka

Apache Kafka is an open source, distributed streaming platform that allows for development of real-time event-driven applications.

Main characteristics of Kafka are as follows:

1. Produce – Consume

a. Specifically, it allows developers to make application that continuously produce and consumes streams of data records.

2. Kafka is distributed.

a. It runs as a cluster that can span multiple servers or even multiple data centers.

3. Fast

a. The records that are produced, are replicated and partitioned in such a way that allows for high volume of users to use the application simultaneously, without any perceptible lag in performance.

4. High accuracy

a. Maintains high level of accuracy within data records

5. In order

a. Maintains the order of their occurance.

6. Resilient and fault tolerant

a. It’s replicated

USE CASE 1. Decoupling

1. User checkout

2. Then order gets shipped

Here, we need to write the integration, considering the shape of data, way the data is transported and format of the data. Not a big deal when there is only one integration.

1. User checkout

2. Then

a. order gets shipped

b. add automated email receipt when checkout happens

c. update to an inventory, when checkout happens

As frontend and backend services get added and application grows more and more integrations need to get built and it can get very messy.

Also, each team is dependent on each other before they can make any changes and development is slow.

Solution: Decoupling system dependencies

We’ll remove all the dependencies and instead we do is checkout will stream events.

Every time a checkout happens, that will get streamed and checkout is not concerned with who’s listening to that stream. Its’ broadcasting those events.

Then other services – email, shipment, inventory, they subscribe to that stream, and choose to listen to that one, and then they get the information they need and it triggers them to act accordingly.

USE CASE 2. Messaging

Our application use messaging to move checkout experience along.

Eg. Let we have 2 APIs and for them we’ll have Kafka Topic and Kafka Template created, with their id’s.

We will refer them with their id’s only.

1. search hotel

2. search price

USE CASE 3. Location Tracking

Eg. Ride share service,

· driver in ride share service using the application would turn on thein app and maybe every sec, new event would get admitted with their current location

· At small scale to let an individual user know how close thein particular ride is

· At large scale,

o to calculate surge pricing

o to show user a map before they choose which ride they want

USE CASE 4. Data Gathering

· to collect analytics, to optimize your website

· can be used in more complex way, with music streaming service, i.e. where one user, every song they listen to can be stream of records, and your application could use that stream to give real-time recommendations to that user.

· It can take data records from all the users, aggregate them and then come up with a list of an artist’s top songs

KAFKA ARCHITECTURE

4 core APIs

1. Producer API

a. Allows your application to produce, to make, these streams of data

b. So, it creates the records, and produces them to topics.

Topic: is an ordered list of events. These can be saved for a minute if it’s going to be consumed immediately or can be saved for hours, days, or even forever. As long has you have enough storage that the topics are persisted to physical storage.

2. Consumer API

a. Subscribes to one or more topics and listens and ingests that data.

b. It can subscribe to topics in real time or it can consume those old data records that are saved to the topic.

3. Stream API

a. Consumers are able to consume the data from Kafka topic, in original format as it was produced by the producer. But to transform that data. We need STREAM API

b. It is beneficial for both producer and consumer API

c. So, stream api will consume from topic/ topics, and then it’ll analyze, aggregate or otherwise transform the data in real time, and then produce the resulting streams to a topic – either the same topics or to new topics.

4. Connector API

a. Enables developers to write connectors, which are reusable producers and consumers.

b. So, in Kafka cluster many developers might need to integrate the same type of data source, like MongoDB for example

c. So not every single developer should have to write that integration, what connector API allows is for that integration to get writer once, the code is there, and then all developers need to do is configure it, in order to get that data source into their cluster.

Technology - continuous learning process

Wednesday, March 9, 2022

Apache Kafka

No comments:

Post a Comment

Popular Posts

Most Featured Post

GitHub: Squash all commits in PR