Apache Kafka is an open source, distributed streaming platform that allows for development of real-time event-driven applications.
Main characteristics of Kafka are as follows:
1. Produce – Consume
a.
Specifically, it allows developers to make
application that continuously produce
and consumes streams of data records.
2. Kafka is distributed.
a.
It runs as a cluster that can span multiple
servers or even multiple data centers.
3. Fast
a. The
records that are produced, are
replicated and partitioned in such a way that allows for high volume of
users to use the application simultaneously, without any perceptible lag in performance.
4. High accuracy
a.
Maintains high level of accuracy within data
records
5. In order
a.
Maintains the order of their occurance.
6. Resilient and fault tolerant
a.
It’s replicated
USE CASE 1.
Decoupling
1. User
checkout
2. Then
order gets shipped
Here, we need to write the
integration, considering the shape of data, way the data is transported and
format of the data. Not a big deal when there is only one integration.
1. User
checkout
2. Then
a.
order gets shipped
b.
add automated email receipt when checkout
happens
c.
update to an inventory, when checkout happens
As frontend and backend services
get added and application grows more and more integrations need to get built
and it can get very messy.
Also, each team is dependent on
each other before they can make any changes and development is slow.
Solution: Decoupling system dependencies
We’ll remove all the dependencies and instead we do is checkout will stream events.
Every time a checkout happens, that will get streamed and
checkout is not concerned with who’s listening to that stream. Its’
broadcasting those events.
Then other services – email, shipment, inventory, they
subscribe to that stream, and choose to listen to that one, and then they get
the information they need and it triggers them to act accordingly.
USE CASE 2. Messaging
Our application use messaging to move checkout experience
along.
Eg. Let we have 2 APIs and for them we’ll have Kafka Topic
and Kafka Template created, with their id’s.
We will refer them with their id’s only.
1.
search hotel
2.
search price
USE CASE 3. Location
Tracking
Eg. Ride share
service,
·
driver in ride share service using the
application would turn on thein app and maybe every sec, new event would get
admitted with their current location
·
At small scale to let an individual user know
how close thein particular ride is
·
At large scale,
o
to calculate surge pricing
o
to show user a map before they choose which ride
they want
USE CASE 4. Data
Gathering
·
to collect analytics, to optimize your website
·
can be used in more complex way, with music
streaming service, i.e. where one user, every song they listen to can be stream
of records, and your application could use that stream to give real-time
recommendations to that user.
·
It can take data records from all the users,
aggregate them and then come up with a list of an artist’s top songs
KAFKA ARCHITECTURE
4 core APIs
1. Producer API
a. Allows
your application to produce, to make, these streams of data
b. So,
it creates the records, and produces them to topics.
Topic: is an
ordered list of events. These can be saved for a minute if it’s going to be
consumed immediately or can be saved for hours, days, or even forever. As long
has you have enough storage that the topics are persisted to physical storage.
2. Consumer API
a.
Subscribes to one or more topics and listens and ingests that data.
b.
It can subscribe to topics in real time or it
can consume those old data records that are saved to the topic.
3.
Stream
API
a.
Consumers are able to consume the data from
Kafka topic, in original format as it was produced by the producer. But to
transform that data. We need STREAM API
b.
It is beneficial for both producer and consumer
API
c.
So, stream api will consume from topic/ topics,
and then it’ll analyze, aggregate or otherwise transform the data in real time,
and then produce the resulting streams to a topic – either the same topics or
to new topics.
4. Connector API
a. Enables
developers to write connectors,
which are reusable producers and consumers.
b. So,
in Kafka cluster many developers might need to integrate the same type of data
source, like MongoDB for example
c. So
not every single developer should have to write that integration, what
connector API allows is for that integration to get writer once, the code is
there, and then all developers need to do is configure it, in order to get that
data source into their cluster.
No comments:
Post a Comment