π¨ Apache Kafka - Event Streaming Platform
Apache Kafka is a distributed event streaming platform that enables real-time data pipelines and applications. Itβs the backbone of modern data infrastructure for handling high-throughput, low-latency event streaming.
Core Concepts
What is Apache Kafka?
Kafka is a publish-subscribe messaging system that allows:
- Real-time data: Process and react to data in real-time
- Durability: Store streams of data safely across distributed clusters
- Scalability: Handle massive volumes of data with horizontal scaling
- Decoupling: Producers and consumers work independently
Key Terminology
Topic: A category of events (similar to a table)
Partition: Parallel unit within a topic for scalability
Broker: Individual server in a Kafka cluster
Producer: Application that publishes events
Consumer: Application that subscribes to topics
Consumer Group: Set of consumers sharing a topic
Getting Started
Installation
# Download Kafka
wget https://archive.apache.org/dist/kafka/3.6.0/kafka_2.13-3.6.0.tgz
tar -xzf kafka_2.13-3.6.0.tgz
cd kafka_2.13-3.6.0
# Start ZooKeeper
bin/zookeeper-server-start.sh config/zookeeper.properties &
# Start Kafka broker
bin/kafka-server-start.sh config/server.properties &
Create Topic
bin/kafka-topics.sh --create \
--topic my-topic \
--bootstrap-server localhost:9092 \
--partitions 3 \
--replication-factor 1
Producer Example
from kafka import KafkaProducer
import json
import time
# Create producer
producer = KafkaProducer(
bootstrap_servers=['localhost:9092'],
value_serializer=lambda v: json.dumps(v).encode('utf-8')
)
# Send events
for i in range(100):
event = {
'id': i,
'timestamp': time.time(),
'value': f'Event {i}'
}
producer.send('my-topic', value=event)
producer.flush()
producer.close()
Consumer Example
from kafka import KafkaConsumer
import json
# Create consumer
consumer = KafkaConsumer(
'my-topic',
bootstrap_servers=['localhost:9092'],
auto_offset_reset='earliest',
value_deserializer=lambda m: json.loads(m.decode('utf-8')),
group_id='my-group'
)
# Consume events
for message in consumer:
print(f"Received: {message.value}")
Advanced Topics
Stream Processing with Kafka Streams
from kafka import KafkaProducer, KafkaConsumer
from kafka.errors import KafkaError
def stream_processing():
# Read from input topic
consumer = KafkaConsumer('input-topic')
producer = KafkaProducer()
for message in consumer:
# Process event
processed = process_event(message.value)
# Send to output topic
producer.send('output-topic', value=processed)
Best Practices
- Partition Key: Use consistent keys to ensure ordering
- Replication: Set replication-factor >= 2 for production
- Consumer Groups: Use consumer groups for parallel processing
- Error Handling: Implement retry logic and dead-letter queues
- Monitoring: Track lag, throughput, and error rates
Last updated: April 12, 2026