Distributed data stream processing with Apache Kafka



Training is conducted by Trayan Iliev, Oracle® certified Java programmer with 20+ years experience, Python & Java ML/AI developer, fullstack Spring, TypeScript, Angular, React and Vue, 15+ years experience as IT trainer for international software companies. Regular speaker at developer conferences – Voxxed Days, Java2Days, jPrime, jProfessionals, BGOUG, BGJUG, DEV.BG on topics as SOA & REST, Spring /WebFlux, Machine Learning, Robotics & Disitributed high-performance computing.

Duration: 48 hours

Target audience: Java developers

Description: This comprehensive training equips you with the expertise to design, implement, and operate robust, scalable, and efficient stream processing solutions using Apache Kafka

Key takeaways:

  • Understand the fundamentals of distributed systems and the role of stream processing in modern data architectures.
  • Gain in-depth knowledge of Apache Kafka’s architecture, including brokers, topics, partitions, producers, and consumers.
  • Learn to set up, configure, and manage Kafka clusters for high availability, scalability, and fault tolerance.
  • Master Kafka Producer and Consumer APIs and their customization for various real-time data streaming scenarios.
  • Develop stream processing applications using the Kafka Streams API, including transformations, aggregations, joins, and windowing.
  • Explore Kafka ecosystem components such as Kafka Connect for data integration.
  • Implement security best practices in Kafka clusters, including authentication, authorization, and encryption.
  • Monitor and tune Kafka clusters and stream processing applications for optimal performance and reliability.
  • Apply best practices for deploying, scaling, and troubleshooting Kafka in production environments.
  • Build real-world projects involving end-to-end data pipelines, real-time analytics, and event-driven architectures using Kafka and its ecosystem.
  • Acquire hands-on experience through practical labs and projects to consolidate theoretical knowledge and develop job-ready skills in distributed stream processing with Apache Kafka.

Training program:

Module 1: Kafka Core Concepts and Java Setup (6 hours)

  • Overview of brokers, topics, partitions, and their role in Kafka’s distributed architecture

  • Setting up Kafka cluster and Java development environment

  • Java Kafka client basics: Producer and Consumer APIs

  • Writing simple Java producers and consumers with asynchronous send and poll methods

  • Hands-on: Build and run a basic Java producer and consumer application with Kafka

Module 2: Advanced Producer and Consumer Features in Java (10 hours)

  • Detailed Java API usage for KafkaProducer and KafkaConsumer

  • Implementing asynchronous data production and consumption with callbacks

  • Using interceptors in Java producers and consumers for monitoring and modifying records

  • Implementing exactly-once semantics using Kafka transactions and idempotent producers

  • Handling failures and retries in asynchronous processing

  • Handling compacted topics and understanding their use cases

  • Hands-on: Code exercises on transactional producers, interceptors, and compacted topic usage

  • Hands-on: Develop robust Java applications ensuring exactly-once event processing guarantees

Module 3: Kafka Admin Client and Programmatic Administration (6 hours)

  • Introduction to Kafka AdminClient API for managing topics, partitions, and configurations programmatically

  • Creating, deleting, and modifying topics using Java AdminClient

  • Managing ACLs, quotas, and broker configurations via Java APIs

  • Hands-on: Write Java programs to automate Kafka cluster administration tasks

Module 4: Performance Monitoring and Visualization (6 hours)

  • Kafka metrics overview: producer, consumer, broker, and stream metrics

  • Configuring Prometheus to scrape Kafka metrics via JMX exporter

  • Setting up Grafana dashboards for real-time Kafka monitoring

  • Writing Java code to expose custom metrics and integrate with monitoring tools

  • Hands-on: Build a monitoring stack with Prometheus and Grafana for Kafka clusters and applications

Module 5: Kafka Streams – KStream and KTable (8 hours)

  • Introduction to Kafka Streams API and its programming model

  • Differences between KStream and KTable abstractions

  • Writing Java stream processing applications: filtering, mapping, aggregations, joins, and windowing

  • Managing state stores and fault tolerance in Kafka Streams

  • Exactly-once processing guarantees with Kafka Streams

  • Hands-on: Build and deploy Java Kafka Streams applications using KStream and KTable APIs

Module 6: Practical Problem Solving and Project Work (8 hours)

  • Real-world problem scenarios involving brokers, partitions, and topic design

  • Debugging and troubleshooting Java Kafka clients and stream applications

  • Project 1: End-to-end Kafka pipeline with transactional producers and consumers

  • Project 2: Stream processing with KStream and KTable including joins and aggregations

  • Code reviews, performance tuning, and best practices discussion

Module 7: Review, Assessment, and Next Steps (4 hours)

  • Recap of key Java APIs and Kafka concepts

  • Hands-on coding assessment and problem-solving exercises

  • Q&A and troubleshooting common Kafka Java client issues

  • Guidance on further learning and Kafka ecosystem tools

This program ensures participants gain deep hands-on expertise in Kafka’s Java client APIs, advanced Kafka features like transactions, interceptors, compacted topics, and Kafka Streams, combined with practical skills in monitoring and administration using Prometheus and Grafana.

For more information and registration, please send us an e-mail at: office@iproduct.org