Aside from holding the publish-subscribe messaging system functionality, Apache Kafka is gaining outstanding momentum as a distributed event streaming platform. It is being leveraged for high-performance data pipelines, streaming analytics, data integration, etc. Additionally, standing as a backbone for IoT data platform to handle massive amounts of heterogeneous data ingestion.

Despite surfacing tremendous usefulness related to data transportation, numerous Kafka connectors for import/export of data from various data systems, still, Kafka client library is an obstacle for any programming language other than Java to serve as producer/consumer of messages to Kafka topic.

Confluent’s Kafka REST Proxy and Its Importance

Apache Kafka offers its functionality through a well-defined…


Kafka Connect assumes a significant part for streaming data between Apache Kafka and other data systems. As a tool, it holds the responsibility of a scalable and reliable way to move the data in and out of Apache Kafka. Importing data from the Database set to Apache Kafka is surely perhaps the most well-known use instance of JDBC Connector (Source & Sink) that belongs to Kafka Connect.

This article aims to elaborate the steps and procedure to integrate the Confluent’s JDBC Kafka connector with an operational multi-broker Apache Kafka Cluster for ingesting data from the functioning MySQL Database to Kafka…


There are multiple ways to ingest data streams into Kafka topic and subsequently deliver to various types of consumers who are hooked to the topic. The stream of data that collects continuously from the topic by consumers, passes through multiple data pipelines and then stream processing engines like Apache Spark, Apache Flink, Amazon Kinesis, etc and eventually landed upon the real-time applications to deliver a final data-driven decision. From finances, manufacturing, insurance, telecom, healthcare, commerce, and more, real-time applications are becoming the best solution for organizations to take immediate action, gain insights from the updated data. …


This article aims to explain the steps to coupling Confluent Schema Registry with existed/operational multi-broker Apache Kafka cluster(Local deployment). The Confluent is an integrated platform bundle with Apache Kafka and multiple different components starting from ksqlDB for stream processing, numerous connectors (Database, File, AWS, Azure, Google, etc), Schema Registry, Control Center, etc. Please click here to know more about the Confluent Platform.

In short, Schema Registry preserves a versioned history of all schemas, provides multiple compatibility settings, allows the evolution of schemas, etc. It supports Avro, JSON Schema, and Protobuf schemas. …


Needless to say, Apache Kafka delivers messages to both real-time and batch consumers without performance degradation and in addition to that gaining enormous momentum as a foremost component for data streaming pipelines too.

Credit card fraud detection, predictive maintenance, or real-time analytics, building streaming IoT platform, etc are the example of real-time use cases. To handle massive amounts of data ingestion, Apache Kafka is the cornerstone of a robust IoT data platform. A schema defines the structure of the data format and schema evolution is a feature that allows updating the schema used to write new data while maintaining backward…


This short article aims to highlight the list of commands to manage a running multi-broker multi-topic Kafka cluster utilizing built-in scripts. These commands will be helpful/beneficial when the cluster is not integrated or hooked up with any third party administrative tool having GUI facilities to administer or control on the fly. Of course, most of them are not free to use. Can refer here to set up a multi-broker Kafka cluster.

Before expound the steps, should loosen up my thankfulness to all the thriving gatherings beginning from cleaning/sterile social event to Nurses, Doctors and other who are dependably battling to…


Over the most recent couple of years, there has been a huge development in the appropriation of Apache Kafka. Kafka is a scalable pub/sub system and in a nutshell, is designed as a distributed multi-subscription system where data persists to disks. On top of it as a highlight, Kafka delivers messages to both real-time and batch consumers at the same time without performance degradation. Current users of Kafka incorporate Uber, Twitter, Netflix, LinkedIn, Yahoo, Cisco, Goldman Sachs, and so forth. Can refer here to know about Apache Kafka

This article aims to explain the steps of how we can install…


Despite the fact that Apache Zookeeper’s functionalities are not legitimately noticeable to end-client however it remains as the spine for hyped components like Hadoop to oversee automatic failover, Kafka’s broker coordination, Solr, HBase, Apache S4 and some more. Besides, Zookeeper is being extensively utilized for many free software projects like AdroitLogic UltraESB, Akka, GoldenOrb (massive-scale Graph analysis), Neo4j(Graph Database), etc. On top of it, many esteemed companies like Yahoo, Rackspace, Box, Midokura, etc. are using it for their operational purpose.

Before expounding the steps to set up multi-node Zookeeper servers/ cluster, should loosen up my thankfulness to all the thriving…


Preceding pen down the article, might want to stretch out appreciation to all the wellbeing teams beginning from cleaning/sterile group to Nurses, Doctors and other who are consistently battling to spare the mankind from continuous Covid-19 pandemic over the globe.

The fundamental target of this article is to feature how we can load or import data into Hive tables without explicitly execute the “load” command. Basically, with this approach Data scientists can query or even visualize directly on various data visualization tool for quick investigation in a scenario when raw data is continuously ingested to HDFS based Data lake from…


This miniature article explains how to resolve the error Error: Could not find or load main class org.apache.zookeeper.server.quorum.QuorumPeerMain” when we start the Apache Zookeeper ( apache-zookeeper-3.5.6.tar.gz) installed on a multi-node cluster. Distributed systems/applications leverage the service offers by Apache Zookeeper to manage their synchronization, configuration service as well as naming registry. Apache Zookeeper is a primary backbone for Hadoop, Kafka, HBase, Tableau, etc. The above mention error displays on the console when executing the script zkServer.sh
with the input parameter “start” available inside the bin directory. This happens due to the unavailability of lib directory that holds the zookeeper-3.5.6.jar file…

Gautam Goswami

Enthusiastic about learning and sharing knowledge on Big Data & related headways. Previously filled in as Sr. Technical Architect. Crafted https://dataview.in

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store