What's the purpose of Kafka's key/value pair-based messaging?

Apache KafkaMessage QueueMessagingKey ValueMessagebroker

Apache Kafka Problem Overview


All of the examples of Kafka | producers show the ProducerRecord's key/value pair as not only being the same type (all examples show <String,String>), but the same value. For example:

producer.send(new ProducerRecord<String, String>("someTopic", Integer.toString(i), Integer.toString(i)));

But in the Kafka docs, I can't seem to find where the key/value concept (and its underlying purpose/utility) is explained. In traditional messaging (ActiveMQ, RabbitMQ, etc.) I've always fired a message at a particular topic/queue/exchange. But Kafka is the first broker that seems to require key/value pairs instead of just a regulare 'ole string message.

So I ask: What is the purpose/usefulness of requiring producers to send KV pairs?

Apache Kafka Solutions


Solution 1 - Apache Kafka

Kafka uses the abstraction of a distributed log that consists of partitions. Splitting a log into partitions allows to scale-out the system.

Keys are used to determine the partition within a log to which a message get's appended to. While the value is the actual payload of the message. The examples are actually not very "good" with this regard; usually you would have a complex type as value (like a tuple-type or a JSON or similar) and you would extract one field as key.

See: http://kafka.apache.org/intro#intro_topics and http://kafka.apache.org/intro#intro_producers

In general the key and/or value can be null, too. If the key is null a random partition will the selected. If the value is null it can have special "delete" semantics in case you enable log-compaction instead of log-retention policy for a topic (http://kafka.apache.org/documentation#compaction).

Solution 2 - Apache Kafka

Late addition... Specifying the key so that all messages on the same key go to the same partition is very important for proper ordering of message processing if you will have multiple consumers in a consumer group on a topic.

Without a key, two messages on the same key could go to different partitions and be processed by different consumers in the group out of order.

Solution 3 - Apache Kafka

Another interesting use case

We could use the key attribute in Kafka topics for sending user_ids and then can plug in a consumer to fetch streaming events (events stored in value attributes). This could allow you to process any max-history of user event sequences for creating features in your machine learning models.

I still have to find out if this is possible or not. Will keep updating my answer with further details.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionsmeebView Question on Stackoverflow
Solution 1 - Apache KafkaMatthias J. SaxView Answer on Stackoverflow
Solution 2 - Apache KafkaMikeKView Answer on Stackoverflow
Solution 3 - Apache KafkaUtkarsh GuptaView Answer on Stackoverflow