Kafka Producer Configuration

Kim Quy
2 min readJul 4, 2019

--

This topic I only want to focus on the importance configuration. You can read more at https://kafka.apache.org/documentation/#producerconfigs to get more information.

ACK

ACK = 0

  • No response is requested (Most speed, Worst integrity-Data Lost).
  • You won’t know the broker down, so the data will be lost.
  • This configuration only for the data which still be okay when It lose the messages. Ex: log, metrics, GPS tracking.

ACK = 1 (default):

  • Leader confirms received the message, but replication is not a guarantee (Low speed, Limited data lost).
  • If an ACK is not received, the producer may retry.
  • If the leader partition down and the new message is not syn with the replication, you will lose that message.

ACK = all:

  • Leader and Replicas confirm received the message. (Worst speed, High integrity-No data lost).
  • With this configuration, you have to take care of the network between leader and replicas and the number of replicas to get a higher speed.
  • This configuration is good at loading the data to other systems like Hadoop or the Data Warehouse System to analytic or processing or reporting.
  • If you use this configuration, you must take care of the min.insync.replicas config.

min.insync.replicas

  • You must be config this parameter when ACK=all.
  • This config can be set at the topic level by overriding the config in the broker.
  • The highest value of this config is equal with the number of brokers (leader + replicas).
  • mic.insync.replicas=3 implies at least 3 brokers that are including leader must respond that they have the data. Otherwise, you’ll get an error message.
  • Ex: If you have min.insync.replicas=3, ack=all, replication.factor=3. If the only one broker going down, you will receive an exception on sending the message. The exception for this case is “Not Enough Replicas

retries

  • The name is the full meaning for this config. If an exception happens, the producer will retry follow the number in this config.
  • Default is 0.
  • The highest retry value is Integer.MAX_VALUE. The producer is going to retry indefinitely until it succeeds.
  • when the retries are higher than 0 and a batch has failed to be sent, that message will be sent out of order. If you want it not happens, you can set max.in.flight.request.per.connection = 1.

max.in.flight.requests.per.connection

  • This setting basically controls how many requests can be made in parallel to any partition.
  • Default is 5.
  • If this setting value is 1: You will have a bad throughput but all message will be ensure ordering.
  • If this setting value is > 1: Give better throughput. May cause out of order delivery when retry occurs. Excessive pipelining.

--

--

No responses yet