This topic I only want to focus on the importance configuration. You can read more at https://kafka.apache.org/documentation/#producerconfigs to get more information.
ACK
ACK = 0
- No response is requested (Most speed, Worst integrity-Data Lost).
- You won’t know the broker down, so the data will be lost.
- This configuration only for the data which still be okay when It lose the messages. Ex: log, metrics, GPS tracking.
ACK = 1 (default):
- Leader confirms received the message, but replication is not a guarantee (Low speed, Limited data lost).
- If an ACK is not received, the producer may retry.
- If the leader partition down and the new message is not syn with the replication, you will lose that message.
ACK = all:
- Leader and Replicas confirm received the message. (Worst speed, High integrity-No data lost).
- With this configuration, you have to take care of the network between leader and replicas and the number of replicas to get a higher speed.
- This configuration is good at loading the data to other systems like Hadoop or the Data Warehouse System to analytic or processing or reporting.
- If you use this configuration, you must take care of the min.insync.replicas config.
min.insync.replicas
- You must be config this parameter when ACK=all.
- This config can be set at the topic level by overriding the config in the broker.
- The highest value of this config is equal with the number of brokers (leader + replicas).
- mic.insync.replicas=3 implies at least 3 brokers that are including leader must respond that they have the data. Otherwise, you’ll get an error message.
- Ex: If you have min.insync.replicas=3, ack=all, replication.factor=3. If the only one broker going down, you will receive an exception on sending the message. The exception for this case is “Not Enough Replicas”
retries
- The name is the full meaning for this config. If an exception happens, the producer will retry follow the number in this config.
- Default is 0.
- The highest retry value is Integer.MAX_VALUE. The producer is going to retry indefinitely until it succeeds.
- when the retries are higher than 0 and a batch has failed to be sent, that message will be sent out of order. If you want it not happens, you can set max.in.flight.request.per.connection = 1.
max.in.flight.requests.per.connection
- This setting basically controls how many requests can be made in parallel to any partition.
- Default is 5.
- If this setting value is 1: You will have a bad throughput but all message will be ensure ordering.
- If this setting value is > 1: Give better throughput. May cause out of order delivery when retry occurs. Excessive pipelining.