For other versions, see theVersioned plugin docs.
For questions about the plugin, open a topic in the Discuss forums. For bugs or feature requests, open an issue in Github.For the list of Elastic supported plugins, please consult the Elastic Support Matrix.
This input will read events from a Kafka topic.
This plugin uses Kafka Client 2.1.0. For broker compatibility, see the official Kafka compatibility reference. If the linked compatibility wiki is not up-to-date, please contact Kafka support/community to confirm compatibility.
If you require features not yet available in this plugin (including client version upgrades), please file an issue with details about what you need.
This input supports connecting to Kafka over:
By default security is disabled but can be turned on as needed.
The Logstash Kafka consumer handles group management and uses the default offset managementstrategy using Kafka topics.
Logstash instances by default form a single logical group to subscribe to Kafka topicsEach Logstash Kafka consumer can run multiple threads to increase read throughput. Alternatively,you could run multiple Logstash instances with the same group_id
to spread the load acrossphysical machines. Messages in a topic will be distributed to all Logstash instances withthe same group_id
.
Ideally you should have as many threads as the number of partitions for a perfect balance — more threads than partitions means that some threads will be idle
For more information see http://kafka.apache.org/documentation.html#theconsumer
Kafka consumer configuration: http://kafka.apache.org/documentation.html#consumerconfigs
The following metadata from Kafka broker are added under the [@metadata]
field:
[@metadata][kafka][topic]
: Original Kafka topic from where the message was consumed.[@metadata][kafka][consumer_group]
: Consumer group[@metadata][kafka][partition]
: Partition info for this message.[@metadata][kafka][offset]
: Original record offset for this message.[@metadata][kafka][key]
: Record key, if any.[@metadata][kafka][timestamp]
: Timestamp in the Record. Depending on your broker configuration, this can be either when the record was created (default) or when it was received by the broker. See more about property log.message.timestamp.type at https://kafka.apache.org/10/documentation.html#brokerconfigsPlease note that @metadata
fields are not part of any of your events at output time. If you need these information to beinserted into your original event, you’ll have to use the mutate
filter to manually copy the required fields into your event
.
This plugin supports these configuration options plus the Common Options described later.
Some of these options map to a Kafka option. See thehttps://kafka.apache.org/documentation for more details.
Setting | Input type | Required |
---|---|---|
No |
||
No |
||
No |
||
No |
||
No |
||
No |
||
No |
||
No |
||
No |
||
No |
||
No |
||
No |
||
No |
||
No |
||
No |
||
a valid filesystem path |
No |
|
a valid filesystem path |
No |
|
No |
||
No |
||
No |
||
No |
||
No |
||
No |
||
No |
||
No |
||
No |
||
No |
||
No |
||
No |
||
No |
||
string, one of |
No |
|
No |
||
No |
||
No |
||
No |
||
a valid filesystem path |
No |
|
No |
||
No |
||
a valid filesystem path |
No |
|
No |
||
No |
||
No |
||
No |
||
No |
Also see Common Options for a list of options supported by allinput plugins.
"5000"
The frequency in milliseconds that the consumer offsets are committed to Kafka.
What to do when there is no initial offset in Kafka or if an offset is out of range:
"localhost:9092"
A list of URLs of Kafka instances to use for establishing the initial connection to the cluster.This list should be in the form of host1:port1,host2:port2
These urls are just usedfor the initial connection to discover the full cluster membership (which may change dynamically)so this list need not contain the full set of servers (you may want more than one, though, incase a server is down).
Automatically check the CRC32 of the records consumed. This ensures no on-the-wire or on-diskcorruption to the messages occurred. This check adds some overhead, so it may bedisabled in cases seeking extreme performance.
"logstash"
The id string to pass to the server when making requests. The purpose of thisis to be able to track the source of requests beyond just ip/port by allowinga logical application name to be included.
Close idle connections after the number of milliseconds specified by this config.
1
Ideally you should have as many threads as the number of partitions for a perfectbalance — more threads than partitions means that some threads will be idle
false
Option to add Kafka metadata like topic, message size to the event.This will add a field named kafka
to the logstash event containing the following attributes:
topic
: The topic this message is associated withconsumer_group
: The consumer group used to read in this eventpartition
: The partition this message is associated withoffset
: The offset from the partition this message is associated withkey
: A ByteBuffer containing the message key"true"
If true, periodically commit to Kafka the offsets of messages already returned by the consumer.This committed offset will be used when the process fails as the position fromwhich the consumption will begin.
Whether records from internal topics (such as offsets) should be exposed to the consumer.If set to true the only way to receive records from an internal topic is subscribing to it.
The maximum amount of data the server should return for a fetch request. This is not anabsolute maximum, if the first message in the first non-empty partition of the fetch is largerthan this value, the message will still be returned to ensure that the consumer can make progress.
The maximum amount of time the server will block before answering the fetch request ifthere isn’t sufficient data to immediately satisfy fetch_min_bytes
. Thisshould be less than or equal to the timeout used in poll_timeout_ms
The minimum amount of data the server should return for a fetch request. If insufficientdata is available the request will wait for that much data to accumulatebefore answering the request.
"logstash"
The identifier of the group this consumer belongs to. Consumer group is a single logical subscriberthat happens to be made up of multiple processors. Messages in a topic will be distributed to allLogstash instances with the same group_id
The expected time between heartbeats to the consumer coordinator. Heartbeats are used to ensurethat the consumer’s session stays active and to facilitate rebalancing when newconsumers join or leave the group. The value must be set lower thansession.timeout.ms
, but typically should be set no higher than 1/3 of that value.It can be adjusted even lower to control the expected time for normal rebalances.
The Java Authentication and Authorization Service (JAAS) API supplies user authentication and authorizationservices for Kafka. This setting provides the path to the JAAS file. Sample JAAS file for Kafka client:
KafkaClient { com.sun.security.auth.module.Krb5LoginModule required useTicketCache=true renewTicket=true serviceName="kafka"; };
Please note that specifying jaas_path
and kerberos_config
in the config file will add theseto the global JVM system properties. This means if you have multiple Kafka inputs, all of them would be sharing the samejaas_path
and kerberos_config
. If this is not desirable, you would have to run separate instances of Logstash ondifferent JVM instances.
Optional path to kerberos config file. This is krb5.conf style as detailed in https://web.mit.edu/kerberos/krb5-1.12/doc/admin/conf_files/krb5_conf.html
"org.apache.kafka.common.serialization.StringDeserializer"
Java Class used to deserialize the record’s key
The maximum amount of data per-partition the server will return. The maximum total memory used for arequest will be #partitions * max.partition.fetch.bytes
. This size must be at leastas large as the maximum message size the server allows or else it is possible for the producer tosend messages larger than the consumer can fetch. If that happens, the consumer can get stuck tryingto fetch a large message on a certain partition.
The maximum delay between invocations of poll() when using consumer group management. This placesan upper bound on the amount of time that the consumer can be idle before fetching more records.If poll() is not called before expiration of this timeout, then the consumer is considered failed andthe group will rebalance in order to reassign the partitions to another member.The value of the configuration request_timeout_ms
must always be larger than max_poll_interval_ms
The maximum number of records returned in a single call to poll().
The period of time in milliseconds after which we force a refresh of metadata even ifwe haven’t seen any partition leadership changes to proactively discover any new brokers or partitions
The class name of the partition assignment strategy that the client uses todistribute partition ownership amongst consumer instances. Maps tothe Kafka partition.assignment.strategy
setting, which defaults toorg.apache.kafka.clients.consumer.RangeAssignor
.
100
Time kafka consumer will wait to receive new messages from topics
The size of the TCP receive buffer (SO_RCVBUF) to use when reading data.
The amount of time to wait before attempting to reconnect to a given host.This avoids repeatedly connecting to a host in a tight loop.This backoff applies to all requests sent by the consumer to the broker.
The configuration controls the maximum amount of time the client will waitfor the response of a request. If the response is not received before the timeoutelapses the client will resend the request if necessary or fail the request ifretries are exhausted.
The amount of time to wait before attempting to retry a failed fetch requestto a given topic partition. This avoids repeated fetching-and-failing in a tight loop.
The Kerberos principal name that Kafka broker runs as.This can be defined either in Kafka’s JAAS config or in Kafka’s config.
"GSSAPI"
SASL mechanism used for client connections.This may be any mechanism for which a security provider is available.GSSAPI is the default mechanism.
PLAINTEXT
, SSL
, SASL_PLAINTEXT
, SASL_SSL
"PLAINTEXT"
Security protocol to use, which can be either of PLAINTEXT,SSL,SASL_PLAINTEXT,SASL_SSL
The size of the TCP send buffer (SO_SNDBUF) to use when sending data
The timeout after which, if the poll_timeout_ms
is not invoked, the consumer is marked deadand a rebalance operation is triggered for the group identified by group_id
"https"
The endpoint identification algorithm, defaults to "https"
. Set to empty string ""
to disable endpoint verification
The password of the private key in the key store file.
If client authentication is required, this setting stores the keystore path.
If client authentication is required, this setting stores the keystore password
The keystore type.
The JKS truststore path to validate the Kafka broker’s certificate.
The truststore password
The truststore type.
["logstash"]
A list of topics to subscribe to, defaults to ["logstash"].
A topic regex pattern to subscribe to.The topics configuration will be ignored when using this configuration.
"org.apache.kafka.common.serialization.StringDeserializer"
Java Class used to deserialize the record’s value
The following configuration options are supported by all input plugins:
"plain"
The codec used for input data. Input codecs are a convenient method for decoding your data before it enters the input, without needing a separate filter in your Logstash pipeline.
true
Disable or enable metric logging for this specific plugin instanceby default we record all the metrics we can, but you can disable metrics collectionfor a specific plugin.
Add a unique ID
to the plugin configuration. If no ID is specified, Logstash will generate one.It is strongly recommended to set this ID in your configuration. This is particularly usefulwhen you have two or more plugins of the same type, for example, if you have 2 kafka inputs.Adding a named ID in this case will help in monitoring Logstash when using the monitoring APIs.
input { kafka { id => "my_plugin_id" }}
Add any number of arbitrary tags to your event.
This can help with processing later.
Add a type
field to all events handled by this input.
Types are used mainly for filter activation.
The type is stored as part of the event itself, so you canalso use the type to search for it in Kibana.
If you try to set a type on an event that already has one (forexample when you send an event from a shipper to an indexer) thena new input will not override the existing type. A type set atthe shipper stays with that event for its life evenwhen sent to another Logstash server.