For other versions, see theVersioned plugin docs.
For questions about the plugin, open a topic in the Discuss forums. For bugs or feature requests, open an issue in Github.For the list of Elastic supported plugins, please consult the Elastic Support Matrix.
This plugin sends Logstash events into files in HDFS viathe webhdfs REST API.
This plugin has no dependency on jars from hadoop, thus reducing configuration and compatibilityproblems. It uses the webhdfs gem from Kazuki Ohta and TAGOMORI Satoshi (@see: https://github.com/kzk/webhdfs).Optional dependencies are zlib and snappy gem if you use the compression functionality.
If you get an error like:
Max write retries reached. Exception: initialize: name or service not known {:level=>:error}
make sure that the hostname of your namenode is resolvable on the host running Logstash. When creating/appendingto a file, webhdfs somtime sends a 307 TEMPORARY_REDIRECT
with the HOSTNAME
of the machine its running on.
This is an example of Logstash config:
input { ...}filter { ...}output { webhdfs { host => "127.0.0.1" # (required) port => 50070 # (optional, default: 50070) path => "/user/logstash/dt=%{+YYYY-MM-dd}/logstash-%{+HH}.log" # (required) user => "hue" # (required) }}
This plugin supports the following configuration options plus the Common Options described later.
Setting | Input type | Required |
---|---|---|
string, one of |
No |
|
No |
||
Yes |
||
No |
||
No |
||
No |
||
Yes |
||
No |
||
No |
||
No |
||
No |
||
No |
||
No |
||
No |
||
string, one of |
No |
|
No |
||
No |
||
No |
||
No |
||
No |
||
No |
||
No |
||
Yes |
Also see Common Options for a list of options supported by alloutput plugins.
none
, snappy
, gzip
"none"
Compress output. One of [none, snappy, gzip]
500
Sending data to webhdfs if event count is above, even if store_interval_in_secs
is not reached.
The server name for webhdfs/httpfs connections.
1
Sending data to webhdfs in x seconds intervals.
Set kerberos keytab file. Note that the gssapi library needs to be available to use this.
The path to the file to write to. Event fields can be used here,as well as date fields in the joda time format, e.g.:/user/logstash/dt=%{+YYYY-MM-dd}/%{@source_host}-%{+HH}.log
true
Retry some known webhdfs errors. These may be caused by race conditions when appending to same file, etc.
5
How many times should we retry. If retry_times is exceeded, an error will be logged and the event will be discarded.
false
Avoid appending to same file in multiple threads.This solves some problems with multiple logstash output threads and locked file leases in webhdfs.If this option is set to true, %{[@metadata][thread_id]} needs to be used in path config settting.
32768
Set snappy chunksize. Only neccessary for stream format. Defaults to 32k. Max is 65536@see http://code.google.com/p/snappy/source/browse/trunk/framing_format.txt
stream
, file
"stream"
Set snappy format. One of "stream", "file". Set to stream to be hive compatible.
false
Use httpfs mode if set to true, else webhdfs.
false
Set ssl authentication. Note that the openssl library needs to be available to use this.
The Username for webhdfs.
The following configuration options are supported by all output plugins:
"line"
The codec used for output data. Output codecs are a convenient method for encoding your data before it leaves the output without needing a separate filter in your Logstash pipeline.
true
Disable or enable metric logging for this specific plugin instance.By default we record all the metrics we can, but you can disable metrics collectionfor a specific plugin.
Add a unique ID
to the plugin configuration. If no ID is specified, Logstash will generate one.It is strongly recommended to set this ID in your configuration. This is particularly usefulwhen you have two or more plugins of the same type. For example, if you have 2 webhdfs outputs.Adding a named ID in this case will help in monitoring Logstash when using the monitoring APIs.
output { webhdfs { id => "my_plugin_id" }}