For other versions, see theVersioned plugin docs.
For plugins not bundled by default, it is easy to install by running bin/logstash-plugin install logstash-output-google_bigquery
. See Working with plugins for more details.
For questions about the plugin, open a topic in the Discuss forums. For bugs or feature requests, open an issue in Github.For the list of Elastic supported plugins, please consult the Elastic Support Matrix.
This Logstash plugin uploads events to Google BigQuery using the streaming APIso data can become available to query nearly immediately.
You can configure it to flush periodically, after N events or aftera certain amount of data is ingested.
You must enable BigQuery on your Google Cloud account and create a dataset tohold the tables this plugin generates.
You must also grant the service account this plugin uses access to the dataset.
You can use Logstash conditionalsand multiple configuration blocks to upload events with different structures.
This is an example of Logstash config:
output { google_bigquery { project_id => "folkloric-guru-278" (required) dataset => "logs" (required) csv_schema => "path:STRING,status:INTEGER,score:FLOAT" (required) json_key_file => "/path/to/key.json" (optional) error_directory => "/tmp/bigquery-errors" (required) date_pattern => "%Y-%m-%dT%H:00" (optional) flush_interval_secs => 30 (optional) }}
Specify either a csv_schema or a json_schema. |
|
If the key is not used, then the plugin tries to findApplication Default Credentials |
batch_size
, batch_size_bytes
, or flush_interval_secs
is met, whatever comes first. If you notice a delay in your processing or low throughput, try adjusting those settings.This plugin supports the following configuration options plus the Common Options described later.
Setting | Input type | Required |
---|---|---|
No |
||
No |
||
No |
||
Yes |
||
No |
||
Deprecated |
||
Yes |
||
No |
||
No |
||
No |
||
No |
||
Deprecated |
||
Obsolete |
||
Yes |
||
Deprecated |
||
No |
||
No |
||
No |
||
Deprecated |
||
Deprecated |
||
Deprecated |
Also see Common Options for a list of options supported by alloutput plugins.
Added in 4.0.0.
128
The maximum number of messages to upload at a single time.This number must be < 10,000.Batching can increase performance and throughput to a point, but at the cost of per-request latency.Too few rows per request and the overhead of each request can make ingestion inefficient.Too many rows per request and the throughput may drop.BigQuery recommends using about 500 rows per request, but experimentation with representative data (schema and data sizes) will help you determine the ideal batch size.
Added in 4.0.0.
1_000_000
An approximate number of bytes to upload as part of a batch.This number should be < 10MB or inserts may fail.
nil
Schema for log data. It must follow the format name1:type1(,name2:type2)*
.For example, path:STRING,status:INTEGER,score:FLOAT
.
The BigQuery dataset the tables for the events will be added to.
"%Y-%m-%dT%H:00"
Time pattern for BigQuery table, defaults to hourly tables.Must Time.strftime patterns: www.ruby-doc.org/core-2.0/Time.html#method-i-strftime
Events are uploaded in real-time without being stored to disk.
Added in 4.0.0.
"/tmp/bigquery"
.The location to store events that could not be uploaded due to errors.By default if any message in an insert is invalid all will fail.You can use skip_invalid_rows
to allow partial inserts.
Consider using an additional Logstash input to pipe the contents ofthese to an alert platform so you can manually fix the events.
Or use GCS FUSE totransparently upload to a GCS bucket.
Files names follow the pattern [table name]-[UNIX timestamp].log
5
Uploads all data this often even if other upload criteria aren’t met.
false
Indicates if BigQuery should ignore values that are not represented in the table schema.If true, the extra values are discarded.If false, BigQuery will reject the records with extra fields and the job will fail.The default value is false.
You may want to add a Logstash filter like the following to remove common fields it adds:
mutate { remove_field => ["@version","@timestamp","path","host","type", "message"]}
Replaces key_password
, key_path
and service_account
nil
If Logstash is running within Google Compute Engine, the plugin can useGCE’s Application Default Credentials. Outside of GCE, you will need tospecify a Service Account JSON key file.
nil
Schema for log data as a hash.These can include nested records, descriptions, and modes.
Example:
json_schema => { fields => [{ name => "endpoint" type => "STRING" description => "Request route" }, { name => "status" type => "INTEGER" mode => "NULLABLE" }, { name => "params" type => "RECORD" mode => "REPEATED" fields => [{ name => "key" type => "STRING" }, { name => "value" type => "STRING" }] }]}
Replaced by json_key_file
or by using ADC. See json_key_file
Obsolete: The PKCS12 key file format is no longer supported.
Please use one of the following mechanisms:
.P12
file or with the following command: gcloud iam service-accounts keys create key.json --iam-account my-sa-123@my-project-123.iam.gserviceaccount.com
Google Cloud Project ID (number, not Project Name!).
Replaced by json_key_file
or by using ADC. See json_key_file
Added in 4.1.0.
false
Insert all valid rows of a request, even if invalid rows exist.The default value is false, which causes the entire request to fail if any invalid rows exist.
"logstash"
BigQuery table ID prefix to be used when creating new tables for log data.Table name will be <table_prefix><table_separator><date>
"_"
BigQuery table separator to be added between the table_prefix and thedate suffix.
Events are uploaded in real-time without being stored to disk.
Events are uploaded in real-time without being stored to disk
This field is no longer used
60
Uploader interval when uploading new files to BigQuery. Adjust time basedon your time pattern (for example, for hourly files, this interval can bearound one hour).
The following configuration options are supported by all output plugins:
"plain"
The codec used for output data. Output codecs are a convenient method for encoding your data before it leaves the output without needing a separate filter in your Logstash pipeline.
true
Disable or enable metric logging for this specific plugin instance.By default we record all the metrics we can, but you can disable metrics collectionfor a specific plugin.
Add a unique ID
to the plugin configuration. If no ID is specified, Logstash will generate one.It is strongly recommended to set this ID in your configuration. This is particularly usefulwhen you have two or more plugins of the same type. For example, if you have 2 google_bigquery outputs.Adding a named ID in this case will help in monitoring Logstash when using the monitoring APIs.
output { google_bigquery { id => "my_plugin_id" }}