A pretty raw post about one of many ways of sending data to Elasticsearch. Possibly the way that requires the least amount of setup (read: effort) while still producing decent results. It’s hardly AWS specific, but it assumes an AWS Elasticsearch cluster and has a few notes regarding that.
It involves an Elasticsearch cluster and a server to send logs from. Nginx in this example. No Logstash, CloudWatch, Kibana Firehose or any other thing like that. All of these have their place and advantages, but might not be needed right away. Basically it’s a good setup for a proof of concept or for starting with Elasticsearch.
Spinning up a cluster is out of scope for this post. AWS makes it pretty easy.
In AWS there are more options. Like, a Lambda function that gets triggered when a log is uploaded to S3 or CloudWatch. Or using Firehose to load logs into Elasticsearch. Won’t talk about these.
We use the last two ingest methods to get logs into Elasticsearch. Steps: - Define a pipeline on Elasticsearch cluster. The pipeline will translate a log line to JSON, informing Elasticsearch about what each field represents. For example, the first field is the client IP address. - Install and configure Filebeat to read nginx access logs and send them to Elasticsearch using the pipeline created above. - Start Filebeat and confirm that it all works as expected.
Interacting with Elasticsearch is done through API calls. One convenient way to do that is to use Kibana’s Console, under “Dev Tools”, in the left side menu. In order to get to Kibana on Amazon Elasticsearch, go to https://cluster.url/_plugin/kibana
. API calls below are presented in Console format.
PUT _ingest/pipeline/weblog_combined
{
"description": "Ingest pipeline for Combined Log Format",
"processors": [
{
"grok": {
"field": "message",
"patterns": [
"""%{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] "%{WORD:verb} %{DATA:request} HTTP/%{NUMBER:httpversion}" %{NUMBER:response:int} (?:-|%{NUMBER:bytes:int}) %{QS:referrer} %{QS:agent}"""
]
}
},
{
"date": {
"field": "timestamp",
"formats": [
"dd/MMM/YYYY:HH:mm:ss Z"
]
}
},
{
"user_agent": {
"field": "agent"
}
}
]
}
This defines three processors:
This is the pipeline for Nginx error logs:
PUT _ingest/pipeline/weblog_nginx_error
{
"description": "Ingest pipeline Nginx error logs",
"processors": [
{
"grok": {
"field": "message",
"patterns": [
"""^(?<timestamp>%{YEAR}[./]%{MONTHNUM}[./]%{MONTHDAY} %{TIME}) \[%{LOGLEVEL:severity}\] %{POSINT:pid}#%{NUMBER:threadid}\:( \*%{NUMBER:connectionid})? %{DATA:message}(,|$)( client: %{IPORHOST:client})?(, server: %{IPORHOST:server})?(, request: "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion}))")?(, upstream: "%{DATA:upstream}")?(, host: "%{IPORHOST:vhost}")?"""
]
}
},
{
"date": {
"field": "timestamp",
"formats": [
"YYYY/MM/dd HH:mm:ss"
]
}
}
]
}
What is notable here is that the error log format is more dynamic. Some fields are optional. For example, there is no “connectionid” if the error isn’t about a connection. In the combined log format there would be a “-” as a placeholder, but in Nginx error logs the field is simply missing. These fields are marked by putting ()?
around them.
To see the list of available plugins, GET _nodes/ingest
. Amazon Elasticsearch service does not allow adding new plugins. The geo-ip plugin is not installed as of version 6.3, so it can’t be used in a pipeline. This is one instance where Logstash comes in if that functionality is needed.
After a pipeline is created it can be tested by using the simulate API:
POST _ingest/pipeline/weblog_combined/_simulate
{
"docs": [
{
"_source": {
"message": "6.6.6.6 - - [07/Aug/2017:10:11:12 +0000] \"GET /login HTTP/1.1\" 200 1062 \"-\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:61.0) Gecko/20100101 Firefox/61.0\""
}
}
]
}
Which should result in:
{
"docs": [
{
"doc": {
"_index": "_index",
"_type": "_type",
"_id": "_id",
"_source": {
"request": "/login",
"agent": """"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:61.0) Gecko/20100101 Firefox/61.0"""",
"auth": "-",
"ident": "-",
"verb": "GET",
"message": """6.6.6.6 - - [07/Aug/2017:10:11:12 +0000] "GET /login HTTP/1.1" 200 1062 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:61.0) Gecko/20100101 Firefox/61.0"""",
"referrer": """"-"""",
"@timestamp": "2017-08-07T10:11:12.000Z",
"response": 200,
"bytes": 1062,
"clientip": "6.6.6.6",
"httpversion": "1.1",
"user_agent": {
"major": "61",
"minor": "0",
"os": "Mac OS X 10.13",
"os_minor": "13",
"os_major": "10",
"name": "Firefox",
"os_name": "Mac OS X",
"device": "Other"
},
"timestamp": "07/Aug/2017:10:11:12 +0000"
},
"_ingest": {
"timestamp": "2017-08-07T12:17:39.029Z"
}
}
}
]
}
On FreeBSD the package is beats.
filebeat.yml:
- type: log
enabled: true
paths:
- /var/log/nginx/access.log
- /var/log/nginx/*_access.log
exclude_lines:
- 'GET.*ELB-HealthChecker\/'
tags:
- weblogs
- nginx
fields:
index_name: weblog_access
pipeline: "weblog_combined"
- type: log
enabled: true
paths:
- /var/log/nginx/error.log
- /var/log/nginx/*_error.log
exclude_lines:
- 'newsyslog\[.*\]: logfile turned over'
tags:
- weblogs
- nginx
fields:
index_name: weblog_nginx_error
pipeline: "weblog_nginx_error"
### Index templates
setup.template.enabled: false
### Outputs
output.elasticsearch:
hosts: ["logses.internal.domain:443"]
protocol: "https"
#username: "elastic"
#password: "changeme"
ssl.verification_mode: none
# use index_name defined in the input section
index: "%{[fields.index_name]:logs}-%{+YYYY.MM.dd}-fbt_%{[beat.version]}"
### Logging
#logging.level: debug
#logging.selectors: ["*"]
logging.to_syslog: true
logging.to_files: false
Filebeat starts a harvester for each file configured in inputs section.
A Filebeat configuration should have at least an input and an output section.
The config will:
A note here. Sending documents to Elasticsearch that the pipeline can’t process will result in
ERR Failed to publish events: temporary bulk send failure
BSD’s newsyslog (log rotation system) might append a message at the end of a log it rotates saying that it was turned over and why. This is very likely to cause the pipeline to return an error resulting in the above message in Filebeat logs and will stop further processing. The solution is to either add the “B” flag to newsyslog config or to add that line to exclude_lines
in Filebeat config. Or both.
To start Filebeat with stdout output, pass it -e
option. On FreeBSD it would be
filebeat -path.config /usr/local/etc -path.home /var/db/beats/filebeat -e
Set logging.level
to debug
in config file for verbose output.
To see a list of indeces:
GET _cat/indices
weblog_access-{date} should be there
To search for documents in an index:
GET weblog_access-*/_search?pretty=true&q=*:*
By default it only returns 10 documents.
To see the logs in Kibana, the index must be defined. Go to Management -> Index patterns
. Add a new index weblog_access-*
. Set @timestamp
as time filter field name (this is the field we used the date processor on when we created the pipeline). Logs can be seen, searched and filtered under Discover
.