Mesos container log forwarding with Filebeat

The next level

In the previous post we managed to forward container logs to journald, but we could have done the same thing and much more if we could have forwarded the logs to systemd-cat instead. We wouldn't have had to write any code at all if that was possible.

So we're going to make that possible by backporting an upcoming "external" container logger, being developed for the next release of Mesos, as a module compiled for the currently stable release. That's version 1.0.1 at the time of writing.

Then we'll use this module to set up log forwarding from Mesos containers to Graylog using Filebeat 1.3 and include additional fields from the Mesos executor's environment.

Mesos and external container loggers

Apache Mesos has had support for a LogrotateContainerLogger since version 0.27, but if you look at what that container logger you'll find that all it does is receive input from standard input and standard error, counts the number of bytes received, and writes to the stdout and stdout files in the sandbox directory.

It calls logrotate once a configurable number of bytes has been received. This logic is implemented by mesos-logrotate-logger which is a stand-alone program. Why do we need a separate module to feed logs to this specific external program?

MESOS-6003 was created to add a logging module for logging to any external program. Thanks to the hard work of Will Rouesnel, shepherded by Joseph Wu, this is almost completed.

What if we can't wait for the next release of Mesos? Maybe we're stuck on 1.0.1 for a while for reasons beyond our control, but we'd love to use Filebeat for log forwarding. Unfortunately we can't, because the recommended setup doesn't work.

Even if it did we'd still be missing out on important context, like what the Marathon app ID and the Mesos task ID are. Maybe there's a specific version of the Marathon app which is having issues. It would be nice if we could find all log messages from that application and version with a simple search.

We'll make that possible by backporting the external container logging module to Mesos 1.0.1 and setting it up to use a small wrapper around Filebeat to create configuration which includes the task's environment variables and forwards the log messages to Graylog. It will also work if you are using Logstash.

Creating and using the backported module

By reusing much of what we did in the last post we can create a module for Mesos 1.0.1 using the code available in the review request for the external logger module. It doesn't compile using Mesos 1.0.1 as is, but the necessary changes are trivial.

The module has been compiled for Ubuntu 14.04 (Trusty Tahr) and Debian 8 and can be downloaded from the mesos-external-container-logger repository on GitHub, but using the provided scripts you can compile it for your system.

It is enabled by starting mesos-slave with a modules config similar to this one and using --container-logger=org_apache_mesos_ExternalContainerLogger.

{
  "libraries": [{
    "file": "/usr/lib/libexternal_container_logger-0.1.0.so",
    "modules": [{
      "name": "org_apache_mesos_ExternalContainerLogger",
      "parameters": [{"value": "/usr/bin/filebeat-container-logger.sh",
                      "key": "external_logger_cmd"}]
    }]
  }]
}

The ExternalContainerLogger module creates an environment variable called MESOS_LOG_STREAM which will have the value "STDOUT" or "STDERR" depending on where the log is coming from. To emulate the default sandbox container logger you could use this.

#!/bin/sh
cat > $(echo "$MESOS_LOG_STREAM" | tr '[:upper:]' '[:lower:]')

We can also emulate the LogrotateContainerLogger.

#!/bin/sh
/usr/libexec/mesos/mesos-logrotate-logger \
  --log_filename=$(echo "$MESOS_LOG_STREAM" | tr '[:upper:]' '[:lower:]')

Filebeat as the external logger process

We can use Filebeat to forward our logs to a central log server like Graylog if we're not happy with local files. We'll create a small bootstrap script to create the Filebeat configuration file and start it.

We extract all environment variables for the task from MESOS_EXECUTORINFO_JSON, which is set by the module, using jq to parse the JSON and a bit of awk to convert it to the proper format.

The script is simple enough to include in full here but you can also find it on GitHub as examples/filebeat-container-logger.sh

#!/bin/bash

config_path="$MESOS_LOG_SANDBOX_DIRECTORY/filebeat-$MESOS_LOG_STREAM.yml"

mesos_fields=$(echo "$MESOS_EXECUTORINFO_JSON" | \
                jq -r ".command.environment.variables
                         |map(\"\(.name):\(.value|tostring)\")|.[]" | \
                # Skip empty variables, use mesos_ prefix, convert to lowercase
                awk -F: 'length($2) > 0 {
                           $1=tolower($1);
                           if (!match($1, "^mesos_.*")) {
                             $1="mesos_" $1;
                           }
                           printf("%s: \"%s\"\n        ", $1, $2);
                         }')

cat <<EOF > $config_path
filebeat:
  prospectors:
    -
      paths:
        - "-"
      input_type: stdin
      close_eof: true
      fields:
        mesos_log_stream: $MESOS_LOG_STREAM
        mesos_log_sandbox_directory: $MESOS_LOG_SANDBOX_DIRECTORY
        $mesos_fields

output:
  logstash:
    hosts: ["graylog.aws.wjoel.com:5044"]
EOF

/usr/bin/filebeat -c $config_path

Change the hosts variable to your Graylog or Logstash server. More output options are available if you need to send the logs to something else.

All environment variables in the ExecutorInfo JSON description, prefixed with mesos_ and in lowercase, are included as extra fields. This makes it easy to find all log messages for any executor, Marathon application, or Mesos agent.

This is what we get in Graylog when we run three instances of the "hello world" Marathon app from the previous post with all of the above in place.

Logs from Marathon in Graylog

Making the previous post irrelevant with systemd-cat

At the end of the previous post we noted that we had implemented a worse systemd-cat. We can use the external logger module to provide logging to journald with systemd-cat. Note that this example requires that jq is installed on the Mesos agent, just like the example above.

#!/bin/bash

log_level="info"
if [ "$MESOS_LOG_STREAM" = "STDERR" ]; then
  log_level="err"
fi

task_id=$(echo "$MESOS_EXECUTORINFO_JSON" | \
  jq -r '.command.environment.variables[]|select(.name=="MESOS_TASK_ID").value')

if [ "$task_id" ]; then
    task_identifier="-t $task_id"
fi

systemd-cat $task_identifier -p $log_level

Finally, we can combine Filebeat with mesos-logrotate-logger by changing the last line of filebeat-container-logger.sh to the following.

log_stream=$(echo $MESOS_LOG_STREAM | tr '[:upper:]' '[:lower:]')
tee >(/usr/libexec/mesos/mesos-logrotate-logger \
        --log_filename="$MESOS_LOG_SANDBOX_DIRECTORY/$log_stream" \
        --logrotate_options="rotate 5" \
        --max_size="10MB") | \
  /usr/bin/filebeat -c $config_path

This way the logs are available both in Graylog and the sandbox, so we can view logs in the Mesos web interface as usual. In addition, log rotation will ensure that don't out of disk space due to unbounded container logs.

Caveats and possible improvements

The cost for this setup is two Filebeat processes per container. That's acceptable for most setups since it was created to be lightweight, but in some cases it could be worth exploring other options.

For example, you could run just one Filebeat process on each Mesos agent and create config for each container in an external logger script which is then loaded by Filebeat. You would then have to find a way to clean up configs after containers are terminated, but this should be possible (although ugly) with a cron job.

Filebeat unfortunately does not terminate when standard input is closed, even with close_eof. The newly added -once flag might help, but it's so new that you would currently have to compile Filebeat from source to enable it.

This should not be much of an issue if you have long-running services but otherwise you should find a way to solve this. Again, perhaps by using a cron to kill off old Filebeat processes.

Summary

By backporting the upcoming external container logger module to Mesos 1.0.1 we can centralize container logs by using Filebeat and Graylog today.

The external container logger is flexible enough to replace all existing container logger modules and the one for journald which we created in the previous post. It can even be used to combine the functionality of several other container loggers.

By centralizing the container logs with Filebeat we can include environment variables provided by Mesos, Marathon and other frameworks to provide more context.

The future of logging is bright for Mesos, and we can have much of it today thanks to modules. You can get this module on GitHub in the mesos-external-container-logger repository and follow the development by Will Rouesnel in MESOS-6003.

It's not perfect yet, but I hope you find this useful.