A Mesos logging module for journald

A short introduction to Mesos

We're going to send logs from Mesos containers to systemd-journald by writing a container logging module in C++. You can skip this section if you already know what Mesos is.

Apache Mesos describes itself as a distributed systems kernel, which is appropriate in a lot of ways. For some it may sound a bit intimidating and complicated, and I think that's a bit unfortunate, because it can be explained very simply without losing too much.

Mesos offers resources in a cluster. Resources types include CPU cores, memory, disk space, network ports, GPUs, etc. Let's say I'm a developer and I want to run my application on the cluster. I get an offer from Mesos of 4 CPU cores, 8 GBs of memory, 40 GBs of disk space and a range of ports from 10000-20000.

I don't need all of it, and reply that accept 1 CPU core, 2 GBs of memory, 200 MBs of disk space and one port, port 10000, and I want to fetch https://wjoel.com/foo-standalone.jar (a self-contained "fat JAR" with no external dependencies) and run it with the command java -jar foo-standalone.jar. Mesos will create a container using cgroups (if running on Linux) to enforce limits based on the resource constraints I accepted. The container is also known as a sandbox, and we get to play in it as long as we stay within the resource limits.

Developers typically don't want to bother with resource offers from Mesos. Programs that respond to resource offers from Mesos are called frameworks. One such framework is Mesosphere's Marathon, and its application specifications are essentially lists of resources and the command to run. Marathon can also ensure that applications are restarted if they die for any reason, do rolling updates, and many other useful things that developers like to have.

You may have noticed that I told Mesos to run my JAR file using Java, but didn't specify that I wanted Java to be downloaded. Hence, my application will only run if Mesos decides to run it somewhere where Java is already installed.

I could create a Docker image which includes foo-standalone.jar, Java, and any other dependencies I might need. Mesos can run Docker containers as well, either on its own or using by Docker for isolation. Alternatively, I could have included an additional URL in my reply, containing the location of an archive with a full Java installation and used that instead, all from within the container.

Container logging in Mesos

The output from my program will end up in the directory of the sandbox Mesos created, in the files stdout and stderr. That's fine in a lot of cases, since the Mesos UI has an interface to view the contents of those files and even updates the view when the files are changed.

Some people prefer to have all their logs go through systemd-journald, or journald for short, perhaps because they have already solved the problems of log forwarding and archiving for journald. We can get this today, instead of having to wait for a release that has it, because there is support for many types of modules in Mesos. There is a module type for container loggers, so let's make one for journald.

The default behavior of logging to stdout and stderr is implemented by the sandbox logger, which can be found in src/slave/container_logger/sandbox.cpp. It's alright if you don't know (or dislike) C++, because the important lines are simple enough.

  process::Future<ContainerLogger::SubprocessInfo> prepare(
      const ExecutorInfo& executorInfo,
      const std::string& sandboxDirectory)
  {
    ContainerLogger::SubprocessInfo info;

    info.out = SubprocessInfo::IO::PATH(path::join(sandboxDirectory, "stdout"));
    info.err = SubprocessInfo::IO::PATH(path::join(sandboxDirectory, "stderr"));

    return info;
  }

In other words, direct all standard output into stdout in the sandbox directory, and direct all standard errors into stderr in the sandbox directory.

A Mesos module for container logging to systemd-journald

At first I thought I'd have to intercept everything written to info.out and info.err and split it on newlines, and then send them to journald using sd_journal_print. I was sufficiently disgusted by the idea of going back to the ancient C world of reading bytes from file descriptors to go looking for prior art. There is already a command for sending lines of texts to journald called systemd-cat, and it is straightforward.

Using sd_journal_stream_fd doesn't quite work with the example above, since it is using (file) paths, but it's possible to assign a file descriptor to info.out and info.err.

  // find out how to set this to the Mesos task identifier on github
  std::string identifier;

  journal_out = sd_journal_stream_fd(identifier.c_str(), LOG_INFO, 1);
  journal_err = sd_journal_stream_fd(identifier.c_str(), LOG_ERR, 1);

  info.out = SubprocessInfo::IO::FD(journal_out);
  info.err = SubprocessInfo::IO::FD(journal_err);

The documentation on Mesos modules instructions is good, but some extra steps are needed for the compilation to succeed. First compile Mesos but use ../configure --enable-install-module-dependencies. You might be forgiven for missing that step, as I did, since it's not (yet?) included in the documentation and it's a new flag. I also had to install libz-dev on a clean Debian 8 installation, but once Mesos has been compiled and installed we can compile the module.

g++ -I/usr/local/lib/mesos/3rdparty/include -c -fpic \
 -o journald_container_logger.o journald_container_logger.cpp
gcc -lsystemd -shared -o mesos_journald_container_logger.so journald_container_logger.o

The GitHub repository for this code uses CMake, but the above is the essence of what it does and it's where I started before going down the CMake route.

A quick demo

First we start mesos-master as usual. I use an internal IP address here. Then we start mesos-agent with two additional flags, perhaps using sudo. We'll use Marathon to run a simple shell script.

$ mesos-master --ip=10.0.1.3
$ mesos-agent --master=10.0.1.3:5050 \
  --modules='{"libraries":[{"file":"/path/to/mesos_journald_container_logger.so", \
              "modules":[{"name":"com_wjoel_JournaldLogger"}]}]}' \
  --container_logger=com_wjoel_JournaldLogger
$ /path/to/marathon-1.1.1/bin/start --master 10.0.1.3:5050

Once everything is up and running we can create an application in Marathon which echoes "hello world" every 5 seconds using a simple shell loop, with a journalctl -f running in the background.

Marathon app creation

In the terminal running journalctl -f we can see the output from our application (along with a failed ssh login from Colombia).

journald log

What we win and what we lose

The output from our container is now forwarded to journald. That means the output is no longer written to files in the sandbox, so we can't view it in the Mesos web interface.

This isn't an issue if log forwarding has been set up on all machines where a container might run, but wouldn't it be nice if we could have both? And while our change to make this happen was most delightfully simple, isn't it just a worse systemd-cat?

Of course it would, and sure it is. We'll take care of all that and more next time. Until then, you can find the mesos-journald-container-logger on GitHub.