I recently wrote a log collector that uses Docker’s Container Logs API to subscribe output streams of containers. Most people don’t collect logs from containers this way, because they can use well known logging drivers that Docker supports, so it’s hard to find programs out there consuming this logs API.
The stream format of this endpoint is not documented anywhere yet and it is rather cryptic. Docker API clients, such as written for Go, or one I wrote myself for .NET do not help you parse this output.
So here I’m going to explain the output format of the /container/<id>/logs
endpoint and offer a Go library to consume it easily.
Docker Logs Format
To examine the logs format, let’s start a container that produces some output:
$ docker run --name=foo busybox sh -c "echo Roses are red violets are blue; sleep 1; echo I have a message that’s from stderr to you >&2"
Roses are red violets are blue
I have a message that’s from stderr to you
Now, let’s use curl to query the logs, and save them to a file.
$ curl -sN --unix-socket /var/run/docker.sock \
http://./containers/foo/logs\?stdout\=1\&stderr\=1 | tee logs.txt
Roses are red violets are blue
-I have a message that’s from stderr to you
Where did that -
come from? If you look at the hex dump of the file, you will
see that our messages are prefixed with some non-printable bytes:
$ hexdump -C logs.txt
00000000 01 00 00 00 00 00 00 1f 52 6f 73 65 73 20 61 72 |........Roses ar|
00000010 65 20 72 65 64 20 76 69 6f 6c 65 74 73 20 61 72 |e red violets ar|
00000020 65 20 62 6c 75 65 0a 02 00 00 00 00 00 00 2d 49 |e blue........-I|
00000030 20 68 61 76 65 20 61 20 6d 65 73 73 61 67 65 20 | have a message |
00000040 74 68 61 74 e2 80 99 73 20 66 72 6f 6d 20 73 74 |that...s from st|
00000050 64 65 72 72 20 74 6f 20 79 6f 75 0a |derr to you.|
This prefix is called the message header and it is 8 bytes long. Its layout is as follows:
- first byte indicates the message stream (1=stdout, 2=stderr)
- next 3 bytes unused
- last 4 bytes indicate length of the message in big endian layout
You can find the implementation of this at github.com/docker/docker/pkg/stdcopy package.
Here is the first message:
01 00 00 00 00 00 00 1f 52 6f 73 65 73 20 61 72 65 ...
│ ─────┬── ─────┬───── R o s e s a r e ...
│ │ │
└stdout │ │
│ └─ 0x0000001f = 31 bytes (including the \n at the end)
unused
Here is the second message:
02 00 00 00 00 00 00 2d 49 20 68 61 76 65 20 61 ...
│ ────┬─── ────┬────── I h a v e a ...
│ │ │
└stderr │ │
│ └─ 0x0000002d = 45 bytes (including the \n at the end)
unused
I must note that since it is not documented as part of the API, it may change and break your code. That said, somebody gets broken pretty much every time Docker API is revisioned, so there is that.
dlog: Go library to parse Docker logs stream
Since there are no tools are available out there to parse this log message, I
decided to write a Go package called dlog
. Check it out on GitHub.
It wraps an io.Reader
(raw logs stream) and returns another io.Reader
that
serves the message entries and strips off the header, which contains stdout/stderr
bit.
It is very easy to use. Here is how you print each line of output from the container:
stream := resp.Body // raw logs stream
rr := dlog.NewReader(stream) // wrap it!
s := bufio.NewScanner(rr)
for s.Scan() {
fmt.Printf("%q\n", s.String())
}
if err := scanner.Err(); err != nil {
panic(err)
}
Just a couple performance notes:
dlog
is optimized for memory/speed. Internally, it uses a single buffer for each line and does not make any allocations per message. Therefore the memory usage is constant.- In the code above, you can use
s.Bytes()
in the loop avoid the extra string allocation which duplicates the line contents in memory.
This design drops the stdout/stderr byte. If you need it, I welcome contributions. We can probably design a for-loop based scanner that returns a struct that has the stream byte and the message.
Leave your thoughts