Sidecar Logging in Openshift with Fluentd and Elasticsearch

2023-08-05

Word count: 964 | Reading time≈ 6 min

We use Elasticsearch for logging in our Openshift environments. This means that each application needs to figure out how to aggregate their logs and ship them to Elasticsearch. This duplication of effort is wasteful and increases maintenance costs as a change at Elasticsearch level must then be absorbed by each application. This approach also amplifies the impact of a security vulnerability such as the recent Log4j vulnerability in that it requires patching in multiple places. One solution would be to have a single ‘certified’ logging library however we would need one for each technology stack we use. There is another way however - the sidecar deployment.

Kubernetes (k8s) allows for multiple containers to run in the same pod. These containers can share resources and dependencies, communicate with one another, and coordinate when and how they are terminated. In the ‘Sidecar Deployment’ we have one container providing a service for another. In our case what we are looking for is that the main application writes it’s logs to file and the sidecar then manages the process of shipping the logs to Elasticsearch.

The solution will look like this:

This approach means that we can setup our logging container once and then use it with all of our application containers, regardless of the technology stack. In addition we can inject the sidecar via our Policy Framework when a pod is deployed. So even if the team forget to enable logging it will be there automatically. Lastly this provides isolation so that if the logging process fails the main application can continue working.

Logging Container

I’m going to use Fluentd to process the logs and send them to Elasticsearch. To do this I need to add the Elasticsearch plugin to the Fluentd Docker image and add my config file.

FROM fluent/fluentd:edge-debian
USER root
RUN ["gem", "install", "fluent-plugin-elasticsearch", "--no-document", "--version", "5.3.0"]

COPY fluent.conf /fluentd/etc/

COPY entrypoint.sh /bin/
RUN chmod +x /bin/entrypoint.sh

USER fluent

Fluentd Configuration

I’m starting with a minimal configuration. We tail the log at ‘var/log/mcb.log’ and tag the lines with ‘mcb’. Fluent will then match any log tagged with ‘mcb’ and send it to Elasticsearch.

I ran into some challenges here as I needed a specific index name to be used. Whilst I set the index_name property this did not seem to have any effect. The culprit was the logstash_format property. If this is set to ‘true’ then index_name is ignored. Switching to ‘false’ fixed the problem.

If you do run into problems set the @log_level to debug at plugin level. In addition setting with_transport_log to true will allow you to see the detailed trace associated with how the data is sent to Elastic.

<source>
  @type tail
  path /var/log/mcb.log
  pos_file /var/log/td-agent/mcb.log.pos
  tag mcb
  <parse>
    @type none
  </parse>
</source>

<match mcb>
  @type elasticsearch
  @log_level debug
  host es-dev.mcb.local
  port 9200
  user elastic
  password elastic
  scheme https
  ssl_verify false
  logstash_format false
  index_name pfmdevlog
  with_transporter_log true
</match>

Configuring the sidecar

We start by creating a volume for the containers to share data. This can be added to your K8S Deployment as follows:

volumes:
  - name: applog
    emptyDir: {}

An emptyDir volume is first created when a Pod is assigned to a node, and exists as long as that Pod is running on that node.

Next we add the sidecar container to the deployment:

containers:
  - name: app-container
  ...
  - name: mcb-logger
    image: >-
      docker.digitalfactory.mcb.local/mcb-sre/mcb-logger@sha256:cfd8bdd41f4fa5fbe031856444a1d272a61e7f70d6f1e42d435a61c8f856ba6d

Finally we add a volumeMount to both containers:

1
2
3

volumeMounts:
  - name: applog
    mountPath: /var/log/

In summary we now have 2 containers running in the pod who have a shared directory located at ‘/var/log’.

Use of Environment Variables

To this point I’ve hardcoded the Fluentd settings. This will not work for everyone as we use different indexes for the various applications. I need to allow each application to inject these settings when the pod is started. Fortunately Fluentd allows you to execute arbitrary Ruby code when the configuration file is parsed. This allows us the opportunity to replace values from the local ennvironment. Here’s the updated configuration:

<source>
  @type tail
  path "#{ENV['LOG_FILE'] || '/var/log/mcb.log'}"
  pos_file /var/log/td-agent/mcb.log.pos
  tag mcb
  <parse>
    @type none
  </parse>
</source>

<match mcb>
  @type elasticsearch
  @log_level "#{ENV['LOG_LEVEL'] || 'debug'}"
  host "#{ENV['ES_HOST'] || 'es-dev.mcb.local'}"
  port "#{ENV['ES_PORT'] || 9200}"
  user "#{ENV['ES_USER'] || 'elastic'}"
  password "#{ENV['ES_PASSWORD'] || 'elastic'}"
  scheme https
  ssl_verify false
  logstash_format false
  index_name pfmdevlog
  with_transporter_log true
</match>

Testing

Once the pod is started we can now see both containers running:

If we then jump to the application container we can test by manually adding a log entry:

The Fluentd running in the logging container takes this log line and sends it to Elasticsearch.

Next steps

At present the Fluentd configuration is minimal so I’ll need to provide more flexibility. The simplest method would be to use a ConfigMap to replace the Fluentd configuration entirely. There may be other options - let me know in the comments if you have a suggestion. I also need to get up our policy framework to automatically inject the sidecar if it’s not part of the applications configuration.

Copyright： Copyright is owned by the author. For commercial reprints, please contact the author for authorization. For non-commercial reprints, please indicate the source.