Logging troubleshooting

Generate test logs with counter, timestamp, and random payload

This simple pod will write logs like the following:

7 2019-05-06 10:47:49 <----- fqVcCIWiSrUtXoNm
8 2019-05-06 10:47:50 <----- ZIJkKNxPX6OUhJ4j
9 2019-05-06 10:47:51 <----- qYdyB6ijFEJI5pXW
10 2019-05-06 10:47:52 <----- kVTFLKfd7Y609svO
11 2019-05-06 10:47:53 <----- 31dZGoWNdQP5qyoI
12 2019-05-06 10:47:54 <----- TFyicTTNRN9hrKPa

To deploy it into your cluster, edit loggenerator.yaml file (attached):

INVERVAL_SECONDS set interval of log records, the default value is 1
PAYLOAD_SIZE - the size of an additional random string after counter and timestamp, the default value is 16, maximum value is 16340
ONETIME_LOG_COUNT set number of log records per INVERVAL_SECONDS period
MAX_LOG_COUNT - set number of total log count should be generated

Then apply using the following command:

kubectl apply -f ./loggenerator.yaml

Find generated logs in Kibana using the filter “kubernetes.container_name is loggenerator”.

See how logging components were initiated

Kibana: see ~/init-kibana.log in kublr-logging-kibana* pod of kublr namespace.

SearchGuard index initialization job: kublr-logging-sg-job in kublr namespace.

Search Guard configuration research

How it works

All components configuration is managed by kublr-logging-sg-job, it’s executed after logging chart deployed. This job runs inside the root-and-clients.sh script, it can be found in kublr-logging-sg-config ConfigMap. As a result, kublr-logging-searchguard is created, which contains the secrets used by logging components: kibana, logstash, elasticsearch nodes, and others who connects elasticsearch (because elasticsearch protected with searchguard and works over https).

After that as soon as a new cluster created in some kublr space, the kublr-logging-component creates a separate role in SearchGuard config for that cluster. If a cluster is deleted and purged, the role is removed and the user should take care of it manually (see ‘Cluster Removed and Purged Case’ section below).

Manual customizations

The administrator can modify Search Guard roles model excluding kublr:*, because of these roles will be overridden by logging-controller.

Get Search Guard config in easiest way

The simplest way to view the actual configuration of Search Guard is to enter kublr-logging-controller-xxxx pod and look for files in /tmp. To request actual config execute /opt/logging-controller/sg_retrieve.sh. Logging-controller checks for new clusters every 3 minutes (centralizedLoggingUpdateInterval). If detected, request actual config (sg_retrieve.sh), adds new roles and roles mapping, then upload files to Search Guard (sg_apply.sh). The following scripts may be helpful:

$ kubectl exec -it -n kublr $(kubectl get pods -n kublr \
           -o=custom-columns=NAME:.metadata.name | grep logging-controller) /bin/bash
bash-4.4$ cd /tmp
bash-4.4$ /opt/logging-controller/sg_retrieve.sh
bash-4.4$ ls
action_groups.yml  config.yml  internal_users.yml  roles.yml  roles_mapping.yml
#modify necessary files using vi
bash-4.4$ /opt/logging-controller/sg_apply.sh

Logging controller modifies the following files: sg_roles_mapping.yml and sg_roles.yml. A new Search Guard roles like “kublr:kublr-system” or “kublr:default” are created as soon as the controller detects a cluster appeared in some Kublr space. Role name contains space name.

Get Search Guard config in convenient way

Administrator can edit SearchGuard files locally.

Create and navigate sgadmin directory, place getsecret.sh, retrieve.sh and apply.sh there.
download SG admin tool from https://docs.search-guard.com/latest/search-guard-versions, unpack it.
Get certificate from secret using the ./getsecret.sh script.
In separate window run “kubectl port-forward service/kublr-logging-elasticsearch-discovery -n kublr 9300”.
./retrieve.sh to download SG config from Elasticsearch into ./config directory.
Edit config files in ./config directory.
Execute ./apply.sh to upload.

How Kublr (Control Plane) authorizes user

Kublr pass x-proxy-roles header to Kibana using sg-auth-proxy (part of Kibana deployment). There is sg-auth-proxy log example:

2019/07/02 18:27:00.809099 proxy.go:108: User '383f7ac8-8e32-4157-99c8-221c28fc1417': name=michael, roles=[uma_authorization user kublr:default]

If you are unsure, what attributes are accessible you can always access the /_searchguard/authinfo endpoint to check. The endpoint will list all attribute names for the currently logged in user. You can use Kibana Dev Tools and request GET _searchguard/authinfo.

If initialization job failed

SearhGuard initialization job can fail in some cases. For example, the administrator did not take into account the necessary resources for Elasticsearch and some components were not allocated into the cluster for a long time. Initialization job failed due timeout, and the administrator needs to restart the initialization. Another case is whole elasticsearch data files (PVC) was removed and need to reinitiate SearchGuard index.

To restart the job, the following script can be used:

kubectl get job -n kublr kublr-logging-sg-job -o json | jq 'del(.spec.selector)' | jq 'del(.spec.template.metadata.labels)' | kubectl replace --force -f -

Check job logs to make sure it done. Some connection errors can be logged in the middle of logs, it is not a problem.

Cluster removed and purged case

After cluster removed AND PRUNED, the corresponding grants being deleted too and user are not able to search logs using corresponding index pattern, because old indexes are still in the database and the users are not able to search it.

For example, user creates two clusters kublr_default_oltr-5778-1 and kublr_default_oltr-5778-2 in “default” kublr space. kublr-logging-controller creates corresponding grants in Elasticsearch Searchguard index.

purged_case-l1-1 purged_case-l1-2 purged_case-l1-3

Next, he deletes and purges kublr_default_oltr-5778-1 cluster. As it is removed from kublr database, kublr-logging-controller removed corresponding sections in sg_roles.yml and sg_roles_mapping.yml files, then applied these config into Elasticsearch Searchguard index.

purged_case-l2-1 purged_case-l2-2 purged_case-l2-3

So, as user uses “kublr_default*” index pattern, it also includes indexes kublr_default_oltr-5778-1*. But the user does not have access to these indexes, only to kublr_default_oltr-5778-2*.

There are several solutions:

Remove indexes belong to purged clusters. For example, using Kibana Dev Tools and DELETE command (in this example remove all kublr_default_oltr-5778-1* indices)
Create narrower index pattern for all remaining clusters in this space and not use general index pattern (in this example kublr_default* index pattern) until those indices are removed by the curator (7 days by default)
Create custom SearchGuard roles config manually to provide the user access to old indices. Make sure naming not included “kublr:” prefix as these sections managed by kublr.

Logstash Dead Letter queue

If Elasticsearch rejects some log entries, this is reflected in the monitoring and the corresponding alert is fired.

Possible reasons for rejecting log entries are:

The limit of the number of shards has been reached.
The record is too large.
The format of the field does not match the format of the field with the same name in the index where the record is inserted.
Other reasons.

The lost records can be found in the Logstash container at /usr/share/logstash/custom_dead_letter_queue/main/. Note that the old log files are deleted when a threshold of 128 MB or 1000 files is exceeded.

In Grafana, the Logstash DeadLetterQueue (lost messages) metric is presented to reflect the number of log entries sent to the DeadLetterQueue.