Note: The following workaround would persist during TKGI tile and cluster upgrades. However, during normal operations, if the
observability-manager pod gets recreated or restarted for any reason (e.g., worker VM rebooted, etc.), then the
fluent-bit daemonset will get recreated and will revert to the default specs. A permanent fix for this would be included in TKGI v1.12. It is recommended to upgrade to version 1.12 once it is available.
The workaround entails the creation of a container image that would contain the custom CA,
fluent-bit patch, and a shell script that deploys the patch. This also requires a registry, such as a local Harbor registry, to host the custom container image. The container image will be ran in the
pks-system namespace of the cluster, and this ensures the patch is persisted through upgrades.
1. In a host or machine where docker is running, create a directory for this workaround, and '
cd' into it. All files that need to be created will be stored in this directory.
$ mkdir fluent-bit-workaround; cd fluent-bit-workaround
2. Save the CA PEM data into a file named
cert.pem. Below is an example PEM data. Make sure to change the PEM data accordingly.
$ cat > cert.pem <<EOF
-----BEGIN CERTIFICATE-----
MIIGPjCCBCagAwIBAgIJAKMduaqpCYfYMA0GCSqGSIb3DQEBCwUAMIGrMQswCQYD
VQQGEwJVUzETMBEGA1UECAwKQ2FsaWZvcm5pYTESMBAGA1UEBwwJUGFsbyBBbHRv
MQ8wDQYDVQQKDAZWTXdhcmUxFjAUBgNVBAsMDU1BUEJVIFN1cHBvcnQxHTAbBgNV
BAMMFFN1cHBvcnQgTGFicyBSb290IENBMSswKQYJKoZIhvcNAQkBFhxwdnRsLXN1
TX4MyEfPygH8R/eh#################################5wUzJp0a+6SIG90
1WzZFzzhnb9891F+9BNSKJy7R/8uIYhFVsP565+IesUSpW+nAyBfteLJQKFbWJyx
T/EjLE36+yNcvXpox9DjA0D1EHCCYfR8aEwB2EbiOgDSF1WjP0VNMJt+Bg016gCe
mfKpuEZdVaJWJFl/9AmQn0ChLDoQ6GAtpziFuKtHdXoqgc6WJZWuUiwMWVaU08I0
34tgDtJ/PGwEsybbnX7dkIEAhT/oUB55stiQ8SB8/FwgkAUe8JGnxJITL82CO26/
J3S9Zf4F50HbrhncESiTXyXW
-----END CERTIFICATE-----
EOF
3. Create a file named
Dockerfile with the following contents:
ARG BASE_IMAGE=ubuntu:bionic
FROM $BASE_IMAGE as builder
RUN apt update && \
apt install --no-install-recommends -y wget zip unzip ca-certificates && \
update-ca-certificates && \
apt-get clean
# Install kubectl
ARG KUBECTL_SOURCE=https://storage.googleapis.com/kubernetes-release/release/v1.15.11/bin/linux/amd64/kubectl
RUN wget $KUBECTL_SOURCE && mv kubectl /usr/local/bin/kubectl && \
chmod +x /usr/local/bin/kubectl
COPY cert.pem /cert.pem
COPY fluent-bit-patch.json /fluent-bit-patch.json
COPY test.sh /test.sh
CMD ["/bin/bash", "test.sh"]
4. Create a file named
fluent-bit-patch.json with the following contents.
{
"spec":{
"template":{
"spec":{
"containers":[
{
"name":"fluent-bit",
"volumeMounts":[
{
"mountPath":"/etc/ssl/certs/cert.pem",
"subPath":"cert.pem",
"name":"ca-pemstore"
}
]
}
],
"volumes":[
{
"name":"ca-pemstore",
"configMap":{
"name":"ca-pemstore",
"defaultMode":420
}
}
]
}
}
}
}
5. Create a file named
test.sh with the following contents.
#!/bin/bash
set -x
timeout=60
while [ $timeout != 0 ]
do
kubectl rollout status deployment observability-manager -n pks-system --watch=true
if [ $? == 1 ]
then
((timeout--));
sleep 1;
else
break;
fi
done
if [ $timeout == 0 ]
then
echo "observability-manager is not started for 10s, deploy fluent-bit-patch fail!"
sleep infinity
fi
timeout=$TIMEOUT
while [ $timeout != 0 ]
do
kubectl rollout status daemonset fluent-bit -n pks-system --watch=true
if [ $? == 1 ]
then
((timeout--));
sleep 1;
else
break;
fi
done
if [ $timeout == 0 ]
then
echo "fluent-bit daemonset not started, deploy fluent-bit-patch fail!"
sleep infinity
fi
kubectl -n pks-system describe configmap ca-pemstore > ca.bk
if [ $? == 1 ]
then
kubectl -n pks-system create configmap ca-pemstore --from-file=cert.pem
kubectl -n pks-system patch ds fluent-bit --patch "$(cat fluent-bit-patch.json)"
else
kubectl -n pks-system patch ds fluent-bit --patch "$(cat fluent-bit-patch.json)"
kubectl -n pks-system delete configmap ca-pemstore --wait=true
kubectl -n pks-system create configmap ca-pemstore --from-file=cert.pem
kubectl -n pks-system describe configmap ca-pemstore > ca.now
#if ca change
bk_md5sum=`md5sum ca.bk | awk '{print $1}'`
now_md5sum=`md5sum ca.now | awk '{print $1}'`
if [ $bk_md5sum != $now_md5sum ]
then
kubectl rollout restart daemonset/fluent-bit -n pks-system
fi
fi
kubectl rollout status daemonset fluent-bit -n pks-system --watch=true
#sleep infinity
kubectl -n pks-system delete deployment persist-fluent-bit-patch
6. List all the files in the current directory and make sure you have the following files.
$ ls -l
-rw-rw-r-- 1 ubuntu ubuntu 2224 Jul 20 14:51 cert.pem
-rw-rw-r-- 1 ubuntu ubuntu 565 Jul 20 15:20 Dockerfile
-rw-rw-r-- 1 ubuntu ubuntu 693 Jul 20 15:20 fluent-bit-patch.json
-rw-rw-r-- 1 ubuntu ubuntu 1591 Jul 20 15:20 test.sh
7. Build the docker image using the '
docker build' command. Make sure to change the value given to the '
-t' flag. In this example, '
harbor.lab-xx.vmware.com/library/persist-fluent-bit-patch' is the repository path of the image that will be created and '
dev' is a tag.
$ docker build -f Dockerfile . -t harbor.lab-xx.vmware.com/library/persist-fluent-bit-patch:dev
8. Push the image to the registry using the '
docker push' command. The argument provided to this command is the same as the
<path>:<tag> value given to the '
-t' flag in the '
docker build' command in step 7.
$ docker push harbor.lab-xx.vmware.com/library/persist-fluent-bit-patch:dev
9. Prepare the add-on definition YAML that will be set in the TKGI Plan(s). Paste the following into a text editor and change the '
image' value. Make sure that the value is the same as the
<path>:<tag> value given to the '
-t' flag in the '
docker build' command in step 7.
apiVersion: apps/v1
kind: Deployment
metadata:
name: persist-fluent-bit-patch
namespace: pks-system
labels:
app: persist-fluent-bit-patch
spec:
replicas: 1
selector:
matchLabels:
app: persist-fluent-bit-patch
template:
metadata:
labels:
app: persist-fluent-bit-patch
spec:
serviceAccountName: observability-manager
containers:
- name: persist-fluent-bit-patch
image: harbor.lab-xx.vmware.com/library/persist-fluent-bit-patch:dev
imagePullPolicy: Always
env:
- name: TIMEOUT
value: "600"
10. Depending on whether you are using EPMC or Ops Manager, follow the instructions below:
EPMC
a. In the TKGI Configuration, complete any necessary settings in the Wizard page and then click "
GENERATE CONFIGURATION".
b. The TKGI Configuration YAML Editor page will appear and you can add the YAML data. Using the Editor, scroll down to the "
plans" section.
c. Within the plans section, go to the plan that you want to have this workaround. Within the specific plan, change the "
addons-spec" value to contain the YAML that was prepared in step 9.
d. Make sure that a "
|" (pipe) character and a new line precedes the YAML data, and that 4 space characters precede each line of the YAML data.
See the following screenshot as an example. This will need to be done in every plan where you want to persist this workaround.
Ops Manager
a. In the TKGI tile settings, go to the tab of the Plan where you want to persist this workaround. This is the plan being used by the clusters wherein you want this workaround.
b. In the field named "
(Optional) Add-ons - Use with caution", paste the YAML from step 9. Save the Plan settings. This will need to be done in every plan where you want to persist this workaround.
11. Click "
Apply Configuration" (in EPMC) or "
Apply Changes" (in Ops Manager), and upgrade all clusters that are using the changed plan(s).
Afterwards, the
fluent-bit pods in the updated clusters should now have the necessary CA, and should persist through the succeeding upgrades.