APM 11.x,19.x requires Kubernetes as an external prerequisite prior to installation.
Kubernetes is a platform dependency that is external to the APM product, similar to an operating system.
The installation of Kubernetes itself is outside of the supported bounds of the product, and this document is purely an example provided for the convenience of administrators who wish to learn more about Kubernetes.
It is NEITHER a representation of production sizing NOR intended for production use
For more information about APM sizing refer to: http://techdocs.broadcom.com/content/broadcom/techdocs/us/en/ca-enterprise-software/it-operations-management/dx-platform-on-premise/dx-platform-on-premise/installing/sizing-recommendations.html
APM 11.x, 19.x
Assumptions
We have been provisioned with the below 4 new servers to install a small setup with APM 11.x, 19.x:
- Each server has CentOS 7.6 OS + 64 GB Mem + 16 CPUs + 500 GB HD
- Apart from the OS, there is nothing else installed.
- There are many ways to install kubernetes, in this example, we cover 1 possibility or use-case
- We will install all the required software and prerequisites step by step : NFS, firewall configuration, docker, kubernetes using kubespray, etc
- We will create a Private Docker Repository in the Master
- We will install NFS Server in the Master.
- We will install a Small DX platform setup that contains only 1 Elastic node
- We enable Operation Intelligence (OI) during the installation
- We will not use Secure routers and SMTP
Step # 1: Install Kubernetes Cluster
sudo/root on all machines
ssh access
Execute the below steps in all servers
a) Enable and start NetworkManager:
systemctl enable NetworkManager
systemctl start NetworkManager
systemctl is-enabled NetworkManager
systemctl status NetworkManager
b) (Recommended) Update OS installed package
yum update
c) Update /etc/hosts with the list of your servers (master and nodes)
Syntax:
<ip> <hostname> <fqdn>
In this example:
10.109.33.74 server1 server1.acme.com
10.109.33.75 server2 server2.acme.com
10.109.33.76 server3 server3.acme.com
10.109.33.77 server4 server4.acme.com
Install docker in all servers
Install a supported version from the OS Repo using : yum install -y docker
In this example we install community edition docker-ce-17.03.2:
sudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
sudo yum install --setopt=obsoletes=0 -y docker-ce-17.03.2.ce-1.el7.centos
sudo systemctl start docker && sudo systemctl enable docker && sudo chkconfig docker on && sudo systemctl status docker
NOTE: If you need to uninstall docker
a) (Optional) Remove all docker images, containers, volume, networks:
docker stop $(docker ps -a -q)
docker system prune -a -f
docker rm -vf $(docker ps -a -q)
docker rmi -f $(docker images -a -q)
b) Uninstall docker:
sudo yum remove docker \
docker-client \
docker-client-latest \
docker-common \
docker-latest \
docker-latest-logrotate \
docker-logrotate \
docker-selinux \
docker-engine-selinux \
docker-engine \
docker-ce \
docker-ce-cli
c)
rm -rf /var/lib/docker/
If you don’t need firewall
a)stop firewalld service:
systemctl stop firewalld && systemctl disable firewalld
b)empty all rules in iptables service:
sudo iptables -t filter -F
sudo iptables -t filter -X
systemctl restart docker
If you need firewall:
a) Configure firewalld
systemctl enable firewalld && systemctl start firewalld
b) open ports as below:
On Master:
### For Kubernetes ###
firewall-cmd --permanent --add-port=6443/tcp
firewall-cmd --permanent --add-port=2379-2380/tcp
firewall-cmd --permanent --add-port=10250-10255/tcp
### For Calico CNI ###
firewall-cmd --permanent --add-port=179/tcp
firewall-cmd --permanent --add-port=5473/tcp
firewall-cmd --permanent --add-port=4789/udp
### For ingress-controller ###
firewall-cmd --permanent --add-port=80/tcp
firewall-cmd --permanent --add-port=443/tcp
### For Docker registry ###
firewall-cmd --permanent --add-port=5000/tcp
### For NFS Server ###
firewall-cmd --permanent --add-port=11/tcp
firewall-cmd --permanent --add-port=2049/tcp
firewall-cmd --permanent --add-port=20048/tcp
firewall-cmd --reload
firewall-cmd --list-ports
On Nodes:
### For Kubernetes ###
firewall-cmd --permanent --add-port=10250/tcp
firewall-cmd --permanent --add-port=10255/tcp
firewall-cmd --permanent --add-port=30000-32767/tcp
firewall-cmd --permanent --add-port=6783/tcp
### For Calico CNI ###
firewall-cmd --permanent --add-port=179/tcp
firewall-cmd --permanent --add-port=5473/tcp
firewall-cmd --permanent --add-port=4789/udp
firewall-cmd --reload
firewall-cmd --list-ports
In this example, we use Master server as the NFS server
On Master
systemctl enable nfs-server && systemctl start nfs-server && systemctl status nfs-server
On nodes / workers
yum install nfs-utils
This step is required for Ansible
On Master:
ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): <press enter here>
/root/.ssh/id_rsa already exists.
Overwrite (y/n)? y
Enter passphrase (empty for no passphrase): <press enter here>
Enter same passphrase again: <press enter here>
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:GIfiUSAq9/sY/GKdNv444IxTS3d5ZNXcuyMNQ5I5JAk [email protected]
The key's randomart image is:
+---[RSA 2048]----+
| . ...E..o.oo . |
| . . . . ..=..o .|
|o . o o . .+ .|
|.. o o + o o . |
| o . S+ + .|
| .+.. o . . + |
| *+= o . . .|
| o *=*. |
| o.+=+. |
+----[SHA256]-----+
Copy the generated public key to all the nodes
Syntax:
for host in host1 host2 host3 ... hostx; do ssh-copy-id -i ~/.ssh/id_rsa.pub $host; done
In this example:
for host in server1.acme.com server2.acme.com server3.acme.com server4.acme.com ; do ssh-copy-id -i ~/.ssh/id_rsa.pub $host; done
The authenticity of host 'server1.acme.com (10.109.33.74)' can't be established.
ECDSA key fingerprint is SHA256:sthTmmg5bk2Qd1EpJsaESh5QloLO6dQ0ik7ZgmOCWw0.
ECDSA key fingerprint is MD5:71:a4:78:84:0b:79:6c:21:85:71:06:4f:e3:81:5d:6b.
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
[email protected]'s password: <enter the root password for each server>
Number of key(s) added: 1
Verify that you can login to each server node without entering the root’s password, then exit.
For example:
exit
This step is required for Kubernetes
Check SELinux status:
sestatus
SELinux status: enabled
SELinuxfs mount: /sys/fs/selinux
SELinux root directory: /etc/selinux
Loaded policy name: targeted
Current mode: permissive
Mode from config file: permissive
Policy MLS status: enabled
Policy deny_unknown status: allowed
Max kernel policy version: 31
If disabled continue to the next step
If enabled, run below commands on each server
mkdir -p /etc/kubernetes/
chcon -R -t svirt_sandbox_file_t /etc/kubernetes/
mkdir -p /var/lib/etcd
chcon -R -t svirt_sandbox_file_t /var/lib/etcd
On Master:
mkdir /root/registry_certs
cd /root
openssl req -newkey rsa:4096 -nodes -sha256 -keyout registry_certs/domain.key \
-x509 -days 1095 -out registry_certs/domain.crt
Generating a 4096 bit RSA private key
...........................................................++
........................................................................................................++
writing new private key to 'registry_certs/domain.key'
-----
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [XX]:
State or Province Name (full name) []:
Locality Name (eg, city) [Default City]:
Organization Name (eg, company) [Default Company Ltd]:
Organizational Unit Name (eg, section) []:
Common Name (eg, your name or your server's hostname) []:server1.acme.com
Email Address []:
ls /root/registry_certs/*
/root/registry_certs/domain.crt /root/registry_certs/domain.key
mkdir /root/registrydata
docker run -d -p 5000:5000 \
-e REGISTRY_STORAGE_FILESYSTEM_ROOTDIRECTORY=/var/lib/registry \
-v /root/registrydata:/var/lib/registry:Z \
-v /root/registry_certs:/certs \
-e REGISTRY_HTTP_TLS_CERTIFICATE=/certs/domain.crt \
-e REGISTRY_HTTP_TLS_KEY=/certs/domain.key \
--restart=always \
--security-opt label:disable \
--name registry registry:2
Unable to find image 'registry:2' locally
2: Pulling from library/registry
c87736221ed0: Pull complete
1cc8e0bb44df: Pull complete
54d33bcb37f5: Pull complete
e8afc091c171: Pull complete
b4541f6d3db6: Pull complete
Digest: sha256:8004747f1e8cd820a148fb7499d71a76d45ff66bac6a29129bfdbfdc0154d146
Status: Downloaded newer image for registry:2
7d189672495eaf0ee19ec69c856934094326735c1fad5c83cac0c2717e5dba8d
Configure docker services on each server with new Registry:
Copy domain.key to each server
Syntax:
target="/etc/docker/certs.d/<your-private-registry-hostname>:5000"
source="/root/registry_certs/domain.crt"
for host in host1, host2 … hostx; do ssh "$host" "mkdir -p $target" && scp "$source" "$host:$target"; done
In this example:
target="/etc/docker/certs.d/server1.acme.com:5000"
source="/root/registry_certs/domain.crt"
for host in server1.acme.com server2.acme.com server3.acme.com server4.acme.com; do ssh "$host" "mkdir -p $target" && scp "$source" "$host:$target"; done
On Master and Nodes:
sudo service docker reload
Go to one of the server node and verify that you can login to the docker private registry:
Syntax:
docker login <your-private-registry-hostname>:5000 -u admin -p admin
In this example:
docker login server1.acme.com:5000 -u admin -p admin
Login Succeeded
On Master ONLY
a) Login to Broadcom Support Portal, then download the below package: https://ftpdocs.broadcom.com/phpdocs/7/caapm/k8s-offline-dependencies-v2_6_0.tar.gz
Transfer the file to a location in the Master server, in this example: /root
b) Create a directory named “k8s-offline-dependencies”:
mkdir -p /root/k8s-offline-dependencies
c) Extract the tar file.
tar -xvf k8s-offline-dependencies-v2_6_0.tar.gz -C /root/k8s-offline-dependencies
8s-offline-dependencies
images.json
images.tar.gz
d) Navigate to the k8s-offline-dependencies directory.
cd /root/k8s-offline-dependencies
e) Run the following command:
cat images.tar.gz | docker load
03901b4a2ea8: Loading layer 5.844 MB/5.844 MB
769ff15c5877: Loading layer 318.4 MB/318.4 MB
735a9d4ccace: Loading layer 3.474 MB/3.474 MB
…
…
...
a03d7e02b0d4: Loading layer 3.584 kB/3.584 kB
93948accb38b: Loading layer 6.799 MB/6.799 MB
63155f02c4cf: Loading layer 8.704 kB/8.704 kB
Loaded image: gcr.io/google_containers/kube-registry-proxy:0.4
On Master ONLY
The Kubespray Container folder will be mounted to /tmp/docker-volume-kubespray.
Any changes done here will be reflected in the next running container (not the image).
mkdir -p /tmp/docker-volume-kubespray
docker volume create --opt type=none --opt device=/tmp/docker-volume-kubespray --opt o=bind kubespray
On Master ONLY
a) Generate the ansible hosts.ini file with the list of your servers or IPs in your cluster.
Syntax:
docker run --privileged=true --rm -it -v /root/.ssh/:/root/.ssh/ -v kubespray:/root/kubespray-offline --net=host --name kubespray docker.io/karthik101/kubespray-alpine:v2 /bin/bash -c 'declare -a IPS=(<master-ip> <node1-ip> <node2-ip> <node3-ip> <node4-ip>); CONFIG_FILE=inventory/mycluster/hosts.ini /usr/bin/python3 contrib/inventory_builder/inventory.py ${IPS[@]}'
In this example:
docker run --privileged=true --rm -it -v /root/.ssh/:/root/.ssh/ -v kubespray:/root/kubespray-offline --net=host --name kubespray docker.io/karthik101/kubespray-alpine:v2 /bin/bash -c 'declare -a IPS=(10.109.33.74 10.109.33.75 10.109.33.76 10.109.33.77); CONFIG_FILE=inventory/mycluster/hosts.ini /usr/bin/python3 contrib/inventory_builder/inventory.py ${IPS[@]}'
DEBUG: Adding group all
DEBUG: Adding group kube-master
DEBUG: Adding group kube-node
DEBUG: Adding group etcd
DEBUG: Adding group k8s-cluster:children
DEBUG: Adding group calico-rr
DEBUG: Adding group vault
DEBUG: adding host node1 to group all
DEBUG: adding host node2 to group all
DEBUG: adding host node3 to group all
DEBUG: adding host node4 to group all
DEBUG: adding host kube-node to group k8s-cluster:children
DEBUG: adding host kube-master to group k8s-cluster:children
DEBUG: adding host node1 to group etcd
DEBUG: adding host node1 to group vault
DEBUG: adding host node2 to group etcd
DEBUG: adding host node2 to group vault
DEBUG: adding host node3 to group etcd
DEBUG: adding host node3 to group vault
DEBUG: adding host node1 to group kube-master
DEBUG: adding host node2 to group kube-master
DEBUG: adding host node1 to group kube-node
DEBUG: adding host node2 to group kube-node
DEBUG: adding host node3 to group kube-node
DEBUG: adding host node4 to group kube-node
b) Update the generated ansible hosts.ini as below:
- Configure only 1 master and 1 etcd node in the [kube-master] and [etcd] sections respectively.
- In production environment, you remove node1 from the [kube-node] section to disable scheduling of pods on master, As this is a test environment, we keep node1 as a node
vi /tmp/docker-volume-kubespray/inventory/mycluster/hosts.ini
[all]
node1 ansible_host=10.109.33.74 ip=10.109.33.74
node2 ansible_host=10.109.33.75 ip=10.109.33.75
node3 ansible_host=10.109.33.76 ip=10.109.33.76
node4 ansible_host=10.109.33.77 ip=10.109.33.77
[kube-master]
node1
[kube-node]
node1
node2
node3
node4
[etcd]
node1
[k8s-cluster:children]
kube-node
kube-master
vi /tmp/docker-volume-kubespray/inventory/mycluster/group_vars/k8s-cluster.yml
Set:
registry_enabled: false
On Master ONLY
a) Update Kubespray configuration with the location of images.tar.gz
vi /tmp/docker-volume-kubespray/inventory/mycluster/group_vars/k8s-cluster.yml
Update “local_release_dir_offline” property with the directory containing the images.tar.gz:
local_release_dir_offline: "/root/offline_image_bundle/"
In this example:
local_release_dir_offline: "/root/k8s-offline-dependencies/"
b) Update find patters:
vi /tmp/docker-volume-kubespray/roles/upload-images/tasks/main.yml
Locate line:
find: paths="{{local_release_dir_offline}}" patterns="k8s-offline-*.tar"
Update pattern value as below:
find: paths="{{local_release_dir_offline}}" patterns="*.tar.gz"
Locate line:
find: paths="{{offline_images}}" patterns="k8s-offline-*.tar"
Update pattern value as below:
find: paths="{{offline_images}}" patterns="*.tar"
c) Execute the pre-req ansible playbook to copy the “images.tar.gz” to all the server:
docker run --privileged=true --rm -it -v /root/.ssh/:/root/.ssh/ -v kubespray:/root/kubespray-offline --net=host --name kubespray docker.io/karthik101/kubespray-alpine:v2 /bin/bash -c "ansible-playbook -vvv -i inventory/mycluster/hosts.ini pre-req.yml --flush-cache --tags=upload-images"
PLAY RECAP *********************************************************************
node1 : ok=6 changed=3 unreachable=0 failed=0
node2 : ok=6 changed=3 unreachable=0 failed=0
node3 : ok=6 changed=3 unreachable=0 failed=0
node4 : ok=6 changed=3 unreachable=0 failed=0
Tuesday 24 September 2019 17:13:20 +0000 (0:01:28.805) 0:02:18.199 *****
===============================================================================
upload-images : upload_images | Load container images to docker registry -- 88.81s
/root/kubespray-offline/roles/upload-images/tasks/main.yml:36 -----------------
upload-images : upload_images | Ansible copy image tar to each node ---- 43.88s
/root/kubespray-offline/roles/upload-images/tasks/main.yml:15 -----------------
Gathering Facts --------------------------------------------------------- 2.77s
/root/kubespray-offline/pre-req.yml:1 -----------------------------------------
upload-images : upload_images | Find k8s image tar package -------------- 0.87s
/root/kubespray-offline/roles/upload-images/tasks/main.yml:10 -----------------
upload-images : upload_images | Find k8s image tar package -------------- 0.68s
/root/kubespray-offline/roles/upload-images/tasks/main.yml:25 -----------------
upload-images : upload_images | Create dest directory for saved/loaded container images --- 0.66s
/root/kubespray-offline/roles/upload-images/tasks/main.yml:2 ------------------
upload-images : upload_images | Extract archive ------------------------- 0.36s
/root/kubespray-offline/roles/upload-images/tasks/main.yml:29 -----------------
[[email protected] /]#
On Master ONLY
Execute the cluster ansible playbook to install the kubernetes cluster:
docker run --privileged=true --rm -it -v /root/.ssh/:/root/.ssh/ -v kubespray:/root/kubespray-offline --net=host --name kubespray docker.io/karthik101/kubespray-alpine:v2 /bin/bash -c "ansible-playbook -v -i inventory/mycluster/hosts.ini cluster.yml --flush-cache -b --extra-vars "override_system_hostname=false""
PLAY RECAP *********************************************************************
localhost : ok=2 changed=0 unreachable=0 failed=0
node1 : ok=322 changed=72 unreachable=0 failed=0
node2 : ok=262 changed=49 unreachable=0 failed=0
node3 : ok=228 changed=37 unreachable=0 failed=0
node4 : ok=228 changed=37 unreachable=0 failed=0
Tuesday 24 September 2019 17:23:00 +0000 (0:00:00.025) 0:08:20.474 *****
===============================================================================
kubernetes/master : Master | wait for the apiserver to be running ------ 21.24s
etcd : wait for etcd up ------------------------------------------------- 9.25s
gather facts from all instances ----------------------------------------- 7.90s
kubernetes-apps/ingress_controller/ingress_nginx : NGINX Ingress Controller | Apply manifests --- 6.95s
kubernetes-apps/ingress_controller/ingress_nginx : NGINX Ingress Controller | Create manifests --- 6.64s
download : container_download | Download containers if pull is required or told to always pull (all nodes) --- 6.39s
etcd : Configure | Check if etcd cluster is healthy --------------------- 5.78s
kubernetes/master : Master | wait for kube-scheduler -------------------- 5.64s
kubernetes-apps/ansible : Kubernetes Apps | Lay Down KubeDNS Template --- 4.23s
download : container_download | Download containers if pull is required or told to always pull (all nodes) --- 3.91s
download : container_download | Download containers if pull is required or told to always pull (all nodes) --- 3.78s
download : container_download | Download containers if pull is required or told to always pull (all nodes) --- 3.77s
download : container_download | Download containers if pull is required or told to always pull (all nodes) --- 3.75s
download : container_download | Download containers if pull is required or told to always pull (all nodes) --- 3.71s
download : container_download | Download containers if pull is required or told to always pull (all nodes) --- 3.70s
kubernetes-apps/ansible : Kubernetes Apps | Start Resources ------------- 3.69s
network_plugin/calico : Calico | Create calico manifests ---------------- 3.35s
download : Download items ----------------------------------------------- 2.92s
Gathering Facts --------------------------------------------------------- 2.74s
kubernetes-apps/network_plugin/calico : Start Calico resources ---------- 2.54s
[[email protected] /]#
IMPORTANT:
Make sure to review the PLAY RECAP, if you find a failure (failed=1):
1. Verify that the cluster is up and running
On Master
kubectl cluster-info
Kubernetes master is running at https://10.109.33.74:6443
KubeDNS is running at https://10.109.33.74:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
kubernetes-dashboard is running at https://10.109.33.74:6443/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy
KubeRegistry is running at https://10.109.33.74:6443/api/v1/namespaces/kube-system/services/registry:registry/proxy
kubectl get nodes
NAME STATUS ROLES AGE VERSION
node1 Ready master,node 6m v1.10.11
node2 Ready node 6m v1.10.11
node3 Ready node 6m v1.10.11
node4 Ready node 6m v1.10.11
node5 Ready node 6m v1.10.11
On Master
-create the service account:
kubectl create serviceaccount cluster-admin-dashboard-sa
-Run the following command to create a cluster role binding:
kubectl create clusterrolebinding cluster-admin-dashboard-sa --clusterrole=cluster-admin --serviceaccount=default:cluster-admin-dashboard-sa
-Generate the token, you will need it to login to the dashboard Kubernetes web admin UI.
TOKEN=$(kubectl describe secret $(kubectl -n kube-system get secret | awk '/^cluster-admin-dashboard-sa-token-/{print $1}') | awk '$1=="token:"{print $2}') && echo $TOKEN
For example:
eyJhbGciOiJSUzI1NiIsImtpZCI6IiJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJkZWZhdWx0Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZWNyZXQubmFtZSI6ImNsdXN0ZXItYWRtaW4tZGFzaGJvYXJkLXNhLXRva2VuLXg3d3d3Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6ImNsdXN0ZXItYWRtaW4tZGFzaGJvYXJkLXNhIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiOTljNDcwYzktZGY5Yi0xMWU5LWI1ZDgtZmExNjNlNzUzMjJhIiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50OmRlZmF1bHQ6Y2x1c3Rlci1hZG1pbi1kYXNoYm9hcmQtc2EifQ.qujUsGlX-XXxqqZVkno4kC2I7-SbRburNmrDMkXpiW0jumWyORrrqKoOHOdbXC60AhPRhgIp4BmGBdPRDRE_rlJrmhp3I4QiUjsVwflVnh5f7Yc6RusEkfcO4XUbJFZcPJyXrDkvMz68JWK_f59vkNgeymvHR260tpeaTcosf5W-Axl9sgdgy1BkoA97zJhZJ-A1GKbPQyuYOEck05wYc75DbP7hY8UXTSb1-YuxWJgOYEAWTvJ5z677lw5G70lGVRdhjlRd_jPVOGv8p2qODKmjH47EQgL4fYHbsWx7uP5-LxY9l8-Ll9hnNx2n2UKQm-Hjrr5yUw2tn280Q7CaKQ eyJhbGciOiJSUzI1NiIsImtpZCI6IiJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJkZWZhdWx0Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZWNyZXQubmFtZSI6ImRlZmF1bHQtdG9rZW4tZ2Y0MmwiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoiZGVmYXVsdCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6ImFjNzcyNmYxLWRmOTItMTFlOS1iNWQ4LWZhMTYzZTc1MzIyYSIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDpkZWZhdWx0OmRlZmF1bHQifQ.ATsgL0f2JHnQrSi80UIfVwGX0KH17ugVwiFl-b9PofxTWwts3jBA0Xo7qcwiUijhEZp8oVyYfrKmsqOoqKCgc7IpghwVihilNRRGs0i_ahp5iCpahgybTmp0iduXpMZhCoJalb96KPjpcw6oZQgYsx24g0SVRc8t-vKqYMNulgWUc2s4Ft5tHgsyI2FgCQFrpFDxgOsUUQROmUZPK7QHfhmUsIFHdaqdVcAK_rdztZFcl3-Lz5dd--2ilyXFO5BL6bAl7p4oUDxcxfxlMbRsD4PHw_KmAB5NhllNSHod2A3Qa8p2vwKsxmbaFtWYPRHiaZejgddRiWGegmC-L5knzg
-Go to the dashboard URL, for example:
https://10.109.33.74:6443/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy
Select the Token option, enter the token you created in previous step.
a) Make sure to apply all the “Server Prerequisites” in the new server(s)
b) Update the Ansible hosts file with the new server configuration details as below example:
On Master:
vi /tmp/docker-volume-kubespray/inventory/mycluster/hosts.ini
[all]
node1 ansible_host=10.109.33.74 ip=10.109.33.74
node2 ansible_host=10.109.33.75 ip=10.109.33.75
node3 ansible_host=10.109.33.76 ip=10.109.33.76
node4 ansible_host=10.109.33.77 ip=10.109.33.77
node5 ansible_host=10.109.33.78 ip=10.109.33.78
[kube-master]
node1
[kube-node]
node1
node2
node3
node4
node5
[etcd]
node1
[k8s-cluster:children]
kube-node
kube-master
Execute the scale ansible playbook to add a new server to the kubernetes cluster:
docker run --privileged=true --rm -it -v /root/.ssh/:/root/.ssh/ -v kubespray:/root/kubespray-offline --net=host --name kubespray docker.io/karthik101/kubespray-alpine:v2 /bin/bash -c "ansible-playbook -v -i inventory/mycluster/hosts.ini scale.yml --flush-cache -b"
NOTE: If you are interested in high availability, it is recommended to configure your kubernetes cluster with 3 server master and 3 etcd nodes, see "(Optional) Add a new Node" section
Running 3 masters also means that your cluster runs 3 ingress controllers, one on each master node. Change your wildcard DNS to round robin to all master node's IP addresses or use load balancer to do the same.
If you need to reinstall, here is the command to uninstall kubernetes
On Master ONLY
docker run --privileged=true --rm -it -v /root/.ssh/:/root/.ssh/ -v kubespray:/root/kubespray-offline --net=host --name kubespray docker.io/karthik101/kubespray-alpine:v2 /bin/bash -c "ansible-playbook -v -i inventory/mycluster/hosts.ini reset.yml --flush-cache -b"
rm -rf /tmp/docker-volume-kubespray
rm -rf /root/k8s-offline-dependencies/
In all servers:
rm -rf /tmp/offline_images/
(Optional) Remove all docker images, containers, volume, networks:
docker stop $(docker ps -a -q)
docker system prune -a -f
docker rm -v $(docker ps -a -q -f status=exited)
docker rmi -f $(docker images -f "dangling=true" -q)
docker volume ls -qf dangling=true | xargs -r docker volume rm
On Master Only:
rm -rf /root/registrydata
Step # 2: Install APM 11.1
On master and nodes:
mkdir -p /nfs/ca/dxi
On Master only:
vi /etc/exports
/nfs/ca/dxi server1.acme.com(rw,sync,no_root_squash,no_all_squash)
/nfs/ca/dxi server2.acme.com(rw,sync,no_root_squash,no_all_squash)
/nfs/ca/dxi server3.acme.com(rw,sync,no_root_squash,no_all_squash)
/nfs/ca/dxi server4.acme.com(rw,sync,no_root_squash,no_all_squash)
Run the following command to export:
exportfs -ra
systemctl restart nfs
Verify:
showmount -e server1.acme.com
On Master:
Copy:
- kubernetes config file to <DX installer root>
- your own certificates to <DX installer root>/.kube directory
In this example:
mkdir /dx-installer
cp /root/.kube/config /dx-installer/
In this example, we install a small cluster so we configure and install only 1 Elasticsearch node
a) Label the target Elastic server (server2):
On Master:
kubectl label nodes node2 dxi-es-node=master-data-1
kubectl get nodes --show-labels
NAME STATUS ROLES AGE VERSION LABELS
node1 Ready master 23m v1.10.11 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=node1,node-role.kubernetes.io/master=true
node2 Ready node 23m v1.10.11 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,dxi-es-node=master-data-1,kubernetes.io/hostname=node2,node-role.kubernetes.io/node=true
node3 Ready node 23m v1.10.11 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=node3,node-role.kubernetes.io/node=true
node4 Ready node 23m v1.10.11 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/hostname=node4,node-role.kubernetes.io/node=true
b) Modify the max_map_count Parameter for Elasticsearch
On Node#2 (server2.acme.com):
cat /proc/sys/vm/max_map_count
-If the map count is not 262144, update /etc/sysctl.conf, add:
vm.max_map_count=262144
-Run the following command to apply the changes without restarting the node.
sysctl -q -w vm.max_map_count=262144
c) Increase the Max Open Files Limit to 65536
On Node#2 (server2.acme.com):
cat /proc/sys/fs/file-max
-If the limit is lower than 65536, open /etc/sysctl.conf and add the following line at the end of file
fs.file-max=65536
- Run the following command to apply the sysctl limits:
sysctl -p
- Add the following lines to the /etc/security/limits.conf file:
* soft nproc 65536
* hard nproc 65536
* soft nofile 65536
* hard nofile 65536
IMPORTANT: Log out and log back in for the changes to take effect.
Verify changes made in /etc/security/limits.conf, run the following commands:
ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 256614
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 65536
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 65536
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Download DXCommonPlatformInstaller-x.x.x.x-online.tar.gz
Copy the installer package to a target location in the master, for example : /dx-installer
Extract the package, for example:
tar xvf DXCommonPlatformInstaller-x.x.x.x-online.tar.gz
post_install/1.reset.masteradmin.password.sh
post_install/2.create.ga.account.sh
post_install/3.apm.onboard.sh
nfs_checks/nfs_checker.sh
nfs_checks/dxicc-nfs-pv.yml
…
…
...
files/meta/plain/nextgen/base/core-pvc.main.yml
files/meta/plain/nextgen/base/core-sa.yml
files/meta/plain/nextgen/meta.yml
files/images.json
bin/kubectl
./install.sh --ignore-errors
Welcome to the DX Platform installer.
DO YOU ACCEPT THE TERMS OF THIS LICENSE AGREEMENT? (Y/N) [Y]: Y
License Agreement in other languages is located in the installer directory. (Enter to proceed):
Installation type:
* 1: Kubernetes
Enter your choice:
Registry installation:
* 1: install
2: use_existing
Enter your choice: 2
Specify the registry URL: [localhost:5000]: server1.acme.com:5000
[registry-validate] [ OK ] Registry has been validated successfully
Specify the namespace of DXI [dxi]: <press enter>
Master IP: 10.109.33.74
[nginx-version] [ OK ] Nginx ingress controller version has been verified successfully
Specify the Loadbalancer Hostname (or <ingressControllerIP>.nip.io) [10.109.33.74.nip.io]: <press enter>
Do you want secure routes (Y/N) [N]: <press enter>
Specify the Name of NFS Server IP/Host [10.109.33.74]: <press enter>
Specify the NFS Folder [/nfs/ca/dxi]: <press enter>
[node-info] [ INFO ] Node node1 is unschedulable
[nfs-validate] [ OK ] Node: node4
[nfs-validate] [ OK ] Node: node2
[nfs-validate] [ OK ] Node: node3
[nfs-validate] [ OK ] NFS storage validated successfully on all nodes
[node-info] [ INFO ] Node node1 is unschedulable
[nfs-nonempty-validate] [ INFO ] NFS Directory seems to be empty
Specify the size of the Elasticsearch:
* 1: Small Installation: One Elasticsearch Hot node is installed.
2: Medium Installation: Three Elasticsearch Hot nodes are installed
Enter your choice:<press enter to select Small installation>
Do you want to enable OI? (Y/N) [N]: Y
[ | ] Validating Nodes for ElasticSearch Prerequisites...
[es-validate] [ OK ] Node: node2 [Max Open Files Limit: 6512585, max_map_count: 262144]
[es-validate] [ OK ] ElasticSearch prerequisites validated successfully on all nodes
Required hardware:
---------------------------------------------------------
Elasticsearch 1 - 1 CPU | 13 GB RAM
Application Services - 18 CPU | 181 GB RAM
---------------------------------------------------------
Available hardware:
[ PASSED ] Elasticsearch 1 - 14 CPU | 60 GB RAM (node2)
[ PASSED ] Application services - 58 CPU | 231 GB RAM
---------------------------------------------------------
Please enter your SMTP server details.You can press ENTER to skip the SMTP details.
Specify the SMTP Service URL("smtp|smtps://<host>:<port>"): <press enter>
Do you want to use Symbolicator for AXA? Symbolicator enables you to view detailed crash information of iOS applications. You need an MacOS host to enable this.
For more information, see the CA App Experience Analytics documentation. (Y/N): N
Enter the password for the masteradmin
The password must be 6 - 25 characters long. The password must be a combination of all the following characters:
at least one uppercase character
at least one lowercase character
at least one number, and
at least one special character(!, @, #, $, %, ^, &, *, _, +): <enter your password>
Re-enter the password for the masteradmin: <enter your password>
The postgres database password has been stored as Base64 encoded value in dxi.input.vars.yml file.
Successfully written properties into /root/./dxi.input.vars.yml file.
Console user input collection completed successfully
Handling registry setup.
[registry] Non-installation mode.
Pulling docker images.
[descriptor] resolved into: files/images.json
[oerth-scx.ca.com:4443/apmservices/at:11.1.3.17-linux-amd64] Pulling image.
Pulling docker images.
…
…
...
[create] [ OK ] Deployment selfmonitoring-filebeat
[create] [ OK ] Deployment selfmonitoring-kibana
[create] [ OK ] Deployment selfmonitoring-kube-state-metrics
[dxi-platform-init] [ OK ] Check log in 10.109.33.74:/nfs/ca/dxi/dxi-platform-init-pod/dxi-platform-init-pod.log before login to DXI Manager
Installation succeeded
To create your first tenant, open this URL: http://apmservices-gateway.10.109.33.74.nip.io/dxiportal. Log in using following credentials: "masteradmin" tenant and "masteradmin" user. Use the password that you entered as a masteradmin password during the installation
-kubectl get pods -n dxi
Check that all pods are running
-Access the Admin console. In this example:
http://apmservices-gateway.10.109.33.74.nip.io/dxiportal
Login to admin console
Select, “Tenant Services” option to create a new tenant
Click “Add Tenant”, enter the details for your new apm cluster or tenant.
Tenant = standalone
admin/admin
Click “Create”
In this example, kubernetes namepace is “dxi”
kubectl delete all -n dxi --all --force --grace-period=0
kubectl delete pvc --all -n dxi
kubectl delete ns dxi
kubectl delete persistentvolume/dxi
kubectl delete persistentvolume/dxi-backup
kubectl delete persistentvolume/pv.dxi-axaservices-amq-data
kubectl delete persistentvolume/pv.dxi-axaservices-pg-data
kubectl delete persistentvolume/pv.dxi-jaf-clientnodemanager-0
kubectl delete persistentvolume/pv.dxi-jaf-clientnodemanager-1
kubectl delete persistentvolume/pv.dxi-jaf-clientnodemanager-2
kubectl delete persistentvolume/pv.dxi-jaf-clientnodemanager-3
rm -rf /dxi/jarvis
rm -rf /dxi/jaf
rm -rf /nfs/ca/dxi/*
Locate postgres pod, get shell to running container, then run sql statements to unlock the account:
kubectl get pods -n <your-namespace> | grep postgres
kubectl exec -it <posgres-pod> -n <your-namespace> bash
psql -U aopuser -d aoplatform
SELECT strikecount FROM aradminbasicauthuser WHERE userid = 'MASTERADMIN';
UPDATE aradminbasicauthuser SET strikecount = 0 WHERE userid = 'MASTERADMIN';
kubectl delete pods --all -n <namespace> --force --grace-period=0:
https://techdocs.broadcom.com/content/broadcom/techdocs/us/en/ca-enterprise-software/it-operations-management/dx-platform-on-premise/1-0/installing/set-up-the-environment.html
http://techdocs.broadcom.com/content/broadcom/techdocs/us/en/ca-enterprise-software/it-operations-management/dx-platform-on-premise/dx-platform-on-premise/installing/sizing-recommendations.html