Rally - On-premises: Dashboard will not load with new installation or after being powered off for a length of time

Products

Rally On-Premise

Issue/Introduction

New appliances provisioned after June 5th, 2019 may fail to load the dashboard to begin the setup procedure
Appliances that have been powered off for an extended period of time may fail to start services to allow login to Rally or access to the dashboard

When trying to access the appliance via a web browser you may receive an error similar to the following examples under commonly used browsers.

Chrome:

This site can’t be reached

<YOUR_IP_ADDRESS> refused to connect.

Try:

- Checking the connection
- Checking the proxy and the firewall

ERR_CONNECTION_REFUSED

Firefox:

Unable to connect

Firefox can’t establish a connection to the server at <YOUR_IP_ADDRESS>:8800.

- The site could be temporarily unavailable or too busy. Try again in a few moments.
- If you are unable to load any pages, check your computer’s network connection.
- If your computer or network is protected by a firewall or proxy, make sure that Firefox is permitted to access the Web.

Internet Explorer:

Can’t reach this page

- Make sure the web address https://<YOUR_IP_ADDRESS>:8800 is correct
- Search for this site on Bing
- Refresh the page

Environment

Release: 2.0, 2.01

Cause

This is caused by the docker swarm not loading due to an SSL certificate expiration issue.

Resolution

In order to resolve this issue, the certificates must be refreshed. Refreshing the certificate involves setting the system date back to a time when the certificate was valid, starting the docker service and then setting the date back to current. Once this is done, a command to refresh the certificate can be given.

The commands below must be executed from the command line on the application server as the ops user. These commands will rollback the system date allowing the docker swarm to be restarted, reset the system date to the “real” date and renew the certificate. Once these five commands have been successfully executed, the installation from the main dashboard panel (<application VM IP address>:8800) can continue.

These commands must be executed one line at a time as they will fail if you try to run them all at once:

$ REAL_DATE=$(date +%Y%m%d)
$ sudo date +%Y%m%d -s "20190604"
$ sudo systemctl restart docker
$ sudo date +%Y%m%d -s "$REAL_DATE"
$ docker swarm ca --rotate

The above output indicates that the docker services have started correctly and you may begin using the device after all containers have initialized. If no error is displayed, there is no need to continue following this article.

In some situations, the above command will result in the following output and the web pages will still not be accessible:

Error response from daemon: This node is not a swarm manager. Use "docker swarm init" or "docker swarm join" to connect this node to swarm and try again.

This typically indicates that the certificate had been renewed at some time, and the date set in the commands above may be too far back in time so docker doesn't consider the certificate to be valid yet. This is typically seen more when the appliance had been operational and then powered off for a period of time. When this happens, you will need to determine the valid start date of the certificate. To do this, run the following command:

$ sudo systemctl status docker -l

Sample Output:

● docker.service - Docker Application Container Engine
Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
Active: active (running) since Tue 2019-06-04 00:00:17 MDT; 1min 18s ago
Docs: https://docs.docker.com
Main PID: 6510 (dockerd)
Tasks: 10
Memory: 35.3M
CGroup: /system.slice/docker.service
└─6510 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock

Jun 04 00:00:16 dhcp-192-0-2-100 dockerd[6510]: time="2019-06-04T00:00:16.588463952-06:00" level=warning msg="Your kernel does not support cgroup blkio weight_device"
Jun 04 00:00:16 dhcp-192-0-2-100 dockerd[6510]: time="2019-06-04T00:00:16.588817242-06:00" level=info msg="Loading containers: start."
Jun 04 00:00:17 dhcp-192-0-2-100 dockerd[6510]: time="2019-06-04T00:00:17.438744066-06:00" level=info msg="Default bridge (docker0) is assigned with an IP address 198.51.100.0/16. Daemon option --bip can be used to set a preferred IP address"
Jun 04 00:00:17 dhcp-192-0-2-100 dockerd[6510]: time="2019-06-04T00:00:17.694172165-06:00" level=info msg="Loading containers: done."
Jun 04 00:00:17 dhcp-192-0-2-100 dockerd[6510]: time="2019-06-04T00:00:17.721731922-06:00" level=info msg="Docker daemon" commit=e8ff056 graphdriver(s)=overlay2 version=18.09.5
Jun 04 00:00:17 dhcp-192-0-2-100 dockerd[6510]: time="2019-06-04T00:00:17.739777909-06:00" level=error msg="cluster exited with error: error while loading TLS certificate in /var/lib/docker/swarm/certificates/swarm-node.crt: certificate (1 - uu3zsh2qs34jkk60xasf0aorx) not valid before Tue, 30 Jul 2019 14:07:00 UTC, and it is currently Tue, 04 Jun 2019 00:00:17 MDT: x509: certificate has expired or is not yet valid"
Jun 04 00:00:17 dhcp-192-0-2-100 dockerd[6510]: time="2019-06-04T00:00:17.739825903-06:00" level=error msg="swarm component could not be started" error="error while loading TLS certificate in /var/lib/docker/swarm/certificates/swarm-node.crt: certificate (1 - uu3zsh2qs34jkk60xasf0aorx) not valid before Tue, 30 Jul 2019 14:07:00 UTC, and it is currently Tue, 04 Jun 2019 00:00:17 MDT: x509: certificate has expired or is not yet valid"
Jun 04 00:00:17 dhcp-192-0-2-100 dockerd[6510]: time="2019-06-04T00:00:17.739860347-06:00" level=info msg="Daemon has completed initialization"
Jun 04 00:00:17 dhcp-192-0-2-100 dockerd[6510]: time="2019-06-04T00:00:17.748871855-06:00" level=info msg="API listen on /var/run/docker.sock"
Jun 04 00:00:17 dhcp-192-0-2-100 systemd[1]: Started Docker Application Container Engine.

Note the date from your output which will resemble the highlighted output above. This is the date that the current docker certificate is valid for and your system date will need to be later than this in order for the docker swarm to initialize. Take this date and add 1 day to it. In the example above the date is July 30, so for this example you would set the system date to July 31 in the commands below.

Repeat the date commands but substitute the newly calculated date from your docker status command output in the following commands:

$ sudo date +%Y%m%d -s "20190731"
$ sudo systemctl restart docker
$ sudo date +%Y%m%d -s "$REAL_DATE"
$ docker swarm ca --rotate

At this point, the certificate should show the certificate rotation output and you should be able to log into the system after the containers have all initialized.

From this point forward the system will auto renew its certificates and the steps outlined below will not be needed again unless the same 2.0 system is shutdown and not restarted for a prolonged period of time (generally greater than 60 days).

Additional Information

On occasion this issue may occur after each reboot. In those situations, you should verify that the swarm IP address is correct.