GenAI/AI Services installation fails in Tanzu Platform
search cancel

GenAI/AI Services installation fails in Tanzu Platform

book

Article ID: 423071

calendar_today

Updated On:

Products

VMware Tanzu Platform Core

Issue/Introduction

GenAI installation fails with the error connecting to the Postgres DB and you would see ai-server job failing on the controller VM with a similar error as below:

cat ai-server.stdout.log | grep ERROR  | grep 'request timed out' | head -1
10:49:50.660 [tomcat-handler-34] ERROR o.h.e.jdbc.spi.SqlExceptionHelper - HikariPool-1 - Connection is not available, request timed out after 30000ms (total=0, active=0, idle=0, waiting=0)

cat ai-server.stdout.log | grep ERROR  | grep 'request timed out' | tail -1
13:16:46.560 [tomcat-handler-278] ERROR o.h.e.jdbc.spi.SqlExceptionHelper - HikariPool-1 - Connection is not available, request timed out after 30000ms (total=0, active=0, idle=0, waiting=0)

 

 

Cause

This can happen if there is a connectivity issue to the Postgres DB from GenAI deployment network or if there is an issue with DNS resolution of the Postgres hostname from the controller VM, below steps can be performed to troubleshoot and scope this issue.

1) Run cf cli and target genai space and run the below command

cf services 

Take the SI name and then fetch the service key using the below command

cf service-key <SI-name> <SI-Key>

Make sure to replace the SI name and Key as per your environment.

2) You will get the DB credentials like similar output 

{
  "credentials": {
    "db": "postgres",
    "hosts": [
      "q-s0.postgres-instance.infra.service-instance-04d4ba53-7740-40a7-b402-1a4bb41da825.bosh"
    ],
    "jdbcUrl": "jdbc:postgresql://q-s0.postgres-instance.infra.service-instance-04d4ba53-7740-40a7-b402-1a4bb41da825.bosh:5432/postgres?user=pgadmin&password=5zRHRAX0QJDsLJGawOhu80vR3fTnAs",
    "password": "5zRHRAX0QJDsLJGawOhu80vR3fTnAs",
    "port": 5432,
    "primary_host": "04d4ba53-7740-40a7-b402-1a4bb41da825.postgres.service.internal",
    "uri": "postgresql://pgadmin:5zRHRAX0QJDsLJGawOhu80vR3fTnAs@q-s0.postgres-instance.infra.service-instance-04d4ba53-7740-40a7-b402-1a4bb41da825.bosh:5432/postgres",
    "user": "pgadmin"
  }
}

3) ssh to the controller VM in the GenAI deployment using bosh cli and then run the following commands to check the network connectivity 

nc -vz q-s0.postgres-instance.infra.service-instance-04d4ba53-7740-40a7-b402-1a4bb41da825.bosh 5432

q-s0.postgres-instance.infra.service-instance-04d4ba53-7740-40a7-b402-1a4bb41da825.bosh is the host mentioned in this example, make sure to replace accordingly.

Resolution

If it's a connectivity issue, you can allow the same in your firewall or routers and if the Postgres is also a tile then the DNS resolution fails then you need to check the Bosh-DNS.

If it's a DNS resolution issue, you can check if the Bosh-DNS certs are not expired and if cert rotation has been performed if the new certs are pushed to all the VMs and then rerun the Apply changes on the GenAI tile.