Cloud Controller Workers fail to start during a PCF Upgrade with error "Applied migration files not in file system"
search cancel

Cloud Controller Workers fail to start during a PCF Upgrade with error "Applied migration files not in file system"

book

Article ID: 297537

calendar_today

Updated On:

Products

VMware Tanzu Application Service for VMs

Issue/Introduction

Symptoms:
You are seeing the following error while upgrading the PAS tile 
Sequel::Migrator::Error: Applied migration files not in file system:
20170920143711_create_service_instance_shares.rb, 
20171013223336_remove_deprecated_v1_service_fields.rb, 
20171103163351_add_name_to_service_bindings.rb, 
20171106202032_change_processes_with_health_check_timeout_0_to_nil.rb, 
20171120214253_remove_buildpack_receipt_stack_name_from_droplets.rb, 
20171123191651_add_index_to_service_binding_on_name.rb, 
20171220183100_add_encryption_key_label_column_to_tables_with_encrypted_columns.rb, 
20180125181819_add_internal_to_domains.rb

Listing out all of the migrations on CCDB that they are missing.

Environment


Cause

1. During the deployment, the Cloud Controller API VMs (cloud_controller in PAS) deployed successfully and ran database migrations to update the database schema in CCDB.
2. The Cloud Controller Worker VMs deployed later (these deploy after the APIs and can get stuck behind other instance groups like the routers, etc. so there can be a decent amount of time before they deploy).
3. Something happened so that this deployment failed and one option might be that the BOSH agent was unavailable on a worker VM.
4. The user ran a bosh recreate of the Cloud Controller Worker VMs to bring them back on new VMs with healthy agents.
5. Unfortunately the bosh recreate will recreate the VMs and deploy the last *successful* deployment - in this case the previous version of PAS with the previous Cloud Controller code.
6. The Cloud Controller code (whether it be running in the api job, worker job, clock job, etc.) will check to make sure it is consistent with the schema of the CCDB before starting.
7. Since the CCDB had been updated when the API VMs successfully deployed, the older Cloud Controller code on these recreated workers did not recognize the migrations that had been applied to CCDB.

Resolution

In this case, the only option to bring the workers back online is to deploy them with the *new* code.

To make this happen, you need to disable the BOSH resurrector and manually delete the worker VMs. They will be temporarily missing. Now, "Apply Changes". BOSH will detect the missing VMs and deploy them with the new Cloud Controller code from the PAS tile they were trying to upgrade to.

With the new code the workers should be consistent with the CCDB schema and the deployment should be successful.