During upgrade from 2.x.x to 3.x.x of the NSX-T Managers, the data_migration step (Step 7) of the upgrade process fails. This issue can be expected while migrating from earlier version to later versions.
The logs were as follows -
Upgrade steps:
download_os [YYYY-MM-DD 05:25:04 - YYYY-MM-DD 05:25:39] SUCCESS
shutdown_manager [YYYY-MM-DD 05:25:47 - YYYY-MM-DD 05:27:32] SUCCESS
install_os [YYYY-MM-DD 05:27:32 - YYYY-MM-DD 05:29:28] SUCCESS
migrate_manager_config [YYYY-MM-DD 05:29:28 - YYYY-MM-DD 05:29:33] SUCCESS
switch_os [YYYY-MM-DD 05:29:33 - YYYY-MM-DD 05:29:38] SUCCESS
reboot [YYYY-MM-DD 05:29:38 - YYYY-MM-DD 05:30:22] SUCCESS
run_migration_tool [YYYY-MM-DD 05:31:27 - ] FAILED
run_migration_tool [YYYY-MM-DD 05:51:30 - ] FAILED
run_migration_tool [YYYY-MM-DD 06:07:53 - ] FAILED
YYYY-MM-DDT07:16:55.646Z ERROR main ObjectsView 4506 TXEnd[TX[f69d]] server quota exceeded org.corfudb.runtime.exceptions.QuotaExceededException: Disk usage has exceeded the quota set, system is now in read-only mode. Quota of 7769284608 bytes
YYYY-MM-DDT07:16:55.646Z WARN main CorfuCompileProxy 4506 TXExecute[CorfuTable[f34]] Abort with exception org.corfudb.runtime.exceptions.TransactionAbortedException: TX ABORT | Snapshot Time = Token(epoch=267, sequence=3981819198) | Failed Transaction ID = <UUID> | Offending Address = -1 | Conflict Key = 00 | Conflict Stream = 00000000-0000-0000-0000-000000000000 | Cause = QUOTA_EXCEEDED | Time = 10049 ms | Message = Disk usage has exceeded the quota set, system is now in read-only mode. Quota of 7769284608 bytes
YYYY-MM-DDT07:17:06.683Z WARN main ChainReplicationProtocol 4506 fillHole[Token(epoch=267, sequence=3981819199)]: chain head 1/1
YYYY-MM-DDT07:17:06.690Z WARN netty-16 BaseHandler 4506 Server threw exception for request 3396
java.util.concurrent.CompletionException: org.corfudb.runtime.exceptions.QuotaExceededException: Disk usage has exceeded the quota set, system is now in read-only mode. Quota of 7769284608 bytes
at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) ~[?:1.8.0_301]
at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) ~[?:1.8.0_301]
at java.util.concurrent.CompletableFuture.uniRun(CompletableFuture.java:714) ~[?:1.8.0_301]
at java.util.concurrent.CompletableFuture$UniRun.tryFire(CompletableFuture.java:701) ~[?:1.8.0_301]
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_301]
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) ~[?:1.8.0_301]
at org.corfudb.infrastructure.BatchProcessor.processor(BatchProcessor.java:165) ~[data-migration-fs.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_301]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_301]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_301]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_301]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_301]
Caused by: org.corfudb.runtime.exceptions.QuotaExceededException: Disk usage has exceeded the quota set, system is now in read-only mode. Quota of 7769284608 bytes
at org.corfudb.infrastructure.BatchProcessor.processor(BatchProcessor.java:167) ~[data-migration-fs.jar:?]
... 5 more
YYYY-MM-DDT07:17:06.690Z WARN main AddressSpaceView 4506 write: write failed
org.corfudb.runtime.exceptions.QuotaExceededException: Disk usage has exceeded the quota set, system is now in read-only mode. Quota of 7769284608 bytes
at org.corfudb.infrastructure.BatchProcessor.processor(BatchProcessor.java:167) ~[data-migration-fs.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_301]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_301]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_301]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_301]
at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_301]
YYYY-MM-DDT07:17:06.691Z ERROR main ObjectsView 4506 TXEnd[TX[c52b]] server quota exceeded org.corfudb.runtime.exceptions.QuotaExceededException: Disk usage has exceeded the quota set, system is now in read-only mode. Quota of 7769284608 bytes
YYYY-MM-DDT07:17:06.691Z WARN main CorfuCompileProxy 4506 TXExecute[CorfuTable[f34]] Abort with exception org.corfudb.runtime.exceptions.TransactionAbortedException: TX ABORT | Snapshot Time = Token(epoch=267, sequence=3981819199) | Failed Transaction ID = 5793114b-b895-499c-9f50-4220be1cc52b | Offending Address = -1 | Conflict Key = 00 | Conflict Stream = 00000000-0000-0000-0000-000000000000 | Cause = QUOTA_EXCEEDED | Time = 10045 ms | Message = Disk usage has exceeded the quota set, system is now in read-only mode. Quota of 7769284608 bytes
root@nsx1:/# df -h
Filesystem Size Used Avail Use% Mounted on
udev 24G 0 24G 0% /dev
tmpfs 4.8G 7.3M 4.8G 1% /run
/dev/sda2 11G 6.1G 3.7G 63% /
tmpfs 24G 572K 24G 1% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 24G 0 24G 0% /sys/fs/cgroup
/dev/sda3 11G 7.3G 2.4G 76% /os_bak
/dev/sda1 930M 8.3M 857M 1% /boot
/dev/mapper/nsx-config__bak 29G 13G 15G 46% /config_bak
/dev/mapper/nsx-image 42G 16G 24G 40% /image
/dev/mapper/nsx-repository 31G 13G 17G 44% /repository
/dev/mapper/nsx-var+log 27G 9.5G 16G 38% /var/log
/dev/mapper/nsx-tmp 3.7G 8.7M 3.5G 1% /tmp
/dev/mapper/nsx-secondary 98G 61M 93G 1% /nonconfig
/dev/mapper/nsx-config 29G 13G 16G 46% /config <<< anything greater than 25% will cause the data migration step to fail
/dev/mapper/nsx-var+dump 9.2G 22M 8.6G 1% /var/dump
In the earlier versions of corfu, the data was not compressed while being persisted. The memory threshold of the /config partition was configured at 69% through the log-size-quota-percentage
parameter
. That means, when the /config partition goes beyond 69% of the allocated memory, corfu goes into read-only mode. Below that threshold, corfu works as expected and hence not compressing the data was not a problem for the earlier versions. But, in the later versions, data was compressed while persisting in corfu. The later versions thus reduced the /config partition's threshold as 25%.
During the data_migration step, the log-size-quota-percentage parameter is configured to 25% as the new version is installed already. But, since the data is still the same and hasn't been migrate to the new version, it exceeded the threshold leading to corfu going into read-only mode. Thus, the upgrade at this step failed.