Corfu read-only mode in data migration - upgrade from 2.x.x to 3.x.x
search cancel

Corfu read-only mode in data migration - upgrade from 2.x.x to 3.x.x

book

Article ID: 345784

calendar_today

Updated On:

Products

VMware NSX Networking

Issue/Introduction

Symptoms:

During upgrade from 2.x.x to 3.x.x of the NSX-T Managers, the data_migration step (Step 7) of the upgrade process fails. This issue can be expected while migrating from earlier version to later versions.

The logs were as follows -

Upgrade steps:
download_os [YYYY-MM-DD 05:25:04 - YYYY-MM-DD 05:25:39] SUCCESS
shutdown_manager [YYYY-MM-DD 05:25:47 - YYYY-MM-DD 05:27:32] SUCCESS
install_os [YYYY-MM-DD 05:27:32 - YYYY-MM-DD 05:29:28] SUCCESS
migrate_manager_config [YYYY-MM-DD 05:29:28 - YYYY-MM-DD 05:29:33] SUCCESS
switch_os [YYYY-MM-DD 05:29:33 - YYYY-MM-DD 05:29:38] SUCCESS
reboot [YYYY-MM-DD 05:29:38 - YYYY-MM-DD 05:30:22] SUCCESS
run_migration_tool [YYYY-MM-DD 05:31:27 - ] FAILED
run_migration_tool [YYYY-MM-DD 05:51:30 - ] FAILED
run_migration_tool [YYYY-MM-DD 06:07:53 - ] FAILED

Logs from /var/log/proton/data-migration.log

YYYY-MM-DDT07:16:55.646Z ERROR main ObjectsView 4506 TXEnd[TX[f69d]] server quota exceeded org.corfudb.runtime.exceptions.QuotaExceededException: Disk usage has exceeded the quota set, system is now in read-only mode. Quota of 7769284608 bytes
YYYY-MM-DDT07:16:55.646Z WARN main CorfuCompileProxy 4506 TXExecute[CorfuTable[f34]] Abort with exception org.corfudb.runtime.exceptions.TransactionAbortedException: TX ABORT | Snapshot Time = Token(epoch=267, sequence=3981819198) | Failed Transaction ID = <UUID> | Offending Address = -1 | Conflict Key = 00 | Conflict Stream = 00000000-0000-0000-0000-000000000000 | Cause = QUOTA_EXCEEDED | Time = 10049 ms | Message = Disk usage has exceeded the quota set, system is now in read-only mode. Quota of 7769284608 bytes
YYYY-MM-DDT07:17:06.683Z WARN main ChainReplicationProtocol 4506 fillHole[Token(epoch=267, sequence=3981819199)]: chain head 1/1
YYYY-MM-DDT07:17:06.690Z WARN netty-16 BaseHandler 4506 Server threw exception for request 3396
java.util.concurrent.CompletionException: org.corfudb.runtime.exceptions.QuotaExceededException: Disk usage has exceeded the quota set, system is now in read-only mode. Quota of 7769284608 bytes
    at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) ~[?:1.8.0_301]
    at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) ~[?:1.8.0_301]
    at java.util.concurrent.CompletableFuture.uniRun(CompletableFuture.java:714) ~[?:1.8.0_301]
    at java.util.concurrent.CompletableFuture$UniRun.tryFire(CompletableFuture.java:701) ~[?:1.8.0_301]
    at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_301]
    at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) ~[?:1.8.0_301]
    at org.corfudb.infrastructure.BatchProcessor.processor(BatchProcessor.java:165) ~[data-migration-fs.jar:?]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_301]
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_301]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_301]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_301]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_301]
Caused by: org.corfudb.runtime.exceptions.QuotaExceededException: Disk usage has exceeded the quota set, system is now in read-only mode. Quota of 7769284608 bytes
    at org.corfudb.infrastructure.BatchProcessor.processor(BatchProcessor.java:167) ~[data-migration-fs.jar:?]
    ... 5 more
YYYY-MM-DDT07:17:06.690Z WARN main AddressSpaceView 4506 write: write failed
org.corfudb.runtime.exceptions.QuotaExceededException: Disk usage has exceeded the quota set, system is now in read-only mode. Quota of 7769284608 bytes
    at org.corfudb.infrastructure.BatchProcessor.processor(BatchProcessor.java:167) ~[data-migration-fs.jar:?]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_301]
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_301]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_301]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_301]
    at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_301]
YYYY-MM-DDT07:17:06.691Z ERROR main ObjectsView 4506 TXEnd[TX[c52b]] server quota exceeded org.corfudb.runtime.exceptions.QuotaExceededException: Disk usage has exceeded the quota set, system is now in read-only mode. Quota of 7769284608 bytes
YYYY-MM-DDT07:17:06.691Z WARN main CorfuCompileProxy 4506 TXExecute[CorfuTable[f34]] Abort with exception org.corfudb.runtime.exceptions.TransactionAbortedException: TX ABORT | Snapshot Time = Token(epoch=267, sequence=3981819199) | Failed Transaction ID = 5793114b-b895-499c-9f50-4220be1cc52b | Offending Address = -1 | Conflict Key = 00 | Conflict Stream = 00000000-0000-0000-0000-000000000000 | Cause = QUOTA_EXCEEDED | Time = 10045 ms | Message = Disk usage has exceeded the quota set, system is now in read-only mode. Quota of 7769284608 bytes

root@nsx1:/# df -h
Filesystem                   Size  Used Avail Use% Mounted on
udev                          24G     0   24G   0% /dev
tmpfs                        4.8G  7.3M  4.8G   1% /run
/dev/sda2                     11G  6.1G  3.7G  63% /
tmpfs                         24G  572K   24G   1% /dev/shm
tmpfs                        5.0M     0  5.0M   0% /run/lock
tmpfs                         24G     0   24G   0% /sys/fs/cgroup
/dev/sda3                     11G  7.3G  2.4G  76% /os_bak
/dev/sda1                    930M  8.3M  857M   1% /boot
/dev/mapper/nsx-config__bak   29G   13G   15G  46% /config_bak
/dev/mapper/nsx-image         42G   16G   24G  40% /image
/dev/mapper/nsx-repository    31G   13G   17G  44% /repository
/dev/mapper/nsx-var+log       27G  9.5G   16G  38% /var/log
/dev/mapper/nsx-tmp          3.7G  8.7M  3.5G   1% /tmp
/dev/mapper/nsx-secondary     98G   61M   93G   1% /nonconfig
/dev/mapper/nsx-config        29G   13G   16G  46% /config <<< anything greater than 25% will cause the data migration step to fail
/dev/mapper/nsx-var+dump     9.2G   22M  8.6G   1% /var/dump


Cause

In the earlier versions of corfu, the data was not compressed while being persisted. The memory threshold of the /config partition was configured at 69% through the log-size-quota-percentage parameter. That means, when the /config partition goes beyond 69% of the allocated memory, corfu goes into read-only mode. Below that threshold, corfu works as expected and hence not compressing the data was not a problem for the earlier versions. But, in the later versions, data was compressed while persisting in corfu. The later versions thus reduced the /config partition's threshold as 25%.

During the data_migration step, the log-size-quota-percentage parameter is configured to 25% as the new version is installed already. But, since the data is still the same and hasn't been migrate to the new version, it exceeded the threshold leading to corfu going into read-only mode. Thus, the upgrade at this step failed.

Identifying the bug

  1. Check if there are log lines related to 'QuotaExceededException' and 'Corfu read-only mode' like the log lines mentioned above
  2. Check if the /config partition before upgrade < 69%
  3. Check if the /config partition when the error happens >25%
  4. Check if the log-size-quota-percentage parameter in /usr/tanuki/conf/corfu-server-wrapper.conf file < the current /config partition usage

Resolution

This issue is resolved with upgrades from 3.x.x onward.

Workaround:
To work around this issue, contact VMware Support