ALERT: Some images may not load properly within the Knowledge Base Article. If you see a broken image, please right-click and select 'Open image in a new tab'. We apologize for this inconvenience.

copy cluster getting errors

book

Article ID: 237821

calendar_today

Updated On:

Products

CA Performance Management - Usage and Administration

Issue/Introduction

The copy cluster was getting the following rsync errors in the vbr log. 

  VbrError: On host 10.73.87.137: Error accessing remote storage: failed accessing remote storage on hostname009: rsync: read errors mapping "/data/drdata/v_drdata_node0002_data/169/02f4d452012bc54adfb7d8d989929a2900b000060592f871_0.gt": Input/output error (5)
  rsync: read errors mapping "/CA/data/drdata/v_drdata_node0002_data/673/02f4d452012bc54adfb7d8d989929a2900b0000605b2abb1_0.gt": Input/output error (5)
  rsync: read errors mapping "/CA/data/drdata/v_drdata_node0002_data/743/02f4d452012bc54adfb7d8d989929a2900b0000605b261bf_0.gt": Input/output error (5)
  rsync: read errors mapping "/CA/data/drdata/v_drdata_node0002_data/753/02f4d452012bc54adfb7d8d989929a2900b0000605b261c9_0.gt": Input/output error (5)
  rsync: read errors mapping "/CA/data/drdata/v_drdata_node0002_data/169/02f4d452012bc54adfb7d8d989929a2900b000060592f871_0.gt": Input/output error (5)
  rsync: read errors mapping "/CA/data/drdata/v_drdata_node0002_data/673/02f4d452012bc54adfb7d8d989929a2900b0000605b2abb1_0.gt": Input/output error (5)
  rsync: read errors mapping "/CA/data/drdata/v_drdata_node0002_data/743/02f4d452012bc54adfb7d8d989929a2900b0000605b261bf_0.gt": Input/output error (5)
  rsync: read errors mapping "/CA/data/drdata/v_drdata_node0002_data/753/02f4d452012bc54adfb7d8d989929a2900b0000605b261c9_0.gt": Input/output error (5)
  ERROR: 169/02f4d452012bc54adfb7d8d989929a2900b000060592f871_0.gt failed verification -- update discarded.
  ERROR: 673/02f4d452012bc54adfb7d8d989929a2900b0000605b2abb1_0.gt failed verification -- update discarded.
  ERROR: 743/02f4d452012bc54adfb7d8d989929a2900b0000605b261bf_0.gt failed verification -- update discarded.
  ERROR: 753/02f4d452012bc54adfb7d8d989929a2900b0000605b261c9_0.gt failed verification -- update discarded.
  rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1042) [sender=3.0.7]
  : returncode=23

Cause

Checked the /var/log/messages on the source system 

Mar 19 22:35:38 txanunxlipcp004 kernel: end_request: I/O error, dev sdb, sector 468409304
Mar 19 22:36:27 txanunxlipcp004 kernel: megaraid_sas 0000:03:00.0: 1251514 (701058948s/0x0002/FATAL) - Unrecoverable medium error during recovery on PD 02(e0x20/s2) at f8b3371
Mar 19 22:36:37 txanunxlipcp004 kernel: megaraid_sas 0000:03:00.0: 1251520 (701058959s/0x0002/FATAL) - Unrecoverable medium error during recovery on PD 02(e0x20/s2) at f8b3371
Mar 19 22:36:39 txanunxlipcp004 kernel: megaraid_sas 0000:03:00.0: 1251522 (701058961s/0x0002/FATAL) - Unrecoverable medium error during recovery on PD 02(e0x20/s2) at f8b33a8
Mar 19 22:36:49 txanunxlipcp004 kernel: megaraid_sas 0000:03:00.0: 1251528 (701058971s/0x0002/FATAL) - Unrecoverable medium error during recovery on PD 02(e0x20/s2) at f8b3371
Mar 19 22:36:51 txanunxlipcp004 kernel: megaraid_sas 0000:03:00.0: 1251530 (701058973s/0x0002/FATAL) - Unrecoverable medium error during recovery on PD 02(e0x20/s2) at f8b33a8

This indicated that there was a disk issue on the source system.

We then shutdown the database on that node and set the Ancient History Mark (AHM) in the database. To rebuild the node. Hopefully bypassing the bad parts of the disk so the copy cluster could run. 

Environment

Release : 20.2

Component :

Resolution

Stop Vertica on the problem node using the following command:
 /opt/vertica/bin/adminTools -t stop_node --hosts xx.xx.xx.xx
Run the following vsql command to set the Ancient History Mark (AHM) in the database. First enter the vsql prompt, then run the command, then quit to exist the prompt.
 /opt/vertica/bin/vsql -U <dbAdminUser> -w <dbAdminPassword>
 select make_ahm_now(true);
 vsql>\q
Restart the database on the problem node with the following command:
 /opt/vertica/bin/admintools -t restart_node -d drdata --hosts x.x.x.x

 

Additional Information

KB to rebuild the problem Vertica node. 

https://knowledge.broadcom.com/external/article?articleId=8078