When using gprecoverseg
to recover segments, the following error is: "Cannot write: No space left on device."
20170126:11:19:30:120429 gprecoverseg:hawqmaster:gpadmin-[INFO]:-Starting gprecoverseg with args: -i /tmp/gprecoverseg -F (...) 20170126:11:19:50:120429 gprecoverseg:hawqmaster:gpadmin-[INFO]:-2 segment(s) to recover 20170126:11:19:50:120429 gprecoverseg:hawqmaster:gpadmin-[INFO]:-Ensuring 2 failed segment(s) are stopped ... 20170126:11:19:54:120429 gprecoverseg:hawqmaster:gpadmin-[INFO]:-Cleaning files from 2 segment(s) ......... 20170126:11:20:03:120429 gprecoverseg:hawqmaster:gpadmin-[INFO]:-Building template directory 20170126:11:20:03:120429 gprecoverseg:hawqmaster:gpadmin-[INFO]:-Creating template 20170126:11:20:04:120429 gprecoverseg:hawqmaster:gpadmin-[INFO]:-Starting copy of segment dbid 2 to location /tmp/GPSQL/gpsql_template20170126_112003 20170126:11:21:10:120429 gprecoverseg:hawqmaster:gpadmin-[CRITICAL]:-Error occurred: non-zero rc: 2 Command was: '/bin/tar -C /tmp/GPSQL/gpsql_template20170126_112003 -xf /tmp/GPSQL/gpsql_template20170126_112003/hawq_template20170126_112004' rc=2, stdout='', stderr='/bin/tar: ./pg_distributedlog/016F: Wrote only 7680 of 10240 bytes /bin/tar: ./pg_distributedlog/0170: Cannot write: No space left on device /bin/tar: ./pg_distributedlog/0171: Cannot write: No space left on device /bin/tar: ./pg_distributedlog/0172: Cannot write: No space left on device /bin/tar: ./pg_distributedlog/0173: Cannot write: No space left on device /bin/tar: ./pg_distributedlog/0174: Cannot write: No space left on device /bin/tar: ./pg_distributedlog/0175: Cannot write: No space left on device (...) /bin/tar: ./postgresql.conf: Cannot write: No space left on device /bin/tar: ./postmaster.pid: Cannot write: No space left on device /bin/tar: Exiting with failure status due to previous errors ' Traceback (most recent call last): File "/usr/local/hawq/ext/python/lib/python2.6/logging/__init__.py", line 769, in emit stream.write(fs % msg) IOError: [Errno 28] No space left on device
The /
partition will be 100% full:
[root@hawq21 ~]# df -h Filesystem Size Used Avail Use% Mounted on /dev/sda5 9.8G 9.8G 0G 100% / tmpfs 2.9G 0 2.9G 0% /dev/shm /dev/sda1 477M 41M 411M 9% /boot /dev/sda7 55G 22G 31G 41% /data /dev/sda2 20G 45M 19G 1% /home /dev/sda3 9.8G 24M 9.2G 1% /tmp [root@hawq21 ~]#
gprecoverseg
with HAWQ 1.x, the master will copy the whole segment directory from one of the running segments into the master's /tmp
directory to create a template.tmp/GPSQL/gpsql_template<TIMESTAMP>
.pg_log
and other directories are removed. As the uncompressed size may be large, this may lead to the "out of space" errors./tmp
directory which may increase the risk of running into the "out of space" error.Make sure there is enough space left on / compared to the size of the segment directory being chosen to copy from. If there is not enough free space, move log files out of the pg_log
directory on running segment that the files are being copied from and use du -sh ./*
to understand where space is being used.
Once gprecoverseg
is complete, the log files can be placed back on the pg_log
directory on the source segment.
Alternatively, segments can be recovered in smaller groups instead of all of them at a time with gprecoverseg -i <file>
.