Pivotal HDB initialization failed, error message shown while initialization
[gpadmin@hawq-mdw utils]$ gpinitsystemåÊ -c gpinitsystem_config -h hostfile ... ... 20131010:16:44:49:021342 gpinitsystem:hawq-mdw:gpadmin-[INFO]:-Create filespace dfs_system 20131010:16:44:57:021342 gpinitsystem:hawq-mdw:gpadmin-[WARN]: 20131010:16:44:57:021342 gpinitsystem:hawq-mdw:gpadmin-[WARN]:-Failed to create dfs filespace; review gpinitsystem output to 20131010:16:44:57:021342 gpinitsystem:hawq-mdw:gpadmin-[WARN]:- determine why this step failed and reinitialize cluster after resolving 20131010:16:44:57:021342 gpinitsystem:hawq-mdw:gpadmin-[WARN]:- issues.åÊ Not all initialization tasks have completed so the cluster 20131010:16:44:57:021342 gpinitsystem:hawq-mdw:gpadmin-[WARN]:- should not be used. 20131010:16:44:57:021342 gpinitsystem:hawq-mdw:gpadmin-[WARN]:-gpinitsystem will now try to stop the cluster 20131010:16:44:57:021342 gpinitsystem:hawq-mdw:gpadmin-[WARN]: 20131010:16:44:58:001190 gpstop:hawq-mdw:gpadmin-[INFO]:-Starting gpstop with args: -a -i -d /data/master/gpseg-1 .. .. 20131010:16:45:00:001190 gpstop:hawq-mdw:gpadmin-[INFO]:-Greenplum Version: 'postgres (HAWQ) 4.2.0 build 1' 20131010:16:45:00:001190 gpstop:hawq-mdw:gpadmin-[INFO]:-There are 0 connections to the database 20131010:16:45:00:001190 gpstop:hawq-mdw:gpadmin-[INFO]:-Commencing Master instance shutdown with mode='immediate' 20131010:16:45:00:001190 gpstop:hawq-mdw:gpadmin-[INFO]:-Master host=hawq-mdw 20131010:16:45:00:001190 gpstop:hawq-mdw:gpadmin-[INFO]:-Commencing Master instance shutdown with mode=immediate 20131010:16:45:00:001190 gpstop:hawq-mdw:gpadmin-[INFO]:-Master segment instance directory=/data/master/gpseg-1 20131010:16:45:01:001190 gpstop:hawq-mdw:gpadmin-[INFO]:-No standby master host configured 20131010:16:45:01:001190 gpstop:hawq-mdw:gpadmin-[INFO]:-Commencing parallel segment instance shutdown, please wait... ... 20131010:16:45:04:001190 gpstop:hawq-mdw:gpadmin-[INFO]:----------------------------------------------------- 20131010:16:45:04:001190 gpstop:hawq-mdw:gpadmin-[INFO]:- Segments stopped successfully = 2 20131010:16:45:04:001190 gpstop:hawq-mdw:gpadmin-[INFO]:- Segments with errors during stop = 0 20131010:16:45:04:001190 gpstop:hawq-mdw:gpadmin-[INFO]:----------------------------------------------------- 20131010:16:45:04:001190 gpstop:hawq-mdw:gpadmin-[INFO]:-Successfully shutdown 2 of 2 segment instances 20131010:16:45:04:001190 gpstop:hawq-mdw:gpadmin-[INFO]:-Database successfully shutdown with no errors reported 20131010:16:45:04:021342 gpinitsystem:hawq-mdw:gpadmin-[INFO]:-Successfully shutdown the Greenplum instance 20131010:16:45:04:021342 gpinitsystem:hawq-mdw:gpadmin-[WARN]: 20131010:16:45:04:021342 gpinitsystem:hawq-mdw:gpadmin-[WARN]:-Failed to create dfs filespace; review gpinitsystem output to 20131010:16:45:04:021342 gpinitsystem:hawq-mdw:gpadmin-[WARN]:- determine why this step failed and reinitialize cluster after resolving 20131010:16:45:04:021342 gpinitsystem:hawq-mdw:gpadmin-[WARN]:- issues. Not all initialization tasks have completed so the cluster 20131010:16:45:04:021342 gpinitsystem:hawq-mdw:gpadmin-[WARN]:- should not be used. 20131010:16:45:04:021342 gpinitsystem:hawq-mdw:gpadmin-[WARN]: 20131010:16:45:04:gpinitsystem:hawq-mdw:gpadmin-[FATAL]: create dfs filespace failed; Script Exiting!
gpinitsystem log in /home/gpadmin/gpAdminLogs/ shows the below snippet
20131010:16:44:49:021342 gpinitsystem:hawq-mdw:gpadmin-[INFO]:-DFS_PATH_LIST: 1:'/data/master/dfs/gpseg-1',2:'hawq-mdw:9000/hawq/gpseg0',3:'hawq-mdw:9000/hawq/gpseg1' 20131010:16:44:49:021342 gpinitsystem:hawq-mdw:gpadmin-[INFO]:-Create filespace dfs_system WARNING: function 1 returned error: -1 WARNING: fail to connect hdfs at hawq-mdw:9000, errno = 5 WARNING: function 1 returned error: -1 WARNING: fail to connect hdfs at hawq-mdw:9000, errno = 5 WARNING: function 1 returned error: -1 WARNING: fail to connect hdfs at hawq-mdw:9000, errno = 5 WARNING: function 1 returned error: -1 CONTEXT: Dropping file-system object -- Filespace Directory: '16384' WARNING: fail to connect hdfs at hawq-mdw:9000, errno = 5 CONTEXT: Dropping file-system object -- Filespace Directory: '16384' WARNING: could not remove filespace directory 16384: Input/output error CONTEXT: Dropping file-system object -- Filespace Directory: '16384' ERROR: could not create filespace directory hdfs://hawq-mdw:9000/hawq/gpseg0: Input/output error
During Initialization hdb was unable to create the directory structure in HDFS using URI address hdfs:/hawq-mdw:9000/. In other words, inialization errored out while accessing hdfs filesystem using the given URI.
In this case the port number 9000 configured for param DFS_URL in the /etc/gphd/hawq/conf/gpinitsystem_config is not correct.
[gpadmin@hawq-mdw hadoop]$ hadoop fs -ls hdfs://hawq-mdw:9000/ ls: Call From hawq-mdw.saturn.local/192.165.100.31 to hawq-mdw:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused [gpadmin@hawq-mdw hadoop]$
lskdjflsdgpinitsystem_config file DFS_URL param
[gpadmin@hawq-mdw utils]$ egrep DFS_URL gpinitsystem_config DFS_URL=hawq-mdw:9000/hawq
Identify the correct host and port information from the clusters /etc/gphd/hadoop/conf/core-site.xml
<property> <name>fs.defaultFS</name> <value>hdfs://hawq-mdw:8020</value> </property>
Verify the core-site.xml URI path is correct
[gpadmin@hawq-mdw conf]$ hadoop fs -ls hdfs://hawq-mdw:8020/ Found 3 items drwxr--r-- - hdfs supergroup 0 2013-10-10 17:43 hdfs://hawq-mdw:8020/hawq drwxr-xr-x - mapred hadoop 0 2013-10-10 17:31 hdfs://hawq-mdw:8020/mapred drwxr-xr-x - hdfs supergroup 0 2013-10-10 16:24 hdfs://hawq-mdw:8020/user
Change the value in gpinitsystem_config to below and perform gpinitsystem again
DFS_URL=hawq-mdw:8020/hawq