PXF server error : This feature is disabled. Please refer to dfs.client.block.write.replace-datanode-on-failure.enable
search cancel

PXF server error : This feature is disabled. Please refer to dfs.client.block.write.replace-datanode-on-failure.enable

book

Article ID: 381509

calendar_today

Updated On:

Products

VMware Tanzu Data Suite

Issue/Introduction

A customer frequently encountered the following error: "PXF server error: This feature is disabled. Please refer to dfs.client.block.write.replace-datanode-on-failure.enable when running specific PXF jobs." 

 

UTC,"generic_gpdb_load_utility","gpdb_prd",p46594,th1531054208,"10.x.x.x","58996",2024-09-04 16:47:40 UTC,0,con219152,cmd17,seg-1,,dx1118568,,sx1,"ERROR","P0001","Error during transaction

             Error Code : 08000

             Message    : PXF server error : This feature is disabled.  Please refer to dfs.client.block.write.replace-datanode-on-failure.enable configuration property.  (seg99 10.x.x.x:40003 pid=311859)

             Detail     :

             Hint       : Check the PXF logs located in the '/usr/local/pxf-gp6/logs' directory on host 'localhost' or 'set client_min_messages=LOG' for additional details.

             Context    : SQL statement ""insert into gpdb.platform_attribute_client_12_ext (user_identity_key,user_identity_type_id,attribute_id) select user_identity_key,user_identity_type_id,attribute_id from gpdb._platform_attribute_client_12""

PL/pgSQL function stg.usp_pop_hdfs_generic(text,text,boolean) line 67 at EXECUTE statement",,,,,,"select stg.usp_pop_hdfs_generic('mesobase', 'fact_platform_attribute_client_12', true);  --DAG:dw-mesobase620_lowes_prospect_srf TASK:hdfs_attribute_client.postgres",0,,"pl_exec.c",3072,

Environment

Prod: 

GPDB: 6.25.1
PXF: 6.10.2

Cause

The issue is likely due to the high workload on the HDFS cluster or potential network problems between the PXF hosts and the HDFS cluster, leading to a high rate of timeouts between the HDFS client (PXF) and the HDFS cluster.

Greenplum DB (GPDB) engineering recommended that the customer enable the dfs.client.block.write.replace-datanode-on-failure.enable feature on their Hadoop cluster. However, the customer declined to implement this change due to the large size of their Hadoop cluster and the significant time and effort required to apply the modification.

 

Resolution

On the Greenplum/PXF side, there are only a few parameters we can adjust to address this issue. As we previously attempted, increasing the values for dfs.datanode.socket.write.timeout and dfs.client.block.write.retries may help reduce the number of errors. We can consider adjusting these parameters further to higher values and monitor if that helps in mitigating the errors at some point.

 

Adjusting the following allowed the customer PXF view to complete successfully. 

Original settings: 

hdfs-site.xml-  <property>
hdfs-site.xml:    <name>dfs.datanode.socket.write.timeout</name>
hdfs-site.xml-    <value>9000000</value>
hdfs-site.xml-  </property>

hdfs-site.xml-  <property>
hdfs-site.xml:    <name>dfs.client.block.write.retries</name>
hdfs-site.xml-    <value>16</value>
hdfs-site.xml-  </property>

 

Updated Settings: 

hdfs-site.xml-  <property>
hdfs-site.xml:    <name>dfs.datanode.socket.write.timeout</name>
hdfs-site.xml-    <value>240000000</value>
hdfs-site.xml-  </property>

hdfs-site.xml-  <property>
hdfs-site.xml:    <name>dfs.client.block.write.retries</name>
hdfs-site.xml-    <value>32</value>
hdfs-site.xml-  </property>