How to migrate Hive from one Hadoop cluster to another
search cancel

How to migrate Hive from one Hadoop cluster to another

book

Article ID: 294648

calendar_today

Updated On:

Products

Services Suite

Issue/Introduction

This article shows how to migrate a Hive database from one cluster to another. Hive stores its metadata in its Metastore database. Metastore can be Oracle, MySQL, Derby, or PostgreSQL. In this case, the migration of Hive with the Metastore database in MySQL is explained.

Environment


Resolution

Follow the steps below to migrate a Hive database from one cluster to another: 
 

1. Install Hive on the new cluster and make sure both the source and destination clusters are identical.

2. Transfer the data present in the Hive warehouse directory (/user/hive/warehouse) to the new Hadoop cluster. 

hadoop distcp <src> <dst> 

3. Take a backup of the Hive Metastore.     

mysqldump hive > /tmp/mydir/backup_hive.sql

 4. Install MySQL on the new Hadoop cluster.


 5. Open the Hive MySQL-Metastore dump file and replace the source NameNode hostname with the destination hostname.

 hdfs://ip-address-old-namenode:port  ---> hdfs://ip-address-new-namenode:port 

 6. Restore the edited MySQL dump into the MySQL of new the Hadoop cluster.     

 mysql hive < /tmp/mydir/backup_hive.sql

 7. Configure Hive as normal and perform the Hive schema upgrade if needed

 

Impact 

Hive metadata contains the information about the database objects, and the contents are stored in the Hadoop Distributed File System (HDFS). Metadata contains HDFS URI and other details.

Therefore, if you migrate Hive from one cluster to another cluster, you have to point the metadata to the HDFS of the new cluster. If you don't do this, it will point to the HDFS of the older cluster and the migration will fail.

In case of any failure, initialize the Hive Metastore of the destination cluster and resume the migration following the correct steps.

/bin/schematool -initSchema -dbType mysql


Frequently faced problem

During data validation, the counts might not match between the source and target databases. This often occurs because the absence of the latest statistics.

Please use  analyze table <table_name> compute statistics on both the source and the target tables before comparing the counts.


Additional Information

Please refer to the links below to know more about the Hive tools that address other issues with migration: