Follow the steps below to migrate a Hive database from one cluster to another:
1. Install Hive on the new cluster and make sure both the source and destination clusters are identical.
2. Transfer the data present in the Hive warehouse directory (/user/hive/warehouse) to the new Hadoop cluster.
hadoop distcp <src> <dst>
3. Take a backup of the Hive Metastore.
mysqldump hive > /tmp/mydir/backup_hive.sql
4. Install MySQL on the new Hadoop cluster.
5. Open the Hive MySQL-Metastore dump file and replace the source NameNode hostname with the destination hostname.
hdfs://ip-address-old-namenode:port ---> hdfs://ip-address-new-namenode:port
6. Restore the edited MySQL dump into the MySQL of new the Hadoop cluster.
mysql hive < /tmp/mydir/backup_hive.sql
7. Configure Hive as normal and perform the Hive schema upgrade if needed
Hive metadata contains the information about the database objects, and the contents are stored in the Hadoop Distributed File System (HDFS). Metadata contains HDFS URI and other details.
Therefore, if you migrate Hive from one cluster to another cluster, you have to point the metadata to the HDFS of the new cluster. If you don't do this, it will point to the HDFS of the older cluster and the migration will fail.
In case of any failure, initialize the Hive Metastore of the destination cluster and resume the migration following the correct steps.
/bin/schematool -initSchema -dbType mysql
During data validation, the counts might not match between the source and target databases. This often occurs because the absence of the latest statistics.
Please use analyze table <table_name> compute statistics
on both the source and the target tables before comparing the counts.
Please refer to the links below to know more about the Hive tools that address other issues with migration: