1. Choose a server to host the Sqoop metastore. It is best to choose a master or administrative server.
Slave nodes are not recommended because they are expected to have a heavy load and to fail at some point. Colocating the Sqoop metastore with Ambari server is acceptable.
2. Setup the Sqoop metastore.
You need to decide which user will execute the metastore. It is recommended to run the metastore as sqoop user. It is strongly discouraged to run as root. Once you have decided which user will run the metastore, the next step is to create the user and the home directory (if needed), and a folder to store the database (DB) information.
The next step is to configure the metastore details in sqoop-site.xml; the relevant properties to be set up are sqoop.metastore.server.location, for example: /home/sqoop/meta-store/shared.db
The other configuration property to set is sqoop.metastore.server.port. Leave it set to the default 16000.
For the client properties, you need to set the following properties:
The auto-connect URL is a connect string for an HSQL DB with the following format:
jdbc:hsqldb:hsql://<hostname_fqdn>:<port>/sqoop
Where hostname_fqdn is the hostname with domain from the host chosen in step 1. port is the port we set in the previous step, which is by default 16000. An example for this is shown below:
jdbc:hsqldb:hsql://hdw1.hdp.local:16000/sqoop
It is not possible to use Ambari to configure these settings. You have to update the files manually.
Log on to another node in the cluster and update the properties for client access:
Do not setup the properties for server configuration. The properties sqoop.metastore.server.location and sqoop.metastore.server.port should be set only in the node running Sqoop metastore.
Copy this new sqoop-site.xml file to all other nodes except the Sqoop metastore server.
4. Now run sudo -u sqoop sqoop-metastore to test that the server comes up successfully. Once the server comes up, it binds to standard output and remains as a foreground process. This is undesirable for a server process, and you must start and leave the server process running in the background. There are many ways to achieve this, all of them are correct. The recommend approach is as follows: