How to set up a Sqoop metastore
search cancel

How to set up a Sqoop metastore

book

Article ID: 294994

calendar_today

Updated On:

Products

Services Suite

Issue/Introduction

A Sqoop metastore is used to store Sqoop job information in a central place. This helps fuel collaboration between Sqoop users and developers.

For example, a user can create a job to load some specific data. Then any other user can access from any node in the cluster the same job and just run it again. This is very convenient when using Sqoop in Oozie workflows.

Environment


Resolution

1. Choose a server to host the Sqoop metastore. It is best to choose a master or administrative server.


Slave nodes are not recommended because they are expected to have a heavy load and to fail at some point. Colocating the Sqoop metastore with Ambari server is acceptable.


2. Setup the Sqoop metastore.


You need to decide which user will execute the metastore. It is recommended to run the metastore as sqoop user. It is strongly discouraged to run as root. Once you have decided which user will run the metastore, the next step is to create the user and the home directory (if needed), and a folder to store the database (DB) information.


The next step is to configure the metastore details in sqoop-site.xml; the relevant properties to be set up are sqoop.metastore.server.location, for example: /home/sqoop/meta-store/shared.db


The other configuration property to set is sqoop.metastore.server.port. Leave it set to the default 16000.


For the client properties, you need to set the following properties:

  • sqoop.metastore.client.autoconnect.url
  • sqoop.metastore.client.autoconnect.username
  • sqoop.metastore.client.autoconnect.password


The auto-connect URL is a connect string for an HSQL DB with the following format:

jdbc:hsqldb:hsql://<hostname_fqdn>:<port>/sqoop

Where hostname_fqdn is the hostname with domain from the host chosen in step 1. port is the port we set in the previous step, which is by default 16000. An example for this is shown below:

jdbc:hsqldb:hsql://hdw1.hdp.local:16000/sqoop
The username and password should be left set to their default designations.

3. Update the service configuration to access the meta store automatically.


It is not possible to use Ambari to configure these settings. You have to update the files manually.


Log on to another node in the cluster and update the properties for client access:

  • sqoop.metastore.client.autoconnect.url
  • sqoop.metastore.client.autoconnect.username
  • sqoop.metastore.client.autoconnect.password


Do not setup the properties for server configuration. The properties sqoop.metastore.server.location and sqoop.metastore.server.port should be set only in the node running Sqoop metastore.


Copy this new sqoop-site.xml file to all other nodes except the Sqoop metastore server.


4. Now run sudo -u sqoop sqoop-metastore to test that the server comes up successfully. Once the server comes up, it binds to standard output and remains as a foreground process. This is undesirable for a server process, and you must start and leave the server process running in the background. There are many ways to achieve this, all of them are correct. The recommend approach is as follows:

a. Log on as the user who will run the metastore: su - sqoop
b. Enter into the metastore folder
c. Start the server process. Redirect stdout and stderr to a file and leave it in the background: nohup sqoop-metastore &>> shared.db.out &
d. If at any point you want to shut down the metastore gracefully, use sqoop-metastore --shutdown as the user running the process.


Additional Information

Sqoop Metastore User guide