CA Directory Data Bulk Load with Performance Considerations

Products

CA Directory

Issue/Introduction

We are planning on a new CA Directory project. What tools do we use to bulk load the initial LDAP database, ldapadd/ldapmodif or dxmodify?

Environment

Release: 14.x
Component: CA Directory

Resolution

For a new CA Directory implementation, the first thing is to confirm or create a data model to hold the data for the LDAP service it is providing.

Custom LDAP Schemas

At times, the Data Architect of the project may realize that there is a need to create a custom LDAP schema in order to host the data appropriately. If so, the following product documentation can get you started:

Directory Schemas

or you may want to refer to the following KB:

How to Create a Custom CA Directory LDAP Schema for a New CA Directory Implementation?

to help plan your efforts as well.

The fastest way to load a large number of entries for a CA Directory implementation is to use the dxloaddb command.

Backup/Restore a CA Directory Data DSA

The dxloaddb along with the dxdumpdb is often used in the backup/restore a CA Directory LDAP service. This command is commonly invoked as

dxloaddb dsaName ldifFile

see DXloaddb Tool -- Load a Data Store from an LDIF File for more details.

This ldif file very often is a backup of a CA Directory LDAP service using the command dxdumpdb as seen in DXdumpdb Tool -- Export Data from a Datastore to an LDIF File. An ldif file created using the dxdumpdb tool actually contains all the relevant CA Directory operational attributes not included in the original data set as well. This allows the LDAP services to be perfectly restored.

Load a CA Directory Data DSA from Scratch

For a new CA Directory LDAP service, an administrator will have to create an LDIF file from scratch to take advantage of the dxloaddb performance boost.

The entries in an LDIF file by design are organized in a Directory Information Tree (DIT) that consists of containers and sub-containers to hold objects within them. As a result, to build an LDIF file from scratch can be challenging or at least tedious.

The popularity of Excel and Relational Data Base often encourages administrators to start the data collection of the initial set of data using a csv file. To ease the creation of an LDIF, CA Directory provides a csv2ldif utility to help convert a csv file into a ldif file:

csv2ldif Tool -- Create an LDIF File from a CSV File

To see how this tool actually works, administrators are encouraged to try out the democorp and unspsc that are included on a Directory Server installation under samples subdirectory. Essentially, in additon to a properly formatted csv file, an adminstrator needs to create a custom ldt file. The following is the sample democorp.ldt under the samples/democorp:

# organization node
dn:o=DEMOCORP,c=AU
objectClass:organization

# Division node
dn:ou=$1,o=DEMOCORP,c=AU
objectClass:organizationalUnit

# Department node
dn:ou=$2,ou=$1,o=DEMOCORP,c=AU
objectClass:organizationalUnit

# Person (leaf node)
dn:cn=$4 $5,ou=$2,ou=$1,o=DEMOCORP,c=AU
objectClass:inetOrgPerson
cn:$4 $5
sn:$5
title:$6 $7
telephoneNumber:$8 $9
description:$3
mail:$4.$5@DEMOCORP.com
postalAddress:$10 $11 $12\$$14 $13
postalCode:$15

This template shows the DIT of the democorp DSA starts from an organization node that contains divisions nodes. A dvision node further contains department nodes. Then person nodes are put under the department nodes.

The $1, $2, ..., $15 are referencing the filed numbers within the democorp.csv file.

From the setup script under the samples/democorp, the

csv2ldif -i1 15 democorp.ldt democorp.csv > democorp.ldi

ignores (bypasses), the first line (-i1) and processes up to 15 fileds each line of data in democorp.csv using the template file democorp.ldt shown above to create an unsorted democorp.ldi file. The unsorted nature is to assume no sorting was done within the csv file for ease of preparing it. Further

ldifsort -u democorp.ldi democorp_sorted.ldi

is used to check uniqueness of entries in the democorp.ldi and generate a sorted LDIF file, see ldifsort Tool -- Sort LDIF Records for more details.

and then this democorp_sorted.ldi being an LDIF file can then be used with

dxloaadb democorp democorp_sorted.ldi

to load into the democorp dsa.

dxmodify vs ldapadd/ldapmodify

Generally, dxmodify is very much similar to ldapadd/ldapmodify. The most important aspect of using dxmodify is that it is part of the CA Directory installation and hence is an officially supported tool, unlike the usual ldapadd/ldapmodify that tend to be open-source tools and hence there may be compliance concerns for some enterprises.

Regardless, the nature of LDAP technology tends to demand input files to be prepared in order to invoke these tools. See the following for more details including examples of the input files regarding dxmodify:

DXmodify Tool -- Add New or Changed Information to a Directory

After the initial deployment of a CA Directory LDAP services, they may be time when one may need to do a large number of updates to the DIT. Under this scenario, using some popular Brower-type of tools like JXplorer and Apache Directory Studio may not be as adequate. It is generally recommended to prepare inputs files that contain multiple entries and use dxmodify to apply the changes to a Directory.

Add/Delete a DIT Container and Entries Beneath it

For performance's sake, when the bulk load involves a complete container of a DIT. You may want to try the following and see these approaches may perform better for your particular use cases:

For an enterprise that use containers like organization units to hold data beneath them, there may be the time when a new organization unit is being formed and a bulk of data need to be loaded into the overall Directory Information Tree. In this case, modifying the production LDAP database through a bulk of regular ldapadd type of operations can have negative impact to its performance.

To address this scenario, there are actually two approaches can help reduce the performance impact:

Use the Distribution Feature of CA Directory

This allows an administrator to create a separate Data DSA for each of the new container that needs to be added to the DIT. All the new entries can then be put into an LDIF file and a simple dxloaddb will allow all the data to be added into the DIT without going through the regular ldapadd performance penalty.

Merge an LDIF Directly into a Prodcution LDAP Service

A production LDAP service (as a DIT) is assumed to have failover/loadbalancing built-in. Since there is a new container within the DIT, we can then assume no production modification would be applied to the new container and entries beneath it. Therefore, we can use the following procedure against each of the replicated Data DSA to merge the new container into the production DIT:

Assume the new LDIF has been prepared and sorted

On a production Data DSA, make sure its multi-write queue does not have updates that that to go out to other peer DSAs. This can be achieved by setting wait-for-multiwrite to true, see set wait-for-multiwrite Command.
Shutdown the DSA. While it is down, if other DSAs need to write to it, those data will be queued up on their sources
run dxdumpdb against the now down DSA to create an LDIF file
append the prepared new LDIF to the end of the dxdumped LDIF
dxloaddb using the appended dxdumped LDIF. dxloaadb will clear up its multi-write queue in the process. This is the reason why we want to make sure there is no data queued up there before shutting down the DSA.
restart the Data DSA
When the DSA is started, it will resume receiving other multi-writes data from its peer DSAs.