Data to collect for Cloud Data Protection Performance Related Issues

book

Article ID: 184721

calendar_today

Updated On:

Products

CDP for ServiceNow CDP for Salesforce CDP Communication Server CDP for Oracle Sales Cloud

Issue/Introduction

This is for any customers facing performance issues with their Cloud Data Protection (CDP).

Please make sure to collect this data for any customer who seems to be having performance related issues.

Environment

CDP 4.13.x or later

Resolution

Examples of performance issues are, but not limited to:

  • Needing to restart CDP frequently for any reason
  • High resource utilization – Memory / CPU / Disk IO
  • System performance is slow (slow response time)
  • System outage (CDP shuts down unexpectedly)

Data to collect:

  • Infrastructure and Environment configuration
  • Network Topology (Number of forward proxies, number of reverse proxies, load balancers, etc.)
  • Services running on each node (Container, MTA, MC, Cluster Manager)
  • Confirm that they are running one node with MC and container only
  • Confirm they are running one reverse proxy on a separate node
  • Server roles for containers: Traffic handling / generic / background service
  • Get the arguments files and properties files all services

Management Console screenshots:

  • Proxy module configuration page (Server and Client tabs)
  • Instance configuration page (take multiple screenshots to capture whole page)
  • Search Engine page
  • Job Management page

JMX Data: Make sure JMX Monitoring script is running continuously in the background and capturing data properly. Provide the csv data file.

  • Confirm script is running and collecting data by doing a “tail -f <hostname>.txt”. Make sure it is writing data every minute and the data looks good (not zeroes, not JMXNOK).
  • Provide the <hostname>.txt
  • If not already present, add the following line to the container arguments file: -XX:+HeapDumpOnOutOfMemoryError and restart the container

Logs: Zip up entire logs directory on each node, containing container, access, cassandra, and console logs

  • Output of top command
  • Output of free -h
  • Output of ulimit -a
  • Thread Dump: Create a thread dump by issuing the following command: kill -3 <pid of container>
  • Create a thread dump every 4 hours for 24 hours, and during a system performance event, if possible

Heap Dumps: There are two types of heap dumps – one that contains everything in the heap currently, and one that runs garbage collection first and then produces a heap dump. Both of them are useful

  • Command for normal heap dump: jmap -dump:format=b,file=container.hprof <pid of container>
  • Command for garbage collected heap dump: jmap -dump:live,format=b,file=container_live.hprof <pid of container>
  • Please run both commands during normal operations to get a baseline, and then try and run both commands during a system performance event, if possible
  • The “HeapDumpOnOutOfMemoryError” argument automatically creates a heap dump on out of memory error. Grab this as well.
  • These memory dumps will be called something like “java_pid3756.hprof” located in the container folder. Any new ones created will have a recent timestamp.
  • Make sure there is enough disk space for all of this

As for memory configurations, the only change that we want to recommend is to reduce Cassandra from taking up 8 GB to 4 GB of memory. Here is how to do that:

  • In the cassandra/conf/cassandra-env.sh file there are two options, commented out by default:

#MAX_HEAP_SIZE="4G"

#HEAP_NEWSIZE="800M"

  • Uncomment these and change them to the following:

MAX_HEAP_SIZE=”4G”

HEAP_NEWSIZE=”1G”

  • Restart Cassandra. Cassandra could take up to 2 hours to start, so this should be done during a maintenance window