PostgreSQL directory consumes large amount of space.
search cancel

PostgreSQL directory consumes large amount of space.

book

Article ID: 204062

calendar_today

Updated On:

Products

CA Application Performance Management Agent (APM / Wily / Introscope) CA Application Performance Management (APM / Wily / Introscope) INTROSCOPE DX Application Performance Management

Issue/Introduction

I have been running the 20.2 cluster since Oct 1st and have very few agents connecting to the cluster.

Also the Postgres data is using 94% of the filesystem. There is not enough space to vacuum the database.  

I believe the cause of this is due to running the fake agent process to reproduce agents connection and metrics count.  

[email protected]:/APM_dxi/axaservices/pg-data> du -hsx * | sort -rh | head -10
472G    userdata
12K     dxi
[email protected]:/APM_dxi/axaservices/pg-data> df -kh /APM_dxi/
Filesystem                             Size  Used Avail Use% Mounted on
cdlenc1inasv44.es.oneadp.com:/APM_dxi  542G  509G   34G  94% /APM_dxi

After vacuuming the DB, the pods will not start up.  All pods are down.

Environment

Release : 20.2

Component : APM Agents

Resolution

This looks very close to a known zookeeper bug:

https://issues.apache.org/jira/browse/ZOOKEEPER-2332



Check if there is a zero-length TxnLog file present in the log directory. (/nfs/ca/dxi/zookeeper/datalog/version-2)

If yes, delete it.

Steps to follow:

1. Delete zero-size log file
2. Delete the apmservices-zookeeper pod
3. Scale down pods that was still in CrashBackoff status
4. Scale the pods back up

 

Environment is now back up and running.

Additional Information

The CrashLoopBackOff are all apm pods, mostly due to not able to connect to the gateway or Zookeeper. The gateway pod also fails to connect to Zookeeper.

Checked but apmservices-zookeeper pod is not seen in the List of Pods in NODE_HEALTH.log.

After checking if this pod exists, started that pod using 

‘kubectl scale deploy apmservices-zookeeper --replicas=1’

But still the environment wasn't up.  Used latest whatsupdxi script and received this output.  See Resolution.

 

[ApmCosAgent] Agent configuration check.

[ApmCosAgent] Forking to backround

[ApmCosAgent] Agent starting

[ApmCosAgent] Agent status check (1)

agent down

[INFO] [MainThread] [ApmCOsAgent] Agent Running. PID: 7

[ApmCosAgent] Agent status check (2)

agent up (PID: 7)

[ApmCosAgent] Agent is up.

ZooKeeper JMX enabled by default

Using config: /conf/zoo.cfg

[myid:] - INFO  [main:[email protected]] - Reading configuration from: /conf/zoo.cfg

[myid:] - INFO  [main:[email protected]] - autopurge.snapRetainCount set to 3

[myid:] - INFO  [main:[email protected]] - autopurge.purgeInterval set to 0

[myid:] - INFO  [main:[email protected]] - Purge task is not scheduled.

[myid:] - WARN  [main:[email protected]] - Either no config or no quorum defined in config, running  in standalone mode

[myid:] - INFO  [main:[email protected]] - Reading configuration from: /conf/zoo.cfg

[myid:] - INFO  [main:[email protected]] - Starting server

[myid:] - INFO  [main:[email protected]] - Server environment:zookeeper.version=3.4.13-2d71af4dbe22557fda74f9a9b4309b15a7487f03, built on 06/29/2018 04:05 GMT

[myid:] - INFO  [main:[email protected]] - Server environment:host.name=apmservices-zookeeper-847dd6c5c6-lv9zp

[myid:] - INFO  [main:[email protected]] - Server environment:java.version=11.0.8

[myid:] - INFO  [main:[email protected]] - Server environment:java.vendor=AdoptOpenJDK

[myid:] - INFO  [main:[email protected]] - Server environment:java.home=/opt/jdk

[myid:] - INFO  [main:[email protected]] - Server environment:java.class.path=/opt/zookeeper/bin/../build/classes:/opt/zookeeper/bin/../build/lib/*.jar:/opt/zookeeper/bin/../lib/slf4j-log4j12-1.7.25.jar:/opt/zookeeper/bin/../lib/slf4j-api-1.7.25.jar:/opt/zookeeper/bin/../lib/netty-3.10.6.Final.jar:/opt/zookeeper/bin/../lib/log4j-1.2.17.jar:/opt/zookeeper/bin/../lib/jline-0.9.94.jar:/opt/zookeeper/bin/../lib/audience-annotations-0.5.0.jar:/opt/zookeeper/bin/../zookeeper-3.4.13.jar:/opt/zookeeper/bin/../src/java/lib/*.jar:/conf:

[myid:] - INFO  [main:[email protected]] - Server environment:java.library.path=/usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib

[myid:] - INFO  [main:[email protected]] - Server environment:java.io.tmpdir=/tmp

[myid:] - INFO  [main:[email protected]] - Server environment:java.compiler=<NA>

[myid:] - INFO  [main:[email protected]] - Server environment:os.name=Linux

[myid:] - INFO  [main:[email protected]] - Server environment:os.arch=amd64

[myid:] - INFO  [main:[email protected]] - Server environment:os.version=3.10.0-1127.8.2.el7.x86_64

[myid:] - INFO  [main:[email protected]] - Server environment:user.name=default

[myid:] - INFO  [main:[email protected]] - Server environment:user.home=/home/default

[myid:] - INFO  [main:[email protected]] - Server environment:user.dir=/opt/zookeeper

[myid:] - INFO  [main:[email protected]] - tickTime set to 2000

[myid:] - INFO  [main:[email protected]] - minSessionTimeout set to -1

[myid:] - INFO  [main:[email protected]] - maxSessionTimeout set to -1

[myid:] - INFO  [main:[email protected]] - Using org.apache.zookeeper.server.NIOServerCnxnFactory as server connection factory

[myid:] - INFO  [main:[email protected]] - binding to port 0.0.0.0/0.0.0.0:2181

[myid:] - ERROR [main:[email protected]] - Unexpected exception, exiting abnormally

java.io.EOFException

        at java.base/java.io.DataInputStream.readInt(DataInputStream.java:397)

        at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)

        at org.apache.zookeeper.server.persistence.FileHeader.deserialize(FileHeader.java:66)

        at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:585)

        at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:604)

        at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:570)

        at org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.next(FileTxnLog.java:650)

        at org.apache.zookeeper.server.persistence.FileTxnSnapLog.fastForwardFromEdits(FileTxnSnapLog.java:219)

        at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:176)

        at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:217)

        at org.apache.zookeeper.server.ZooKeeperServer.loadData(ZooKeeperServer.java:284)

        at org.apache.zookeeper.server.ZooKeeperServer.startdata(ZooKeeperServer.java:407)

        at org.apache.zookeeper.server.NIOServerCnxnFactory.startup(NIOServerCnxnFactory.java:118)

        at org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:122)

        at org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:89)

        at org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:55)

        at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:119)

        at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)