Apps are missing logs in App Metrics Log Store in Tanzu Application Service for VMs
search cancel

Apps are missing logs in App Metrics Log Store in Tanzu Application Service for VMs

book

Article ID: 297943

calendar_today

Updated On:

Products

VMware Tanzu Application Service for VMs

Issue/Introduction

Symptoms

  • Some apps are missing logs in the App Metrics, however there are logs available while checking from the Apps Manager. 
  • This issue was observed on foundation with Tanzu Application Service (TAS) for VMs v2.9+ and App Metrics: v2.0.5-build.6 or Metric Store: v1.4.4.

How to identify this issue

All of the problem apps have the same partition ID, 0 and 63. For example, issued app GUID: 44f5d477-d3b1-4499-835f-3a59642424cc.
vcap@10fddf38-4c12-4fc2-95f4-58f42932ecae:/var/vcap/jobs/log-store/packages/node-locator$ SOURCE_ID=44f5d477-d3b1-4499-835f-3a59642424cc ./node-locator 
FIELD NAME:               TYPE:           ENV:                  REQUIRED:  VALUE:
Config.Instances          map[int]string  LOG_STORE_INSTANCES   true       map[0:10.xxx.xxx.xx:8080 1:10.xxx.xxx.xx:8080 2:10.xxx.xxx.xx:8080]
Config.LocalAddr          string          LOG_STORE_LOCAL_ADDR  true       10.xxx.xxx.xx:8080
Config.PartitionCount     int             PARTITION_COUNT       true       64
Config.ReplicationFactor  int             REPLICATION_FACTOR    true       2
Config.SourceID           string          SOURCE_ID             true       44f5d477-d3b1-4499-835f-3a59642424cc
Source ID 44f5d477-d3b1-4499-835f-3a59642424cc lives on the following partitions
	 ID 	 ADDRESS
	 63 	 10.xxx.xxx.xx:8080
	 0 	  10.xxx.xxx.xx:8080

Run the influx_inspect verify command on all log-store VMs, and ensure there was no error message.

Output is similar to the following:
/var/vcap/store/log-store/influxdb/8/data/logs/default/16122240000000/0000003-000001.tsm: healthy
Broken Blocks: 0 / 17739, in 0.100052868s

Verify if there is any app data in the given partition table. As expected, there is one partition that doesn't have app data:
vcap@10fddf38-4c12-4fc2-95f4-58f42932ecae:/var/vcap/data/packages/influx-inspect/cfd412d2b0968da3c4031c97c5699e5b3c54e738$ grep -r 44f5d477-d3b1-4499-835f-3a59642424cc /tmp/step5-63 | wc -l
0
vcap@10fddf38-4c12-4fc2-95f4-58f42932ecae:/var/vcap/data/packages/influx-inspect/cfd412d2b0968da3c4031c97c5699e5b3c54e738$ grep -r 44f5d477-d3b1-4499-835f-3a59642424cc /tmp/step5-0 | wc -l
441


Environment

Product Version: 2.1

Resolution

Since App Metrics v2 is rearchitected, the log-store is intended to replicate data across the number of nodes per the replication factor configuration. By default it is set to replication factor of 2 across 3 nodes.

In this edge case, logs were not correctly getting written to multiple partitions on a single node. This only happens when a source ID gets hashed onto partition 63 and 0 on log-store-vms/0 in a 3 nodes, 64 partition, 2 replication factor default deployment. This is why we wouldn’t see logs propagated into partition 63.
 

Workaround

A workaround for this issue would be to change the configuration from 3 nodes to 4 nodes.

Note: Log store does not support horizontal scaling, and this will result in total data loss. We recommend doing this with new disks.

If you require the logs, you can use the downloads endpoint.

# download the gz file to the tmp directory
curl -k --insecure --fail --output /tmp/logs.gz -vvv --capath /var/vcap/jobs/log-store/config/certs/ca.crt --cert /var/vcap/jobs/log-store/config/certs/api.crt --key /var/vcap/jobs/log-store/config/certs/api.key  https://localhost:8080/v1/sources/<your-source-id>/download
 
# head over to the /tmp folder and unzip
cd /tmp
gzip -d logs.gz
 
# check logs are uncorrupt and human readable
cat logs


Conclusion

For more information, refer to the App Metrics v2.0.6 - Release Notes under Maintenance Updates.

Log Store has the following fixes:

"Fixed an issue causing missing application logs when the partition count was not divisible by 3. Note that this only affected a small percentage of applications and was not a widespread issue."