oauth_refresh_token_view_client_key table - CPU Spikes and long stream among the Cassandra nodes during Cassandra repair process.
search cancel

oauth_refresh_token_view_client_key table - CPU Spikes and long stream among the Cassandra nodes during Cassandra repair process.

book

Article ID: 419931

calendar_today

Updated On:

Products

CA API Gateway

Issue/Introduction

DBA is concerned in regards to the large partition of oauth_refresh_token_view_client_key which caused CPU spikes and long stream among the Cassandra nodes during Cassandra repair process.
 
This issue could cause a potential Prod outage due to high CPU spike on Cassandra Nodes as it attempt to repair table used for OTK 

What is known: 
 
1. oauth_refresh_token_view_client_key is used as part of the Authorization Code flow, where clients were given Refresh Token so they can generate Access Token without requiring authorization code flow. 
2. Some of the Clients could have up to 10 million Resource Owner, and Refresh Token TTL of 1.5 years. 
3. oauth_refresh_token_view_client_key and the use case where we have a single client_key with potentially up to 10 million records and long lived TTL on the refresh token. Due to the way the table is setup, the partition key is tied to a single key, and generates a really big SSTable during repair and compaction (estimated at 147GB) which then get streamed to the other nodes to satisfy the Replication Factor. The process of streaming this huge data caused spike in CPU usage on the nodes. 
 
Work Done
 
Several suggestions were given by the Cassandra supports to remediate the situation. One of the suggestion given was to update oauth_refresh_token_view_client_key table schema, so that the partition key used is more unique.
 
 

Environment

CA API Gateway - OTK Plugin Version: 4.6.0

Cause

Product's limitation

Resolution

Product Management notified this is and enhancement and it is on to be delivered stage (TBD), There is not specific ETA, possible in next Release Q2 next year