If you would like CloudHealth to read from a BigQuery View off the standard BigQuery Billing Export table, this is supported. But! You must keep in mind partitioning so that we incur the least amount of costs. Otherwise, every query (one per day) will scan the entire dataset and we will incur significant cost for the customer.
Why would you do this?
As a direct customer, you wish for CloudHealth to provide reports solely on a specific selection of projects within your Billing Account. To achieve this, it is necessary to establish a BigQuery View derived from the standard BigQuery billing export, applying a filter based on Project ID. Conversely, as a partner, you prefer that CloudHealth does not charge you for all your sub billing accounts, but rather for a selected few. In this case, you will need to create a BigQuery view based on the master billing account export, utilizing a filter on Billing Account ID.
What is a partition, and why should I care?
Partitioning is used in BigQuery for cost and performance reasons. In BigQuery, you are charged for how much data is scanned. By limiting the query to a partition, you are only charged for the amount of data in that partition and it only takes the time to scan that one partition vs the entire dataset.
In the Billing export, GCP partitions by day. So when we collect data, we collect each day's partition in a separate query. If the customer does not partition by day, every query we make will scan their entire dataset, which could incur significant cost
What are the overall steps to follow
Constructing the Query
Depending on if this is a direct/partner customer and the use case, there are two different types of queries:
A. Direct Customer looking to limit which projects CloudHealth collects
If you have a direct customer that does not want to give access (or be charged by) CloudHealth, you can create a view separate from their BigQuery export filtered to the desired projects.
SELECT *, EXTRACT(DATE FROM _PARTITIONTIME) as PartitionTime
FROM `project_id.dataset_name.table_name` WHERE project.id IN (‘project-1’,‘project-2’)
The query above does the following:
- Create a new column in the view titled “PartitionTime” (note spelling/capitalization must match)
- Sets the value in that new column equal to the value in the _PARTITIONTIME pseudo column in the standard BigQuery Billing Export
- Copies over all cost data incurred by projects mentioned in the query
B. Partner Customer looking to limit which Billing Accounts CloudHealth collects
For partners, they may have customers under their master billing account that they do not want to run through CloudHealth. If that is the case, they can create a View and CloudHealth will only report on those Billing Accounts.
SELECT *, EXTRACT(DATE FROM _PARTITIONTIME) as PartitionTime
FROM `project_id.dataset_name.table_name` WHERE billing_account_id IN (‘ba-id1’,’ba-id2’)
The query above does the following:
- Create a new column in the view titled “PartitionTime” (note spelling/capitalization must match)
- Sets the value in that new column equal to the value in the _PARTITIONTIME pseudo column in the standard BigQuery Billing Export
- Copies over all cost data incurred by Billing Accounts mentioned in the filters