Fastest way to get Google Storage bucket size?

Google Cloud-StorageGsutil

Google Cloud-Storage Problem Overview


I'm currently doing this, but it's VERY slow since I have several terabytes of data in the bucket:

gsutil du -sh gs://my-bucket-1/

And the same for a sub-folder:

gsutil du -sh gs://my-bucket-1/folder

Is it possible to somehow obtain the total size of a complete bucket (or a sub-folder) elsewhere or in some other fashion which is much faster?

Google Cloud-Storage Solutions


Solution 1 - Google Cloud-Storage

The visibility for google storage here is pretty shitty

The fastest way is actually to pull the stackdriver metrics and look at the total size in bytes: enter image description here

Unfortunately there is practically no filtering you can do in stackdriver. You can't wildcard the bucket name and the almost useless bucket resource labels are NOT aggregate-able in stack driver metrics

Also this is bucket level only- not prefixes

The SD metrics are updated daily so unless you can wait a day you cant use this to get the current size right now

UPDATE: Stack Driver metrics now support user metadata labels so you can label your GCS buckets and aggregate those metrics by custom labels you apply.

Solution 2 - Google Cloud-Storage

Unfortunately, no. If you need to know what size the bucket is right now, there's no faster way than what you're doing.

If you need to check on this regularly, you can enable bucket logging. Google Cloud Storage will generate a daily storage log that you can use to check the size of the bucket. If that would be useful, you can read more about it here: https://cloud.google.com/storage/docs/accesslogs#delivery

Solution 3 - Google Cloud-Storage

If the daily storage log you get from enabling bucket logging (per Brandon's suggestion) won't work for you, one thing you could do to speed things up is to shard the du request. For example, you could do something like:

gsutil du -s gs://my-bucket-1/a* > a.size &
gsutil du -s gs://my-bucket-1/b* > b.size &
...
gsutil du -s gs://my-bucket-1/z* > z.size &
wait
awk '{sum+=$1} END {print sum}' *.size

(assuming your subfolders are named starting with letters of the English alphabet; if not; you'd need to adjust how you ran the above commands).

Solution 4 - Google Cloud-Storage

Use the built in dashboard Operations -> Monitoring -> Dashboards -> Cloud Storage

The graph at the bottom shows the bucket size for all buckets, or you can select an individual bucket to drill down.

object size graph

Solution 5 - Google Cloud-Storage

I found that that using the CLI it was frequently timing out. But that my be as I was reviewing a coldline storage.

For a GUI solution. Look at Cloudberry Explorer

GUI view of storage

Solution 6 - Google Cloud-Storage

For me following command helped:

gsutil ls -l gs://{bucket_name}

It then gives output like this after listing all files:

TOTAL: 6442 objects, 143992287936 bytes (134.1 GiB)

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionfredrikView Question on Stackoverflow
Solution 1 - Google Cloud-Storagered888View Answer on Stackoverflow
Solution 2 - Google Cloud-StorageBrandon YarbroughView Answer on Stackoverflow
Solution 3 - Google Cloud-StorageMike SchwartzView Answer on Stackoverflow
Solution 4 - Google Cloud-Storagedan carterView Answer on Stackoverflow
Solution 5 - Google Cloud-StorageneedcaffeineView Answer on Stackoverflow
Solution 6 - Google Cloud-StorageAnton KumpanView Answer on Stackoverflow