Web Analytics with Google Cloud Functions and BigQuery
Following up on my last post and a Twitter exchange and of course, musing on commodity pricing, I decided to play around with Google Cloud Functions to see if it was a viable alternative to tools like Segment, Mixpanel.
My goals were:
- Cheap: <$10/month
- Easy: Little to no maintanence
- Flexible: SQL access (near real-time if possible)
Chord aims to do all three. Setup is surprisingly easy, though it takes a few more screenshots than I feel like taking at 2am in the morning. And, the cost is a fraction of a propriety tool.
- Once warmed up, requests takes <200 milliseconds.
- Based on Sentry’s BigQuery data, 2 million rows of pageview data are ~1 GB.
- I’m assuming we return 1KB / request (responses have no body data, but I assume header data takes 1kb)
- Assuming we use 128MB of memory and a 200MHz CPU.
- For estimating costs with Segment, ~20 pageviews per MTU.
2 million pageviews
- Chord: $0
- Segment: $1,125/month (for 100k users)
5 million pageview
- Chord: $2.20
- Segment: $2,241/month (for 250k users)
Interestingly, Cloud Functions definitely have a warm up time. A warmed up instance might finish execution in even less than <100ms; a cold start might take 1481ms! There are about 2.6 million seconds in a month. A request a second should keep the function relatively active, so as you send more requests, the average execution time should be on average 200ms, keeping you under the free tier, minus the requests themselves.
>20 million page views
Eventually, though, this approach gets more expensive. After the free tier, the costs become about $1 / million requests1. At this point, Google’s detailed a better option for larger scales. It’s a bit more expensive at first (Load Balancer rules cost a flat $18/month), but the cost per million events drops by an order of magnitude, as the “most expensive” part of the event pipelines (Cloud Functions) are removed. Beyond that, you can even grab log files and bulk load them into BigQuery once an hour, removing the BigQuery streaming inserts from the costs, though at this point, you’re probably also doing a lot of transforms on the data to make it useable. Still, the cost can brought down by another order of magnitude to around $0.10/million events.
0.0000004/request + 0.025 GB-seconds/request * $0.0000025/GB-seconds + 0.04 GHz-second/request * $0.0000100/request + 0.000001 GB/request * $0.12/request + 0.0000005 GB/row * $0.05 / GB = $0.0000010075/request = $1 / million requests ↩