Web Analytics with Google Cloud Functions and BigQuery


Following up on my last post and a Twitter exchange and of course, musing on commodity pricing, I decided to play around with Google Cloud Functions to see if it was a viable alternative to tools like Segment, Mixpanel.

My goals were:

  1. Cheap: <$10/month
  2. Easy: Little to no maintanence
  3. Flexible: SQL access (near real-time if possible)

Chord aims to do all three. Setup is surprisingly easy, though it takes a few more screenshots than I feel like taking at 2am in the morning. And, the cost is a fraction of a propriety tool.

Cost Estimates

Assumptions

  • Once warmed up, requests takes <200 milliseconds.
  • Based on Sentry’s BigQuery data, 2 million rows of pageview data are ~1 GB.
  • I’m assuming we return 1KB / request (responses have no body data, but I assume header data takes 1kb)
  • Assuming we use 128MB of memory and a 200MHz CPU.
  • For estimating costs with Segment, ~20 pageviews per MTU.

2 million pageviews

  • Chord: $0
  • Segment: $1,125/month (for 100k users)
Metric Gross Free Net Price Total
Requests 2m 2m 0 0.0000004 0
GB-Seconds 50,000 400,000 0 0.0000025 0
GHz-Seconds 80,000 200,000 0 0.0000100 0
Networking 2 5 0 0.12 0
BigQuery (GB) 1 10 0 0.05 0

5 million pageview

  • Chord: $2.20
  • Segment: $2,241/month (for 250k users)
Metric Gross Free Net Price Total
Requests 5m 2m 3m 0.0000004 $1.20
GB-Seconds 125,000 400,000 0 0.0000025 0
GHz-Seconds 200,000 200,000 0 0.0000100 $0.00
Networking 5 5 0 0.12 0
BigQuery (GB) 2.5 10 0 0.05 0

Interestingly, Cloud Functions definitely have a warm up time. A warmed up instance might finish execution in even less than <100ms; a cold start might take 1481ms! There are about 2.6 million seconds in a month. A request a second should keep the function relatively active, so as you send more requests, the average execution time should be on average 200ms, keeping you under the free tier, minus the requests themselves.

>20 million page views

Eventually, though, this approach gets more expensive. After the free tier, the costs become about $1 / million requests1. At this point, Google’s detailed a better option for larger scales. It’s a bit more expensive at first (Load Balancer rules cost a flat $18/month), but the cost per million events drops by an order of magnitude, as the “most expensive” part of the event pipelines (Cloud Functions) are removed. Beyond that, you can even grab log files and bulk load them into BigQuery once an hour, removing the BigQuery streaming inserts from the costs, though at this point, you’re probably also doing a lot of transforms on the data to make it useable. Still, the cost can brought down by another order of magnitude to around $0.10/million events.

  1. 0.0000004/request + 0.025 GB-seconds/request * $0.0000025/GB-seconds + 0.04 GHz-second/request * $0.0000100/request + 0.000001 GB/request * $0.12/request + 0.0000005 GB/row * $0.05 / GB = $0.0000010075/request = $1 / million requests