v8go and parallelism

June 27th, 2023

Parallelism with v8go is rarely worth the cost. In my very simple case, I’m passing a number of JSON objects as input to an arbitrary number of JavaScript functions.

v8go is a wrapper around v8, so v8go calls are system calls. This means calls aren’t safe for goroutines because goroutines can be moved between threads and therefore pointers to v8 values can become invalid. This means you need to thread lock all of the code calling v8.

A moderately complicated function call takes about 20us. The cost of parallelism is 70x the cost of running a function on its own. Creating an isolate in parallel is 10x the cost of calling a function, but the most expensive cost is copying large amount of data around, which has a base cost of 50us but scales at ~10us/KB. This means that for any 100KB+ JSON input, the overhead of concurrency is significant compared to the compute cost. Once in the VM though, computing on large objects is relatively cheap.

Not surprising. When the data is small, compute becomes the primary cost and parallelism is worth it. When the data is large, the cost of copying data around becomes the primary cost and parallelism is not worth it. What the admittedly crude benchmarks show is the ratio is 1 reducer per bin per KB. So for a 1KB object, you need to be able to put at least 1 reducer in each bin to “break even” on copy costs, plus the fixed cost of spinning up isolates.

The trick is that if the number of available cores is small, bin’ing the work will actually slow execution because you incur the additional startup cost. Therefore, the number of bins should never exceed the number of available cores: “locking” the cores for usage is therefore necessary.

This really underlines why goroutines are so fantastic. They’re cheap, both mentally, memory-wise, and computation-wise. V8 concurrency stands in stark juxtaposition to this.