Lattice 0.6 - Automatic, Fine-Grained Parallelization

Lattice is a high-performance visual scripting system targeting Unity ECS. Read more here.

Lattice 0.6 just released! On the whole this is largely a stability and performance update, with one big exception: Automatic Parallelization of Lattice Scripts. What does that mean?

Well, a Lattice Script normally looks something like this:

Before 0.6, the Lattice compiler generated .NET IL that executed each node one at a time, serially. While that sounds slow, if you think about it, it’s how all programming languages work! Unless you tell it otherwise, code generally executes in one big line from beginning to end, on a single thread. Actually Lattice is a vector execution engine, so it executes each node across all entities as a chunk, so it’s even a bit faster than that.

In the new version of Lattice, all nodes run maximally in parallel, without any annotations from the scripter. The Lattice compiler automatically figures out how to optimally split things up into jobs and schedule them under the hood with the Unity job system.

If you’re familiar with dataflow, it’s a variant of that style with some extra goodies.

So that mean if you have a body of Lattice scripts in 0.5, in 0.6 they will simply just run faster — no work needed on your end.

Fine Grained Parallelism

Parallelization is “fine-grained”. Normally when you work with parallel code (like with Unity jobs) you split out larger pieces of code to run in parallel. Another way to say it: the fundamental unit of parallelization is a larger routine or function.

In Lattice, the fundamental unit of parallelization is much smaller: it’s able to schedule each node separately. Actually, it’s even smaller than that, because most Lattice nodes (that you would write in a script) compile down to several “IR Nodes” in the compiler. Each of the nodes below represents a vector block of values in a lattice script.

The above script, as the compiler sees it.

Here’s the same script, but colored by ‘work unit’. (That’s what we call this fundamental parallelizable unit).

Each different color is eligible for scheduling in a different Unity job. Crucially, the compiler is not required to schedule them separately. If work is scheduled too finely, the overhead of scheduling and moving data around will eat into the benefits of parallel execution. But it is allowed to, and so far I’ve found the best speed ups come from maximal parallelization.

Here’s what the Lattice frame tick looks like now in our project, with parallelization enabled:

Hundreds of lattice nodes split across all threads.

“Fearless Concurrency”

Writing parallel code is really hard. You have to keep track of which values are being written from which threads, otherwise you’ll run into race conditions, deadlocks, or worse. If you’ve ever worked with Unity’s job system directly, you’ll know to what lengths Unity goes to keep race conditions from happening.

What’s really cool about Lattice is that it is completely data-race free. You can just write your scripts — whether they run in parallel or serially is just an afterthought. One note: If any of your nodes call Unity non-thread-safe APIs, you’ll need to tag those nodes with [MainThread]. That will force those nodes onto the main thread.

Lattice is able side-step all this usual complexity, because it’s a pure, data-flow language. Data dependencies are fully defined under the hood, and so scheduling is simply a topological sort across the entire compilation graph. Even cooler, the schedule for a graph is built at compile time, rather than at runtime, as Unity’s job system usually requires!

In a lot of ways, Lattice is just a really fancy scheduler for Unity’s job system. It manages the inputs and outputs of each function, and figures out where to store intermediate values.

Also in 0.6: Full Source!

Folks will be glad to hear that the source for the compiler is now distributed as a part of the package. This should make integrating the library much easier, and means you can quickly fix bugs, submit PRs against the repo, etc. Keep in mind the license is not MIT, but it’s intended to be free for most reasonable use cases. Check the readme for more info and let me know if you have any questions.

This also fixes a pernicious bug in the latest version of Unity ECS, which did not play well with precompiled DLLs.

Overall Status and Roadmap

Lattice is coming along nicely. A big change that isn’t visible is an underlying refactor preparing the way for sub-graphs! I’m very excited about this because it’s going to be a major improvement in ergonomics, and allow making larger, reusable components that you wouldn’t easily be able to define with C#.

I’ve also figured out a way to do subgraphs with zero overhead. To the compiler, sub-graphs will just look like normal, inlined code, and as we add optimizations to the compiler, they will be able to optimize across subgraph boundaries. Excited to share more about this soon.

Thanks for reading!

You can download the latest Lattice release on Github, and you can chat with us on the Unity forum thread, or on our Discord channel! Come say hi!