August 13, 2024
Game Developer Deep Dives are an ongoing series with the goal of shedding light on specific design, art, or technical features within a video game in order to show how seemingly simple, fundamental design decisions aren’t really that simple at all.
Earlier installments cover topics such as how GOG perfected the imperfect with the re-release of Alpha Protocol, how Ishtar Games designed a new dwarf race in The Last Spell, and how Krillbite Studio cooked up a palatable food prep experience in Fruitbus.
In this edition, the team at Owlchemy Labs tells us in-depth about the technical challenges in porting their VR titles to the Apple Vision Pro.
The launch of the Apple Vision Pro in February marked a big moment in the VR community by being the first major six degrees-of-freedom headset to ship without controllers.
Senior platform engineer Phillip Johnson is happy to explain how we brought Job Simulator and Vacation Simulator to the Apple Vision Pro. We will go through the techniques we used to implement hand tracking and go over the challenges that we faced with the shader and audio systems. By sharing our experience, we hope to see more great fully immersive titles come to the visionOS platform.
30hz hand tracking in a 90hz game
Arguably the biggest challenge we faced during the entire production of this port was compensating for hand tracking updating at 30hz. Both Job Simulator and Vacation Simulator are deeply interactive experiences. Updating hand poses only once every three frames had a few consequences when we initially began work on these ports. Grabbing and throwing objects was almost impossible. Hand velocity would be over-exaggerated, causing destructible objects such as plates to break in our hands. We’d also routinely lose tracking when looking away from the hands. Our titles were unplayable, and it was unclear when there would be an update, so our team set out to solve our hand tracking issues with what was available then.
Images via Owlchemy Labs.
Senior gameplay engineer Greg Tamargo reflects on smoothing out hand tracking with extrapolation.
Due to hand tracking updating at 30hz while the rest of the game updated at 90hz, for every frame containing hand pose data, there would be at least two frames with no updated data at all. Because of this, we had to modify the Unity XR VisionOS package to tell us whether or not data was “fresh” or “stale” and compensate accordingly. We found that simply covering up “stale” frames by blending between the most recent “fresh” hand poses was too slow and felt unresponsive, so we opted to use extrapolation to predict where the hands were going to be while estimating where the next “fresh” hand pose would be before it happened. Anyone who has experience with programming online multiplayer games might be familiar with this technique. By keeping track of at least two recent fresh hand poses, we can calculate a velocity and angular velocity and use those to infer what the pose might be, given how much time has passed from the most recent frame of fresh data. Implementing this created a massive improvement to the functionality and feel of the game. It’s also worth noting that by keeping track of additional frames of fresh hand data, we can create more complex extrapolations to predict where the next hand pose data will be, but when we implemented this, it wasn’t immediately obvious whether this actually improved the game feel any further.
Regardless, simply implementing this pose extrapolation on the hand’s wrist poses made a massive improvement. When we attempted to continue this approach with the rest of the hand data to smooth out the movement of each individual finger, the results were much less promising. So, we decided to try something else to smooth out the poses of all the finger joints.
Images via Owlchemy Labs.
Expert system engineer and handtracking specialist, Marc Huet, gives some insight in the decisions we made regarding poses.
For the hand poses themselves, we wanted to avoid the possibility of creating unnatural poses, so we tried to work with real pose data from the device as much as possible instead of generating our own.
To address the low frequency of updates, we introduced a delay so we could interpolate joint rotations between the two most recent poses while we waited for the next one. To make sure this delay didn’t negatively affect gameplay, we only use the most recently received pose when detecting actions like “grab” and “release,” while the smoothed pose is reserved for presentation through the hand model.
We also took a conservative approach to filling in gaps when individual joints lose tracking. Instead of attempting to create new data for the missing joint with IK, we copy the missing parent-child relationships from the previous pose while leaving all parent-child relationships down the chain intact (see graphic).
Both of these techniques were made significantly easier by storing and working with joint orientations relative to the parent joint instead of relative to the wrist or world origin.
Images via Owlchemy Labs.
Apple has since announced that they will be supporting 90hz hands in the VisionOS 2.0 update, and we will be sure to update our content when that update goes live.
Building shaders and jokes
Unity compiles and caches shaders the first time they are displayed. This compilation causes brief framerate hitches, which is unacceptable on spatial platforms since it causes motion sickness. Due to the spatial nature of the VisionOS, there are some restrictions that require us to rethink how and when we can build shaders. VisionOS requires its applications to draw a frame every two seconds or else the app will be terminated; this makes sense in a spatial environment where users may have multiple applications running, but in a game, it is common to hide shader builds during loading sequences. With the two-second restriction, we were unable to use the standard shader building procedure, so we had to develop a new method from scratch.
Our principal graphics engineer, Ben Hopkins, led the way on our solution. To correctly build shaders, we needed to have every unique vertex layout and shader variant that would then be rendered once at a time, off-screen, during the boot-up sequence. In order to do this, we developed a simple tool that would collect and log vertex layouts from every mesh in the game. These logs would be fed into our warm-up system, where players would encounter one big shader warm-up the first time they ran Vacation Simulator. The sequence would dynamically create a quad for each vertex layout and cycle our shader variants through each one of them. It is admittedly a painful three to four minutes before it completes, so we tried to soften the experience a bit with the very best jokes the port team could write in one hour to keep the player occupied. Once the shaders are built the game will subsequently boot up instantaneously.
Images via Owlchemy Labs.
Spatialization
Daniel Perry, audio director for Owlchemy Labs, explains how we were able to solve the audio issues for our visionOS ports.
The biggest challenge we needed to solve in audio is that Fully Immersive mode did not have access to the Apple Spatializer in Unity, and spatialized audio is crucial for our experiences to be able to bring out the environment and create a vivid and responsive soundfield. We needed to find a solution that would be compatible with the architecture of both Job Simulator and Vacation Simulator. Apple has PHASE (Physical Audio Spatialization Engine) that works with Unity, but using it would require significant changes to our audio flow, including routing, processing, and file loading features.
Currently, the market is still low on spatializer solutions for Unity, and most of the existing ones don’t support the VisionOS.
The Resonance Audio spatializer is open source and multi-platform but it had low maintenance for some time, and had not been compiled for VisionOS. Fortunately, because the source is available, we were able to modify it so that it could be built for the VisionOS.
Because of Resonance’s limited routing approach, we had to create a custom solution for Reverb. For performance reasons on mobile platforms, we always used a minimal amount of simple reverb algorithms with presets for the different rooms and environments, and different audio mixer groups to sum effects in the game. While we couldn’t replicate all effects in the chain of the audio mixer groups, it was crucial to maintain the general atmosphere and feel of the world, and so we created our own solution of a pre-spatialized send/receive system that sends audio from all audiosources to summed streaming audio sources that then are sent to a non-spatialized reverb AudioMixer.
While it is not the ideal order of processing, it allowed us to use Resonance and still get some similar abilities for groups post-processing, and maintain a general similarity to our game in other platforms, while maintaining optimized audio processing performance. Resonance ended up being a bit more compatible with how our audio system is structured.
Conclusion
When we began porting to the Apple Vision Pro, we had no idea if the issues preventing us from launching would be solved in one month or one year, but we knew that we wanted to be there as early as possible. Apple shares our passion for hand tracking-only experiences, as we feel that they are more approachable for a mainstream audience. Due to our ability to create our own tools to address some of our issues, our titles were able to launch on Apple Vision Pro months before the VisionOS 2.0 update. We are proud of the work we’ve done bringing Job Simulator and Vacation Simulator to VisionOS and are excited for new players to experience our award-winning titles.
Read more about:
Deep DivesFeaturesTop Stories
About the Authors
You May Also Like