Waterloo Reality Labs, based at the University of Waterloo, specializes in XR (extended reality) and spatial computing. Our team offers students hands-on experience with consumer VR (virtual reality) and AR (augmented reality) technologies. Our focus spans a broad range of areas, including hardware, optics, mechanical design, firmware, software, and machine learning applications.
Based off of open-source guides, we are building our very own, fully custom VR headset. We call it Reality from Scratch.
We first soldered an inertial measurement unit (IMU) and microcontroller unit (MCU) together, and got real-time motion vector data translated into SteamVR with drivers forked from the OpenVR SDK. Then, the VR Compositor output was routed to our VR displays, which comes with accompanying lenses and a custom 3D-printed housing. In addition to the HMD, we are building Vive Wand-style controllers, which we'll dive into deeper below.
From this headset, we plan to build other systems, such as an inside-out 6DoF tracking solution using visual-inertial odometry, or a varifocal optical stack using eye tracking and motors or voice coils.
A key factor of immersion in VR is field of view (FOV). The eyesight of a human typically covers 220 degrees horizontally and 130 degrees vertically - however, the vast majority of consumer VR headsets today can only cover 100-120 degrees horizontally and 90-100 degrees vertically.
We aim to explore the importance of FOV by creating our own wide FOV prototype. We will be stitching 2 horizontal 4K displays together and using large fresnel panels, along with a custom-designed shell. We will explore custom distortion correction in SteamVR, as well as DFR (dynamic foveated rendering) with built-in eye tracking (using EyeTrackVR).
The goal of this project is to create an importable Unity package which expands and simplifies the hand gesture recognition system provided by Meta. Instead of using an on-or-off approach to finger poses, we will take the float values given by the sensors on the headset for finger positions and apply a machine learning approach. By collecting a training set of popular hand gestures, we will train neural networks to classify them. The output of these trained neural networks will be accessible to scripts that we create in C#, which will finally be attachable to GameObjects within Unity.
When you prompt a virtual assistant (for example Meta AI on Raybans glasses), what happens when you ask “What am I looking at”? Currently, the pipeline seems rather simplistic. The cameras on the glasses take a picture, that picture is passed through a model that can assign text labels to images, and finally that text label describing the whole image is passed into an LLM. This process, especially the step where a model must describe everything in an image using words, is often inaccurate.
What if we could build a system that...
- ...provides a richer text summary of a virtual environment, complete with descriptions of how objects compose each other, are placed within/next to/on top of each other?
- ...also describes how you, the user, is interacting with that environment at any moment? Could we assign additional text to describe that you are pointing at a specific object, or reaching out for one?
- ...runs in real time, that is, can constantly update every frame to provide an updated description. That way, we wouldn’t have to wait for text generation, and we could create a live captioning system?
- ...runs entirely on-device, meaning this information is never sent to the cloud?
If we created this, we could use it for...
- ...in-application virtual assistants that make use of a rich text summary for high-accuracy responses
- ...virtual science labs where users could receive detailed auto-generated scientific explanations about tools and objects they interact with
- ...dynamic VR scene descriptions for the visually impaired, describing layout and objects, or even what they’re holding, pointing at or nearby to
- ...and so much more
Universal Text aims to explore this. We will develop a structured software package for Unity, which is composed of several scripts. We will begin with fully-virtual environments—artificial scenes that we build and label ourselves. The goal is to create a system that allows Unity VR developers to easily label their GameObjects with descriptions, and seamlessly integrate tutorials, live captioning for accessibility, or virtual assistants into their application.
Components | Our Choice | Count |
---|---|---|
IMU | MPU-9250 | 1 |
MCU | Arduino Pro Micro | 1 |
Display | 1440x1440 90Hz LCDs | 2 |
Housing | 3D-printed | varies |
Lenses |
|
2 |
Components | Count |
---|---|
PS3 Eye Camera | min 1, max 7 |
HadesVR/PS3 Move Controllers | 2 |
USB Hubs | varies |
Here's a simple flow chart of how the different components of the VR device interact with each other:
Our team has received custom PCBs and other various electrical components for the build of 2 DIY Vive Wand-like controllers, which will be based off of this WIP open-source guide by LiquidCGS (creator of HadesVR). Each controller will have an IMU, a rechargeable battery, RF transceivers, tactile buttons, triggers, and joysticks. The HMD's microcontroller will also be upgraded and moved onto a central PCB. We are almost done with soldering.
SteamVR is the only universal platform with accessible driver SDKs (from Valve's OpenVR). It is an easy choice for an open-source VR project. The drivers for Reality from Scratch can be found here. We use them in conjunction with the FastIMU Arduino library.
We recommend fresnel lenses for any DIY VR build, since they are readily available for very low prices on platforms like Amazon and Aliexpress, and are thin and lightweight. Traditional biconvex lenses are wider, heavier (when built with glass), and usually cost more, but they may have increased visual clarity and no god rays. This is due to the design of fresnels and their fine concentric lines, which can introduce god rays and other distracting artifacts. Below is a visual comparision that should help explain the design of fresnel lenses further.
Figure 1 (left): Fresnel lens (left) and equivalent power plano-convex lens (right).
Figure 2 (right): Collapsing a conventional lens into an equivalent power Fresnel lens.
Outside of biconvex and fresnel lenses, there are not many options, though there are some advanced DIY stacked lens solutions out in the wild.
A lot of these simpler lenses can be purchased at a wide assortment of focal lengths; we hope to test multiple. Depending on the focal length, the length of the headset's housing will vary, so there may be some merit in seeing the difference between shorter and longer housings (in terms of image quality, FOV, perceived weight of the HMD and how this affects comfort, etc.) Creating the mechanical design for the housing of the VR headset will be challenging, given that we'd like to include user-addressable IPD, as well as test different housing for different lenses. However, we are very excited to begin testing this once we finish modelling the 3D prints for our headset.
For a comphrehensive guide on how to build your own Reality from Scratch - our open-source, DIY HMD, check out the guide here, or join the UW Reality Labs team if you're a student at the University of Waterloo!
This section dives into the various technological barriers when desinging comfortable and user-friendly VR products, and how these obstacles can be overcome. We talk about the limitations of IMUs when considering 6DoF tracking and the importance of lenses and optical technologies in VR.
While the accelerometer in IMUs can technically be used for positional tracking, it is just not possible for it to be accurate enough on its own. A positional tracking demo found on YouTube that used the IMU of the Oculus DK1 showed that since acceleration data needs to be integrated twice in order to become displacement (positional) data, there is a significant amount of error (quadratic) introduced into the tracking over time (in the order of meters per second), which causes drift. Therefore, reference points need to be set up in the surrounding space of the user, through either inside-out (visual-intertial odometry) or outside-in (base-station/lighthouse) tracking.
Knowing this helps us understand how modern standalone and PC VR headsets handle positional tracking. On Oculus's "Cresent Bay" prototype and (consequently) the Rift CV1, for example, the IMU was almost purely used for positionally tracking fast movements, but the external camera would track the "constellations" (IR LEDs) on the HMD and Touch controllers and continously correct the IMU's positional error that, without the external camera, would quadratically increase over time.
The Oculus Cresent Bay prototype, with its IR LEDs.
We hope to achieve a full 6DoF positional tracking system with our headset by initially using PSMoveServiceEx. This is similar to outside-in tracking using IR LEDs, except that there is only one 'giant' LED, which emits colour on the visible spectrum of wavelength. Next, we want to work on implementing existing open-source SLAM research projects (like Basalt) that can create 'point clouds' (or similar) of the surrounding space and use this information as reference points for inside-out postitional tracking.
The interactions between the lenses and the user can often make or break the experience of a lot of consumer VR headsets. Popular consumer VR products like the Valve Index have fully adjustable lens interpupilary distance (IPD), as well as adjustable eye relief (distance of the lenses from the user's eyes). This makes this product much more accessible to a wide range of users, and serves to be more comfortable for individuals sensitiive to such changes. However, the further the lenses are from the user's eyes, the smaller the field of view (FoV) is, (given the lenses stay the same size) - and this is why it is especially important that the VR headset sits at a reasonable length from the user's face. The Valve Index also has canted (tilted) displays, to further conform around the face of the user. This is beneficial for many reasons (including FoV), but also introduces new types of optical distortions that are difficult to correct.
The issues regarding comfort go further. The vergence-accommodation conflict is one phenomenon that VR researchers have been working to solve for many years. It occurs when a user perceives an object in VR to be a certain distance away, but the user's eyes are focused at a a different distance, due to the lenses in the VR headset only being able to show one fixed focal distance (the distance between the user and the virtual object of focus). This focal distance is said to be at around 2m for most consumer VR headets today, meaning that any object closer than or further than 2m away in a virtual scene will look blurry and be unable to focus on naturally. This can cause eye strain and motion sickness, which is likely a deal-breaker for many would-be new VR users.
Vergence-accommodation conflict example. [1]
Meta Reality Labs revealed that they were working on a varifocal VR headset prototype as early as 2018 (called Half-Dome). The idea of the first Half-Dome prototype was to use eye tracking built in to the headset to figure out where the user was looking and which in-game object they were attempting to focus on. Then, through Oculus's API, the focal distance to that object could be found and then stepper motors within the headset would move the lenses accordingly, to set the lenses to that focal distance.
This Half-Dome prototype is exactly the type of concept and the level of fidelity that the team wishes to implement into our headset at some point, whether or not it be univerally compatible with SteamVR. This will begin with implementing eye-tracking hardware into Reality from Scratch.
The Reality from Scratch HMD draws inspiration from and utilizes resources and knowledge from existing open-source projects, such as Relativty and HadesVR. We really appreciate the members of the DIY VR community who have put in their best efforts to maintain this open-source knowledge.
In addition, the UW Reality Labs design team draws obvious inspiration from Meta's own Reality Labs.
A large portion of our research comes from helpful articles and sources from companies like Valve, and from initiatives like XinReality. These and other useful resources are cited throughout this document, and repeated below; we encourage you to view them.
Hands-On with Meta's New VR Headset Prototypes! - Tested (YouTube) This one we really recommend watching.
OSC Colloquium: Douglas Lanman, "How to Pass the Visual Turing Test in AR/VR"
Vergence-accommodation conflict - Wikipedia
Doc-Ok.org - Hacking the Oculus DK2
Valve Software - Valve Index Deep Dive
WalkerDev (Hackaday.io) - Easy "Pancake Lenses"
Dispersion (optics) - Wikipedia
Augmented and Mixed Reality Optical See-Through Combiners Based on Plastic Optics
Project Caliper - Prototyping VR Controllers
[1] By Rosedaler - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=123242579
Repository originally created on November 23, 2023 on 'kennynahh'. Moved to UW Reality Labs organization on January 17, 2024.