Designs for augmented reality applications need to consider all the uses to which it can be put, write Nick Ni and Adam Taylor.
Research has shown that humans interact with the world visually because we process visual images many times faster than information presented in other forms.
Augmented reality (AR), like its virtual reality (VR) cousin, enables us to experience an increased perception of our environment.
The major difference is that AR augments the natural world with virtual objects such as text or other visuals, equipping us to interact safely and more efficiently with it, whereas VR immerses us in a created environment.
The mixed environment
Combinations of AR and VR are often described as presenting us with a mixed reality (MR).
Many people may have already used AR without being aware of it, such as when we use mobile devices for street‑level navigation or to play AR games such as Pokémon Go.
One of the best examples of AR and its applications is the head-up display (HUD). Among simpler AR applications, HUDs are used in aviation and automotive applications to make pertinent vehicle information available without the need to glance at the instrument cluster.
AR applications with more advanced capabilities, including wearable technologies (often called smart AR), are predicted to be worth $2.3bn by 2020 according to Tractica, a market intelligence firm that focuses on human interaction with technology.
AR can be smart
AR is finding many applications across industrial, military, manufacturing, medical and commercial sectors. Commercially it is used in social media applications, to add biographical information to assist with user identification.
Many AR applications are based on the use of smart glasses that can be worn by an operative. They can increase efficiency in manufacturing by replacing manuals, or can be used to illustrate how to assemble parts.
In the medical field, smart glasses provide the potential to share medical records and wound and injury details. This can give on-scene emergency service personnel the benefit of information that can later be made available to hospital medics.
A large parcel delivery company is already using AR in smart glasses to read bar codes on shipping labels. Once the code has been scanned, the glasses can communicate with the company’s servers via Wi-Fi to determine the intended destination of the package and can then give directions as to where it should be stacked in readiness for its planned onward shipment.
When considering the application of AR systems it is important to factor in requirements including performance, security, power and future-proofing. Sometimes designers can confront competing pressures from such requirements.
Complex AR systems must be able to interface with and process data from multiple camera sensors that interpret their surroundings.
Such sensors may also operate across different elements of the electromagnetic (EM) spectrum, such as the infrared or near infrared. Additionally, sensors may furnish information from outside the EM spectrum, providing inputs for the detection of movement and rotation – as is the case with MEMS accelerometers and gyroscopes.
They can combine this information with location data dispensed by global navigation satellite systems (GNSS).
Embedded vision systems that combine information from several different sensor types such as these are known as heterogeneous sensor fusion systems. AR systems also require high frame‑rates, along with the ability to perform real-time analysis frame by frame, to extract and process the information contained in each frame.
Equipping systems with the processing capability to achieve these requirements becomes a critical factor in component selection.
The human connection
Designers must also consider the unique aspects of AR systems. In addition to interfacing with cameras and sensors and executing algorithms they must also be able to track users’ eyes and determine the direction of their gaze.
This is normally achieved by using additional cameras that monitor the user’s face and an eye tracking algorithm that allows the AR system to follow the user’s gaze and determine the content to be delivered to the AR display.
This delivers efficient use of the bandwidth and processing requirements in what can be a computationally intensive task.
Most AR systems are also portable, untethered and, in many instances, wearable – as is the case with smart glasses. This introduces the unique challenge of implementing the necessary processing in a power-constrained environment.
There are many options for processing within an AR headset. One of these is with FPGAs, for example, Xilinx’s Zync-7000 and UltraScale+ devices (see box, below). Both the Zynq device families offer high performance per watt. They can further reduce power during operation by exercising options ranging from placing processors into stand-by mode to be awoken by one of several sources, to powering down the programmable logic half of the device.
By detecting that they are no longer being used in such cases AR systems can extend battery life.
During operation of the AR system, elements of the processor not being used can be clock-gated to reduce power consumption.
Designers can achieve power efficiency in the programmable logic element by following simple design rules such as making efficient use of hard macros, carefully planning control signals and considering intelligent clock-gating for device regions when not required. This provides a more power-efficient and responsive single‑chip solution when compared with a CPU or GPU based approach.
Designs based on Zynq devices using the ReVision Acceleration stack (see box, below) to accelerate machine learning and image processing elements can achieve between 6x (machine learning) and 42x (image processing) frames-per-second per watt with one fifth the latency compared to a GPU‑based solution.
Privacy and security considerations
AR applications like sharing patient medical records or manufacturing data call for a high level of security, both in information assurance and threat protection, especially as AR systems will be highly mobile and could be misplaced.
Information assurance requires that the information stored in, received and transmitted by the system is trustworthy.
For a comprehensive information assurance domain the Zynq devices’ secure boot capabilities enable the use of encryption.Verification is through the advanced encryption standard (AES), keyed-hash message authentication code (HMAC) and RSA public key cryptography algorithm.
Once the device is correctly configured and running, developers can use the ARM Trustzone and hypervisors to implement an orthogonal world, where one is secure and cannot be accessed by the other.
When it comes to threat protection, designers can use the built-in Xilinx ADC macro to monitor supply voltages, currents and temperatures and detect attempts to tamper with the AR system.
Should a threatening event occur, the Zynq device has protective options ranging from logging the attempt to erasing secure data and preventing the AR system from connecting again to the supporting infrastructure.
Uniting conflicting business challenges
Augmented reality systems are becoming commonplace across commercial, industrial and military sectors. These systems present the often mutually exclusive challenges of high performance, system-level security and power efficiency. But processors now exist to help designers address these challenges with an acceleration stack to support embedded vision and machine learning applications.
SoC has video encoder and acceleration stack
Programmable Zynq-7000 SoC and the next generation Zynq UltraScale+ MPSoC for the processing cores of augmented reality (AR) systems are heterogeneous processing systems that combine ARM processors with high-performance programmable logic.
Zynq UltraScale+ MPSoC also has an ARM Mali-400 GPU, and other variants offer a hardened video encoder that supports H.265 and the HEVC high‑efficiency video coding standard.
These devices enable designers to segment their system architectures using processors for real-time analytics and transferring traditional processor tasks to the ecosystem. They can use the programmable logic for sensor interfaces and processing functions. Benefits include:
- Parallel implementation of N image processing pipelines
- Any-to-any connectivity, the ability to define and interface with any sensor, communication protocol or display standard
- Support for embedded vision and machine learning.
To implement an image processing pipeline and sensor fusion algorithms, developers can use the ReVision acceleration stack, which supports both embedded vision and machine learning applications. This permits designers to use industry-standard frameworks such as OpenVX for cross-platform acceleration of vision processing, OpenCV computer vision library and Caffe Flow to target both the processor system and programmable logic.
It can accelerate a large number of OpenCV functions (including the core OpenVX functions) into the programmable logic and supports implementation of the machine learning inference engine in the programmable logic directly from the Caffe prototxt file.
Nick Ni is senior product manager, SDSoC and embedded vision and Adam Taylor is embedded systems consultant, both at Xilinx International