Edge AI chip startup Deep Vision has raised $35 million in a series B round of funding led by Tiger Global, joined by existing investors Exfinity Venture Partners, Silicon Motion and Western Digital.
The company began shipping its first-generation chip last year. ARA-1 is designed for power-efficient, low-latency edge AI processing in applications like smart retail, smart city and robotics. While the company’s name suggests a focus on convolutional neural networks, ARA-1 can also accelerate natural language processing with support for complex networks such as long short-term memory (LSTMs) and recurrent neural networks (RNNs).
A second-generation chip, ARA-2 with additional features for accelerating LSTMs and RNNs will launch next year.
“When ARA-1 started sampling last year, we wanted to make sure that we have a fantastic product-market fit with customers,” Deep Vision CEO Ravi Annavajjhala told EE Times. “We have several customers, including one big FAANG customer that we are shipping to in volume, our product is qualified, it’s in mass production, and now what we are doing is taking that story and sort of replicating across different segments and different customers.”
“FAANG” refers to hyper-scalers Facebook, Amazon, Apple, Netflix and Google.
Annavajjhala said smart retail is the company’s top target application, adding that the company already has a “very large customer” in that sector along with “multiple other engagements.” He declined to identify the retail customer, or whether it and the FAANG company were one and the same.
Retail applications for Deep Vision’s chips include checkout analytics and inventory management, including shelf cameras as well as digital signage. While shelf-cams don’t require extreme low latency, Annavajjhala said they do require millions of images to track products via fairly complex AI models. Therefore, processing needs are relatively high.
“A shelf cam is not just [detecting empty shelves], it’s combining that model with a number of other models, often with many filters,” he said. “Compute itself is a problem – we can’t take this to the cloud as it’s prohibitively expensive – but the network bandwidth and power-over-Ethernet switches required have also become very expensive.” Hence, “it’s in the best interests of the store to reduce the total cost of ownership, and the way to do it is do as much inference at the edge as possible.”
Other applications include smart city (high resolution, high frame rate surveillance cameras), driver-monitoring systems, robotics, drones and factory automation.
ARA-1’s performance for ResNet-50 measured 100 images per second, or 40 images per second per watt. The chip is shipping now in USB modules, M.2 modules and U.2 PCIe modules (two or four chips to a card). The company offers two SKUs of the chip – an 800-MHz version plus a 600-MHz version for power-sensitive edge applications.
Deep Vision’s compute architecture is designed to minimize the amount of data movement to and from memory.
“Data movement minimization happens across software, at the system level and at the compute core level to ensure any data that’s brought onto the chip stays as close to core compute as possible and stays for as long as possible,” said Annavajjhala. “We have abstracted all of this out on the hardware and ensured that the data flows minimize the data movement between compute and different levels of memory hierarchy.”
Data can be reused many times when computing AI inference, he said, noting that for image convolutions around 90 percent of the data is the same between one frame (or one convolution) and the next. Deep Vision’s software scans models looking at many different combinations of data movement, scheduling the most efficient combination for the processor. A compiler, tracking power consumption and performance of the processing and memory subunits, optimizes for the desired result. Deep Vision’s software flow supports TensorFlow, Kaffe, Pytorch, MXNet and ONNX.
The time crunch on getting ARA-1 to market meant that some advanced features were left out, according to Annavajjhala. Those features, primarily related to accelerating LSTMs and RNNs, will be added in the ARA-2 chip, which is based on the same overall architecture. ARA-2 will also move from 28- to 16-nm process technology. The result will be 3-fold boost in performance-per-Watt, yielding a five-fold performance increase compared to ARA-1. (Applications currently running on ARA-1 will be compatible with ARA-2, the company said).
Founded in 2018 by Stanford University PhDs Rehan Hameed and Wajahat Qadeer, Deep Vision has now raised a total of $54 million. The company currently employs 57 and expects to grow to 75 by the end of the year. ARA-1 is shipping in volume today, while ARA-2 is expected to begin sampling in 2022.