Uber Calls for AI Standard

Will Nvidia accept a ride from CCIX?

By Rick Merritt

Deep-learning accelerators need a standard interface, said a top engineering manager at Uber who sketched a picture of the company’s use of AI, its data centers, and their challenges.

“AI is really disrupting our industry” across the design of chips, boards, systems, and services, said Gloria Lau, head of hardware engineering at Uber, in a keynote at DesignCon here.

Like many other web giants, Uber uses banks of Nvidia GPUs for deep learning today, often riding Nvidia’s NVLink interface. Also like other large data center operators, it is testing FPGAs and ASICs from startups including Eyeris, Graphcore, and Wave Computing in its search for more performance and efficiency.

“I would love to see a standard interface for all AI chips — NVLink is just for Nvidia,” Lau said.

In a brief encounter after her keynote, she said that she is familiar with the CCIX standard supported by AMD, Arm, IBM, Xilinx, and others. But to date, Nvidia is not using it, she noted.

The many deep-learning algorithms that Uber uses need to “settle down” before the company can pick an ASIC accelerator. In the meantime, she noted challenges programming both FPGAs and the tensor cores in Nvidia’s latest Pascal chips.

Uber considers itself “on the bleeding edge of AI,” maintaining its own dedicated AI research team. It runs more than a dozen deep-learning models in its data centers including recommendation engines for Uber Eats, fraud detection services, and features to improve estimates of when a driver will arrive.

The algorithms span a half-dozen varieties, implemented across a laundry list of mainly open-source frameworks and libraries. The underlying AI hardware today consumes as much as 40 kW in a rack of systems — twice the power that standard servers use — and can require flows of more than 100 petabytes of data.

Amid the complexity, Uber is seeking simplicity. “We are architecting our next-generation AI server so that people other than data scientists can do the AI work,” Lau said.

Next page: Also on the wish list: a server management interface

text

Typically, Uber downloads data sets for AI training via PCIe Gen 3, but it uses Nvidia’s NVLink for gradient averaging among pools of four GPUs. Click to enlarge. (Source: Uber)

Like Google, Uber is starting to design its own compute and storage servers and have them made by ODMs. It has even designed a variant of a 19-inch rack to house them along with vertically mounted switches.

Lau described Uber’s “super-hot” storage server using two Intel Cascade Lake processors. Each 1U system packs more than 16 TB of NAND flash, and 96 systems fit in a rack.

A so-called warm storage system uses older Broadwell processors and up to 70 8-TB SATA hard-disk drives in a 4U design that packs 6 petabytes in a rack. “Warm storage cuts our cost in half,” Lau said.

Uber operates two large data centers in the U.S. and one in Europe with a half-dozen smaller centers sprinkled around the world. It runs copies of its 3,000 corporate microservices in seperate regions, each region made up of at least three 5-MW data centers.

The systems have served more than 10 billion Uber rides to date at a rate of about 15 million a day across 600 cities in 65 countries. Given the scale, it’s no surprise that Lau also called for a common management interface for all servers, faster networks, and more power per rack.

text

Two of Intel’s Cascade Lakes processors drive Uber’s custom all-flash arrays. Click to enlarge. (Source: Uber)

— Rick Merritt, Silicon Valley Bureau Chief, EE Times Circle me on Google+

Contact Us

We're not around right now. But you can send us an email and we'll get back to you, asap.

Not readable? Change text.

Start typing and press Enter to search