Microsoft made a splash in the world of dedicated AI hardware today when it unveiled a new system for doing high-speed, low-latency serving of machine learning models. The company showed off a new system called Brainwave that will allow developers to deploy machine learning models onto programmable silicon and achieve high performance beyond what they’d be able to get from a CPU or GPU.
Via TechinBiz
The model that Microsoft chose is several times larger than convolutional neural networks like Alexnet and Resnet-50, which other companies have used to benchmark their own hardware. Providing low-latency insights is important for deploying machine learning systems at scale. Users don’t want to wait long for their apps to respond. “We call it real-time AI because the idea here is that you send in a request, you want the answer back,” said Doug Burger, a distinguished engineer with Microsoft Research. “If it’s a video stream, if it’s a conversation, if it’s looking for intruders, anomaly detection, all the things where you care about interaction and quick results, you want those in real time,” he said. However, some previously published results on hardware-accelerated machine learning have focused on results that optimize for throughput at the cost of latency. In Burger’s view, more people should ask how a machine learning accelerator can perform without bundling requests into a batch and processing them all at once. “All of the numbers [other] people are throwing around are juiced,” he said. Microsoft is using Brainwave across the army of FPGAs it has installed in its data centers. According to Burger, Brainwave will allow Microsoft services to more rapidly support artificial intelligence features. In addition, the company is working to make Brainwave available to third-party customers through its Azure cloud platform.