
As neural networks continue to evolve, they will most likely proliferate in cloud-based applications and extend into real-time embedded functions. In addition, processor platforms will need to be optimized for CNNs in such a way that they support the required power constraints and extreme throughput needs. As a step on this evolutionary path, Cadence has launched a DSP that was designed specifically for CNN applications. Compared to its predecessor, the Tensilica Vision P6 DSP (Figure 2) offers up to 4X better performance with quadruple the available multiple-accumulate (MAC) horsepower (MACs are a major computation block for CNN applications). When compared to commercially available GPUs, the Vision P6 DSP provides twice the frame rate at much lower power consumption on a typical neural network implementation.

While embedded devices such as smart watches have much to gain from CNN capabilities, their small form factor and power constraints make them a challenging environment for these compute-intensive algorithms. What’s more, expertise in neural networks has traditionally been concentrated in academia. As a result, neural networks aren’t yet deeply understood by embedded architects. There’s opportunity here for software differentiation and for the emergence of optimized SoCs for low-cost, mass-produced embedded supercomputers. GPU and specialized DSP suppliers are ready to meet this demand, which, in turn, creates a need for new hardware, IP, memory, and interconnect technology for embedded devices. We are also seeing a need for deep-learning algorithms that are designed to specifically address embedded requirements.
Summary
In order for CNNs to become pervasive and deliver a broad impact, now is the time to address the complexity of the technology. Leaders in the industry are researching ways to leverage automation to simplify the development of neural networks while maintaining their accuracy. There are also opportunities to improve the efficiency of these