Paper - DeepCache: Principled Cache for Mobile Deep Vision

  • Metadata:
    • author: Mengwei Xu, Mengze Zhu, Yunxin Liu, Felix Lin, Xuanzhe Liu
    • title: DeepCache: Principled Cache for Mobile Deep Vision
    • year: 20182018

  • Essay:
    • While mobile on-device deep learning inference has been desirable for a variety of reasons (e.g. privacy), resource intensity is still a limiting factor for most applications.
    • For video analysis, a way to reduce the processing need would be to make use of the temporal locality of video and reuse already classified regions in-between adjacent frames.
    • Caching these regions can reduce the calculations needed in a convolutional network. However, this is challenging for two reasons: The key lookup has to be based on similarity rather than identity, and performing lookups on the internal layers is difficult, as the locality there is not as clearly defined as for the input images. The presented approach, DeepCache, combats these issues by offering a robust way of identifying and merging reusable image regions and propagating the found regions through the convolution operations into deeper layers. This makes reuse for internal layers possible without requiring separate lookups and significantly boosts the potential of caching for CNNs. DeepCache manages to substantially improve on-device inference both in terms of performance and energy consumption due to the reduced processing time while taking negligible accuracy hits.
    • One of the more prominent strengths of the paper is the deployability of their approach. While a lot of papers present prototypes or have limitations in their applicability, DeepCache can be used transparently with any pre-trained models and is openly available. Their solution can be directly integrated into any project using ncnn, thereby enabling everyone to make use of the results of their research.
    • The fact that they expose their parameters (matching-threshold/blocksize) further aids the development of mobile apps. It gives developers another way to finetune the performance vs. accuracy tradeoff of their applications, which is essential when dealing with the differing hardware (camera quality, processing power) of modern smartphones.
    • One of the more interesting applications I can see for DeepCache is deconvolution. The upscaling layers in the latter part of hourglass networks transform the feature map to the dimensions of previous layers. This would relegate the problem of cache erosion (a limitation of propagating the regions into deeper layers) to the first part of the network, as the corresponding cached regions of intermediate features grow back to their previous sizes during upsampling. This change in the importance of cache erosion could impact the best merging policy for reusable blocks.
    • Also, videos that were taken with phone cameras often switch back to previous viewpoints. The introduced locality could justify the use of a more global cache rather than just using the similarities between mostly adjacent frames.
    • Funnily enough, this paper could become relevant in a project for one of my other courses. We are building an application to perform deep pose detection for performing artists on mobile devices. As the evaluation has shown, DeepCache works exceptionally well when the picture is mostly static (the background in our case). This fact, combined with the availability of their source-code, could make on-device inference a viable option for the project.