Abstract: Deep Learning brings breakthroughs into computer vision, speech recognition, natural language processing and other applications. Those breakthroughs are achieved by feeding a huge amount of data into large-scale Deep Neural Networks (DNNs). The large scale brings new challenges to both DNN inference and DNN training. In the inference, the computation complexity is unaffordable in edge devices (such as drones, mobile devices and robots). Therefore, compact/efficient DNNs are required for real-time inference on the edge. State-of-the-art model compression methods can reduce the storage sizes of DNNs; however, they converge to irregular structures, which are unfriendly for hardware execution. In this talk, I will introduce our general learning algorithms [1][2][3] to dynamically remove regular structures (including neurons, filters, channels, layers, matrix dimensions, ranks, hidden states/cells, etc).  I will show, using our approaches, the final compact DNNs have regular structures as traditional DNNs do, and can be directly deployed without any software or hardware twisting. The second part of the talk will cover our recent progress on scalable distributed Deep Learning. In the DNN training, distributed systems are usually utilized to boost the computing power; however, the communication becomes the new speed bottleneck because of the gradient synchronization. I will introduce our SGD variant (TernGrad) to overcome the bottleneck. In TernGrad [4], floating/32-bit gradients are stochastically quantized to only 3 levels (i.e. ternary gradients) such that each gradient needs less than 2 bits to encode the information, therefore, significantly reducing the communication.

[1] Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, Hai Li,  “Learning Structured Sparsity in Deep Neural Networks”, the 30th Annual Conference on Neural Information Processing Systems (NIPS), 2016.

[2] Wei Wen, Yuxiong He, Samyam Rajbhandari, Minjia Zhang, Wenhan Wang, Fang Liu, Bin Hu, Yiran Chen, Hai Li,  “Learning Intrinsic Sparse Structures within Long Short-Term Memory”,the 6th International Conference on Learning Representations (ICLR), 2018.

[3] Wei Wen, Cong Xu, Chunpeng Wu, Yandan Wang, Yiran Chen, Hai Li,  “Coordinating Filters for Faster Deep Neural Networks”, Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017.

[4] Wei Wen, Cong Xu, Feng Yan, Chunpeng Wu, Yandan Wang, Yiran Chen, Hai Li,  “TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning”,the 31st Annual Conference on Neural Information Processing Systems (NIPS), 2017. (Oral, 40/3240=1.2%. Available in official PyTorch/Caffe2).

Bio:Wei Wen is a fifth-year Ph.D. student in Duke University, supervised by Dr. Hai Helen Li and Dr. Yiran Chen. His research is Machine Learning and its applications in Computer Vision and Natural Language Processing. More specific, he focuses on efficient deep learning on the edge, optimization algorithms for distributed machine learning, and learning algorithm understanding.

He closely worked with Facebook Research, Microsoft Research, Intel Labs, and HP Labs, where he landed his research into industrial AI productions, including Facebook Applied Machine Learning, Microsoft Bing, Intel Nervana & SkimCaffe, etc. He is a contributor of PyTorch/Caffe2. Homepage: http://www.pittnuts.com/