6.812/6.825 Hardware Architecture for Deep Learning

6.5930/1 Hardware Architecture for Deep Learning - Spring 2024

Optional readings are NOT required for the class, but interested students are always encouraged to check them out for in-depth understanding of the topics.

Readings refer to:

book Efficient processing of deep neural networks, by Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, and Joel S. Emer

All these books and their online/e-book versions are available through MIT libraries.

Topic	Book Chapter	Papers / Other Resources
L02 - DNN Components	Ch 1 & 2	Stanford cs231n: http://cs231n.github.io/convolutional-networks/ http://www.deeplearningbook.org/: Chapter 9 http://www.deeplearningbook.org/contents/convnets.html Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network training by reducing internal covariate shift," ICML 2015.
L03 - Popular Models	Ch 2 & 9	Works cited in lecture (increase accuracy): LeNet: LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proc. IEEE 1998. AlexNet: Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." NeurIPS. 2012. VGGNet: Simonyan, Karen, and Andrew Zisserman. "Very deep convolution networks for large-scale image recognition." ICLR 2015. Network in Network: Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network". ICLR 2014. GoogleNet: Szegedy, Christian, et al. "Deep residual learning for image recognition." CVPR 2015. ResNet: He, Kaiming, et a. "Deep residual learning for image recognition." CVPR 2016. DenseNet: Huang, Gao, et al. "Densely connected convolutional networks." CVPR 2017. Wide ResNet: Zagoruyko, Sergey, and Nikos Komodakis. "Wide residual networks." BMVC 2017. ResNext: Xie, Saining, et al. "Aggregated residual transformations for deep neural networks." CVPR 2017. SENets: Hu, Jie et al. "Squeeze-and-Excitation Networks." CVPR 2018. NFNet: Brock, Andrew, et al. "High-Performance Large-Scale Image Recognition Without Normalization." arXiv 2021. Works cited in lecture (increase efficiency): InceptionV3: Szegedy, Christian, et al. "Rethinking the inception architecture for computer vision." CVPR 2016. SqueezeNet: Iandola, Forrest N., et al. "SqueezeNet: AlexNet-level Accuracy with 50x Fewer Parameters and < 0.5 MB Model Size." ICLR 2017. Xception: Chollet, François. "Xception: Deep Learning with Depthwise Separable Convolutions." CVPR 2017. MobileNet: Howard, Andrew G., et al. "Mobilenets: Efficient Convolution Neural Networks for Mobile Vision Applications." arXiv 2017. MobileNetv2: Sandler, Mark et al. "MobileNetV2: Inverted Residuals and Linear Bottlenecks." CVPR 2018. MobileNetv3: Howard, Andrew et al. "Searching for MobileNetV3." ICCV 2019. ShuffleNet: Zhang, Xiangyu, et al. "ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices." CVPR 2018. Learning Network Architecture: Zoph, Barret, et al. "Learning Transferable Architectures for Scalable Image Recognition." CVPR 2018. Works cited in lecture (increase accuracy and efficiency): EfficientNet: Tan, Mingxing, et al. "EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks." ICML 2019.
L04 - Evaluation and Training	Ch 2 & 3	Stanford CS231n: Lecture 7 Lecture 8 A Recipe for Training Neural Networks by Anrej Karpathy. PyTorch tutorial.
L05 - Kernel Computation - CPU		J. L. Hennessy and D. A. Patterson. "Chapter 3 & Appendix C," Computer Architecture: A Quantitative Approach.
L06 - Kernel Computation - Vectorization		J. L. Hennessy and D. A. Patterson. "Chapter 4 & Appendix G," Computer Architecture: A Quantitative Approach.
L07 - Kernel Computation - Memory		J. L. Hennessy and D. A. Patterson. "Chapter 2" Computer Architecture: A Quantitative Approach. M. Horowitz, "1.1 Computing's energy problem (and what we can do about it)," IEEE International Solid-State Circuits Conference 2014.
L08 - Storage Technology and Transforms	Ch 4	A. Lavin and S. Gray. "Fast Algorithms for Convolutional Neural Networks." arXiv 2015.
L09 - GPUs
L10 - Accelerator Architecture	Ch 5
L11 - Dataflows 1	Ch 5
L12 - Dataflows 2	Ch 5 & 6	N. P. Jouppi et al., "In-datacenter performance analysis of a tensor processing unit," ISCA 2017. Y.-H. Chen, J. Emer, V. Sze, "Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks," ISCA 2016. Y.-H. Chen, T. Krishna, J. Emer, V. Sze, "Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks," JSSC 2017
L13 - Convolutional Mappings	Ch 5 & 6	M. Pellauer, Y.S. Shao, J. Clemons, N. Crago, K. Hegde, R. Venkatesan, S.W. Keckler, C.W. Fletcher, and J. Emer. "Buffets: An Efficient and Composable Storage Idiom for Explicit Decoupled Data Orchestration." ASPLOS 2019.
L14 - Numeric Precision	Ch 7	V. Camus et al. "Review and Benchmarking of Precision- Scalable Multiply-Accumulate Unit Architectures for Embedded Neural-Network Processing." IEEE Journal on Emerging and Selected Topics in Circuits and Systems. October 2019.
L15 - Advanced Technology	Ch 10	Y. Chen et al., "DaDianNao: A Machine-Learning Supercomputer," MICRO 2014. D. Kim, J. Kung, S. Chai, S. Yalamanchili and S. Mukhopadhyay, "Neurocube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory," ISCA 2016 M. Gao, J. Pu, X. Yang, M. Horowitz, C. Kozyrakis, "TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory," ASPLOS 2017.
L16 - Sparsity	Ch 8.1	D. Blalock, J.J. Gonzales-Ortiz, J.Frankle, J. Guttag. "What is the State of Neural Network Pruning?" MLSys 2020.
L17 - Sparse Architectures - 1	Ch 8.2 & 8.3	A. Parashar et al., "SCNN: An accelerator for compressed-sparse convolutional neural networks," ISCA 2017. Y.-H. Chen, T.-J Yang, J. Emer, V. Sze, "Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices," JETCAS 2019.
L18 - Sparse Architectures - 2	Ch 8.2 & 8.3	J. Albericio, P. Judd, T. Hetherington, T. Aamodt, N. E. Jerger and A. Moshovos, "Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing," ISCA 2016