6.5930/1 Hardware Architecture for Deep Learning - Spring 2024


Top
Course Info
Staff
Announcements
Syllabus
Reading List
Lecture Notes
Recitations
Labs
Paper Review
Collaboration Policy
6.5930/1 Spring 2024 Reading List
Optional readings are NOT required for the class, but interested students are always encouraged to check them out for in-depth understanding of the topics.

Readings refer to:

  • book Efficient processing of deep neural networks, by Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, and Joel S. Emer
All these books and their online/e-book versions are available through MIT libraries.


Topic Book Chapter Papers / Other Resources
L02 - DNN Components Ch 1 & 2
L03 - Popular Models Ch 2 & 9
  • Works cited in lecture (increase accuracy):
    • LeNet: LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proc. IEEE 1998.
    • AlexNet: Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." NeurIPS. 2012.
    • VGGNet: Simonyan, Karen, and Andrew Zisserman. "Very deep convolution networks for large-scale image recognition." ICLR 2015.
    • Network in Network: Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network". ICLR 2014.
    • GoogleNet: Szegedy, Christian, et al. "Deep residual learning for image recognition." CVPR 2015.
    • ResNet: He, Kaiming, et a. "Deep residual learning for image recognition." CVPR 2016.
    • DenseNet: Huang, Gao, et al. "Densely connected convolutional networks." CVPR 2017.
    • Wide ResNet: Zagoruyko, Sergey, and Nikos Komodakis. "Wide residual networks." BMVC 2017.
    • ResNext: Xie, Saining, et al. "Aggregated residual transformations for deep neural networks." CVPR 2017.
    • SENets: Hu, Jie et al. "Squeeze-and-Excitation Networks." CVPR 2018.
    • NFNet: Brock, Andrew, et al. "High-Performance Large-Scale Image Recognition Without Normalization." arXiv 2021.
  • Works cited in lecture (increase efficiency):
    • InceptionV3: Szegedy, Christian, et al. "Rethinking the inception architecture for computer vision." CVPR 2016.
    • SqueezeNet: Iandola, Forrest N., et al. "SqueezeNet: AlexNet-level Accuracy with 50x Fewer Parameters and < 0.5 MB Model Size." ICLR 2017.
    • Xception: Chollet, François. "Xception: Deep Learning with Depthwise Separable Convolutions." CVPR 2017.
    • MobileNet: Howard, Andrew G., et al. "Mobilenets: Efficient Convolution Neural Networks for Mobile Vision Applications." arXiv 2017.
    • MobileNetv2: Sandler, Mark et al. "MobileNetV2: Inverted Residuals and Linear Bottlenecks." CVPR 2018.
    • MobileNetv3: Howard, Andrew et al. "Searching for MobileNetV3." ICCV 2019.
    • ShuffleNet: Zhang, Xiangyu, et al. "ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices." CVPR 2018.
    • Learning Network Architecture: Zoph, Barret, et al. "Learning Transferable Architectures for Scalable Image Recognition." CVPR 2018.
  • Works cited in lecture (increase accuracy and efficiency):
    • EfficientNet: Tan, Mingxing, et al. "EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks." ICML 2019.
L04 - Evaluation and Training Ch 2 & 3
L05 - Kernel Computation - CPU
  • J. L. Hennessy and D. A. Patterson. "Chapter 3 & Appendix C," Computer Architecture: A Quantitative Approach.
L06 - Kernel Computation - Vectorization
  • J. L. Hennessy and D. A. Patterson. "Chapter 4 & Appendix G," Computer Architecture: A Quantitative Approach.
L07 - Kernel Computation - Memory
  • J. L. Hennessy and D. A. Patterson. "Chapter 2" Computer Architecture: A Quantitative Approach.
  • M. Horowitz, "1.1 Computing's energy problem (and what we can do about it)," IEEE International Solid-State Circuits Conference 2014.
L08 - Storage Technology and Transforms Ch 4
  • A. Lavin and S. Gray. "Fast Algorithms for Convolutional Neural Networks." arXiv 2015.
L09 - GPUs
L10 - Accelerator Architecture Ch 5
L11 - Dataflows 1 Ch 5
L12 - Dataflows 2 Ch 5 & 6
  • N. P. Jouppi et al., "In-datacenter performance analysis of a tensor processing unit," ISCA 2017.
  • Y.-H. Chen, J. Emer, V. Sze, "Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks," ISCA 2016.
  • Y.-H. Chen, T. Krishna, J. Emer, V. Sze, "Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks," JSSC 2017
L13 - Convolutional Mappings Ch 5 & 6
  • M. Pellauer, Y.S. Shao, J. Clemons, N. Crago, K. Hegde, R. Venkatesan, S.W. Keckler, C.W. Fletcher, and J. Emer. "Buffets: An Efficient and Composable Storage Idiom for Explicit Decoupled Data Orchestration." ASPLOS 2019.
L14 - Numeric Precision Ch 7
  • V. Camus et al. "Review and Benchmarking of Precision- Scalable Multiply-Accumulate Unit Architectures for Embedded Neural-Network Processing." IEEE Journal on Emerging and Selected Topics in Circuits and Systems. October 2019.
L15 - Advanced Technology Ch 10
  • Y. Chen et al., "DaDianNao: A Machine-Learning Supercomputer," MICRO 2014.
  • D. Kim, J. Kung, S. Chai, S. Yalamanchili and S. Mukhopadhyay, "Neurocube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory," ISCA 2016
  • M. Gao, J. Pu, X. Yang, M. Horowitz, C. Kozyrakis, "TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory," ASPLOS 2017.
L16 - Sparsity Ch 8.1
  • D. Blalock, J.J. Gonzales-Ortiz, J.Frankle, J. Guttag. "What is the State of Neural Network Pruning?" MLSys 2020.
L17 - Sparse Architectures - 1 Ch 8.2 & 8.3
  • A. Parashar et al., "SCNN: An accelerator for compressed-sparse convolutional neural networks," ISCA 2017.
  • Y.-H. Chen, T.-J Yang, J. Emer, V. Sze, "Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices," JETCAS 2019.
L18 - Sparse Architectures - 2 Ch 8.2 & 8.3
  • J. Albericio, P. Judd, T. Hetherington, T. Aamodt, N. E. Jerger and A. Moshovos, "Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing," ISCA 2016