|
Optional readings are NOT required for the class, but interested students are always encouraged to check them out for in-depth understanding of the topics.
Readings refer to:
- book Efficient processing of deep neural networks, by Vivienne Sze, Yu-Hsin Chen, Tien-Ju Yang, and Joel S. Emer
All these books and their online/e-book versions are available through MIT libraries.
Topic |
Book Chapter |
Papers / Other Resources |
L02 - DNN Components |
Ch 1 & 2 |
|
L03 - Popular Models |
Ch 2 & 9 |
- Works cited in lecture (increase accuracy):
- LeNet: LeCun, Yann, et al. "Gradient-based learning
applied to document recognition." Proc. IEEE 1998.
- AlexNet: Krizhevsky, Alex, Ilya Sutskever, and
Geoffrey E. Hinton. "Imagenet classification with deep
convolutional neural networks." NeurIPS. 2012.
- VGGNet: Simonyan, Karen, and Andrew Zisserman.
"Very deep convolution networks for large-scale image
recognition." ICLR 2015.
- Network in Network: Lin, Min, Qiang Chen, and
Shuicheng Yan. "Network in network". ICLR 2014.
- GoogleNet: Szegedy, Christian, et al. "Deep
residual learning for image recognition." CVPR 2015.
- ResNet: He, Kaiming, et a. "Deep residual learning
for image recognition." CVPR 2016.
- DenseNet: Huang, Gao, et al. "Densely connected
convolutional networks." CVPR 2017.
- Wide ResNet: Zagoruyko, Sergey, and Nikos Komodakis.
"Wide residual networks." BMVC 2017.
- ResNext: Xie, Saining, et al. "Aggregated residual
transformations for deep neural networks." CVPR 2017.
- SENets: Hu, Jie et al. "Squeeze-and-Excitation
Networks." CVPR 2018.
- NFNet: Brock, Andrew, et al. "High-Performance
Large-Scale Image Recognition Without Normalization." arXiv
2021.
- Works cited in lecture (increase efficiency):
- InceptionV3: Szegedy, Christian, et al. "Rethinking
the inception architecture for computer vision." CVPR 2016.
- SqueezeNet: Iandola, Forrest N., et al. "SqueezeNet:
AlexNet-level Accuracy with 50x Fewer Parameters and <
0.5 MB Model Size." ICLR 2017.
- Xception: Chollet, François. "Xception:
Deep Learning with Depthwise Separable Convolutions."
CVPR 2017.
- MobileNet: Howard, Andrew G., et al.
"Mobilenets: Efficient Convolution Neural Networks for
Mobile Vision Applications." arXiv 2017.
- MobileNetv2: Sandler, Mark et al. "MobileNetV2:
Inverted Residuals and Linear Bottlenecks." CVPR 2018.
- MobileNetv3: Howard, Andrew et al. "Searching
for MobileNetV3." ICCV 2019.
- ShuffleNet: Zhang, Xiangyu, et al. "ShuffleNet:
An Extremely Efficient Convolutional Neural Network for
Mobile Devices." CVPR 2018.
- Learning Network Architecture: Zoph, Barret, et
al. "Learning Transferable Architectures for Scalable
Image Recognition." CVPR 2018.
- Works cited in lecture (increase accuracy and efficiency):
- EfficientNet: Tan, Mingxing, et al. "EfficientNet:
Rethinking Model Scaling for Convolutional Neural
Networks." ICML 2019.
|
L04 - Evaluation and Training |
Ch 2 & 3 |
|
L05 - Kernel Computation - CPU |
|
- J. L. Hennessy and D. A. Patterson. "Chapter 3 & Appendix C,"
Computer Architecture: A Quantitative Approach.
|
L06 - Kernel Computation - Vectorization |
|
- J. L. Hennessy and D. A. Patterson. "Chapter 4 & Appendix G,"
Computer Architecture: A Quantitative Approach.
|
L07 - Kernel Computation - Memory |
|
- J. L. Hennessy and D. A. Patterson. "Chapter 2"
Computer Architecture: A Quantitative Approach.
- M. Horowitz, "1.1 Computing's energy problem (and what we can
do about it)," IEEE International Solid-State Circuits
Conference 2014.
|
L08 - Storage Technology and Transforms |
Ch 4 |
- A. Lavin and S. Gray. "Fast Algorithms for Convolutional Neural
Networks." arXiv 2015.
|
L09 - GPUs |
|
|
L10 - Accelerator Architecture |
Ch 5 |
|
L11 - Dataflows 1 |
Ch 5 |
|
L12 - Dataflows 2 |
Ch 5 & 6 |
- N. P. Jouppi et al., "In-datacenter performance analysis of a
tensor processing unit," ISCA 2017.
- Y.-H. Chen, J. Emer, V. Sze, "Eyeriss: A Spatial Architecture
for Energy-Efficient Dataflow for Convolutional Neural
Networks," ISCA 2016.
- Y.-H. Chen, T. Krishna, J. Emer, V. Sze, "Eyeriss: An
Energy-Efficient Reconfigurable Accelerator for Deep
Convolutional Neural Networks," JSSC 2017
|
L13 - Convolutional Mappings |
Ch 5 & 6 |
- M. Pellauer, Y.S. Shao, J. Clemons, N. Crago, K. Hegde,
R. Venkatesan, S.W. Keckler, C.W. Fletcher, and J. Emer.
"Buffets: An Efficient and Composable Storage Idiom for
Explicit Decoupled Data Orchestration." ASPLOS 2019.
|
L14 - Numeric Precision |
Ch 7 |
- V. Camus et al. "Review and Benchmarking of Precision-
Scalable Multiply-Accumulate Unit Architectures for Embedded
Neural-Network Processing." IEEE Journal on Emerging and Selected
Topics in Circuits and Systems. October 2019.
|
L15 - Advanced Technology |
Ch 10 |
- Y. Chen et al., "DaDianNao: A Machine-Learning Supercomputer,"
MICRO 2014.
- D. Kim, J. Kung, S. Chai, S. Yalamanchili and S. Mukhopadhyay,
"Neurocube: A Programmable Digital Neuromorphic Architecture
with High-Density 3D Memory," ISCA 2016
- M. Gao, J. Pu, X. Yang, M. Horowitz, C. Kozyrakis, "TETRIS:
Scalable and Efficient Neural Network Acceleration with 3D
Memory," ASPLOS 2017.
|
L16 - Sparsity |
Ch 8.1 |
- D. Blalock, J.J. Gonzales-Ortiz, J.Frankle, J. Guttag.
"What is the State of Neural Network Pruning?" MLSys 2020.
|
L17 - Sparse Architectures - 1 |
Ch 8.2 & 8.3 |
- A. Parashar et al., "SCNN: An accelerator for compressed-sparse
convolutional neural networks," ISCA 2017.
- Y.-H. Chen, T.-J Yang, J. Emer, V. Sze, "Eyeriss v2: A Flexible
Accelerator for Emerging Deep Neural Networks on Mobile
Devices," JETCAS 2019.
|
L18 - Sparse Architectures - 2 |
Ch 8.2 & 8.3 |
- J. Albericio, P. Judd, T. Hetherington, T. Aamodt, N. E. Jerger
and A. Moshovos, "Cnvlutin: Ineffectual-Neuron-Free Deep Neural
Network Computing," ISCA 2016
|
|