- Stanford's cs231n is recommended resource for Deep Learning on Computer Vision
- Introduction to Computer Vision Course on Udacity, taught at Georgia Tech Master, has assignments in Octave/Matlab
- Pattern Recognition and Machine Learning [$], Christopher Bishop, 2006, Springer, 27k citations
- Computer Vision: Algorithms and Applications, Richard Szelski, 2011, Springer, 3k citations
- Learning OpenCV [$], Gray Bradski, Adrian Kaehler, 2008, O'Reilly
Based on various sources, including Awesome Deep Learning and Adit Deshpande's "The 9 Deep Learning Papers You Need To Know About"
- Alexnet: ImageNet Classification with Deep Convolutional Neural Networks A. Krizhevsky et al, 2012
- ZFNet: Visualizing and Understanding Convolutional Networks Matthew D. Zeiler, Rob Fergus 2013
- VGGNet: Very Deep Convolutional Networks For Large-scale Image Recognition Karen Simonyan and Andrew Zisserman 2015
- GoogLeNet: Going Deeper with Convolutions Christian Szegedy et al 2015
- ResNet: Deep Residual Learning for Image Recognition
- OverFeat: Integrated recognition, localization and detection using convolutional networks, P. Sermanet et al., 2013, 1700+ citations
- Return of the devil in the details: delving deep into convolutional nets, K. Chatfield et al., 2014, 1200+ citations
- Network in Network or 1x1 convolution, M. Lin et al., 2013, 1000+ citations
- Rich feature hierarchies for accurate object detection and semantic segmentation, R. Girshick et al., 2014
- Fully convolutional networks for semantic segmentation, J. Long et al., 2015
- Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, S. Ren et al., 2015
- Fast R-CNN, R. Girshick, 2015
- Learning hierarchical features for scene labeling, C. Farabet et al., 2013
- Semantic image segmentation with deep convolutional nets and fully connected CRFs, L. Chen et al.
Other Interesting Papers
- [Spatial pyramid pooling in deep convolutional networks for visual recognition](http://arxiv.org/pdf/1406.4729), K. He et al., 2014 - [You only look once: Unified, real-time object detection](http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Redmon_You_Only_Look_CVPR_2016_paper.pdf), J. Redmon et al., 2016- DeepFace: Closing the gap to human-level performance in face verification, Y. Taigman et al., 2014
- Large-scale video classification with convolutional neural networks, A. Karpathy et al., 2014
- Show and tell: A neural image caption generator, O. Vinyals et al., 2015
- Show, attend and tell: Neural image caption generation with visual attention, K. Xu et al., 2015
- Deep visual-semantic alignments for generating image descriptions, A. Karpathy and L. Fei-Fei, 2015
- Long-term recurrent convolutional networks for visual recognition and description, J. Donahue et al., 2015
- 3D convolutional neural networks for human action recognition, S. Ji et al., 2013
- Two-stream convolutional networks for action recognition in videos, K. Simonyan et al., 2014