Discover a range of important and new studies in the areas of Computer Vision and Machine Learning. This page features a mix of classic and recent papers, regularly updated with fresh insights.

This collection is a valuable resource for practitioners seeking in-depth knowledge. However, for those looking to practically apply these insights, our API simplifies the process. By integrating our user-friendly API, you can effortlessly implement Machine Learning models in your Computer Vision projects. Whether you’re delving into the theoretical underpinnings or eager to apply these concepts in real-world applications, our resources and tools are designed to support your journey in Computer Vision.

Best papers in Computer Vision

Must-know papers in Computer Vision

Paper Date Key concept
LeCun, Y., Bottou, L., Bengio, Y. and Haffner, P., 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE86(11), pp.2278-2324. December 1998 Introduced Convolutions
Krizhevsky, A., Sutskever, I. and Hinton, G.E., 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems25. September 2012 Introduced ReLU activation and Dropout to CNNs
Simonyan, K. and Zisserman, A., 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. September 2014 Used large number of filters of small size in each layer to learn complex features
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V. and Rabinovich, A., 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1-9). September 2014 Introduced Inception Modules consisting of multiple parallel convolutional layers, designed to recognize different features at multiple scales
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J. and Wojna, Z., 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2818-2826). December 2015 Design Optimizations of the Inception Modules which improved performance and accuracy
He, K., Zhang, X., Ren, S. and Sun, J., 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778). December 2015 Introduced residual connections, which are shortcuts that bypass one or more layers in the network.
Szegedy, C., Ioffe, S., Vanhoucke, V. and Alemi, A., 2017, February. Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the AAAI conference on artificial intelligence (Vol. 31, No. 1). February 2016 Hybrid approach combining Inception Net and ResNet
Huang, G., Liu, Z., Van Der Maaten, L. and Weinberger, K.Q., 2017. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4700-4708). August 2016 Each layer receives input from all the previous layers, creating a dense network of connections between the layers, allowing to learn more diverse features
Chollet, F., 2017. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1251-1258). October 2016 Based on InceptionV3 but uses depthwise separable convolutions instead on inception modules
Xie, S., Girshick, R., Dollár, P., Tu, Z. and He, K., 2017. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1492-1500). November 2016 Built over ResNet, introduces the concept of grouped convolutions, where the filters in a convolutional layer are divided into multiple groups
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M. and Adam, H., 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861. April 2017 Uses depthwise separable convolutions to reduce the number of parameters and computation required
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A. and Chen, L.C., 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4510-4520). January 2018 Built upon the MobileNetv1 architecture, uses inverted residuals and linear bottlenecks
Howard, A., Sandler, M., Chu, G., Chen, L.C., Chen, B., Tan, M., Wang, W., Zhu, Y., Pang, R., Vasudevan, V. and Le, Q.V., 2019. Searching for mobilenetv3. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 1314-1324). May 2019 Uses AutoML to find the best possible neural network architecture for a given problem
Tan, M. and Le, Q., 2019, May. Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning (pp. 6105-6114). PMLR. May 2019 Uses a compound scaling method to scale the network’s depth, width, and resolution to achieve a high accuracy with a relatively low computational cost
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S. and Uszkoreit, J., 2020. An image is worth 16×16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929. October 2020 Images are segmented into patches, which are treated as tokens and a sequence of linear embeddings of these patches are input to a Transformer
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S. and Guo, B., 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 10012-10022). March 2021 A hierarchical vision transformer that uses shifted windows to addresses the challenges of adapting the transformer model to Computer Vision
Mehta, S. and Rastegari, M., Mobilevit: Light-weight, general-purpose, and mobile-friendly vision transformer. arXiv 2021. arXiv preprint arXiv:2110.02178. October 2021 A lightweight vision transformer designed for mobile devices, effectively combining the strengths of CNNs and ViTs
Trockman, A. and Kolter, J.Z., 2022. Patches are all you need?. arXiv preprint arXiv:2201.09792. January 2022 Processes image patches using standard convolutions for mixing spatial and channel dimensions
Need help? Get in touch

Contact Email

Need help? Get in touch

Contact email