Discover a range of important and new studies in the areas of Computer Vision and Machine Learning. This page features a mix of classic and recent papers, regularly updated with fresh insights.
This collection is a valuable resource for practitioners seeking in-depth knowledge. However, for those looking to practically apply these insights, our API simplifies the process. By integrating our user-friendly API, you can effortlessly implement Machine Learning models in your Computer Vision projects. Whether you’re delving into the theoretical underpinnings or eager to apply these concepts in real-world applications, our resources and tools are designed to support your journey in Computer Vision.
Must-know papers in Computer Vision
Paper | Date | Key concept |
---|---|---|
LeCun, Y., Bottou, L., Bengio, Y. and Haffner, P., 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), pp.2278-2324. | December 1998 | Introduced Convolutions |
Krizhevsky, A., Sutskever, I. and Hinton, G.E., 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25. | September 2012 | Introduced ReLU activation and Dropout to CNNs |
Simonyan, K. and Zisserman, A., 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. | September 2014 | Used large number of filters of small size in each layer to learn complex features |
Articles, strategies and tutorials for building production grade ML systems, Simeon Emanuilov, 2023 | December 2023 | How to deploy and optimize Machine Learning models |
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V. and Rabinovich, A., 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1-9). | September 2014 | Introduced Inception Modules consisting of multiple parallel convolutional layers, designed to recognize different features at multiple scales |
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J. and Wojna, Z., 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2818-2826). | December 2015 | Design Optimizations of the Inception Modules which improved performance and accuracy |
He, K., Zhang, X., Ren, S. and Sun, J., 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778). | December 2015 | Introduced residual connections, which are shortcuts that bypass one or more layers in the network. |
Szegedy, C., Ioffe, S., Vanhoucke, V. and Alemi, A., 2017, February. Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the AAAI conference on artificial intelligence (Vol. 31, No. 1). | February 2016 | Hybrid approach combining Inception Net and ResNet |
Huang, G., Liu, Z., Van Der Maaten, L. and Weinberger, K.Q., 2017. Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4700-4708). | August 2016 | Each layer receives input from all the previous layers, creating a dense network of connections between the layers, allowing to learn more diverse features |
Chollet, F., 2017. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1251-1258). | October 2016 | Based on InceptionV3 but uses depthwise separable convolutions instead on inception modules |
Xie, S., Girshick, R., Dollár, P., Tu, Z. and He, K., 2017. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1492-1500). | November 2016 | Built over ResNet, introduces the concept of grouped convolutions, where the filters in a convolutional layer are divided into multiple groups |
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M. and Adam, H., 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861. | April 2017 | Uses depthwise separable convolutions to reduce the number of parameters and computation required |
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A. and Chen, L.C., 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4510-4520). | January 2018 | Built upon the MobileNetv1 architecture, uses inverted residuals and linear bottlenecks |
Howard, A., Sandler, M., Chu, G., Chen, L.C., Chen, B., Tan, M., Wang, W., Zhu, Y., Pang, R., Vasudevan, V. and Le, Q.V., 2019. Searching for mobilenetv3. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 1314-1324). | May 2019 | Uses AutoML to find the best possible neural network architecture for a given problem |
Tan, M. and Le, Q., 2019, May. Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning (pp. 6105-6114). PMLR. | May 2019 | Uses a compound scaling method to scale the network’s depth, width, and resolution to achieve a high accuracy with a relatively low computational cost |
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S. and Uszkoreit, J., 2020. An image is worth 16×16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929. | October 2020 | Images are segmented into patches, which are treated as tokens and a sequence of linear embeddings of these patches are input to a Transformer |
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S. and Guo, B., 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 10012-10022). | March 2021 | A hierarchical vision transformer that uses shifted windows to addresses the challenges of adapting the transformer model to Computer Vision |
Mehta, S. and Rastegari, M., Mobilevit: Light-weight, general-purpose, and mobile-friendly vision transformer. arXiv 2021. arXiv preprint arXiv:2110.02178. | October 2021 | A lightweight vision transformer designed for mobile devices, effectively combining the strengths of CNNs and ViTs |
Trockman, A. and Kolter, J.Z., 2022. Patches are all you need?. arXiv preprint arXiv:2201.09792. | January 2022 | Processes image patches using standard convolutions for mixing spatial and channel dimensions |
Contact Email
Contact email