Uğur Güdükbay's Publications

• Sorted by Date • Classified by Publication Type • Classified by Research Category •

Learning Visual Similarity for Image Retrieval with Global Descriptors and Capsule Networks

Duygu Durmuş, Uğur Güdükbay, and Özgür Ulusoy. Learning Visual Similarity for Image Retrieval with Global Descriptors and Capsule Networks. Multimedia Tools and Applications, 83(7):20243–20263, February 2024.

Download

[PDF]

Abstract

Finding matching images across large and unstructured datasets is vital in many computer vision applications. With the emergence of deep learning-based solutions, various visual tasks, such as image retrieval, have been successfully addressed. Learning visual similarity is crucial for image matching and retrieval tasks. Capsule Networks enable learning richer information that describes the object without losing the essential spatial relationship between the object and its parts. Besides, global descriptors are widely used for representing images. We propose a framework that combines the power of global descriptors and Capsule Networks by benefiting from the information of multiple views of images to enhance the image retrieval performance. The Spatial Grouping Enhance strategy, which enhances within internal representations of images, are utilized to empower the image representations. sub-features parallelly, and self-attention layers, which explore global dependencies The approach captures resemblances between similar images and differences between non-similar images using triplet loss and cost-sensitive regularized cross-entropy loss. The results are superior to the state-of-the-art approaches for the Stanford Online Products Database with Recall@K of 85.0, 94.4, 97.8, and 99.3, where K is 1, 10, 100, and 1000, respectively.

BibTeX

@Article{DurmusUO24,
  author    = {Duygu Durmu{\c s} and
               U{\^g}ur G{\"u}d{\"u}kbay and
			   {\"O}zg{\"u}r Ulusoy},
  title     = {{Learning Visual Similarity for Image Retrieval with Global Descriptors and Capsule Networks}},
  journal   = {Multimedia Tools and Applications},
  volume    = {83},
  number    = {7},
  year      = {2024},
  month	    = {February},
  pages     = {20243-20263},
  keywords  = {Deep learning, Neural networks, Capsule networks, Global descriptors, Image retrieval, 
               Triplet loss, Cost-sensitive regularized cross-entropy loss},
  abstract  = {Finding matching images across large and unstructured datasets is vital in many 
               computer vision applications. With the emergence of deep learning-based solutions,
               various visual tasks, such as image retrieval, have been successfully addressed. 
               Learning visual similarity is crucial for image matching and retrieval tasks. 
               Capsule Networks enable learning richer information that describes the object 
               without losing the essential spatial relationship between the object and its parts. 
               Besides, global descriptors are widely used for representing images. We propose 
               a framework that combines the power of global descriptors and Capsule Networks 
               by benefiting from the information of multiple views of images to enhance the 
               image retrieval performance. The Spatial Grouping Enhance strategy, which enhances
               within internal representations of images, are utilized to empower the image representations. 
               sub-features parallelly, and self-attention layers, which explore global dependencies
               The approach captures resemblances between similar images and differences between 
               non-similar images using triplet loss and cost-sensitive regularized cross-entropy loss. 
               The results are superior to the state-of-the-art approaches for the Stanford Online 
               Products Database with Recall@K of 85.0, 94.4, 97.8, and 99.3, where K is 1, 10, 100, 
               and 1000, respectively.},
  ee        = {https://link.springer.com/article/10.1007/s11042-023-16164-5},
  bib2html_dl_pdf = "http://www.cs.bilkent.edu.tr/~gudukbay/publications/papers/journal_articles/Durmus_Et_Al_MTAP_2024.pdf",
  bib2html_pubtype = {Refereed Journal Articles},
  bib2html_rescat = {Multimedia Databases},
  bibsource = {DBLP, http://dblp.uni-trier.de}
}

Generated by bib2html.pl (written by Patrick Riley ) on Tue Jun 10, 2025 11:27:24