Svoboda | Graniru | BBC Russia | Golosameriki | Facebook

short-paper

Multimodality in Media Retrieval

Author:

Maria Eirini PegiaAuthors Info & Claims

ICMR '24: Proceedings of the 2024 International Conference on Multimedia Retrieval

Pages 1219 - 1223

https://doi.org/10.1145/3652583.3657583

Published: 07 June 2024 Publication History

Abstract

The quest for retrieving relevant media for a given query is well-studied and has various applications. Modern publicly available media collections provide diverse modalities of the same objects, which can enhance search. Our research delves into enhancing media retrieval by effectively representing and querying multimodal data. In the retrieval methods' ranking procedure, we examine efficiency through techniques like approximate nearest neighbor (ANN) indexing and high-performance computing (HPC). Our method, MuseHash, is proposed for single media object retrieval and is applied to images and 3D objects, outperforming existing methods on diverse datasets. Moreover, it significantly reduces execution times with ANN and HPC. Future plans include considering multimodality in the video retrieval domain.

References

[1]

Charu C Aggarwal. 2018. Information retrieval and search engines. Machine Learning for Text (2018), 259--304.

[2]

Giuseppe Amato, Fabio Carrara, Fabrizio Falchi, and Claudio Gennaro. 2017. Efficient indexing of regional maximum activations of convolutions using fulltext search engines. In Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval. 420--423.

Digital Library

[3]

Martin Aumüller, Erik Bernhardsson, and Alexander Faithfull. 2017. ANNBenchmarks: A Benchmarking Tool for Approximate Nearest Neighbor Algorithms. In International Conference on Similarity Search and Applications. Springer, 34--49.

[4]

Domenico D Bloisi, Luca Iocchi, Andrea Pennisi, and Luigi Tombolini. 2015. ARGOS-Venice Boat Classification. In 2015 12th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, 1--6.

[5]

Ilker Bozcan and Erdal Kayacan. 2020. AU-AIR: A Multi-modal Unmanned Aerial Vehicle Dataset for Low Altitude Traffic Surveillance. In 2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 8504--8510.

[6]

M Lo Brutto and Paola Meli. 2012. Computer Vision Tools for 3D Modelling in Archaeology. International Journal of Heritage in the Digital Era 1, 1_suppl (2012), 1--6.

[7]

Yue Cao, Mingsheng Long, Bin Liu, and Jianmin Wang. 2018. Deep Cauchy Hashing for Hamming Space Retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1229--1237.

[8]

Botao Chen, Xi Mu, Peng Chen, Biao Wang, Jaewan Choi, Honglyun Park, Sheng Xu, YanlanWu, and Hui Yang. 2021. Machine Learning-based Inversion of Water Quality Parameters in Typical Reach of the Urban River by UAV Multispectral Data. Ecological Indicators 133 (2021), 108434.

[9]

Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yantao Zheng. 2009. NUS-WIDE: A Real-World Web Image Database from National University of Singapore. In Proceedings of the ACM International Conference on Image and Video Retrieval. 1--9.

Digital Library

[10]

Dario Di Palma. 2023. Retrieval-augmented recommender system: Enhancing recommender systems with large language models. In Proceedings of the 17th ACM Conference on Recommender Systems. 1369--1373.

Digital Library

[11]

Cathal Gurrin, Björn Þór Jónsson, Duc Tien Dang Nguyen, Graham Healy, Jakub Lokoc, Liting Zhou, Luca Rossetto, Minh-Triet Tran, Wolfgang Hürst, Werner Bailer, et al. 2023. Introduction to The Sixth Annual Lifelog Search Challenge, LSC'23. In Proceedings of the 2023 ACM International Conference on Multimedia Retrieval. 678--679.

Digital Library

[12]

Young-Soo Han, Jaejoon Lee, Jungmin Lee,Wonhyuk Lee, and Kyungho Lee. 2019. 3D CAD Data Extraction and Conversion for Application of Augmented/Virtual Reality to the Construction of Ships and Offshore Structures. International Journal of Computer Integrated Manufacturing 32, 7 (2019), 658--668.

[13]

Xiangnan He, Hanwang Zhang, and Tat-Seng Chua. 2018. Recommendation Technologies for Multimedia Content. In ICMR. 8.

[14]

Mark J Huiskes and Michael S Lew. 2008. The Mir Flickr Retrieval Evaluation. In Proceedings of the 1st ACM International Conference on Multimedia Information retrieval. 39--43.

Digital Library

[15]

Longlong Jing, Elahe Vahdani, Jiaxing Tan, and Yingli Tian. 2021. Cross-Modal Center Loss for 3D Cross-Modal Retrieval. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3142--3151.

[16]

Omar Shahbaz Khan, Björn Þór Jónsson, Jan Zahálka, Stevan Rudinac, and Marcel Worring. 2021. Impact of Interaction Strategies on User Relevance Feedback. In Proceedings of the 2021 International Conference on Multimedia Retrieval. 590--598.

Digital Library

[17]

Omar Shahbaz Khan, Jan Zahálka, and Björn Þór Jónsson. 2022. Influence of Late Fusion of High-Level Features on User Relevance Feedback for Videos. In Proceedings of the 2nd International Workshop on Interactive Multimedia Retrieval. 17--24.

Digital Library

[18]

Margarita Khokhlova, Valérie Gouet-Brunet, Nathalie Abadie, and Liming Chen. 2020. Cross-Year Multi-Modal Image Retrieval Using Siamese Networks. In 2020 IEEE International Conference on Image Processing (ICIP). IEEE, 1--5.

[19]

Chao Li, Cheng Deng, Ning Li, Wei Liu, Xinbo Gao, and Dacheng Tao. 2018. Self-Supervised Adversarial Hashing Networks for Cross-Modal Retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4242--4251.

[20]

Mingbao Lin, Rongrong Ji, Hong Liu, and Yongjian Wu. 2018. Supervised Online Hashing via Hadamard Codebook Learning. In Proceedings of the 26th ACM International Conference on Multimedia. 1635--1643.

Digital Library

[21]

Zijia Lin, Guiguang Ding, Mingqing Hu, and Jianmin Wang. 2015. Semantics-Preserving Hashing for Cross-View Retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3864--3872.

[22]

Xin Liu, Zhikai Hu, Haibin Ling, and Yiu-ming Cheung. 2019. MTFH: A matrix tri-factorization hashing framework for efficient cross-modal retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 3 (2019), 964--981.

[23]

Jakub Lokoč, Stelios Andreadis, Werner Bailer, Aaron Duane, Cathal Gurrin, Zhixin Ma, Nicola Messina, Thao-Nhu Nguyen, Ladislav Pe?ka, Luca Rossetto, et al. 2023. Interactive Video Retrieval in the Age of Effective Joint Embedding Deep Models: Lessons from the 11th VBS. Multimedia Systems 29, 6 (2023), 3481--3504.

Digital Library

[24]

Devraj Mandal, Kunal N Chaudhury, and Soma Biswas. 2017. Generalizes Semantic Preserving Hashing for N-Label Cross-Modal Retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4076--4084.

[25]

Andrew Marx, Donald McFarlane, and Ahmed Alzahrani. 2017. UAV Data for Multi-temporal Landsat Analysis of Historic Reforestation: A Case Study in Costa Rica. International Journal of Remote Sensing 38, 8--10 (2017), 2331--2348.

Digital Library

[26]

Elisa Mohr, Thomas Thum, and Christian Bär. 2022. Accelerating Cardiovascular Research: Recent Advances in Translational 2D and 3D Heart Models. European Journal of Heart Failure 24, 10 (2022), 1778--1791.

[27]

Nick Pantelidis, Maria Pegia, Damianos Galanopoulos, Konstantinos Apostolidis, Klearchos Stavrothanasopoulos, Anastasia Moumtzidou, Konstantinos Gkountakos, Ilias Gialampoukidis, Stefanos Vrochidis, Vasileios Mezaris, et al. 2024. VERGE in VBS 2024. In International Conference on Multimedia Modeling. Springer, 356--363.

[28]

Maria Pegia, Björn Þór Jónsson, Anastasia Moumtzidou, Sotiris Diplaris, Ilias Gialampoukidis, Stefanos Vrochidis, and Ioannis Kompatsiaris. 2024. Multimodal 3D Object Retrieval. In International Conference on Multimedia Modeling. Springer, 188--201.

Digital Library

[29]

Maria Pegia, Björn Þór Jónsson, Anastasia Moumtzidou, Ilias Gialampoukidis, Stefanos Vrochidis, and Ioannis Kompatsiaris. 2023. MuseHash: Supervised Bayesian Hashing for Multimodal Image Representation. In Proceedings of the 2023 ACM International Conference on Multimedia Retrieval. 434--442.

Digital Library

[30]

Maria Pegia, Ferran Agullo Lopez, Anastasia Moumtzidou, Alberto Gutierrez-Torre, Björn Þór Jónsson, Josep Lluís Berral García, Ilias Gialampoukidis, Stefanos Vrochidis, and Ioannis Kompatsiaris. 2024. Time-Quality Tradeoff of MuseHash Query Processing Performance. In International Conference on Multimedia Modeling. Springer, 270--283.

[31]

Maria Pegia, Anastasia Moumtzidou, Ilias Gialampoukidis, Björn Þór Jónsson, Stefanos Vrochidis, and Ioannis Kompatsiaris. 2022. BiasHash: A Bayesian Hashing Framework for Image Retrieval. In 2022 IEEE 14th Image, Video, and Multidimensional Signal Processing Workshop (IVMSP). IEEE, 1--5.

[32]

Renjing Pei, Jianzhuang Liu, Weimian Li, Bin Shao, Songcen Xu, Peng Dai, Juwei Lu, and Youliang Yan. 2023. CLIPPING: Distilling CLIP-Based Models with a Student Base for Video-Language Retrieval. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18983--18992.

[33]

Anil Rahate, Rahee Walambe, Sheela Ramanna, and Ketan Kotecha. 2022. Multimodal Co-learning: Challenges, applications with datasets, recent advances and future directions. Information Fusion 81 (2022), 203--239.

Digital Library

[34]

Patricia Schöntag, David Nakath, Stefan Röhrl, and Kevin Köser. 2022. Towards Cross Domain Transfer Learning for Underwater Correspondence Search. In International Conference on Image Analysis and Processing. Springer, 461--472.

[35]

Leon Amadeus Varga, Benjamin Kiefer, Martin Messmer, and Andreas Zell. [n. d.]. SeaDronesSee: A Maritime Benchmark for Detecting Humans in Open Water. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision.

[36]

Yongxin Wang, Zhen-Duo Chen, Xin Luo, Rui Li, and Xin-Shun Xu. 2021. Fast Cross-Modal Hashing With Global and Local Similarity Embedding. IEEE Transactions on Cybernetics 52, 10 (2021), 10064--10077.

[37]

Martin J Willemink, Wojciech A Koszek, Cailin Hardell, Jie Wu, Dominik Fleischmann, Hugh Harvey, Les R Folio, Ronald M Summers, Daniel L Rubin, and Matthew P Lungren. 2020. Preparing Medical Imaging Data for Machine Learning. Radiology 295, 1 (2020), 4--15.

[38]

Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, and Jianxiong Xiao. 2015. 3D Shape Nets: A Deep Representation for Volumetric Shapes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1912--1920.

[39]

Yanzhao Xie, Yu Liu, Yangtao Wang, Lianli Gao, Peng Wang, and Ke Zhou. 2020. Label-Attended Hashing for Multi-Label Image Retrieval. In IJCAI. 955--962.

[40]

Bo-Hyun Yun and Chang-Ho Seo. 2003. Semantic-based Information Retrieval for Content Management and Security. Computational Intelligence 19, 2 (2003), 87--110.

[41]

Yu-Wei Zhan, Yongxin Wang, Yu Sun, Xiao-Ming Wu, Xin Luo, and Xin-Shun Xu. 2022. Discrete Online Cross-Modal Hashing. Pattern Recognition 122 (2022), 108262.

Digital Library

Index Terms

Multimodality in Media Retrieval

Recommendations

Context-aware media retrieval
CIVR'06: Proceedings of the 5th international conference on Image and Video Retrieval

In this paper we propose a representation framework for dynamic multi-sensory knowledge and user context, and its application in media retrieval. We provide a definition of context, the relationship between context and knowledge and the importance of ...
What fresh media are you looking for?: retrieving media items from multiple social networks
SAM '12: Proceedings of the 2012 international workshop on Socially-aware multimedia

Social networks play an increasingly important role for sharing media items related to daily life moments or for the live coverage of events. One of the problems is that media are spread over multiple social networks. In this paper, we propose a social ...
Speech recognition tools in a media retrieval system
AIEMPro '11: Proceedings of the 2011 ACM international workshop on Automated media analysis and production for novel TV services

Broadcast video retrieval is a key issue for media researchers looking for suitable media material in archives. Current media retrieval applications in use at VRT have proven to be a suboptimal solution. In this paper, we explain a novel search ...

Comments

Information & Contributors

Information

Published In

ICMR '24: Proceedings of the 2024 International Conference on Multimedia Retrieval

May 2024

1379 pages

ISBN:9798400706196

DOI:10.1145/3652583

General Chairs:
Cathal Gurrin
Dublin City University, Ireland
,
Rachada Kongkachandra
Thammasat University, Thailand
,
Klaus Schoeffmann
Klagenfurt University, Austria
,
Program Chairs:
Duc-Tien Dang-Nguyen
University of Bergen, Norway
,
Luca Rossetto
University of Zurich, Switzerland
,
Shin'ichi Satoh
National Institute of Informatics, Japan
,
Liting Zhou
Dublin City University, Ireland

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 June 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

European Commission

Conference

ICMR '24

Sponsor:

ICMR '24: International Conference on Multimedia Retrieval

June 10 - 14, 2024

Phuket, Thailand

Acceptance Rates

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
25
Total Downloads

Downloads (Last 12 months)25
Downloads (Last 6 weeks)9

Reflects downloads up to 16 Aug 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents