Svoboda | Graniru | BBC Russia | Golosameriki | Facebook
skip to main content
10.1145/3652583.3657583acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
short-paper

Multimodality in Media Retrieval

Published: 07 June 2024 Publication History
  • Get Citation Alerts
  • Abstract

    The quest for retrieving relevant media for a given query is well-studied and has various applications. Modern publicly available media collections provide diverse modalities of the same objects, which can enhance search. Our research delves into enhancing media retrieval by effectively representing and querying multimodal data. In the retrieval methods' ranking procedure, we examine efficiency through techniques like approximate nearest neighbor (ANN) indexing and high-performance computing (HPC). Our method, MuseHash, is proposed for single media object retrieval and is applied to images and 3D objects, outperforming existing methods on diverse datasets. Moreover, it significantly reduces execution times with ANN and HPC. Future plans include considering multimodality in the video retrieval domain.

    References

    [1]
    Charu C Aggarwal. 2018. Information retrieval and search engines. Machine Learning for Text (2018), 259--304.
    [2]
    Giuseppe Amato, Fabio Carrara, Fabrizio Falchi, and Claudio Gennaro. 2017. Efficient indexing of regional maximum activations of convolutions using fulltext search engines. In Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval. 420--423.
    [3]
    Martin Aumüller, Erik Bernhardsson, and Alexander Faithfull. 2017. ANNBenchmarks: A Benchmarking Tool for Approximate Nearest Neighbor Algorithms. In International Conference on Similarity Search and Applications. Springer, 34--49.
    [4]
    Domenico D Bloisi, Luca Iocchi, Andrea Pennisi, and Luigi Tombolini. 2015. ARGOS-Venice Boat Classification. In 2015 12th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, 1--6.
    [5]
    Ilker Bozcan and Erdal Kayacan. 2020. AU-AIR: A Multi-modal Unmanned Aerial Vehicle Dataset for Low Altitude Traffic Surveillance. In 2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 8504--8510.
    [6]
    M Lo Brutto and Paola Meli. 2012. Computer Vision Tools for 3D Modelling in Archaeology. International Journal of Heritage in the Digital Era 1, 1_suppl (2012), 1--6.
    [7]
    Yue Cao, Mingsheng Long, Bin Liu, and Jianmin Wang. 2018. Deep Cauchy Hashing for Hamming Space Retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1229--1237.
    [8]
    Botao Chen, Xi Mu, Peng Chen, Biao Wang, Jaewan Choi, Honglyun Park, Sheng Xu, YanlanWu, and Hui Yang. 2021. Machine Learning-based Inversion of Water Quality Parameters in Typical Reach of the Urban River by UAV Multispectral Data. Ecological Indicators 133 (2021), 108434.
    [9]
    Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yantao Zheng. 2009. NUS-WIDE: A Real-World Web Image Database from National University of Singapore. In Proceedings of the ACM International Conference on Image and Video Retrieval. 1--9.
    [10]
    Dario Di Palma. 2023. Retrieval-augmented recommender system: Enhancing recommender systems with large language models. In Proceedings of the 17th ACM Conference on Recommender Systems. 1369--1373.
    [11]
    Cathal Gurrin, Björn Þór Jónsson, Duc Tien Dang Nguyen, Graham Healy, Jakub Lokoc, Liting Zhou, Luca Rossetto, Minh-Triet Tran, Wolfgang Hürst, Werner Bailer, et al. 2023. Introduction to The Sixth Annual Lifelog Search Challenge, LSC'23. In Proceedings of the 2023 ACM International Conference on Multimedia Retrieval. 678--679.
    [12]
    Young-Soo Han, Jaejoon Lee, Jungmin Lee,Wonhyuk Lee, and Kyungho Lee. 2019. 3D CAD Data Extraction and Conversion for Application of Augmented/Virtual Reality to the Construction of Ships and Offshore Structures. International Journal of Computer Integrated Manufacturing 32, 7 (2019), 658--668.
    [13]
    Xiangnan He, Hanwang Zhang, and Tat-Seng Chua. 2018. Recommendation Technologies for Multimedia Content. In ICMR. 8.
    [14]
    Mark J Huiskes and Michael S Lew. 2008. The Mir Flickr Retrieval Evaluation. In Proceedings of the 1st ACM International Conference on Multimedia Information retrieval. 39--43.
    [15]
    Longlong Jing, Elahe Vahdani, Jiaxing Tan, and Yingli Tian. 2021. Cross-Modal Center Loss for 3D Cross-Modal Retrieval. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3142--3151.
    [16]
    Omar Shahbaz Khan, Björn Þór Jónsson, Jan Zahálka, Stevan Rudinac, and Marcel Worring. 2021. Impact of Interaction Strategies on User Relevance Feedback. In Proceedings of the 2021 International Conference on Multimedia Retrieval. 590--598.
    [17]
    Omar Shahbaz Khan, Jan Zahálka, and Björn Þór Jónsson. 2022. Influence of Late Fusion of High-Level Features on User Relevance Feedback for Videos. In Proceedings of the 2nd International Workshop on Interactive Multimedia Retrieval. 17--24.
    [18]
    Margarita Khokhlova, Valérie Gouet-Brunet, Nathalie Abadie, and Liming Chen. 2020. Cross-Year Multi-Modal Image Retrieval Using Siamese Networks. In 2020 IEEE International Conference on Image Processing (ICIP). IEEE, 1--5.
    [19]
    Chao Li, Cheng Deng, Ning Li, Wei Liu, Xinbo Gao, and Dacheng Tao. 2018. Self-Supervised Adversarial Hashing Networks for Cross-Modal Retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4242--4251.
    [20]
    Mingbao Lin, Rongrong Ji, Hong Liu, and Yongjian Wu. 2018. Supervised Online Hashing via Hadamard Codebook Learning. In Proceedings of the 26th ACM International Conference on Multimedia. 1635--1643.
    [21]
    Zijia Lin, Guiguang Ding, Mingqing Hu, and Jianmin Wang. 2015. Semantics-Preserving Hashing for Cross-View Retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3864--3872.
    [22]
    Xin Liu, Zhikai Hu, Haibin Ling, and Yiu-ming Cheung. 2019. MTFH: A matrix tri-factorization hashing framework for efficient cross-modal retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 3 (2019), 964--981.
    [23]
    Jakub Lokoč, Stelios Andreadis, Werner Bailer, Aaron Duane, Cathal Gurrin, Zhixin Ma, Nicola Messina, Thao-Nhu Nguyen, Ladislav Pe?ka, Luca Rossetto, et al. 2023. Interactive Video Retrieval in the Age of Effective Joint Embedding Deep Models: Lessons from the 11th VBS. Multimedia Systems 29, 6 (2023), 3481--3504.
    [24]
    Devraj Mandal, Kunal N Chaudhury, and Soma Biswas. 2017. Generalizes Semantic Preserving Hashing for N-Label Cross-Modal Retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4076--4084.
    [25]
    Andrew Marx, Donald McFarlane, and Ahmed Alzahrani. 2017. UAV Data for Multi-temporal Landsat Analysis of Historic Reforestation: A Case Study in Costa Rica. International Journal of Remote Sensing 38, 8--10 (2017), 2331--2348.
    [26]
    Elisa Mohr, Thomas Thum, and Christian Bär. 2022. Accelerating Cardiovascular Research: Recent Advances in Translational 2D and 3D Heart Models. European Journal of Heart Failure 24, 10 (2022), 1778--1791.
    [27]
    Nick Pantelidis, Maria Pegia, Damianos Galanopoulos, Konstantinos Apostolidis, Klearchos Stavrothanasopoulos, Anastasia Moumtzidou, Konstantinos Gkountakos, Ilias Gialampoukidis, Stefanos Vrochidis, Vasileios Mezaris, et al. 2024. VERGE in VBS 2024. In International Conference on Multimedia Modeling. Springer, 356--363.
    [28]
    Maria Pegia, Björn Þór Jónsson, Anastasia Moumtzidou, Sotiris Diplaris, Ilias Gialampoukidis, Stefanos Vrochidis, and Ioannis Kompatsiaris. 2024. Multimodal 3D Object Retrieval. In International Conference on Multimedia Modeling. Springer, 188--201.
    [29]
    Maria Pegia, Björn Þór Jónsson, Anastasia Moumtzidou, Ilias Gialampoukidis, Stefanos Vrochidis, and Ioannis Kompatsiaris. 2023. MuseHash: Supervised Bayesian Hashing for Multimodal Image Representation. In Proceedings of the 2023 ACM International Conference on Multimedia Retrieval. 434--442.
    [30]
    Maria Pegia, Ferran Agullo Lopez, Anastasia Moumtzidou, Alberto Gutierrez-Torre, Björn Þór Jónsson, Josep Lluís Berral García, Ilias Gialampoukidis, Stefanos Vrochidis, and Ioannis Kompatsiaris. 2024. Time-Quality Tradeoff of MuseHash Query Processing Performance. In International Conference on Multimedia Modeling. Springer, 270--283.
    [31]
    Maria Pegia, Anastasia Moumtzidou, Ilias Gialampoukidis, Björn Þór Jónsson, Stefanos Vrochidis, and Ioannis Kompatsiaris. 2022. BiasHash: A Bayesian Hashing Framework for Image Retrieval. In 2022 IEEE 14th Image, Video, and Multidimensional Signal Processing Workshop (IVMSP). IEEE, 1--5.
    [32]
    Renjing Pei, Jianzhuang Liu, Weimian Li, Bin Shao, Songcen Xu, Peng Dai, Juwei Lu, and Youliang Yan. 2023. CLIPPING: Distilling CLIP-Based Models with a Student Base for Video-Language Retrieval. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18983--18992.
    [33]
    Anil Rahate, Rahee Walambe, Sheela Ramanna, and Ketan Kotecha. 2022. Multimodal Co-learning: Challenges, applications with datasets, recent advances and future directions. Information Fusion 81 (2022), 203--239.
    [34]
    Patricia Schöntag, David Nakath, Stefan Röhrl, and Kevin Köser. 2022. Towards Cross Domain Transfer Learning for Underwater Correspondence Search. In International Conference on Image Analysis and Processing. Springer, 461--472.
    [35]
    Leon Amadeus Varga, Benjamin Kiefer, Martin Messmer, and Andreas Zell. [n. d.]. SeaDronesSee: A Maritime Benchmark for Detecting Humans in Open Water. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision.
    [36]
    Yongxin Wang, Zhen-Duo Chen, Xin Luo, Rui Li, and Xin-Shun Xu. 2021. Fast Cross-Modal Hashing With Global and Local Similarity Embedding. IEEE Transactions on Cybernetics 52, 10 (2021), 10064--10077.
    [37]
    Martin J Willemink, Wojciech A Koszek, Cailin Hardell, Jie Wu, Dominik Fleischmann, Hugh Harvey, Les R Folio, Ronald M Summers, Daniel L Rubin, and Matthew P Lungren. 2020. Preparing Medical Imaging Data for Machine Learning. Radiology 295, 1 (2020), 4--15.
    [38]
    Zhirong Wu, Shuran Song, Aditya Khosla, Fisher Yu, Linguang Zhang, Xiaoou Tang, and Jianxiong Xiao. 2015. 3D Shape Nets: A Deep Representation for Volumetric Shapes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1912--1920.
    [39]
    Yanzhao Xie, Yu Liu, Yangtao Wang, Lianli Gao, Peng Wang, and Ke Zhou. 2020. Label-Attended Hashing for Multi-Label Image Retrieval. In IJCAI. 955--962.
    [40]
    Bo-Hyun Yun and Chang-Ho Seo. 2003. Semantic-based Information Retrieval for Content Management and Security. Computational Intelligence 19, 2 (2003), 87--110.
    [41]
    Yu-Wei Zhan, Yongxin Wang, Yu Sun, Xiao-Ming Wu, Xin Luo, and Xin-Shun Xu. 2022. Discrete Online Cross-Modal Hashing. Pattern Recognition 122 (2022), 108262.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    ICMR '24: Proceedings of the 2024 International Conference on Multimedia Retrieval
    May 2024
    1379 pages
    ISBN:9798400706196
    DOI:10.1145/3652583
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 June 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. media retrieval
    2. multimodality
    3. optimization

    Qualifiers

    • Short-paper

    Funding Sources

    Conference

    ICMR '24
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 254 of 830 submissions, 31%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 25
      Total Downloads
    • Downloads (Last 12 months)25
    • Downloads (Last 6 weeks)9
    Reflects downloads up to 16 Aug 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media