short-paper

A Preference Judgment Tool for Authoritative Assessment

Authors:

Mahsa Seifikar,

Linh Nhi Phan Minh,

Negar Arabzadeh,

Charles L. A. Clarke, and

Mark D. SmuckerAuthors Info & Claims

SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2023

Pages 3100 - 3104

https://doi.org/10.1145/3539618.3591801

Published: 18 July 2023 Publication History

Get Access

Abstract

Preference judgments have been established as an effective method for offline evaluation of information retrieval systems with advantages to graded or binary relevance judgments. Graded judgments assign each document a pre-defined grade level, while preference judgments involve assessing a pair of items presented side by side and indicating which is better. However, leveraging preference judgments may require a more extensive number of judgments, and there are limitations in terms of evaluation measures. In this study, we present a new preference judgment tool called JUDGO, designed for expert assessors and researchers. The tool is supported by a new heap-like preference judgment algorithm that assumes transitivity and allows for ties. An earlier version of the tool was employed by NIST to determine up to the top-10 best items for each of the 38 topics for the TREC 2022 Health Misinformation track, with over 2,200 judgments collected. The current version has been applied in a separate research study to collect almost 10,000 judgments, with multiple assessors completing each topic. The code and resources are available at https://judgo-system.github.io.

Supplementary Material

MP4 File (SIGIR23-dep3093.mp4)

This video introduces Judgo, an open-source preference judgment tool designed for expert assessors. Preference judgments offer a valuable approach for evaluating information retrieval systems, as they involve comparing pairs of items to determine preferences, rather than assigning fixed grades like traditional relevance judgments. In the video, we demonstrate the distinctions between various types of relevance assessments. We provide a comprehensive demo of the user interface, showcasing the different features and functionalities of Judgo. Additionally, we elaborate on the algorithm powering the tool, which employs a tournament-style approach based on a heap-like data structure. By the end of the video, viewers will have a clear understanding of Judgo's purpose, how preference judgments differ from other assessment methods, and the capabilities and advantages of the tool's user interface and underlying algorithm.

Download
45.32 MB

References

[1]

Omar Alonso, Daniel E Rose, and Benjamin Stewart. 2008. Crowdsourcing for relevance evaluation. In ACM SigIR forum, Vol. 42. ACM New York, NY, USA, 9--15.

Abstract

Supplementary Material

References

Index Terms

Recommendations

On the role of human and machine metadata in relevance judgment tasks

Relevance Judgments: Preferences, Scores and Ties

Comparing In Situ and Multidimensional Relevance Judgments

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations