Efficient Rank Elicitation from Pairwise Comparisons

 
Abstract

Rank aggregation is a widely applicable task in various domains, including voting, gaming, and recommendation systems. It involves combining pairwise or listwise comparisons to generate a unified ranking.

This topic has a long history and is still relevant today: it dates back to democratic voting systems in Ancient Greece; it was modeled on psychology experiments in the last century, and it is the cornerstone of reinforcement learning with human feedback (RLHF) in large language model (LLM) finetuning to produce high quality output aligned with human values. In the era of big data, with an abundance of available data, there is a growing need for efficient and accurate analysis to uncover hidden knowledge.

In the context of the usage and acquisition of data from multiple sources with different levels of quality during rank aggregation, this proposal aims to address the challenges of efficiently and accurately dealing with heterogeneous data sources. Specifically, we focus on two interconnected topics: active ranking and bandit problems.

Active ranking techniques aim to minimize the number of samples needed to generate an aggregated ranking by strategically selecting data based on existing information and rankings. Previous studies have explored methods under different transitivity assumptions, such as Strong Stochastic Transitivity (SST) or Weak Stochastic Transitivity (WST). In this proposal, our main focus is on improving methods based on these two assumptions to incorporate source accuracy to increase data efficiency and estimation accuracy.

In addition, if the objective of the algorithm is both to rank items and to collect rewards, the problem can be formulated as a dueling bandit problem. Previous research has primarily focused on achieving optimal regret when rewards are based on linear functions. However, the behavioral model suggests that the reward function is non-linear, leading to generalized linear models. Unfortunately, existing methods for this class of problems only approximate the confidence interval using a pessimistic estimate that is derived when dealing with linear models. In this study, we specifically investigate a special case within generalized linear models and propose adjusting the size of the confidence interval based on available information. We focus on the logistic link function, which is a subset of the generalized linear model. For this specific model, we show a method that can improve the efficiency and accuracy of determining whether the result falls within the function.

Committee:  

  • Cong Shen, Committee Chair (ECE/SEAS/UVA)
  • Farzad Farnoud, Advisor (ECE, CS/SEAS/UVA)
  • Chenyu Wei (CS/SEAS/UVA)
  • Shangtong Zhang (CS/SEAS/UVA)
  • Yen-Ling Kuo (CS/SEAS/UVA)
  • Quanquan Gu (CS/ENG/UCLA)
OSZAR »