THUIR

Opensource 开源项目与数据

Follow us on GitHub

Toolkits

ReChorus2.0Top-K Recommendation with Implicit Feedback

ReChorus2.0 is a modular and task-flexible PyTorch library for recommendation, especially for research purpose. It aims to provide researchers a flexible framework to implement various recommendation tasks, compare different algorithms, and adapt to diverse and highly-customized data inputs.

ReChorusTop-K Recommendation with Implicit Feedback

ReChorus is a general PyTorch framework for Top-K recommendation with implicit feedback, especially for research purpose. It aims to provide a fair benchmark to compare different state-of-the-art algorithms. We hope this can partially alleviate the problem that different papers adopt non-comparable experimental settings, so as to form a “Chorus” of recommendation algorithms.

ULTRAUnbiased Learning to Rank Algorithm

ULTRA is an Unbiased Learning To Rank Algorithms toolbox that provides a codebase for experiments and research on learning to rank with human annotated or noisy labels. With the unified data processing pipeline, ULTRA supports multiple unbiased learning-to-rank algorithms, online learning-to-rank algorithms, neural learning-to-rank models, as well as different methods to use and simulate noisy labels (e.g., clicks) to train and test different algorithms/ranking models

Datasets

Description: We provide this Chinene-centric TianGong-CRL dataset to support researches in epidemic related Information Retrieval (IR) tasks and information needs of Chinese people in the context of COVID-19. Refined from an 82-day search log by Sogou, the second largest search engine in China, the dataset consists of two parts. The first part provides a collection of 1492 COVID-19 related queries and the submission frequency of these queries in each province of China over an 82-day period, the second part provides a sample of COVID-19-related search logs during the period, we only provide session-level data for user privacy concerns. We also sample a subset of 1,700 sessions from TianGong-CRL and manually label each session with five intent labels.
Description: On Annotation Methodologies for Image Search Evaluation
Description: The influence of image search intents on user behavior and satisfaction
Description: Understanding Reading Attention Distribution during Relevance Judgement.
Description: The Sogou-SRR (Search Result Relevance) dataset was constructed to support researches on search engine relevance estimation and ranking tasks. The dataset consists of 6,338 queries and corresponding top 10 search results. For each search result, the screenshot, title, snippet, HTML source code, parse tree, url as well as a 4-grade relevance score (1-4) and the result type are provided. The queries are sampled from search logs of Sogou.com. The sampled queries with frequency between 100 and 10,000 are usually regarded as torso queries , and usually the most important concerns for ranking algorithm design.
Description: The Sogou-QCL dataset was created to support research on information retrieval and related human language technologies. The dataset consists of 537,366 queries, more than 9 million Chinese web pages, and five kinds of relevance labels assessed by click models. Moreover, a 2,000-queries’ dataset with 4-level human assessed relevance labels is also offered to the public for research.
Description: The Tiangong-ULTR (Unbiased Learning To Rank) dataset is constructed to support the studies on unbiased learning to rank. This dataset provides real click data sampled from the search logs of Sogou.com for the training of unbiased learning to rank algorithm as well as a seperate set of human-annotated data for the evaluation of their performance.
Description: This dataset was created to support research on search evaluation in exploratory search. We conducted a user study which contained 166 search sessions in three domains. Users’ interactions and explicit feedback were collected during searching process. The clicked documents collected in the user study were annotated by external assessors.
Description: ZhihuRec dataset is collected from a knowledge-sharing platform (Zhihu), which is composed of around 100M interactions collected within 10 days, 798K users, 165K questions, 554K answers, 240K authors, 70K topics, and more than 501K user query logs. There are also descriptions of users, answers, questions, authors, and topics, which are anonymous. To the best of our knowledge, this is the largest real-world interaction dataset for personalized recommendation.
Description: T2Ranking is a large-scale Chinese benchmark for passage ranking, including passage retrieval and re-ranking. T2Ranking comprises more than 300K queries and over 2M unique passages from real- world search engines. Specifically, we sample question-based search queries from user logs of the Sogou search engine, a popular search system in China. For each query, we extract the content of corresponding documents from different search engines. After model-based passage segmentation and clustering-based passage de-duplication, a large-scale passage corpus is obtained. For a given query and its corresponding passages, we hire expert annotators to provide 4-level relevance judgments of each query-passage pair.
Description: EEG-SVRec is the first EEG dataset with user multidimensional affective engagement labels in short video recommendation. It can be used for a deeper exploration of affective experience and cognitive activity behind user behaviors in recommender systems.
Description: STARD is a Chinese dataset that compiles 1,543 query cases from real legal consultations and 55,348 candidate statutory articles, aimed at addressing the neglect of non-professional public queries in existing statute retrieval benchmarks, thereby more comprehensively capturing the complexity and diversity of real queries from the public.
Description: HELM is a hallucination detection benchmark dataset for large language models (LLM), offering texts generated by six different LLMs along with their hallucination annotations. It also includes contextual embeddings, self-attentions, and hidden-layer activations for each token during the inference process of each LLM, providing a detailed snapshot for studying the internal state changes of these models.
Description: The URS (User Reported Scenario) dataset comprises 1,846 real-world conversations with 15 LLM services, contributed by 712 users from 23 countries across 6 continents. Each scenario is classified into six user intent categories. This dataset, characterized by its user-centric, multi-intent, and multi-cultural nature, provides a valuable resource for advancing user-centric evaluations of LLMs.
Description: GNN4EEG is a benchmark and toolkit focusing on Electroencephalography (EEG) classification tasks via Graph Neural Network (GNN), aiming to facilitate research in this direction. Researchers can arbitrarily choose their preferred GNN models, hyper-parameters and experimental protocols. Training and evaluating dataset can be flexibly chosen as any self-built datasets.
Description: LeKUBE serves as a benchmark for knowledge updating methods in the legal domain, which is distinct from general domain knowledge updating. The legal domain presents unique challenges such as legal reasoning, application of law, and the length of legal regulations. LeKUBE concentrates on these challenges, providing a comprehensive evaluation of knowledge updating methods in the legal domain across five dimensions (accuracy, generalizability, locality, retainability, and scalability).

Special thanks to Shuqi Zhu for the initial construction of this page.