1 |
Almeida, T. S., Abonizio, H., and Nogueira, R. (2024). Sabiá-2: A New Generation of Portuguese Large Language Models. arXiv preprint arXiv:2403.09887
|
|
2 |
Blanco, R., Halpin, H., Herzig, D. M., Mika, P., Pound, J., Thompson, H. S., and Tran Duc, T. (2011). Repeatable and reliable search system evaluation using crowdsourcing. In ACM SIGIR conference on Research and development in Information Retrieval, pages 923–932.
|
|
3 |
Bueno, M., de Oliveira, E. S., Nogueira, R., Lotufo, R. A., and Pereira, J. A. (2024). Quati: A brazilian portuguese information retrieval dataset from native speakers. arXiv preprint arXiv:2404.06976.
|
|
4 |
Cleverdon, C. W. (1960). The aslib cranfield research project on the comparative efficiency of indexing systems. In Aslib Proceedings, volume 12, pages 421–431.
|
|
5 |
de Jesus, G. and Nunes, S. (2024). Exploring large language models for relevance judgments in tetun. arXiv preprint arXiv:2406.07299.
|
|
6 |
Faggioli, G., Dietz, L., Clarke, C. L., Demartini, G., Hagen, M., Hauff, C., Kando, N., Kanoulas, E., Potthast, M., Stein, B., et al. (2023). Perspectives on large language models for relevance judgment. In ACM SIGIR International Conference on Theory of Information Retrieval, pages 39–50.
|
|
7 |
Lima de Oliveira, L., Romeu, R. K., and Moreira, V. P. (2021). REGIS: A Test Collection for Geoscientific Documents in Portuguese. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, page 2363–2368.
|
|
8 |
Piau, M., Lotufo, R., and Nogueira, R. (2024). ptt5-v2: A closer look at continued pretraining of T5 models for the portuguese language. arXiv preprint arXiv:2406.10806.
|
|
9 |
Rahmani, H. A., Craswell, N., Yilmaz, E., Mitra, B., and Campos, D. (2024). Synthetic test collections for retrieval evaluation. arXiv preprint arXiv:2405.07767.
|
|
10 |
Resnick, A. and Savage, T. R. (1964). The consistency of human judgments of relevance. American Documentation, 15(2):93–95.
Santos, D. and Rocha, P. (2004). The key to the first CLEF with Portuguese: Topics, questions and answers in CHAVE. In Workshop of the Cross-Language Evaluation Forum for European Languages, pages 821–832. Springer.
|
|
11 |
Soviero, B., Kuhn, D., Salle, A., and Moreira, V. P. (2024). ChatGPT goes shopping: LLMs can predict relevance in ecommerce search. In European Conference on Information Retrieval, pages 3–11.
|
|
12 |
Spärck Jones, K. and van Rijsbergen, C. J. (1975). Report on the need for and provision of an "ideal" information retrieval test collection. Computer Laboratory, University of Cambridge.
|
|
13 |
Spärck Jones, K.,Walker, S., and Robertson, S. E. (2000). A probabilistic model of information retrieval: development and comparative experiments. Information processing & management, 36(6):809–840.
|
|
14 |
Theodosiou, Z., Georgiou, O., and Tsapatsoulis, N. (2011). Evaluating annotators consistency with the aid of an innovative database schema. In International Workshop on Semantic Media Adaptation and Personalization, pages 74–78.
|
|
15 |
Thomas, P., Spielman, S., Craswell, N., and Mitra, B. (2023). Large language models can accurately predict searcher preferences. arXiv preprint arXiv:2309.10621.
|
|
16 |
Voorhees, E. M. (2000). Variations in relevance judgments and the measurement of retrieval effectiveness. Information Processing & Management, 36(5):697–716.
|
|
17 |
Wang, L., Yang, N., Huang, X., Yang, L., Majumder, R., andWei, F. (2024). Multilingual e5 text embeddings: A technical report. arXiv preprint arXiv:2402.05672.
|
|
18 |
Zheng, L., Chiang, W.-L., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., Lin, Z., Li, Z., Li, D., Xing, E., et al. (2024). Judging LLM-as-a-judge with MT-bench and chatbot arena. Advances in Neural Information Processing Systems, 36.
|
|
19 |
Zhu, E., Sheng, Q., Yang, H., Liu, Y., Cai, T., and Li, J. (2023). A unified framework of medical information annotation and extraction for chinese clinical text. Artificial Intelligence in Medicine, 142:102573.
|
|