1 |
Arabzadeh, N. et al. (2021). Predicting efficiency/effectiveness trade-offs for dense vs. sparse retrieval strategy selection. In CIKM, page 2862–2866.
|
|
2 |
Askari, A. et al. (2023). Injecting the bm25 score as text improves bert-based re-rankers. In ECIR, page 66–83.
|
|
3 |
Bassani, E. (2023). ranxhub: An online repository for information retrieval runs. In SIGIR, page 3210–3214.
|
|
4 |
Bruch, S. et al. (2023). An approximate algorithm for maximum inner product search over streaming sparse vectors. TOIS, 42(2):1–43.
|
|
5 |
Chen, Y. et al. (2024). PRompt optimization in multi-step tasks (PROMST): Integrating human feedback and heuristic-based sampling. In EMNLP, pages 3859–3920.
|
|
6 |
Cunha, W. et al. (2023). A comparative survey of instance selection methods applied to nonneural and transformer-based text classification. ACM.
|
|
7 |
Cunha, W., Moreo, A., Esuli, A., Sebastiani, F., Rocha, L., and Gonçalves, M. A. (2024). A noise-oriented and redundancy-aware instance selection framework. ACM TOIS.
|
|
8 |
Cunha, W., Rocha, L., and Gonçalves, M. A. (2025). A thorough benchmark of automatic text classification: From traditional approaches to large language models. arXiv.
|
|
9 |
de Andrade, C., Cunha, W., Reis, D., Pagano, A. S., Rocha, L., and Gonçalves, M. A. (2024). A strategy to combine 1stgen transformers and open llms for automatic text classification. arXiv.
|
|
10 |
de Andrade, C. M. et al. (2023). On the class separability of contextual embeddings representations – or “the classifier does not matter when the (text) representation is so good!”. Information Processing Management, 60(4).
|
|
11 |
Dettmers, T. et al. (2023). Qlora: Efficient finetuning of quantized llms. In Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., and Levine, S., editors, Advances in Neural Information Processing Systems, volume 36, pages 10088–10115.
|
|
12 |
Devlin, J. et al. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Burstein, J. and othes, editors, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4171–4186.
|
|
13 |
Dubey, A. et al. (2024). The llama 3 herd of models. arXiv.
|
|
14 |
França, C. and others. (2025). Optimizing tail-head trade-off for extreme multi-label text classification (xmtc) with rag-labels and a dynamic two-stage retrieval and fusion pipeline. In Proceedings of the 48th International ACM SIGIR conference on Research and Development in Information Retrieval.
|
|
15 |
França, C., Rabbi, G., Salles, T., Cunha, W., Rocha, L., and Gonçalves, M. A. (2025). Ranking-based fusion algorithms for extreme multi-label text classification (xmtc).
|
|
16 |
Guo, D., Yang, D., Zhang, H., Song, J., Zhang, R., Xu, R., Zhu, Q., Ma, S., Wang, P., Bi, X., et al. (2025). Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948.
|
|
17 |
Jiang, F. (2024). Identifying and mitigating vulnerabilities in llm-integrated applications. Master’s thesis, University of Washington.
|
|
18 |
Jiang, T. et al. (2021). Lightxml: Transformer with dynamic negative sampling for high-performance extreme multi-label text classification. In AAAI, volume 35, pages 7987–7994.
|
|
19 |
Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., and Zettlemoyer, L. (2020a). BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th ACL, pages 7871–7880, Online.
|
|
20 |
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.-t., Rocktäschel, T., Riedel, S., and Kiela, D. (2020b). Retrieval-augmented generation for knowledgeintensive nlp tasks. In Advances in Neural Information Processing Systems, pages 9459–9474.
|
|
21 |
Lin, S.-C. et al. (2023). Aggretriever: A Simple Approach to Aggregate Textual Representations for Robust Dense Passage Retrieval. TACL, 11:436–452.
|
|
22 |
Liu, J. et al. (2023). A contrastive learning framework for safety information extraction in construction. Advanced Engineering Informatics, 58:102194.
|
|
23 |
Llordes, M. et al. (2023). Explain like i am bm25: Interpreting a dense model’s ranked-list with a sparse approximation. In SIGIR, page 1976–1980. ACM.
|
|
24 |
Muennighoff, N., Wang, T., Sutawika, L., Roberts, A., Biderman, S., Scao, T. L., Bari, M. S., Shen, S., Yong, Z.-X., Schoelkopf, H., et al. (2022). Crosslingual generalization through multitask finetuning. arXiv preprint arXiv:2211.01786.
|
|
25 |
Penha, G. and Hauff, C. (2023). Do the findings of document and passage retrieval generalize to the retrieval of responses for dialogues? In ECIR, pages 132–147.
|
|
26 |
Sikosana, M., Ajao, O., and Maudsley-Barton, S. (2024). A comparative study of hybrid models in health misinformation text classification. OASIS ’24, page 18–25.
|
|
27 |
Sun, A., Lim, E.-P., and Liu, Y. (2009). On strategies for imbalanced text classification using svm: A comparative study. Decision Support Systems, 48(1):191–201.
|
|
28 |
Sy, C. Y., Maceda, L. L., Canon, M. J. P., and Flores, N. M. (2024). Beyond bert: Exploring the efficacy of roberta and albert in supervised multiclass text classification. International Journal of Advanced Computer Science & Applications, 15(3).
|
|
29 |
Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., et al. (2023). Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
|
|
30 |
Wang, J., Chen, Z., Qin, Y., He, D., and Lin, F. (2023). Multi-aspect co-attentional collaborative filtering
for extreme multi-label text classification. Knowledge-Based Systems, 260(2):1–11.
|
|
31 |
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R. R., and Le, Q. V. (2019). Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems, 32.
|
|
32 |
Ye, H., Sunderraman, R., and Ji, S. (2024). Matchxml: An efficient text-label matching framework for extreme multi-label text classification. IEEE TKDE, 36(9):4781–4793.
|
|
33 |
You, R. et al. (2019). Attentionxml: Label tree-based attention-aware deep model for high-performance extreme multi-label text classification. In Wallach, H. et al., editors, NeurIPS, volume 32, pages 1–11.
|
|
34 |
Zhang, J. et al. (2021). Fast multi-resolution transformer fine-tuning for extreme multi-label text classification. In NeurIPS, volume 34, pages 7267–7280.
|
|
35 |
Zhou, Q., Zhou, H., and Li, T. (2016). Cost-sensitive feature selection using random forest: Selecting low-cost subsets of informative features. Knowledge-based systems, 95:1–11.
|
|