Paper Information | SBC BOOKS

Fill in your paper information

Language (*)

Title (*)

Keywords (*)

Abstract (*)

Pages (*)

File Link

English Information

Title

Keywords

Abstract

(*) To change the order drag the item to the new position.

Authors

#	Name
1	Celso França(celsofranca@dcc.ufmg.br)
2	Ian Nunes(ianqrbn06@aluno.ufsj.edu.br)
3	Thiago Salles(tsalles@dcc.ufmg.br)
4	Washington Cunha(washingtoncunha@dcc.ufmg.br)
5	Gabriel Jallais(gabrieljallais@dcc.ufmg.br)
6	Leonardo Rocha(lcrocha@ufsj.edu.br)
7	Marcos André Gonçalves(mgoncalv@dcc.ufmg.br)

(*) To change the order drag the item to the new position.

Reference

#	Reference
1	Arabzadeh, N. et al. (2021). Predicting efficiency/effectiveness trade-offs for dense vs. sparse retrieval strategy selection. In CIKM, page 2862–2866.
2	Askari, A. et al. (2023). Injecting the bm25 score as text improves bert-based re-rankers. In ECIR, page 66–83.
3	Bassani, E. (2023). ranxhub: An online repository for information retrieval runs. In SIGIR, page 3210–3214.
4	Bruch, S. et al. (2023). An approximate algorithm for maximum inner product search over streaming sparse vectors. TOIS, 42(2):1–43.
5	Chen, Y. et al. (2024). PRompt optimization in multi-step tasks (PROMST): Integrating human feedback and heuristic-based sampling. In EMNLP, pages 3859–3920.
6	Cunha, W. et al. (2023). A comparative survey of instance selection methods applied to nonneural and transformer-based text classification. ACM.
7	Cunha, W., Moreo, A., Esuli, A., Sebastiani, F., Rocha, L., and Gonçalves, M. A. (2024). A noise-oriented and redundancy-aware instance selection framework. ACM TOIS.
8	Cunha, W., Rocha, L., and Gonçalves, M. A. (2025). A thorough benchmark of automatic text classification: From traditional approaches to large language models. arXiv.
9	de Andrade, C., Cunha, W., Reis, D., Pagano, A. S., Rocha, L., and Gonçalves, M. A. (2024). A strategy to combine 1stgen transformers and open llms for automatic text classification. arXiv.
10	de Andrade, C. M. et al. (2023). On the class separability of contextual embeddings representations – or “the classifier does not matter when the (text) representation is so good!”. Information Processing Management, 60(4).
11	Dettmers, T. et al. (2023). Qlora: Efficient finetuning of quantized llms. In Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., and Levine, S., editors, Advances in Neural Information Processing Systems, volume 36, pages 10088–10115.
12	Devlin, J. et al. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Burstein, J. and othes, editors, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4171–4186.
13	Dubey, A. et al. (2024). The llama 3 herd of models. arXiv.
14	França, C. and others. (2025). Optimizing tail-head trade-off for extreme multi-label text classification (xmtc) with rag-labels and a dynamic two-stage retrieval and fusion pipeline. In Proceedings of the 48th International ACM SIGIR conference on Research and Development in Information Retrieval.
15	França, C., Rabbi, G., Salles, T., Cunha, W., Rocha, L., and Gonçalves, M. A. (2025). Ranking-based fusion algorithms for extreme multi-label text classification (xmtc).
16	Guo, D., Yang, D., Zhang, H., Song, J., Zhang, R., Xu, R., Zhu, Q., Ma, S., Wang, P., Bi, X., et al. (2025). Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948.
17	Jiang, F. (2024). Identifying and mitigating vulnerabilities in llm-integrated applications. Master’s thesis, University of Washington.
18	Jiang, T. et al. (2021). Lightxml: Transformer with dynamic negative sampling for high-performance extreme multi-label text classification. In AAAI, volume 35, pages 7987–7994.
19	Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., and Zettlemoyer, L. (2020a). BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th ACL, pages 7871–7880, Online.
20	Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.-t., Rocktäschel, T., Riedel, S., and Kiela, D. (2020b). Retrieval-augmented generation for knowledgeintensive nlp tasks. In Advances in Neural Information Processing Systems, pages 9459–9474.
21	Lin, S.-C. et al. (2023). Aggretriever: A Simple Approach to Aggregate Textual Representations for Robust Dense Passage Retrieval. TACL, 11:436–452.
22	Liu, J. et al. (2023). A contrastive learning framework for safety information extraction in construction. Advanced Engineering Informatics, 58:102194.
23	Llordes, M. et al. (2023). Explain like i am bm25: Interpreting a dense model’s ranked-list with a sparse approximation. In SIGIR, page 1976–1980. ACM.
24	Muennighoff, N., Wang, T., Sutawika, L., Roberts, A., Biderman, S., Scao, T. L., Bari, M. S., Shen, S., Yong, Z.-X., Schoelkopf, H., et al. (2022). Crosslingual generalization through multitask finetuning. arXiv preprint arXiv:2211.01786.
25	Penha, G. and Hauff, C. (2023). Do the findings of document and passage retrieval generalize to the retrieval of responses for dialogues? In ECIR, pages 132–147.
26	Sikosana, M., Ajao, O., and Maudsley-Barton, S. (2024). A comparative study of hybrid models in health misinformation text classification. OASIS ’24, page 18–25.
27	Sun, A., Lim, E.-P., and Liu, Y. (2009). On strategies for imbalanced text classification using svm: A comparative study. Decision Support Systems, 48(1):191–201.
28	Sy, C. Y., Maceda, L. L., Canon, M. J. P., and Flores, N. M. (2024). Beyond bert: Exploring the efficacy of roberta and albert in supervised multiclass text classification. International Journal of Advanced Computer Science & Applications, 15(3).
29	Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., et al. (2023). Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
30	Wang, J., Chen, Z., Qin, Y., He, D., and Lin, F. (2023). Multi-aspect co-attentional collaborative filtering for extreme multi-label text classification. Knowledge-Based Systems, 260(2):1–11.
31	Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R. R., and Le, Q. V. (2019). Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems, 32.
32	Ye, H., Sunderraman, R., and Ji, S. (2024). Matchxml: An efficient text-label matching framework for extreme multi-label text classification. IEEE TKDE, 36(9):4781–4793.
33	You, R. et al. (2019). Attentionxml: Label tree-based attention-aware deep model for high-performance extreme multi-label text classification. In Wallach, H. et al., editors, NeurIPS, volume 32, pages 1–11.
34	Zhang, J. et al. (2021). Fast multi-resolution transformer fine-tuning for extreme multi-label text classification. In NeurIPS, volume 34, pages 7267–7280.
35	Zhou, Q., Zhou, H., and Li, T. (2016). Cost-sensitive feature selection using random forest: Selecting low-cost subsets of informative features. Knowledge-based systems, 95:1–11.

Paper Registration

Fill in your paper information

English Information

Authors

Reference