SBBD

Paper Registration

1

Select Book

2

Select Paper

3

Fill in paper information

4

Congratulations

Fill in your paper information

English Information

(*) To change the order drag the item to the new position.

Authors
# Name
1 Salvador Ludovico Paranhos(salvadorludovico@egresso.ufg.br)
2 Jonatas Tomazini(tomazini@discente.ufg.br)
3 Celso Camilo(celso@inf.ufg.br)
4 Sávio de Oliveira(savioteles@ufg.br)

(*) To change the order drag the item to the new position.

Reference
# Reference
1 Antematter team. Optimizing retrieval-augmented generation with advanced chunking techniques: A comparative study, 2024. Accessed: 2025-03-31.
2 Emily M Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of FAccT, 2021.
3 Ilias Chalkidis, Manos Fergadiotis, Prodromos Malakasiotis, Nikolaos Aletras, and Ion Androutsopoulos. LEGAL-BERT: The muppets straight out of law school. In Trevor Cohn, Yulan He, and Yang Liu, editors, Findings of the Association for Computational Linguistics: EMNLP 2020, pages 2898–2904, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.findings-emnlp.261. URL https: //aclanthology.org/2020.findings-emnlp.261/.
4 Shahul Es, Jithin James, Luis Espinosa-Anke, and Steven Schockaert. Ragas: Automated evaluation of retrieval augmented generation, 2023. URL https://doi.org/10. 48550/arXiv.2309.15217.
5 Naman Gupta. Bge-m3 vs openai embeddings: A com- parative study. https://naman1011.medium.com/ bge-m3-model-vs-openai-embeddings-e6d6cda27d0c, 2024.
6 Gautier Izacard and Edouard Grave. Leveraging passage retrieval with generative models for open domain question answering. In Proceedings of EACL, 2021.
7 Jungwoo Kang, Jinhyuk Lee, and Jaewoo Kang. Knowledge graph-augmented language models for knowledge-grounded dialogue generation. arXiv preprint arXiv:2305.18846, 2023.
8 Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. Dense passage retrieval for open-domain ques- tion answering. In Proceedings of EMNLP, 2020.
9 Urvashi Khandelwal, Omer Levy, Dan Jurafsky, Luke Zettlemoyer, and Mike Lewis. Gen- eralization through memorization: Nearest neighbor language models. arXiv preprint arXiv:1911.00172, 2020.
10 Joon Lee, Hyoungho Yoon, and Hyeoun-Ae Park. Explainable ai in healthcare: From black box to interpretable models. Healthcare Informatics Research, 27(1):1–9, 2021.
11 Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Na- man Goyal, Angela Fan, Vishrav Chaudhary, Tim Rocktäschel, and Sebastian Riedel. Retrieval-augmented generation for knowledge-intensive nlp tasks. In Advances in Neural Information Processing Systems, 2020a.
12 Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Na- man Goyal, Heinrich Küttler, Mike Lewis, Wen tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. Retrieval-augmented generation for knowledge-intensive nlp tasks, 2020b. URL https://doi.org/10.48550/arXiv.2005.11401.
13 Xiao Liu, Zihan Zhou, Tianyu Zhou, Maosong Sun, and Tianyu Wang. Bge-m3: A multi- function embedding model for dense, sparse and multi-vector retrieval. arXiv preprint arXiv:2402.03216, 2024. URL https://arxiv.org/abs/2402.03216.
14 Zuhong Liu, Charles-Elie Simon, and Fabien Caspani. Passage segmentation of doc- uments for extractive question answering, 2025. URL https://doi.org/10. 48550/arXiv.2501.09940.
15 Yi Luan, Kaitao Tang, Mandar Joshi Gupta, and Luke Zettlemoyer. Sparse retrieval for question answering. In Proceedings of ACL, 2021.
16 Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. Model cards for model reporting. Proceedings of the Conference on Fairness, Accountability, and Transparency, 2019.
17 Niklas Muennighoff, Nizar Tazi, et al. Mteb: Massive text embedding benchmark. https://huggingface.co/spaces/mteb/leaderboard, 2023.
18 Taichi Nishikawa, Soichiro Hidaka, Sho Yokoi, and Hideki Nakayama. Towards entity- enhanced RAG: Augmenting retrieval augmented generation with entity annotation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022.
19 Ori Ram, Yoav Levine, Itay Dalmedigos, Dor Muhlgay, Amnon Shashua, Kevin Leyton- Brown, and Yoav Shoham. In-context retrieval-augmented language models. Transac- tions on Machine Learning Research, 2023.
20 Nils Reimers and Iryna Gurevych. Sentence-BERT: Sentence embeddings using siamese BERT-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, 2019.
21 Krystian Safjan. From fixed-size to nlp chunking - a deep dive into text chunking tech- niques, 2023. Accessed: 2025-03-31.
22 Emily Sheng, Kai-Wei Chang, Premkumar Natarajan, and Nanyun Peng. The woman worked as a babysitter: On biases in language generation. In Proceedings of EMNLP, 2019.
23 Nandan Thakur, Nils Reimers, Andreas Rücklé, Abhishek Srivastava, and Iryna Gurevych. Beir: A heterogeneous benchmark for zero-shot evaluation of information retrieval models. In Proceedings of the International Conference on Neural Informa- tion Processing Systems, 2022.
24 Rui Wang and Lili Zhao. Ai in education and policy-making: A review of recent advances. Educational Technology Research and Development, 71(2):135–152, 2023.
25 Xiang Wang, Xiangyu Dong, Fuzheng Zhang, Liwei Wang, and Xing Xie. Kepler: A unified model for knowledge embedding and pre-trained language representation, 2021. URL https://arxiv.org/abs/1911.06136. BLOG / Survey style reference.
26 Andrew Yates, Sebastian Hofstätter, and Guido Zuccon. Pretrained transformers for text ranking: Bert and beyond. arXiv preprint arXiv:2104.08663, 2021.
27 Zijie Zhong, Hanwen Liu, Xiaoya Cui, Xiaofan Zhang, and Zengchang Qin. Mix-of- granularity: Optimize the chunking granularity for retrieval-augmented generation, 2025. URL https://doi.org/10.48550/arXiv.2406.00456.