1 |
Aquino, I., dos Santos, M. M., Dorneles, C., and Carvalho, J. T. (2024). Extracting
information from brazilian legal documents with retrieval augmented generation. In
Anais Estendidos do XXXIX Simpósio Brasileiro de Bancos de Dados, pages 280–287,
Porto Alegre, RS, Brasil. SBC
|
|
2 |
Edge, D., Trinh, H., Cheng, N., Bradley, J., Chao, A., Mody, A., Truitt, S., Metropolitan-
sky, D., Ness, R. O., and Larson, J. (2025). From local to global: A graph rag approach
to query-focused summarization.
|
|
3 |
Jiang, C., Gao, L., Zarch, H. E., and Annavaram, M. (2024). Efficient llm inference with
i/o-aware partial kv cache recomputation.
|
|
4 |
Jimenez Gutierrez, B., Shu, Y., Gu, Y., Yasunaga, M., and Su, Y. (2024). Hipporag:
Neurobiologically inspired long-term memory for large language models. Advances in
Neural Information Processing Systems, 37:59532–59569.
|
|
5 |
Kociský, T., Schwarz, J., Blunsom, P., Dyer, C., Hermann, K. M., Melis, G., and Grefen-
stette, E. (2018). The NarrativeQA reading comprehension challenge. Transactions of
the Association for Computational Linguistics, 6:317–328.
|
|
6 |
Lavie, A. and Agarwal, A. (2007). Meteor: an automatic metric for mt evaluation with
high levels of correlation with human judgments. In Proceedings of the Second Work-
shop on Statistical Machine Translation, StatMT ’07, page 228–231, USA. Association
for Computational Linguistics.
|
|
7 |
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., Küttler, H., Lewis,
M., Yih, W.-t., Rocktäschel, T., Riedel, S., and Kiela, D. (2020). Retrieval-augmented
generation for knowledge-intensive nlp tasks. In Proceedings of the 34th International
Conference on Neural Information Processing Systems, NIPS ’20, Red Hook, NY,
USA. Curran Associates Inc.
|
|
8 |
Li, B., Jiang, Y., Gadepally, V., and Tiwari, D. (2024). Llm inference serving: Survey of
recent advances and opportunities. In 2024 IEEE High Performance Extreme Comput-
ing Conference (HPEC), pages 1–8.
|
|
9 |
Li, H., Li, Y., Tian, A., Tang, T., Xu, Z., Chen, X., Hu, N., Dong, W., Li, Q., and Chen,
L. (2025). A survey on large language model acceleration based on kv cache manage-
ment.
|
|
10 |
Lin, C.-Y. (2004). ROUGE: A package for automatic evaluation of summaries. In Text
Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Com-
putational Linguistics.
|
|
11 |
Liu, A., Liu, J., Pan, Z., He, Y., Haffari, G., and Zhuang, B. (2024a). Minicache: Kv cache
compression in depth dimension for large language models. In Globerson, A., Mackey,
L., Belgrave, D., Fan, A., Paquet, U., Tomczak, J., and Zhang, C., editors, Advances
in Neural Information Processing Systems, volume 37, pages 139997–140031. Curran
Associates, Inc.
|
|
12 |
Liu, Y., Li, H., Cheng, Y., Ray, S., Huang, Y., Zhang, Q., Du, K., Yao, J., Lu, S., Anan-
thanarayanan, G., Maire, M., Hoffmann, H., Holtzman, A., and Jiang, J. (2024b).
Cachegen: Kv cache compression and streaming for fast large language model serving.
In Proceedings of the ACM SIGCOMM 2024 Conference, ACM SIGCOMM ’24, page
38–56, New York, NY, USA. Association for Computing Machinery.
|
|
13 |
Nolet, C. J., Lafargue, V., Raff, E., Nanditale, T., Oates, T., Zedlewski, J., and Patterson,
J. (2021). Bringing umap closer to the speed of light with gpu acceleration
|
|
14 |
NVIDIA Corporation (2024). Benchmarking metrics for large language mod-
els. https://docs.nvidia.com/nim/benchmarking/llm/latest/
metrics.html. Accessed: 2025-04-17.
|
|
15 |
Oliveira, V. P. L. (2024). MemoryGraph: uma proposta de memória para agentes con-
versacionais utilizando grafo de conhecimento. Tese (doutorado em ciência da com-
putação), Universidade Federal de Goiás, Goiânia.
|
|
16 |
Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. (2002). Bleu: a method for automatic
evaluation of machine translation. In Proceedings of the 40th Annual Meeting on As-
sociation for Computational Linguistics, ACL ’02, page 311–318, USA. Association
for Computational Linguistics.
|
|
17 |
Paschoal, A. F. A., Pirozelli, P., Freire, V., Delgado, K. V., Peres, S. M., José, M. M.,
Nakasato, F., Oliveira, A. S., Brandão, A. A. F., Costa, A. H. R., and Cozman, F. G.
(2021). Pirá: A bilingual portuguese-english dataset for question-answering about the
ocean. In Proceedings of the 30th ACM International Conference on Information &
Knowledge Management, CIKM ’21, page 4544–4553, New York, NY, USA. Associ-
ation for Computing Machinery.
|
|
18 |
Reimers, N. and Gurevych, I. (2019). Sentence-BERT: Sentence embeddings using
Siamese BERT-networks. In Inui, K., Jiang, J., Ng, V., and Wan, X., editors, Pro-
ceedings of the 2019 Conference on Empirical Methods in Natural Language Pro-
cessing and the 9th International Joint Conference on Natural Language Processing
(EMNLP-IJCNLP), pages 3982–3992, Hong Kong, China. Association for Computa-
tional Linguistics
|
|
19 |
RunPod (2025). Runpod – cloud compute for ai, ml, and more. Acesso em: 28 abr. 2025.
|
|
20 |
Sarthi, P., Abdullah, S., Tuli, A., Khanna, S., Goldie, A., and Manning, C. D. (2024).
RAPTOR: Recursive abstractive processing for tree-organized retrieval. In The Twelfth
International Conference on Learning Representations.
|
|
21 |
Souza, F., Nogueira, R., and Lotufo, R. (2020). BERTimbau: pretrained BERT models
for Brazilian Portuguese. In 9th Brazilian Conference on Intelligent Systems, BRACIS,
Rio Grande do Sul, Brazil, October 20-23.
|
|
22 |
Taschetto, L. and Fileto, R. (2024). Using retrieval-augmented generation to improve
performance of large language models on the brazilian university admission exam.
In Anais do XXXIX Simpósio Brasileiro de Bancos de Dados, pages 799–805, Porto
Alegre, RS, Brasil. SBC
|
|
23 |
Yao, J., Li, H., Liu, Y., Ray, S., Cheng, Y., Zhang, Q., Du, K., Lu, S., and Jiang, J. (2025).
Cacheblend: Fast large language model serving for rag with cached knowledge fusion.
In Proceedings of the Twentieth European Conference on Computer Systems, EuroSys
’25, page 94–109, New York, NY, USA. Association for Computing Machinery
|
|
24 |
Yu, H., Gan, A., Zhang, K., Tong, S., Liu, Q., and Liu, Z. (2024). Evaluation of retrieval-
augmented generation: A survey.
|
|