1 |
Chen, Tong, Hongwei Wang, Sihao Chen, Wenhao Yu, Kaixin Ma, Xinran Zhao, Hong-
ming Zhang, and Dong Yu (Nov. 2024). “Dense X Retrieval: What Retrieval
Granularity Should We Use?” In: Proceedings of the 2024 Conference on Empir-
ical Methods in Natural Language Processing. Ed. by Yaser Al-Onaizan, Mohit
Bansal, and Yun-Nung Chen. Miami, Florida, USA: Association for Computa-
tional Linguistics, pp. 15159–15177. DOI: 10 . 18653 / v1 / 2024 . emnlp -
main.845. URL: https://aclanthology.org/2024.emnlp-main.
845/.
|
|
2 |
Chen, Zhiyu, Shiyang Li, Charese Smiley, Zhiqiang Ma, Sameena Shah, and William
Yang Wang (2022). “ConvFinQA: Exploring the Chain of Numerical Reasoning
in Conversational Finance Question Answering”. In: Proceedings of the EMNLP.
Association for Computational Linguistics, pp. 6279–6292.
|
|
3 |
DeepSeek-AI et al. (2025). DeepSeek-R1: Incentivizing Reasoning Capability in LLMs
via Reinforcement Learning. arXiv: 2501 . 12948 [cs.CL]. URL: https :
//arxiv.org/abs/2501.12948.
|
|
4 |
Ding, Ning, Yulin Chen, Bokai Xu, Yujia Qin, Shengding Hu, Zhiyuan Liu, Maosong Sun,
and Bowen Zhou (2023). “Enhancing Chat Language Models by Scaling High-
quality Instructional Conversations”. In: Proceedings of the 2023 Conference on
Empirical Methods in Natural Language Processing, pp. 3029–3051.
|
|
5 |
Duarte, Andr´e, Jo˜ao Marques, Miguel Grac¸a, Miguel Freire, Lei Li, and Arlindo
Oliveira (2024). “LumberChunker: Long-Form Narrative Document Segmenta-
tion”. In: Findings of the Association for Computational Linguistics: EMNLP
2024, pp. 6473–6486.
|
|
6 |
Gharehchopogh, Farhad Soleimanian and Zeinab Abbasi Khalifelu (2011). “Analysis and
evaluation of unstructured data: text mining versus natural language processing”.
In: 2011 5th International Conference on Application of Information and Com-
munication Technologies (AICT). IEEE, pp. 1–4.
|
|
7 |
Hilbert, Martin and Priscila L´opez (2011). “The world’s technological capacity to store,
communicate, and compute information”. In: science 332.6025, pp. 60–65.
|
|
8 |
Izacard, Gautier and ´Edouard Grave (2021). “Leveraging Passage Retrieval with Gen-
erative Models for Open Domain Question Answering”. In: Proceedings of the
16th Conference of the European Chapter of the Association for Computational
Linguistics: Main Volume, pp. 874–880
|
|
9 |
Kamradt, Greg (2024). Semantic Chunking. https : / / github . com /
FullStackRetrieval - com / RetrievalTutorials / tree / main /
tutorials/LevelsOfTextSplitting.
|
|
10 |
Karpukhin, Vladimir, Barlas O˘guz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey
Edunov, Danqi Chen, and Wen Tau Yih (2020). “Dense passage retrieval for open-
domain question answering”. In: 2020 Conference on Empirical Methods in Natu-
ral Language Processing, EMNLP 2020. Association for Computational Linguis-
tics (ACL), pp. 6769–6781.
|
|
11 |
Koshorek, Omri, Adir Cohen, Noam Mor, Michael Rotman, and Jonathan Berant (2018).
“Text Segmentation as a Supervised Learning Task”. In: Proceedings of the 2018
Conference of the North American Chapter of the Association for Computational
Linguistics: Human Language Technologies, Volume 2 (Short Papers), pp. 469–
473.
|
|
12 |
Lee, Jinhyuk, Alexander Wettig, and Danqi Chen (2021). “Phrase Retrieval Learns Pas-
sage Retrieval, Too”. In: Proceedings of the 2021 Conference on Empirical Meth-
ods in Natural Language Processing, pp. 3661–3672.
|
|
13 |
Lewis, Patrick, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Na-
man Goyal, Heinrich K¨uttler, Mike Lewis, Wen-tau Yih, Tim Rockt¨aschel, et al.
(2020). “Retrieval-augmented generation for knowledge-intensive NLP tasks”. In:
Advances in Neural Information Processing Systems 33, pp. 9459–9474.
|
|
14 |
Merity, Stephen, Caiming Xiong, James Bradbury, and Richard Socher (2022). “Pointer
Sentinel Mixture Models”. In: International Conference on Learning Representa-
tions
|
|
15 |
National Library of Medicine (2023). PMC Open Access Subset. Dataset retrieved from
Hugging Face Datasets (PubMed Central Open Access dataset, Version 2023-06-
17). Available: https : / / huggingface . co / datasets / pmc / open _
access, cited 2024-05-08.
|
|
16 |
Porzel, Robert and Iryna Gurevych (2003). “Contextual coherence in natural language
processing”. In: International and Interdisciplinary Conference on Modeling and
Using Context. Springer, pp. 272–285.
|
|
17 |
Reimers, Nils and Iryna Gurevych (Nov. 2019). “Sentence-BERT: Sentence Embeddings
using Siamese BERT-Networks”. In: Proceedings of the 2019 Conference on Em-
pirical Methods in Natural Language Processing and the 9th International Joint
Conference on Natural Language Processing (EMNLP-IJCNLP). Ed. by Kentaro
Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan. Hong Kong, China: Association
for Computational Linguistics, pp. 3982–3992. DOI: 10 . 18653 / v1 / D19 -
1410. URL: https://aclanthology.org/D19-1410/.
|
|
18 |
Smith, Brandon and Anton Troynikov (July 2024). Evaluating Chunking Strategies
for Retrieval. Tech. rep. https : / / research . trychroma . com /
evaluating-chunking. Chroma
|
|
19 |
The White House (2024). State of the Union 2024. Accessed: 2024-05-02.
|
|
20 |
Vaswani, A (2017). “Attention is all you need”. In: Advances in Neural Information Pro-
cessing Systems.
|
|
21 |
Yang, Wei, Yuqing Xie, Aileen Lin, Xingyu Li, Luchen Tan, Kun Xiong, Ming Li, and
Jimmy Lin (2019). “End-to-End Open-Domain Question Answering with BERT-
serini”. In: Proceedings of the 2019 Conference of the North American Chapter of
the Association for Computational Linguistics (Demonstrations), pp. 72–77.
|
|