1 |
Guilherme Torresan Bazzo, Gustavo Acauan Lorentz, Danny Suarez Vargas, and Viviane P. Moreira. Assessing the impact of OCR errors in information retrieval. In Advances in Information Retrieval, pages 102–109, 2020.
2 |
Steven M. Beitzel, Eric C. Jensen, and David A. Grossman. A survey of retrieval strategies for OCR text collections. In Symposium on Document Image Understanding Technologies, 2003.
3 |
G. Chiron, A. Doucet, M. Coustaty, and J. Moreux. ICDAR 2017 Competition on Post-OCR Text Correction. In Intl. Conf. on Document Analysis and Recognition, volume 01, pages 1423–1428, 2017.
4 |
W. Bruce Croft, Stephen Harding, Kazem Taghva, and Julie Borsack. An evaluation of information retrieval accuracy with simulated ocr output. In Symposium of Document Analysis and Information Retrieval, 1994
5 |
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
6 |
M. Droettboom. Correcting broken characters in the recognition of historical printed documents. In Joint Conference on Digital Libraries, pages 364–366, May 2003.
7 |
John Evershed and Kent Fitch. Correcting noisy ocr: Context beats confusion. In Intl. Conference on Digital Access to Textual Cultural Heritage, DATeCH ’14, pages 45–51, 2014.
8 |
Paul B. Kantor and Ellen M. Voorhees. The TREC-5 confusion track: Comparing retrieval methods for scanned text. Information Retrieval, 2(2):165–176, May 2000
9 |
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems 26, pages 3111–3119. 2013.
10 |
T. Nguyen, A. Jatowt, M. Coustaty, N. Nguyen, and A. Doucet. Deep statistical analysis of OCR errors for effective post-OCR processing. In Joint Conference on Digital Libraries (JCDL), pages 29–38, June 2019
11 |
Thi Tuyet Hai Nguyen, Adam Jatowt, Mickael Coustaty, and Antoine Doucet. Survey of post-ocr processing approaches. ACM Computing Surveys (CSUR), 54(6):1–37, 2021.
12 |
Javier Parapar, Ana Freire, and Alvaro Barreiro. Revisiting n-gram based models for retrieval in degraded large collections. In Advances in Information Retrieval, pages 680–684, 2009
13 |
C. Rigaud, A. Doucet, M. Coustaty, and J. Moreux. ICDAR 2019 competition on post-ocr text correction. In Intl. Conf. on Document Analysis and Recognition, pages 1588–1593, 2019
14 |
Diana Santos and Paulo Rocha. The key to the first clef with Portuguese: Topics, questions and answers in Chave. In Workshop of the Cross-Language Evaluation Forum for European Languages, pages 821–832, 2004.
15 |
Kazem Taghva, Julie Borsack, and Allen Condit. Evaluation of model-based retrieval effectiveness with ocr text. ACM Trans. Inf. Syst., 14(1):64–93, January 1996.