1 |
Bazzo, G. T., Lorentz, G. A., Vargas, D. S., and Moreira, V. P. (2020). Assessing the impact of OCR errors in information retrieval. In European Conference on Information Retrieval, pages 102–109.
2 |
Croft, W. B., Harding, S., Taghva, K., and Borsack, J. (1994). An evaluation of information retrieval accuracy with simulated OCR output. In Symposium on Document Analysis and Information Retrieval, pages 115–126.
3 |
Ghosh, K., Chakraborty, A., Parui, S. K., and Majumder, P. (2016). Improving information retrieval performance on OCRed text in the absence of clean text ground truth. Information Processing & Management, 52(5):873–884.
4 |
Hegghammer, T. (2021). OCR with tesseract, amazon textract, and google document AI: a benchmarking experiment. Journal of Computational Social Science, pages 1–22.
5 |
Kantor, P. B. and Voorhees, E. M. (2000). The TREC-5 confusion track: Comparing retrieval methods for scanned text. Information Retrieval, 2(2):165–176.
6 |
Mittendorf, E. and Schäuble, P. (2000). Information retrieval can cope with many errors. Information Retrieval, 3(3):189–216.
7 |
Oliveira, L. L. d., Romeu, R. K., and Moreira, V. P. (2021). REGIS: A test collection for geoscientific documents in portuguese. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, page 2363–2368.
8 |
Oliveira, L. L. d., Vargas, D. S., Alexandre, A. M. A., Cordeiro, F. C., Gomes, D. d. S. M., Rodrigues, M. d. C., Romeu, R. K., and Moreira, V. P. (2023). Evaluating and mitigating the impact of OCR errors on information retrieval. International Journal on Digital Libraries, 24(1):45–62.
9 |
Sanderson, M. (2010). Test collection based evaluation of information retrieval systems. Foundations and Trends® in Information Retrieval, 4(4):247–375.
10 |
Santos, D. and Rocha, P. (2004). The key to the first CLEF with portuguese: Topics, questions and answers in CHAVE. In Workshop of the Cross-Language Evaluation Forum for European Languages, pages 821–832.
11 |
Taghva, K., Borsack, J., and Condit, A. (1996a). Effects of OCR errors on ranking and feedback using the vector space model. Information Processing & Management, 32(3):317–327.
12 |
Taghva, K., Borsack, J., and Condit, A. (1996b). Evaluation of model-based retrieval effectiveness with OCR text. ACM Transactions on Information Systems (TOIS), 14(1):64–93.
13 |
Vargas, D. S., de Oliveira, L. L., Moreira, V. P., Bazzo, G. T., and Lorentz, G. A. (2021). sOCRates-a post-OCR text correction method. In Anais do XXXVI Simpósio Brasileiro de Bancos de Dados, pages 61–72.