1 |
Almeida, T. S., Abonizio, H., Nogueira, R., and Pires, R. (2024). Sabiá-2: A new generation of portuguese large language models. ArXiv, abs/2403.09887.
|
|
2 |
Aumiller, D., Almasian, S., Lackner, S., and Gertz, M. (2021). Structural text segmentation of legal documents. In Proceedings of the Eighteenth International Conference on Artificial Intelligence and Law, ICAIL ’21. ACM.
|
|
3 |
Borkar, V., Deshmukh, K., and Sarawagi, S. (2001). Automatic segmentation of text into structured records. SIGMOD Rec., 30(2):175–186.
|
|
4 |
Boukhers, Z., Ambhore, S., and Staab, S. (2019). An end-to-end approach for extracting and segmenting high-variance references from pdf documents. In 2019 ACM/IEEE JCDL, pages 186–195.
|
|
5 |
Chang, Y., Wang, X., Wang, J., Wu, Y., Yang, L., Zhu, K., Chen, H., Yi, X., Wang, C., Wang, Y., Ye, W., Zhang, Y., Chang, Y., Yu, P. S., Yang, Q., and Xie, X. (2023). A survey on evaluation of large language models.
|
|
6 |
Chen, X., Marazopoulou, K., Lee, W., Agarwal, C., Sukumaran, J., and Hofleitner, A.
(2023). Binary classifier evaluation on unlabeled segments using inverse distance
weighting with distance learning. Proceedings of the 29th ACM SIGKDD Conference
on Knowledge Discovery and Data Mining
|
|
7 |
Chen, Z., Meng, W., and Dragut, E. C. (2022). Web record extraction with invariants.
Proc. VLDB Endow., 16:959–972.
|
|
8 |
Cruz, P., Vanneschi, L., Painho, M., and Rita, P. (2021). Automatic identification of
addresses: A systematic literature review. ISPRS Int. J. Geo Inf., 11:11.
|
|
9 |
Dorneles, C. F., Gonçalves, R., and dos Santos Mello, R. (2011). Approximate data
instance matching: a survey. Knowledge and Information Systems, 27(1):1–21.
|
|
10 |
Géron, A. (2019). Hands-On Machine Learning with Scikit-Learn and TensorFlow: Con-
cepts, Tools, and Techniques to Build Intelligent Systems. O’Reilly.
|
|
11 |
Haider, W. and Yes¸ilada, Y. (2022). Classification of layout vs. relational tables on the
web: Machine learning with rendered pages. ACM Transac. on the Web, 17:1 – 23.
|
|
12 |
Hoffart, J., Seufert, S., Nguyen, D. B., Theobald, M., and Weikum, G. (2012). Kore:
keyphrase overlap relatedness for entity disambiguation. In Proceedings of the 21st
CIKM, page 545–554, New York, NY, USA. Association for Computing Machinery.
|
|
13 |
Kayed, M., Dakrory, S., and Ali, A. A. (2021). Postal address extraction from the web: a
comprehensive survey. Artificial Intelligence Review, 55:1085 – 1120.
|
|
14 |
Lerman, K., Getoor, L., Minton, S., and Knoblock, C. (2004). Using the structure of web
sites for automatic segmentation of tables. In Proceedings of the 2004 ACM SIGMOD,
page 119–130, New York, NY, USA. Association for Computing Machinery
|
|
15 |
Misra, H., Yvon, F., Capp´e, O., and Jose, J. (2011). Text segmentation: A topic modeling
perspective. Information Processing & Management, 47(4):528–544.
|
|
16 |
Peng, F. and McCallum, A. (2006). Information extraction from research papers using
conditional random fields. Information Processing & Management, 42(4):963–979.
|
|
17 |
Rea, L. and Parker, R. (2012). Designing and Conducting Survey Research: A Compre-
hensive Guide. Wiley.
|
|
18 |
Simon, K. and Lausen, G. (2005). Viper: augmenting automatic information extraction
with visual perceptions. In International Conference on Information and Knowledge
Management.
|
|
19 |
Uppalapati, V. K. and Nag, D. S. (2024). A comparative analysis of ai models in complex
medical decision-making scenarios: Evaluating chatgpt, claude ai, bard, and perplex-
ity. Cureus, 16.
|
|
20 |
Varma, M., Orr, L., Wu, S., Leszczynski, M., Ling, X., and R´e, C. (2021). Cross-domain
data integration for entity disambiguation in biomedical text. In EMNLP.
|
|
21 |
Yao, Y., Duan, J., Xu, K., Cai, Y., Sun, Z., and Zhang, Y. (2024). A survey on large lan-
guage model (llm) security and privacy: The good, the bad, and the ugly. 4(2):100211.
|
|
22 |
Yoon, J., Gupta, A., and Anumanchipalli, G. K. (2024). Is bigger edit batch size always
better? – an empirical study on model editing with llama-3. ArXiv.
|
|
23 |
Zhang, P., Shao, N., Liu, Z., Xiao, S., Qian, H., Ye, Q., and Dou, Z. (2024). Extending
llama-3’s context ten-fold overnight. ArXiv.
|
|
24 |
Zhang, X., Zou, J., Le, D., and Thoma, G. (2011). A structural svm approach for reference
parsing. BMC bioinformatics, 12 Suppl 3:S7.
|
|