1 |
Bayardo, R. J.,Ma, Y., and Srikant, R (2007). Scaling up All Pairs Similarity Search. In Proceedings of the WWW Conference, pages 131–140.
2 |
Bloom, B. H. (1970). Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7):422–426.
3 |
Chaudhuri, S., Ganti, V., and Kaushik, R. (2006). A Primitive Operator for SimilarityJoins in Data Cleaning. In Proceedings of the ICDE Conference, page 5.
4 |
Chu, X., Ilyas, I. F., Krishnan, S., and Wang, J. (2016). Data Cleaning: Overview andEmerging Challenges. In Proceedings of the SIGMOD Conference, pages 2201–2206.
5 |
CrowdFlower (2016). 2016 Data Science Report. https://visit.figure-eight.com/data-science-report.html.
6 |
do Carmo Oliveira, D. J., Borges, F. F., and Ribeiro, L. A. (2017). Uma abordagem para processamento distribuído de junção por similaridade sobre múltiplos atributos. In Proceedings of the Brazilian Symposium on Databases, pages 300–305.
7 |
Kaggle(2017).The State of Data Science & Machine Learning. https://www.kaggle.com/kaggle/kaggle-survey-2017.
8 |
Li, G., He, J., Deng, D., and Li, J. (2015). Efficient Similarity Join and Search on Multi-Attribute Data. In Proceedings of the SIGMOD Conference, pages 1137–1151.
9 |
Mann, W., Augsten, N., and Bouros, P. (2016). An Empirical Evaluation of Set SimilarityJoin Techniques. PVLDB, 9(9):636–647.
10 |
Oliveira, D. J. C., Borges, F. F., Ribeiro, L. A., and Cuzzocrea, A. (2018). Set Similar-ity Joins with Complex Expressions on Distributed Platforms. In Proceedings of theSymposium on Advances in Databases and Information Systems, pages 216–230.
11 |
Ribeiro, L. A. and Härder, T. (2011). Generalizing Prefix Filtering to Improve Set Similarity Joins. Information Systems, 36(1):62–78.
12 |
Ribeiro, L. A., Schneider, N. C., de Souza Inácio, A., Wagner, H. M., and von Wangenheim, A. (2016). Bridging Database Applications and Declarative Similarity Matching. Journal of Information and Data Management, 7(3):217–232.
13 |
Ribeiro-Júnior, S., Quirino, R. D., Ribeiro, L. A., and Martins, W. S. (2017). Fast Parallel Set Similarity Joins on Many-core Architectures. Journal of Information and Data Management, 8(3):255–270.
14 |
Wang, X., Qin, L., Lin, X., Zhang, Y., and Chang, L. (2017). Leveraging Set Relations in Exact Set Similarity Join. Proceedings of the VLDB Endowment, 10(9):925–936.
15 |
Xiao, C., Wang, W., Lin, X., Yu, J. X., and Wang, G. (2011). Efficient Similarity Joins for Near-Duplicate Detection. ACM Transactions on Database Systems, 36(3):15:1–15:41.