1 |
Baevski, A., Zhou, Y., Mohamed, A., and Auli, M. (2020). wav2vec 2.0: A framework for self-supervised learning of speech representations. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, pages 12449–12460.
|
|
2 |
Banerjee, S. and Lavie, A. (2005). Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, pages 65–72.
|
|
3 |
Chan, W., Jaitly, N., Le, Q., and Vinyals, O. (2016). Listen, attend and spell: A neural network for large vocabulary conversational speech recognition.
|
|
4 |
Chiu, C.-C., Sainath, T. N., Wu, Y., Prabhavalkar, R., Nguyen, P., Chen, Z., Kannan, A., Weiss, R. J., Rao, K., Gonina, E., et al. (2018). State-of-the-art speech recognition with sequence-to-sequence models. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4774–4778. IEEE.
|
|
5 |
de Lima, T. A. and Da Costa-Abreu, M. (2020). A survey on automatic speech recognition systems for portuguese language and its variations. Computer Speech & Language, 62:101055.
|
|
6 |
Filippidou, F. and Moussiades, L. (2020). A benchmarking of ibm, google and wit automatic speech recognition systems. In IFIP International Conference on Artificial Intelligence Applications and Innovations, pages 73–82. Springer.
|
|
7 |
Graves, A., Mohamed, A.-r., and Hinton, G. (2013). Speech recognition with deep recurrent neural networks. In 2013 IEEE international conference on acoustics, speech and signal processing, pages 6645–6649. Ieee.
|
|
8 |
Hartmann, N., Fonseca, E., Shulby, C., Treviso, M., Rodrigues, J., and Aluisio, S. (2017). Portuguese word embeddings: Evaluating on word analogies and natural language tasks. arXiv preprint arXiv:1708.06025.
|
|
9 |
Kepuska, V. and Bohouta, G. (2017). Comparing speech recognition systems (microsoft api, google api and cmu sphinx). Int. J. Eng. Res. Appl, 7(03):20–24.
|
|
10 |
Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. (2001). Bleu: a method for automatic evaluation of machine translation. In ACL.
|
|
11 |
Reddy, D. R. (1976). Speech recognition by machine: A review. Proceedings of the IEEE, 64(4):501–531.
|
|
12 |
Schneider, S., Baevski, A., Collobert, R., and Auli, M. (2019). wav2vec: Unsupervised pre-training for speech recognition. In Interspeech 2019, pages 3465–3469.
|
|
13 |
Xiong, W., Droppo, J., Huang, X., Seide, F., Seltzer, M., Stolcke, A., Yu, D., and Zweig, G. (2016). Achieving human parity in conversational speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing.
|
|
14 |
Xiong, W., Wu, L., Alleva, F., Droppo, J., Huang, X., and Stolcke, A. (2018). The microsoft 2017 conversational speech recognition system. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), pages 5934–5938. IEEE.
|
|