1 |
Baevski, A., Zhou, Y., Mohamed, A., and Auli, M. (2020). wav2vec 2.0: A framework for self-supervised learning of speech representations. In NeurIPS, pages 12449–12460.
|
|
2 |
Chiu, C.-C., Tripathi, A., Chou, K., Co, C., Jaitly, N., Jaunzeikare, D., Kannan, A., Nguyen, P., Sak, H., Sankar, A., et al. (2017). Speech recognition for medical conversations. arXiv preprint arXiv:1711.07274
|
|
3 |
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805
|
|
4 |
Hsu, W.-N., Bolte, B., Tsai, Y.-H. H., Lakhotia, K., Salakhutdinov, R., and Mohamed, A. (2021). Hubert: Self-supervised speech representation learning by masked prediction of hidden units. IEEE/ACM TASLP, 29:3451–3460
|
|
5 |
Jiang, A. Q., Sablayrolles, A., Mensch, A., Bamford, C., Chaplot, D. S., Casas, D. d. l., Bressand, F., Lengyel, G., Lample, G., Saulnier, L., et al. (2023). Mistral 7b. arXiv preprint arXiv:2310.06825
|
|
6 |
Kar, S., Mishra, P., Lin, J., Woo, M.-J., Deas, N., Linduff, C., Niu, S., Yang, Y., McClendon, J., Smith, D. H., et al. (2021). Systematic evaluation and enhancement of speech recognition in operational medical environments. In IJCNN, pages 1–8
|
|
7 |
Lee, T.-Y., Li, C.-C., Chou, K.-R., Chung, M.-H., Hsiao, S.-T., Guo, S.-L., Hung, L.-Y., and Wu, H.-T. (2023). Machine learning-based speech recognition system for nursing documentation–a pilot study. IJMI, 178:105213
|
|
8 |
Li, B., Zhou, H., He, J., Wang, M., Yang, Y., and Li, L. (2020). On the sentence embeddings from pre-trained language models. In Webber, B., Cohn, T., He, Y., and Liu, Y., editors, Proceedings of the EMNLP, pages 9119–9130
|
|
9 |
Li, J., Lavrukhin, V., Ginsburg, B., Leary, R., Kuchaiev, O., Cohen, J. M., Nguyen, H., and Gadde, R. T. (2019). Jasper: An End-to-End Convolutional Neural Acoustic Model. In Proc. Interspeech 2019, pages 71–75. ISCA
|
|
10 |
Paats, A., Alumae, T., Meister, E., and Fridolin, I. (2018). Retrospective analysis of clinical performance of an estonian speech recognition system for radiology: effects of different acoustic and language models. JDI, 31(5):615–621.
|
|
11 |
Papineni, K., Roukos, S., Ward, T., and Zhu, W.-J. (2002). Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th ACL, page 311–318, USA. Association for Computational Linguistics
|
|
12 |
Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., and Sutskever, I. (2023). Robust speech recognition via large-scale weak supervision. In ICML, pages 28492–28518.
|
|
13 |
Reddy, D. R. (1976). Speech recognition by machine: A review. Proceedings of the IEEE, 64(4):501–531.
|
|
14 |
Rubenstein, P. K., Asawaroengchai, C., Nguyen, D. D., Bapna, A., Borsos, Z., Quitry, F. d. C., Chen, P., Badawy, D. E., Han, W., Kharitonov, E., et al. (2023). Audiopalm: A large language model that can speak and listen. arXiv preprint arXiv:2306.12925.
|
|
15 |
Schneider, S., Baevski, A., Collobert, R., and Auli, M. (2019). wav2vec: Unsupervised pre-training for speech recognition. In Interspeech 2019, pages 3465–3469.
|
|
16 |
Sullivan, P., Shibano, T., and Abdul-Mageed, M. (2022). Improving automatic speech recognition for non-native english with transfer learning and language model decoding. In AANLSP, pages 21–44
|
|
17 |
Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Roziere, B., Goyal, N., Hambro, E., Azhar, F., et al. (2023). Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971
|
|
18 |
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł.,and Polosukhin, I. (2017). Attention is all you need. In NIPS, pages 6000–6010
|
|
19 |
Wilcoxon, F. (1992). Individual comparisons by ranking methods. In Kotz, S. and Johnson, N. L., editors, Breakthroughs in Statistics: Methodology and Distribution, pages 196–202. Springer New York, New York, NY.
|
|