1 |
Bau, A., Belinkov, Y., Sajjad, H., Durrani, N., Dalvi, F., and Glass, J. (2018). Identifying and controlling important neurons in neural machine translation.
|
|
2 |
Bengio, Y., Ducharme, R., and Vincent, P. (2000). A neural probabilistic language model. In Leen, T., Dietterich, T., and Tresp, V., editors, Advances in Neural Information Processing Systems, volume 13. MIT Press.
|
|
3 |
Costa, L., Figênio, M., Santanchè, A., and Gomes-Jr, L. (2024). LLM-MRI python module: a brain scanner for llms. In Anais Estendidos do XXXIX Simpósio Brasileiro de Bancos de Dados, pages 125–130, Porto Alegre, RS, Brasil. SBC.
|
|
4 |
Cunningham, H., Ewart, A., Riggs, L., Huben, R., and Sharkey, L. (2023). Sparse auto-encoders find highly interpretable features in language models.
|
|
5 |
Dalvi, F., Durrani, N., Sajjad, H., Belinkov, Y., Bau, A., and Glass, J. (2019). What is one grain of sand in the desert? analyzing individual neurons in deep nlp models. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01):6309–6317.
|
|
6 |
DeRose, J. F., Wang, J., and Berger, M. (2020). Attention flows: Analyzing and comparing attention mechanisms in language models.
|
|
7 |
Figênio, M., Santanché, A., and Gomes-Jr, L. (2024a). The impact of activation patterns in the explainability of large language models – a survey of recent advances. In Anais da XIX Escola Regional de Banco de Dados, pages 141–149, Porto Alegre, RS, Brasil. SBC.
|
|
8 |
Figênio, M. R. and Gomes-Jr, L. (2023). Ética na era dos modelos de linguagem massivos (llms): um estudo de caso do chatgpt. In Anais da XVIII Escola Regional de Banco de Dados (ERBD 2023), volume 0, page 100, Brasil.
|
|
9 |
Figênio, M. R., Santanché, A., and Gomes-Jr, L. (2024b). The impact of activation patterns in the explainability of large language models - a survey of recent advances. In Anais da XIX Escola Regional de Banco de Dados (ERBD 2024), page 141, Brasil.
|
|
10 |
Hiter, S. (2024). Top 20 generative ai tools and applications in 2024. Disponível em: https://www.eweek.com/artificial-intelligence/generative-ai-apps-tools/.
|
|
11 |
Hoover, B., Strobelt, H., and Gehrmann, S. (2019). exbert: A visual analysis tool to explore learned representations in transformers models.
|
|
12 |
Horta, V. A., Tiddi, I., Little, S., and Mileo, A. (2021). Extracting knowledge from deep neural networks through graph analysis. Future Generation Computer Systems, 120:109–118.
|
|
13 |
L. da F. Costa, F. A. Rodrigues, G. T. and Boas, P. R. V. (2007). Characterization of complex networks: A survey of measurements. Advances in Physics, 56(1):167–242.
|
|
14 |
Lieberum, T., Rajamanoharan, S., Conmy, A., Smith, L., Sonnerat, N., Varma, V., Kramár, J., Dragan, A., Shah, R., and Nanda, N. (2024). Gemma scope: Open sparse autoencoders everywhere all at once on gemma 2.
|
|
15 |
Naveed, H., Khan, A. U., Qiu, S., Saqib, M., Anwar, S., Usman, M., Akhtar, N., Barnes, N., and Mian, A. (2024). A comprehensive overview of large language models.
|
|
16 |
Samek, W., Wiegand, T., and Müller, K.-R. (2017). Explainable artificial intelligence: Understanding, visualizing and interpreting deep learning models.
|
|
17 |
Schmidt, H. G. and Rikers, R. M. J. P. (2007). How expertise develops in medicine: knowledge encapsulation and illness script formation. Medical Education, 41(12):1133–1139.
|
|
18 |
Tunstall, L., Von Werra, L., and Wolf, T. (2022). Natural language processing with transformers. ”O’Reilly Media, Inc.”.
|
|
19 |
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems.
|
|
20 |
Zhang, B., He, Z., and Lin, H. (2024). A comprehensive review of deep neural network interpretation using topological data analysis. Neurocomputing, 609:128513.
|
|
21 |
Zhao, H., Chen, H., Yang, F., Liu, N., Deng, H., Cai, H., Wang, S., Yin, D., and Du, M. (2024a). Explainability for large language models: A survey. ACM Trans. Intell. Syst. Technol. Just Accepted.
|
|
22 |
Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., Du, Y., Yang, C., Chen, Y., Chen, Z., Jiang, J., Ren, R., Li, Y., Tang, X., Liu, Z., Liu, P., Nie, J.-Y., and Wen, J.-R. (2024b). A survey of large language models.
|
|