Paper Information | SBC BOOKS

Fill in your paper information

Language (*)

Title (*)

Keywords (*)

Abstract (*)

LLMs são atualmente tecnologias centrais para o desenvolvimento de aplicações de IA, atraindo interesse acadêmico e grandes investimentos na indústria. Apesar do sucesso e amplo uso dessas tecnologias, LLMs apresentam grandes desafios em termos de interpretabilidade –  a complexidade dos modelos e a natureza de caixa-preta das redes neurais dificultam a compreensão dos mecanismos de geração das saídas. Este artigo apresenta os Grafos de Ativação de Regiões Neurais (NRAGs -- do inglês Neural Region Activation Graphs), nova proposta para mecanismos de explicabilidade em LLMs. NRAGs são representações em grafo das ativações de um LLM estimuladas por um corpus. Os grafos gerados podem ser usados em tarefas como (i) entender as interconexões entre diferentes regiões do espaço multidimensional das camadas da rede, (ii) comparar subgrafos de ativações de textos de diferentes categorias, e (iii) comparar propriedades dos grafos induzidos por LLMs diferentes para um mesmo corpus. NRAGs são implementados na biblioteca LLM-MRI, que oferece diversos artefatos para o estudo de ativações em LLMs. Este artigo apresenta NRAGs como uma alternativa para a investigação científica de fenômenos complexos resultantes da inferência de LLMs, cobrindo a geração dos grafos a partir da biblioteca LLM-MRI e exemplos de aplicações em andamento.

Pages (*)

File Link

English Information

Title

Keywords

Abstract

LLMs are currently central technologies for the development of AI applications, attracting academic interest and significant industry investment. Despite their success and widespread use, LLMs pose major challenges in terms of interpretability—the complexity of the models and the black-box nature of neural networks make it difficult to understand the mechanisms behind output generation.
This paper introduces Neural Region Activation Graphs (NRAGs), a novel approach to explainability in LLMs. NRAGs are graph-based representations of the activations of an LLM when stimulated by a corpus. The generated graphs can be used for tasks such as:
(i) understanding the interconnections between different regions of the multidimensional space of the network layers,
(ii) comparing activation subgraphs from texts of different categories, and
(iii) comparing properties of graphs induced by different LLMs for the same corpus.
NRAGs are implemented in the LLM-MRI library, which provides a variety of tools for studying LLM activations. This paper presents NRAGs as an alternative for the scientific investigation of complex phenomena resulting from LLM inference, covering the process of generating graphs using the LLM-MRI library and examples of ongoing applications.

(*) To change the order drag the item to the new position.

Authors

#	Name
1	Luiz Celso Gomes Jr(lcjunior@utfpr.edu.br)
2	Mateus Figênio(matigenioo@gmail.com)
3	André Santanchè(santanch@ic.unicamp.br)
4	Luiz Felipe Costa(l230613@dac.unicamp.br)

(*) To change the order drag the item to the new position.

Reference

#	Reference
1	Bau, A., Belinkov, Y., Sajjad, H., Durrani, N., Dalvi, F., and Glass, J. (2018). Identifying and controlling important neurons in neural machine translation.
2	Bengio, Y., Ducharme, R., and Vincent, P. (2000). A neural probabilistic language model. In Leen, T., Dietterich, T., and Tresp, V., editors, Advances in Neural Information Processing Systems, volume 13. MIT Press.
3	Costa, L., Figênio, M., Santanchè, A., and Gomes-Jr, L. (2024). LLM-MRI python module: a brain scanner for llms. In Anais Estendidos do XXXIX Simpósio Brasileiro de Bancos de Dados, pages 125–130, Porto Alegre, RS, Brasil. SBC.
4	Cunningham, H., Ewart, A., Riggs, L., Huben, R., and Sharkey, L. (2023). Sparse auto-encoders find highly interpretable features in language models.
5	Dalvi, F., Durrani, N., Sajjad, H., Belinkov, Y., Bau, A., and Glass, J. (2019). What is one grain of sand in the desert? analyzing individual neurons in deep nlp models. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01):6309–6317.
6	DeRose, J. F., Wang, J., and Berger, M. (2020). Attention flows: Analyzing and comparing attention mechanisms in language models.
7	Figênio, M., Santanché, A., and Gomes-Jr, L. (2024a). The impact of activation patterns in the explainability of large language models – a survey of recent advances. In Anais da XIX Escola Regional de Banco de Dados, pages 141–149, Porto Alegre, RS, Brasil. SBC.
8	Figênio, M. R. and Gomes-Jr, L. (2023). Ética na era dos modelos de linguagem massivos (llms): um estudo de caso do chatgpt. In Anais da XVIII Escola Regional de Banco de Dados (ERBD 2023), volume 0, page 100, Brasil.
9	Figênio, M. R., Santanché, A., and Gomes-Jr, L. (2024b). The impact of activation patterns in the explainability of large language models - a survey of recent advances. In Anais da XIX Escola Regional de Banco de Dados (ERBD 2024), page 141, Brasil.
10	Hiter, S. (2024). Top 20 generative ai tools and applications in 2024. Disponível em: https://www.eweek.com/artificial-intelligence/generative-ai-apps-tools/.
11	Hoover, B., Strobelt, H., and Gehrmann, S. (2019). exbert: A visual analysis tool to explore learned representations in transformers models.
12	Horta, V. A., Tiddi, I., Little, S., and Mileo, A. (2021). Extracting knowledge from deep neural networks through graph analysis. Future Generation Computer Systems, 120:109–118.
13	L. da F. Costa, F. A. Rodrigues, G. T. and Boas, P. R. V. (2007). Characterization of complex networks: A survey of measurements. Advances in Physics, 56(1):167–242.
14	Lieberum, T., Rajamanoharan, S., Conmy, A., Smith, L., Sonnerat, N., Varma, V., Kramár, J., Dragan, A., Shah, R., and Nanda, N. (2024). Gemma scope: Open sparse autoencoders everywhere all at once on gemma 2.
15	Naveed, H., Khan, A. U., Qiu, S., Saqib, M., Anwar, S., Usman, M., Akhtar, N., Barnes, N., and Mian, A. (2024). A comprehensive overview of large language models.
16	Samek, W., Wiegand, T., and Müller, K.-R. (2017). Explainable artificial intelligence: Understanding, visualizing and interpreting deep learning models.
17	Schmidt, H. G. and Rikers, R. M. J. P. (2007). How expertise develops in medicine: knowledge encapsulation and illness script formation. Medical Education, 41(12):1133–1139.
18	Tunstall, L., Von Werra, L., and Wolf, T. (2022). Natural language processing with transformers. ”O’Reilly Media, Inc.”.
19	Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems.
20	Zhang, B., He, Z., and Lin, H. (2024). A comprehensive review of deep neural network interpretation using topological data analysis. Neurocomputing, 609:128513.
21	Zhao, H., Chen, H., Yang, F., Liu, N., Deng, H., Cai, H., Wang, S., Yin, D., and Du, M. (2024a). Explainability for large language models: A survey. ACM Trans. Intell. Syst. Technol. Just Accepted.
22	Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., Du, Y., Yang, C., Chen, Y., Chen, Z., Jiang, J., Ren, R., Li, Y., Tang, X., Liu, Z., Liu, P., Nie, J.-Y., and Wen, J.-R. (2024b). A survey of large language models.

Paper Registration

Fill in your paper information

English Information

Authors

Reference