1 |
Comet. https://www.comet.com/. Accessed: 2025-06-03.
|
|
2 |
Dlprov provenance data model. ProvenanceDataModel. https://github.com/dbpina/dlprov/ Accessed: 2025-06-03.
|
|
3 |
Mlflow. https://mlflow.org/. Accessed: 2025-06-03.
|
|
4 |
Neo4j. https://neo4j.com. Accessed: 2025-06-03.
|
|
5 |
Prov. https://pypi.org/project/prov/. Accessed: 2025-06-03.
|
|
6 |
Prov database connector. https://github.com/DLR-SC/prov-db-connector. Accessed: 2025-06-
03.
|
|
7 |
prov2neo. https://github.com/DLR-SC/prov2neo. Accessed: 2025-06-03.
|
|
8 |
Weights and biases. https://wandb.ai/site/. Accessed: 2025-06-03.
|
|
9 |
Saleema Amershi, Andrew Begel, Christian Bird, Robert DeLine, Harald Gall, Ece Kamar, Nachiappan Nagappan, Besmira Nushi, and Thomas Zimmermann. Software engineering for machine learning:
A case study. In 2019 IEEE/ACM 41st International Conference on Software Engineering: Software
Engineering in Practice (ICSE-SEIP), pages 291–300, 2019.
|
|
10 |
Matthias Boehm, Arun Kumar, and Jun Yang. Data management in machine learning systems. Synthesis
Lectures on Data Management, 11(1):1–173, 2019.
|
|
11 |
Adriane Chapman, Paolo Missier, Giulia Simonelli, and Riccardo Torlone. Capturing and querying fine-grained provenance of preprocessing pipelines in data science. Proceedings of the VLDB Endowment,
14(4):507–520, 2020.
|
|
12 |
Andrew Chen, Andy Chow, Aaron Davidson, Arjun DCunha, Ali Ghodsi, Sue Ann Hong, Andy
Konwinski, Clemens Mewald, Siddharth Murching, Tomas Nykodym, Paul Ogilvie, Mani Parkhe, Avesh
Singh, Fen Xie, Matei Zaharia, Richard Zang, Juntai Zheng, and Corey Zumar. Developments in mlflow:
A system to accelerate the machine learning lifecycle. DEEM’20. ACM, 2020.
|
|
13 |
Lyncoln de Oliveira, Rômulo Silva, Liliane Kunstmann, Débora Pina, Daniel de Oliveira, Alvaro Coutinho, and Marta Mattoso. Dados de proveniência para redes neurais guiadas pela física: o caso da equação
eikonal. In Anais do XXXVII Simpósio Brasileiro de Bancos de Dados, pages 373–378, Porto Alegre, RS,
Brasil, 2022. SBC.
|
|
14 |
Lyncoln S. de Oliveira, Liliane Kunstmann, Débora Pina, Daniel de Oliveira, and Marta Mattoso. Pinnprov: Provenance for physics-informed neural networks. In 2023 International Symposium on Computer
Architecture and High Performance Computing Workshops (SBAC-PADW), pages 16–23, 2023.
|
|
15 |
Rafael Ferreira da Silva, Deborah Bard, Kyle Chard, Shaun DeWitt, Ian T. Foster, Tom Gibbs, Carole
Goble, William Godoy, Johan Gustafsson, Utz-Uwe Haus, Stephen Hudson, Shantenu Jha, Laila Los,
Drew Paine, Frederic Suter, Logan Ward, Sean Wilkinson, Marcos Amaris, Yadu Babuji, Jonathan Bader,
Riccardo Balin, Daniel Balouek, Sarah Beecroft, Khalid Belhajjame, Rajat Bhattarai, Wes Brewer, Paul
Brunk, Silvina Caino-Lores, Henri Casanova, Daniela Cassol, Jared Coleman, Taina Coleman, Iacopo
Colonnelli, Anderson Andrei Da Silva, Daniel de Oliveira, Pascal Elahi, Nour Elfaramawy, Wael Elwa-
sif, Brian Etz, Thomas Fahringer, Wesley Ferreira, Rosa Filgueira, Jacob Fosso Tande, Luiz Gadelha,
Andy Gallo, Daniel Garijo, Yiannis Georgiou, Philipp Gritsch, Patricia Grubel, Amal Gueroudji, Quen-
tin Guilloteau, Carlo Hamalainen, Rolando Hong Enriquez, Lauren Huet, Kevin Hunter Kesling, Paula
Iborra, Shiva Jahangiri, Jan Janssen, Joe Jordan, Sehrish Kanwal, Liliane Kunstmann, Fabian Lehmann,
Ulf Leser, Chen Li, Peini Liu, Jakob Luettgau, Richard Lupat, Jose M. Fernandez, Ketan Maheshwari,
Tanu Malik, Jack Marquez, Motohiko Matsuda, Doriana Medic, Somayeh Mohammadi, Alberto Mulone,
John-Luke Navarro, Kin Wai Ng, Klaus Noelp, Bruno P. Kinoshita, Ryan Prout, Michael R. Crusoe,
Sashko Ristov, Stefan Robila, Daniel Rosendo, Billy Rowell, Jedrzej Rybicki, Hector Sanchez, Nishant
Saurabh, Sumit Kumar Saurav, Tom Scogland, Dinindu Senanayake, Woong Shin, Raul Sirvent, Tyler
Skluzacek, Barry Sly-Delgado, Stian Soiland-Reyes, Abel Souza, Renan Souza, Domenico Talia, Nathan
Tallent, Lauritz Thamsen, Mikhail Titov, Ben Tovar, Karan Vahi, Eric Vardar-Irrgang, Edite Vartina, Yuandou Wang, Merridee Wouters, Qi Yu, Ziad Al Bkhetan, and Mahnoor Zulfiqar. Workflows community
summit 2024: Future trends and challenges in scientific workflows. (ORNL/TM-2024/3573), 2024.
|
|
16 |
Juliana Freire, David Koop, Emanuele Santos, and Cláudio T. Silva. Provenance for computational tasks:
A survey. Computing in Science & Engineering, 10(3):11–21, 2008.
|
|
17 |
Gharib Gharibi, Vijay Walunj, Raju Nekadi, Raj Marri, and Yugyung Lee. Automated end-to-end management of the modeling lifecycle in deep learning. Empirical Software Engineering, 26:1–33, 2021.
|
|
18 |
Melanie Herschel, Ralf Diestelkämper, and Houssem Ben Lahmar. A survey on provenance: What for?
what form? what from? The VLDB Journal, 26:881–906, 2017.
|
|
19 |
Samuel Idowu, Osman Osman, Daniel Strüber, and Thorsten Berger. Machine learning experiment management tools: a mixed-methods empirical study. Empirical Software Engineering, 29(4):1–35, 2024.
|
|
20 |
Liliane Kunstmann, Débora Pina, Filipe Silva, Aline Paes, Patrick Valduriez, Daniel de Oliveira, and
Marta Mattoso. Online deep learning hyperparameter tuning based on provenance analysis. Journal of
Information and Data Management, 12(5), Nov. 2021.
|
|
21 |
Simone Leo, Michael R. Crusoe, Laura Rodríguez-Navas, Raül Sirvent, Alexander Kanitz, Paul De Geest, Rudolf Wittner, Luca Pireddu, Daniel Garijo, José M. Fernández, Iacopo Colonnelli, Matej Gallo,
Tazro Ohta, Hirotaka Suetake, Salvador Capella-Gutierrez, Renske de Wit, Bruno P. Kinoshita, and Stian
Soiland-Reyes. Recording provenance of workflow runs with ro-crate. PLOS ONE, 19(9):1–35, 09 2024.
|
|
22 |
Hui Miao, Ang Li, Larry S Davis, and Amol Deshpande. Modelhub: Towards unified data and lifecycle
management for deep learning. arXiv preprint arXiv:1611.06224, 2016.
|
|
23 |
Hui Miao, Ang Li, Larry S. Davis, and Amol Deshpande. Modelhub: Deep learning lifecycle management. In 2017 IEEE 33rd International Conference on Data Engineering (ICDE), pages 1393–1394,
2017.
|
|
24 |
Marçal Mora-Cantallops, Salvador Sánchez-Alonso, Elena García-Barriocanal, and Miguel-Angel Sicilia. Traceability for trustworthy ai: A review of models and tools. Big Data and Cognitive Computing,
5(2):20, 2021.
|
|
25 |
Luc Moreau and Paul Groth. Provenance: an introduction to prov. Synthesis Lectures on the Semantic Web: Theory and Technology, 3(4):1–129, 2013.
|
|
26 |
Mohammad Hossein Namaki, Avrilia Floratou, Fotis Psallidas, Subru Krishnan, Ashvin Agrawal,
Yinghui Wu, Yiwen Zhu, and Markus Weimer. Vamsa: Automated provenance tracking in data science
scripts. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery
and Data Mining, KDD ’20, page 1542–1551, New York, NY, USA, 2020. Association for Computing
Machinery.
|
|
27 |
David Nigenda, Zohar Karnin, Muhammad Bilal Zafar, Raghu Ramesha, Alan Tan, Michele Donini,
and Krishnaram Kenthapadi. Amazon sagemaker model monitor: A system for real-time insights into
deployed machine learning models. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge
Discovery and Data Mining, pages 3671–3681, 2022.
|
|
28 |
Débora Pina, Adriane Chapman, Daniel De Oliveira, and Marta Mattoso. Deep learning provenance data
integration: a practical approach. WWW ’23 Companion, page 1542–1550. ACM, 2023.
|
|
29 |
Débora Pina, Adriane Chapman, Liliane Kunstmann, Daniel de Oliveira, and Marta Mattoso. Dlprov: A
data-centric support for deep learning workflow analyses. DEEM ’24, page 77–85. ACM, 2024.
|
|
30 |
Débora Pina, Liliane Kunstmann, Adriane Chapman, Daniel de Oliveira, and Marta Mattoso. Dlprov: a suite of provenance services for deep learning workflow analyses. PeerJ Comp. Sci., 11:e2985, 2025.
|
|
31 |
Débora Pina, Liliane Kunstmann, Felipe Bevilaqua, Isabela Siqueira, Alan Lyra, Daniel de Oliveira, and
Marta Mattoso. Capturing provenance from deep learning applications using keras-prov and colab: a
practical approach. Journal of Information and Data Management, 13(5), Dec. 2022.
|
|
32 |
Débora Pina, Liliane Kunstmann, Daniel de Oliveira, and Marta Mattoso. Breadcrumbs for your deep
learning model: Following provenance traces with dlprov. Software Impacts, 23:100730, 2025.
|
|
33 |
Débora Pina, Liliane Neves, Daniel de Oliveira, and Marta Mattoso. Captura automática de dados de
proveniência de experimentos de aprendizado de máquina com keras-prov. In Anais Estendidos do XXXVI
Simpósio Brasileiro de Bancos de Dados, pages 69–74, Porto Alegre, RS, Brasil, 2021. SBC.
|
|
34 |
Jim Pruyne, Justin M. Wozniak, and Ian Foster. Tracking dubious data: Protecting scientific workflows
from invalidated experiments. In 2022 IEEE 18th International Conference on e-Science (e-Science),
pages 456–461, 2022.
|
|
35 |
Sebastian Schelter, Joos-Hendrik Böse, Johannes Kirschnick, Thoralf Klein, and Stephan Seufert. Automatically tracking metadata and provenance of machine learning experiments. In Machine Learning
Systems workshop at the conference on Neural Information Processing Systems (NIPS), 2017.
|
|
36 |
Marius Schlegel and Kai-Uwe Sattler. Management of machine learning lifecycle artifacts: A survey.
SIGMOD Rec., 51(4):18–35, jan 2023.
|
|
37 |
Marius Schlegel and Kai-Uwe Sattler. Mlflow2prov: Extracting provenance from machine learning experiments. In Proceedings of the Seventh Workshop on Data Management for End-to-End Machine Learning, DEEM ’23, New York, NY, USA, 2023. Association for Computing Machinery.
|
|
38 |
Shreya Shankar and Aditya G. Parameswaran. Towards observability for production machine learning
pipelines. Proc. VLDB Endow., 15(13):4015–4022, sep 2022.
|
|
39 |
Filipe Silva, Débora Pina, Liliane Kunstmann, and Marta Mattoso. Painel de proveniência: análise
durante o treinamento de redes neurais profundas. In Anais Estendidos do XXXVI Simpósio Brasileiro de
Bancos de Dados, pages 22–28, Porto Alegre, RS, Brasil, 2021. SBC.
|
|
40 |
Rômulo Silva, Débora Pina, Liliane Kunstmann, Daniel de Oliveira, Patrick Valduriez, Alvaro Coutinho,
and Marta Mattoso. Capturing provenance to improve the model training of pinns: first handon experiences with grid5000. In 42nd Ibero-Latin-American Congress on Computational Methods in Engineering
and 3rd Pan American Congress on Computational Mechanics, pages 1–7, 2021.
|
|
41 |
Vítor Silva, Daniel de Oliveira, Patrick Valduriez, and Marta Mattoso. Dfanalyzer: runtime dataflow
analysis of scientific applications using provenance. Proc. VLDB Endow., 11(12):2082–2085, 2018.
|
|
42 |
Renan Souza, Leonardo G. Azevedo, Vítor Lourenço, Elton Soares, Raphael Thiago, Rafael Brandão,
Daniel Civitarese, Emilio Vital Brazil, Marcio Moreno, Patrick Valduriez, Marta Mattoso, Renato Cerqueira, and Marco A. S. Netto. Workflow provenance in the lifecycle of scientific machine learning.
Concurrency and Computation: Practice and Experience, 34(14):e6544, 2022.
|
|
43 |
Renan Souza, Silvina Caino-Lores, Mark Coletti, Tyler J. Skluzacek, Alexandru Costan, Frédéric Suter,
Marta Mattoso, and Rafael Ferreira Da Silva. Workflow provenance in the computing continuum for
responsible, trustworthy, and energy-efficient ai. In 2024 IEEE 20th International Conference on e-
Science (e-Science), pages 1–7, 2024.
|
|
44 |
Jason Tsay, Todd Mummert, Norman Bobroff, Alan Braz, Peter Westerink, and Martin Hirzel. Runway:
machine learning model experiment management tool. In Conference on systems and machine learning
(sysML), 2018.
|
|
45 |
Manasi Vartak and Samuel Madden. Modeldb: Opportunities and challenges in managing machine learning models. IEEE Data Eng. Bull., 41(4):16–25, 2018.
|
|
46 |
Manasi Vartak, Harihar Subramanyam, Wei-En Lee, Srinidhi Viswanathan, Saadiyah Husnoo, Samuel
Madden, and Matei Zaharia. Modeldb: a system for machine learning model management. In Proceedings of the Workshop on Human-In-the-Loop Data Analytics, pages 1–3, 2016.
|
|
47 |
Justin M. Wozniak, Zhengchun Liu, Rafael Vescovi, Ryan Chard, Bogdan Nicolae, and Ian Foster. Braid-db: Toward ai-driven science with machine learning provenance. In Jeffrey Nichols, Arthur ‘Barney’
Maccabe, James Nutaro, Swaroop Pophale, Pravallika Devineni, Theresa Ahearn, and Becky Verastegui,
editors, Driving Scientific and Engineering Discoveries Through the Integration of Experiment, Big Data,
and Modeling and Simulation, pages 247–261, Cham, 2022. Springer International Publishing.
|
|
48 |
Matei Zaharia, Andrew Chen, Aaron Davidson, Ali Ghodsi, Sue Ann Hong, Andy Konwinski, Siddharth
Murching, Tomas Nykodym, Paul Ogilvie, Mani Parkhe, Fen Xie and Corey Zumar. Accelerating the machine learning lifecycle with mlflow. IEEE Data Eng. Bull., 41(4):39–45, 2018.
|
|