1 |
[Airflow 2022] Airflow, A. (2022). Apache airflow. https://airflow.apache.org/.
|
|
2 |
[Armbrust 2020] Armbrust, M. (2020). Delta lake: High-performance acid table storage over cloud object stores. In Databricks. Databricks.
|
|
3 |
[Azkaban 2022] Azkaban (2022). Azkaban. https://azkaban.github.io/.
|
|
4 |
[de Oliveira et al. 2019] de Oliveira, D. C. M., Liu, J., and Pacitti, E. (2019). Data-Intensive Workflow Management: For Clouds and Data-Intensive and Scalable Computing Environments. Synthesis Lectures on Data Management. Morgan & Claypool Publishers.
|
|
5 |
[Gottin et al. 2018] Gottin, V., Pacheco, E., Dias, J., Ciarlini, A., Costa, B., Vieira, W., Souto, Y., Pires, P., Porto, F., and Rittmeyer, J. (2018). Automatic caching decision for scientific dataflow execution in apache spark. In Proc. of the BeyondMR, pages 1–10.
|
|
6 |
[Heidsieck et al. 2020] Heidsieck, G., de Oliveira, D., Pacitti, E., Pradal, C., Tardieu, F., and Valduriez, P. (2020). Distributed caching of scientific workflows in multisite cloud. In Database and Expert Systems Applications, volume 12392, pages 51–65. Springer.
|
|
7 |
[Hopkins 2022] Hopkins, U. J. (2022). Covid-19 data repository by the center for systems science and engineering at johns hopkins university. https://github.com/cssegisanddata/covid-19.
|
|
8 |
[Jain 2017] Jain, A. (2017). Mastering Apache Storm: Real-Time Big Data Streaming Using Kafka, Hbase and Redis. Packt Publishing.
|
|
9 |
[Lakshman and Malik 2009] Lakshman, A. and Malik, P. (2009). Cassandra - a decentralized structured storage system. In Cs Cornell. Cs Cornell.
|
|
10 |
[Prefect 2022] Prefect (2022). Prefect - the new standard in dataflow automation - prefect. https://www.prefect.io/.
|
|
11 |
[Zaharia et al. 2016] Zaharia, M., Xin, R. S., Wendell, P., Das, T., Armbrust, M., Dave, A., Meng, X., Rosen, J., Venkataraman, S., Franklin, M. J., Ghodsi, A., Gonzalez, J., Shenker, S., and Stoica, I. (2016). Apache spark: A unified engine for big data processing. Commun. ACM, 59(11):56–65.
|
|