ISG Talks are sponsored by Couchbase.
- This event has passed.
Sadeem Alsudais: Drove: Tracking Execution Results of Workflows on Large Data
October 7, 2022 @ 12:30 pm - 2:00 pm
Data analytics using workflows is an iterative process, in which an analyst makes many iterations of changes, such as additions, deletions, and alterations of operators and their links. In many cases, the analyst wants to compare these workflow versions and their execution results to help decide the next iteration of changes. To this end, we introduce Drove, a framework that manages the end-to-end lifecycle of constructing, refining, and executing workflows on large data sets and provides a dashboard to monitor these execution results. In many cases, the result of an execution is equivalent to a prior one. Identifying such equivalence between the execution results of different workflow versions is important to find reuse opportunities. In Drove, we reason the semantic equivalence of the workflow versions to reuse previously-stored results by leveraging existing Equivalence Verifiers (EV). In this talk, I will discuss a novel technique called a “covering window,” which covers the edits between workflow versions to reason their effect on the results. This technique can be applied not only to find final result reuse opportunities but also to find intermediate ones. Finally, I will demonstrate in this talk a prototype of Drove’s dashboard in Texera.
Sadeem Alsudais is a Ph.D. student in the Computer Science department at UC Irvine. She received her M.Sc. in Software Engineering from USC and B.Sc. in Information Technology from King Saud University. Her research interests lie in the fields of Big Data processing and visualization. She is a recipient of the KSU scholarship award 2018.