ISG Talks are sponsored by Couchbase.

- This event has passed.
Jiadong Bai: Supporting Data Science Education Using Texera with a Cloud Infrastructure
February 14 @ 1:00 pm - 2:00 pm
Abstract
In this talk, we will first present our paper at the DSE-K12 conference with the title “DS4ALL: Teaching High-School Students Data Science and AI/ML Using the Texera Workflow Platform as a Service.” Traditional data science education often requires students to have programming experience and install local software. It also makes collaboration inefficient and slows down the feedback loop when students need help from TAs. To address these challenges, we developed a new teaching paradigm using Texera, and successfully hosted DS4ALL summer programs in 2023 and 2024. As a result, students with no prior coding experience were able to perform data analysis using AI/ML techniques on the platform.The second part of the talk will focus on how we are scaling to a broader audience based on the success of the Texera system used in DS4ALL,. There are several challenges of building such a cloud infrastructure, such as handling versatile service traffic, supporting concurrent isolated workflow executions, and managing diverse data storage needs. To address these challenges, we develop Texera to be able to run using Kubernetes to achieve scalable service management; For workflow execution, we run each workflow in an isolated Kubernetes pod to ensure performance and security. On the data storage side, we design a storage layer that integrates LakeFS for managing versioned datasets and Apache Iceberg for handling versioned tables with support for concurrent read/write operations. These solutions allow Texera to be deployed as a cloud service to provide a scalable, secure, and efficient environment for data science workflows.Bio
Jiadong Bai is a second-year Ph.D. student in the Computer Science Department at UC Irvine, with research interests in data systems, data science, and big data analysis. He’s supervised by Prof. Chen Li.Shengquan Ni is a sixth-year Ph.D. student in the Department of Computer Science advised by Professor Chen Li. His research interests include big data processing, distributed systems, data analytics, and data science.
In this talk, we will first present our paper at the DSE-K12 conference with the title “DS4ALL: Teaching High-School Students Data Science and AI/ML Using the Texera Workflow Platform as a Service.” Traditional data science education often requires students to have programming experience and install local software. It also makes collaboration inefficient and slows down the feedback loop when students need help from TAs. To address these challenges, we developed a new teaching paradigm using Texera, and successfully hosted DS4ALL summer programs in 2023 and 2024. As a result, students with no prior coding experience were able to perform data analysis using AI/ML techniques on the platform.The second part of the talk will focus on how we are scaling to a broader audience based on the success of the Texera system used in DS4ALL,. There are several challenges of building such a cloud infrastructure, such as handling versatile service traffic, supporting concurrent isolated workflow executions, and managing diverse data storage needs. To address these challenges, we develop Texera to be able to run using Kubernetes to achieve scalable service management; For workflow execution, we run each workflow in an isolated Kubernetes pod to ensure performance and security. On the data storage side, we design a storage layer that integrates LakeFS for managing versioned datasets and Apache Iceberg for handling versioned tables with support for concurrent read/write operations. These solutions allow Texera to be deployed as a cloud service to provide a scalable, secure, and efficient environment for data science workflows.Bio
Jiadong Bai is a second-year Ph.D. student in the Computer Science Department at UC Irvine, with research interests in data systems, data science, and big data analysis. He’s supervised by Prof. Chen Li.Shengquan Ni is a sixth-year Ph.D. student in the Department of Computer Science advised by Professor Chen Li. His research interests include big data processing, distributed systems, data analytics, and data science.