BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Information Systems Group - ECPv6.4.0.1//NONSGML v1.0//EN
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-WR-CALNAME:Information Systems Group
X-ORIGINAL-URL:https://isg.ics.uci.edu
X-WR-CALDESC:Events for Information Systems Group
REFRESH-INTERVAL;VALUE=DURATION:PT1H
X-Robots-Tag:noindex
X-PUBLISHED-TTL:PT1H
BEGIN:VTIMEZONE
TZID:America/Los_Angeles
BEGIN:DAYLIGHT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
TZNAME:PDT
DTSTART:20240310T100000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
TZNAME:PST
DTSTART:20241103T090000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
TZNAME:PDT
DTSTART:20250309T100000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
TZNAME:PST
DTSTART:20251102T090000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20240927T110000
DTEND;TZID=America/Los_Angeles:20240927T120000
DTSTAMP:20260604T185834
CREATED:20241007T164605Z
LAST-MODIFIED:20260417T191550Z
UID:2100-1727434800-1727438400@isg.ics.uci.edu
SUMMARY:Aditya Parameswaran (Berkeley): Enhance\, Don't Replace: A Recipe for Success in Data Tooling
DESCRIPTION:Enhance\, Don’t Replace: A Recipe for Success in Data Tooling\nAbstract: Most data analysis and data science is performed in human-centered tools\, such as spreadsheets\, visual analytics tools\, and data science libraries. However\, these tools often pose challenges for end-users\, especially those without extensive programming expertise\, in terms of scalability\, interactivity\, and usability. \nRather than forcing such users to switch tools\, over the past decade\, we’ve instead taken the approach of enhancing existing tools with an eye towards addressing these challenges. To do so\, we draw on techniques from data management and human-computer interaction. In my talk\, I’ll describe a couple of successful examples of tools that have been adopted widely by end-users. Finally\, we also reflect on how our recipe — of enhancing existing tools as opposed to replacing them — may need revisiting in the exciting arena of LLM-powered data work\, which forms the focus of our new EPIC Data lab at Berkeley. \nBio:  Aditya Parameswaran is an Associate Professor in EECS at UC Berkeley. He works in the broad area of human-centered data science\, developing usable\, robust\, scalable\, and intuitive data science tools. His open-source tools have received thousands of GitHub stars\, and have been downloaded millions of times across a wide spectrum of industries. Ponder\, a company that Aditya cofounded with his students in 2021 based on open-source tooling developed as part of research\, was acquired by Snowflake\, the leading cloud data warehouse vendor in 2023. Aditya has received the Alfred P. Sloan Research Fellowship\, VLDB Early Career Research Contributions Award\, the ARO Young Investigator Program Award\, the NSF CAREER Award\, the TCDE Rising Star Award\, a number of best paper awards\, along with other recognitions. \nWebsite: http://adityagp.net
URL:https://isg.ics.uci.edu/event/aditya-parameswaran-berkeley-enhance-dont-replace-a-recipe-for-success-in-data-tooling/
LOCATION:DBH 6011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20241011T130000
DTEND;TZID=America/Los_Angeles:20241011T140000
DTSTAMP:20260604T185834
CREATED:20241007T164906Z
LAST-MODIFIED:20250211T004315Z
UID:2102-1728651600-1728655200@isg.ics.uci.edu
SUMMARY:Arnab Nandi (OSU): Data Exploration in a Camera-first World: Query and Result Challenges
DESCRIPTION:Prof. Arnab Nandi \n \nAssociate Professor\, Computer Science and Engineering \nThe Ohio State University \nFriday\, October 11\, 2024\nat 11 a.m.\nDonald Bren Hall 6011 \nTitle: “Data Exploration in a Camera-first World: Query and Result Challenges” \nAbstract: The pervasive availability of cameras in smartphones\, vehicles\, drones and more has triggered a new “camera-first” data revolution across industries. When combined with rapid advances in computer vision and machine learning\, this video data deluge presents several data exploration challenges. Here\, we will talk about two complementary challenges for end-users: query specification and result consumption. \nWhen searching large video collections\, the first challenge is that the user is often unaware of the contents of the video\, its structure\, and the exact terminology to use in the user query\, putting them at a loss for where to begin specifying the query. Here\, we present methods to guide the user through the query construction process by building on vision language models and search query interfaces. \nOnce users have executed a search\, they are faced with a new challenge of result consumption. Presenting query results as a list of links poses an impedance mismatch: they are cumbersome to skim through and are in a different modality compared to the source data. However\, processing large video collections within interactive response times has performance implications. We present V2V\, a system to efficiently synthesize video results for video queries. V2V returns a fully-edited video\, allowing the user to consume results in the same modality as the source videos\, resulting in a fluid\, user-centric video exploration experience. \nBio:  Arnab’s work focuses on bridging data infrastructure with human interaction\, spanning areas of database systems\, human factors\, and next-generation interfaces. Arnab is a recipient of the US National Science Foundation’s CAREER Award\, IEEE’s TCDE Early Career Award for his contributions towards user-focused data interaction\, The Ohio State University’s Alumni Award for Distinguished Teaching\, and the University’s Early Career Innovator of the Year Award. \nOver the years\, Arnab has served as Program Committee member and Associate Editor for several database systems journals and conferences including SIGMOD\, VLDB\, ICDE\, and HILDA. Most recently\, Arnab served as Vice President of Data Science at Azuga Inc. (a Bridgestone company) after the acquisition of his connected vehicles analytics startup\, Mobikit. https://arnab.org/ \n 
URL:https://isg.ics.uci.edu/event/arnab-nandi-osu-data-exploration-in-a-camera-first-world-query-and-result-challenges/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20241017T150000
DTEND;TZID=America/Los_Angeles:20241017T160000
DTSTAMP:20260604T185834
CREATED:20241011T010505Z
LAST-MODIFIED:20250211T004309Z
UID:2128-1729177200-1729180800@isg.ics.uci.edu
SUMMARY:Nika Mansouri Ghiasi (ETH): Storage-Centric Computing for Genomics and Metagenomics
DESCRIPTION:Title: Storage-Centric Computing for Genomics and Metagenomics \nAbstract \nGenomics and metagenomics applications have enabled significant advancements in many critical areas. The exponential growth of genomic data poses unprecedented challenges in genomics and metagenomic applications. These applications suffer from significant data movement overheads from the storage system. To fundamentally address these overheads\, we make a case for storage-centric computing. \nFirst\, we propose GenStore\, the first in-storage processing system designed for genome sequence analysis that greatly reduces both data movement and computational overheads of genome sequence analysis by exploiting low-cost and accurate in-storage filters. We address the challenges of in-storage processing\, supporting reads with 1) different read lengths and error rates\, and 2) different degrees of genetic variation. Through rigorous analysis of read mapping processes\, we design low-cost hardware accelerators and data/computation flows inside a NAND flash-based SSD. Our evaluation using a wide range of real genomic datasets shows that GenStore significantly improves the read mapping performance of state-of-the-art software (hardware) baselines by 2.07-6.05× (1.52-3.32×) for read sets with high similarity to the reference genome and 1.45-33.63× (2.70-19.2×) for read sets with low similarity to the reference genome. \nSecond\, we propose MegIS\, the first in-storage processing system designed to significantly reduce the data movement overhead of the end-to-end metagenomic analysis pipeline. MegIS is enabled by our lightweight design that effectively leverages and orchestrates processing inside and outside the storage system. Through our detailed analysis of the end-to-end metagenomic analysis pipeline and careful hardware/software co-design\, we address \nin-storage processing challenges for metagenomics via specialized and efficient 1) task partitioning\, 2) data/computation flow coordination\, 3) storage technology-aware algorithmic optimizations\, 4) data mapping\, and 5) lightweight in-storage accelerators. MegIS’s design is flexible\, capable of supporting different types of metagenomic input datasets\, and can be integrated into various metagenomic analysis pipelines. Our evaluation shows that MegIS outperforms the state-of-the-art performance- and accuracy-optimized software metagenomic tools by 2.7×–37.2× and 6.9×–100.2×\, respectively\, while matching the accuracy of the accuracy-optimized tool. MegIS achieves 1.5×–5.1× speedup compared to the state-of-the-art metagenomic hardware-accelerated (using processing-in-memory) tool\, while achieving significantly higher accuracy. \n Bio \nNika Mansouri Ghiasi is a Ph.D. candidate in the SAFARI Research Group at ETH Zürich\, working with Professor Onur Mutlu. Her current research interests are in computer architecture and bioinformatics\, focusing on 1) large-scale bioinformatics applications\, storage systems\, and their interactions\, and 2) emerging technologies such as ultra-dense 3D integrated systems. Nika has co-authored several works on these topics in major computer architecture venues such as ISCA\, ASPLOS\, and MICRO\, as well as major bioinformatics venues such as ISMB\, Bioinformatics\, and Nature Reviews. \n 
URL:https://isg.ics.uci.edu/event/nika-mansouri-ghiasi-eth-storage-centric-computing-for-genomics-and-metagenomics/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20241018T110000
DTEND;TZID=America/Los_Angeles:20241018T120000
DTSTAMP:20260604T185834
CREATED:20241008T004750Z
LAST-MODIFIED:20241009T163638Z
UID:2109-1729249200-1729252800@isg.ics.uci.edu
SUMMARY:Yannis Papakonstantinou (Google): Vector Search and Databases
DESCRIPTION:Yannis Papakonstantinou\nDistinguished Engineer\, Query Processing and GenAI at Google Cloud Databases\nAbstract:\nSemantic search ability\, via embedding (vectors) and vector indexing\, has been added to Google Cloud Platform (GCP) databases in order to enable GenAI applications. The inclusion of vectors in databases confers many of the traditional benefits of databases: Developers can now develop GenAI applications on their familiar and trusted databases. Furthermore\, developers can be sure that the vectors are also up-to-date and transactionally consistent. The rapid adoption of the postgres pgvector extension is evidence of the appreciation of these benefits by the database developer community.\nThe inclusion of vectors in databases raises three R&D questions\, which we will discuss in this talk.\nFirst\, can databases with vector abilities perform as well as purpose-built vector databases in pure vector search? What does it take to achieve this?\nSecond\, what are the opportunities and respective R&D challenges that emerge at the intersection of SQL data and vectors?\nFinally\, what does it take to facilitate and align the experience of SQL developers with the world of vector management and vector indexing? \nBio:\nYannis Papakonstantinou is a Distinguished Engineer\, working on Query Processing and GenAI\, at Google Cloud. He is also an Adjunct Professor of Computer Science and Engineering at the University of California\, San Diego\, following many years of having been a UCSD regular faculty member. Previously he was an architect in query processing & ETL at Databricks. Earlier\, he was a Senior Principal Scientist at Amazon Web Services from 2018-2021 and was a consultant for AWS since 2016. He was the CEO and Chief Scientist of Enosys Software\, which built and commercialized an early Enterprise Information Integration platform for structured and semistructured data. The Enosys Software was OEM’d and sold under the BEA Liquid Data and BEA Aqualogic brand names\, eventually acquired in 2003 by BEA Systems.\nHis R&D work has been mostly on query processing with focus on querying semistructured data. He has published over one hundred twenty research articles that have received over 21\,000 citations. Yannis holds a Diploma of Electrical Engineering from the National Technical University of Athens\, MS and Ph.D. in Computer Science from Stanford University (1997). \n 
URL:https://isg.ics.uci.edu/event/yannis-papakonstantinou-google-vector-search-and-database/
LOCATION:DBH 6011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20241101T130000
DTEND;TZID=America/Los_Angeles:20241101T140000
DTSTAMP:20260604T185834
CREATED:20241017T163643Z
LAST-MODIFIED:20241017T163750Z
UID:2133-1730466000-1730469600@isg.ics.uci.edu
SUMMARY:Michael Jungmair (TU Munich): A Compiler-Centric Query Engine Design for Mixed Workloads and Modern Hardware
DESCRIPTION:A Compiler-Centric Query Engine Design for Mixed Workloads and Modern Hardware \n11/1/2024\, 1:00 PM 2 PM\, DBH 3011 \nMichael Jungmair\, Technical University of Munich\, Germany \nAbstract: Relational query engines are increasingly expected to handle more than just relational queries and also run on modern hardware that is increasingly parallel and distributed. However\, it is not clear how existing system designs can deal with these two challenges effectively.\nWe propose a holistic\, compiler-centric design for data processing systems that is designed for tightly integrated optimization and execution of relational queries\, non-relational workloads and user-defined functions on modern hardware. \nBio: Michael Jungmair is a third year PhD student at the Technical University of Munich. Supervised by Jana Giceva\, he is performing research in the intersection of database engines and compiler technology. So far\, this research culminated in the design and implementation of LingoDB (lingo-db.com)\, a novel query engine based on the MLIR compiler framework
URL:https://isg.ics.uci.edu/event/michael-jungmair-tu-munich-a-compiler-centric-query-engine-design-for-mixed-workloads-and-modern-hardware/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20241115T130000
DTEND;TZID=America/Los_Angeles:20241115T140000
DTSTAMP:20260604T185834
CREATED:20241008T012443Z
LAST-MODIFIED:20250211T004236Z
UID:2116-1731675600-1731679200@isg.ics.uci.edu
SUMMARY:Kunwoo Park: CloudMapper: A Pay-as-you-go Solution for Accelerating Genomics Sequence Alignment Using Public Clouds
DESCRIPTION: CloudMapper: A Pay-as-you-go Solution for Accelerating Genomics Sequence Alignment Using Public Clouds \nAbstract: Single-cell RNA sequencing (scRNA-seq) alignment remains a computational bottleneck in bioinformatics data analysis. As datasets grow in size and complexity\, traditional alignment tools\, such as CellRanger\, face significant limitations\, often requiring hours or even days. Furthermore\, setting up the necessary infrastructure frequently demands familiarity with complex tools like Slurm\, creating a barrier for researchers without cluster management expertise. To address these challenges\, we introduce CloudMapper\, a pay-as-you-go solution that simplifies and accelerates scRNA-seq alignment through scalable public cloud resources. Built on the Texera platform\, CloudMapper allows researchers to launch and manage clusters on cloud providers like AWS via an intuitive web interface\, enabling parallel processing of large scRNA-seq datasets. By automating infrastructure setup and providing streamlined resource options\, CloudMapper offers bioinformaticians flexible tools to balance cost and performance\, significantly reducing alignment time and technical overhead. In this talk\, we’ll explore CloudMapper’s problem-centered design\, architectural framework\, and user experience tailored for bioinformaticians. We’ll also discuss ongoing research challenges\, such as minimizing manual configuration for non-expert users\, optimizing resource pre-provisioning to reduce cluster launch times\, and future plans to expand CloudMapper’s capabilities beyond RNA alignment to broader bioinformatics and data-processing tasks. \nBio: Kunwoo Park is a second-year Ph.D. student in the Computer Science Department at UC Irvine\, with research interests in data systems and big data a
URL:https://isg.ics.uci.edu/event/kunwoo-park-talk/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20241122T110000
DTEND;TZID=America/Los_Angeles:20241122T120000
DTSTAMP:20260604T185834
CREATED:20241008T012638Z
LAST-MODIFIED:20260401T210003Z
UID:2118-1732273200-1732276800@isg.ics.uci.edu
SUMMARY:Sainyam Galhotra (Cornell): Context-aware Responsible Data Science
DESCRIPTION:Abstract: Data-based systems are increasingly used in applications that have far-reaching consequences and long-lasting societal impact. However\, the development process remains highly specialized\, tedious\, and unscalable. This produces a manually fine-tuned rigid solution that works only for one specific problem in one specific context. The system fails to adapt to the changing world and severely limits the full utilization of valuable data. \nSo\, how can you avert this fate for your systems? \nIn this talk\, I present my vision of context-aware systems that enable even non-expert users to develop correct\, explainable\, and equitable data-science pipelines. To achieve this\, I will focus on i) re-thinking the design of data science pipelines\, and ii) the importance of causal inference for trustworthy data analysis. I will present a data discovery framework that automatically identifies useful data on behalf of end-users for various tasks. Lastly\, I will discuss my proposal of leveraging counterfactual reasoning and causal inference to quantify the impact of an input on the outcome. These topics are the pieces of the puzzle that come together to create the Data Scientists’ holy grail – an easily deployable\, scalable\, and robust system that you can trust even as everything around it evolves. \n\n\n\n\nBio: Sainyam Galhotra is an Assistant Professor in Computer Science at Cornell University and a field member for Computer Science\, Statistics and Data Science. Previously\, he was a Computing Innovation Fellow pursuing postdoctoral research at the University of Chicago. He received his Ph.D. from the University of Massachusetts Amherst under the supervision of Prof. Barna Saha (currently at UC San Diego). The goal of his research is to lay the foundation of responsible data science\, that enable efficient development and deployment of trustworthy data analytics applications. His research has combined techniques from Data Management\, Probabilistic Methods\, Causal Inference\, Machine Learning\, and Software Engineering. His research has been published in top-tier Data Management (SIGMOD\, VLDB\, PODS\, & ICDE)\, AI (NeurIPS\, AAAI & AIES) and Software Engineering (FSE) conferences. He is a recipient of the Best Paper Award in FSE 2017 and Most Reproducible Paper Award in both SIGMOD 2017 and 2018\, and Best Artifact Paper Honorable Mention Award in SIGMOD 2023. He was recognized as a Data Science rising star\, a DAAD AInet Fellow\, and as the first recipient of the Krithi Ramamritham Award at UMass for contribution to database research. \nhttps://sainyamgalhotra.com/
URL:https://isg.ics.uci.edu/event/sainyam-galhotra-cornell/
LOCATION:DBH 6011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20241206T130000
DTEND;TZID=America/Los_Angeles:20241206T140000
DTSTAMP:20260604T185834
CREATED:20241008T012801Z
LAST-MODIFIED:20250211T004551Z
UID:2120-1733490000-1733493600@isg.ics.uci.edu
SUMMARY:Binbin Gu: PoneglyphDB: Efficient Non-interactive Zero-Knowledge Proofs for Arbitrary SQL Queries Verification
DESCRIPTION:Abstract:\nIn database applications involving sensitive data\, the dual imperatives of data confidentiality and provable (verifiable) query processing are important. This paper introduces PoneglyphDB\, a database system that leverages non-interactive zero-knowledge proofs (ZKP) to support both confidentiality and provability. Unlike traditional databases\, PoneglyphDB enhances confidentiality by ensuring that raw data remains exclusively with the host\, while also enabling verifying the correctness of query responses by providing proofs to clients. \nThe main innovation in this paper is proposing efficient ZKP designs (called circuits) for basic operations in SQL query processing. These basic operation circuits are then combined to form ZKP circuits for larger\, more complex queries. PoneglyphDB’s circuits are \ncarefully designed to be efficient by utilizing advances in cryptography such as PLONKish-based circuits\, recursive proof composition techniques\, and designing with low-order polynomial constraints. We demonstrate the performance of PoneglyphDB with the standard TPC-H benchmark. Our experimental results show that PoneglyphDB can efficiently achieve both confidentiality and provability\, outperforming existing state-of-the-art ZKP methods.
URL:https://isg.ics.uci.edu/event/binbin-gu-tbd/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20250110T130000
DTEND;TZID=America/Los_Angeles:20250110T140000
DTSTAMP:20260604T185834
CREATED:20250211T004808Z
LAST-MODIFIED:20250211T004808Z
UID:2160-1736514000-1736517600@isg.ics.uci.edu
SUMMARY:Shengquan Ni: IcedTea: Efficient and Responsive Time-Travel Debugging in Dataflow Systems
DESCRIPTION:Abstract: As data analytics grow in popularity\, the increasing volume of data and complexity of jobs require users to wait longer to see results\, hindering productivity and causing frustration. To address this\, we developed an actor-based data processing engine optimized for pipelined execution\, featuring a flexible interface for defining control messages. This enables users to seamlessly customize and manage interactions during execution.\nWhile interactive systems help users identify incorrect behavior earlier\, the pipelined and distributed nature of execution often leads to non-deterministic behavior\, making it difficult to pinpoint the root cause of bugs. To tackle this\, we created IcedTea\, a time-travel debugger with tuple-based step semantics. IcedTea enables lightweight recording of execution\, allowing users to roll back to previous states and step forward to investigate issues effectively.\nAs data jobs increasingly operate in cloud environments\, adapting the system to the cloud is crucial. Challenges such as resource isolation and fault tolerance must be addressed to ensure security and reliability in distributed systems.\nIn this talk\, I will focus on IcedTea\, a time-travel debugger that allows users to record a pipelined distributed workflow execution and jump back to specific points to replay the execution step-by-step. Our evaluation demonstrates that IcedTea effectively helps identify state-related bugs with minimal overhead added to the original execution.
URL:https://isg.ics.uci.edu/event/shengquan-ni-icedtea-efficient-and-responsive-time-travel-debugging-in-dataflow-systems/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20250117T130000
DTEND;TZID=America/Los_Angeles:20250117T140000
DTSTAMP:20260604T185834
CREATED:20241008T012400Z
LAST-MODIFIED:20250211T005019Z
UID:2114-1737118800-1737122400@isg.ics.uci.edu
SUMMARY:Abhishek Singh: LogPoseDB: Transaction Handoff and Agreement in Edge-Cloud Systems
DESCRIPTION:Abstract: Emerging IoT and edge applications demand fast response times that cannot be achieved by faraway cloud datacenters. This motivates building edge-cloud systems where nodes on the edge can participate in the processing and storage of data. However\, building an edge-cloud transaction processing system faces two main challenges: (1) Inefficient transaction execution due to low concurrency arising from long  Round Trip Times between Edge and Cloud\, (2)The absence of a dedicated edge nodes infrastructure\, and (2) edge nodes may be untrusted.\nWe propose LogPoseDB\, an edge-cloud database that spans both edge and cloud nodes. LogPoseDB aims to overcome the challenges above. LogPoseDB proposes dynamic state detachment\, where the state storage and processing is treated as a disjoint resource between the cloud and the edge. LogPoseDB’s transaction processing protocol ensures fast response by avoiding wide-area coordination with the cloud or other faraway edge nodes. This is done by leveraging data locality of detached state and by methods that build on the areas of transaction chopping and commutativity.\nLogPoseDB does not require any dedicated edge infrastructure. Rather\, clients may utilize their edge nodes—if desired—to perform the processing and storage of their data while they need it. (Other clients can still process their data on cloud nodes.)  To address the trust challenges\, we propose a byzantine fault-tolerant (BFT) protocol that targets edge nodes. LogPoseDB’s BFT replication protocol proposes the principle of  remote lazy trust that enables efficient BFT edge coordination by utilizing a remote trusted node asynchronously. \nBio: Abhishek is a PhD Candidate supervised by Prof. Faisal Nawab. His research includes building data management and transaction processing systems for the emerging Edge-Cloud infrastructure.
URL:https://isg.ics.uci.edu/event/abhishek-singh-talk/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20250124T110000
DTEND;TZID=America/Los_Angeles:20250124T120000
DTSTAMP:20260604T185834
CREATED:20250211T005142Z
LAST-MODIFIED:20250211T005142Z
UID:2163-1737716400-1737720000@isg.ics.uci.edu
SUMMARY:Xiaodong Zhang (The Ohio State University): Data Management: Interactions with Computer Architecture and Systems
DESCRIPTION:Abstract:  We have entered a data-centric computing era\, characterized by the coexistence of diverse parallel and specialized hardware accelerators along with general-purpose processors. In this ecosystem\, minimizing data movement has become a critical priority for the design of both systems and applications. Over the years\, the CPU-centric ecosystem has evolved into a one-size-fits-all environment\, supporting a wide variety of applications. However\, its efficiency in performance\, computational power\, and energy consumption has steadily declined\, making the general-purpose computing model increasingly unsustainable for the rapidly growing demand of data analytics and machine learning applications. In this presentation\, I will explore the constraints and obstacles inherent in our current computing ecosystem. I will also provide case studies to support the evolution of computer hardware and software for high-performance data processing\, featuring advanced hardware components such as GPUs\, RDMA\, and other relevant technologies. All associated algorithms and software implementations are open source\, with some having been integrated into production systems. The system infrastructure transition for data-centric workloads also challenges our college computer science education. In this context\, I will briefly introduce a new textbook of mine\, which shares the same title of this presentation\, and was published by Cambridge University Press last year. \nBio: Xiaodong Zhang is a University Distinguished Scholar and the Robert M. Critchfield Professor in Engineering at the Ohio State University. His research interests focus on data management in computer and distributed systems. Driven by a commitment to translate his academic research solutions into cutting-edge technology\, he has made continuous efforts in advancing the design and implementation of several major production systems. He was recognized by the 2020 ACM Microarchitecture Test of Time Award for his contributions on memory architecture design and the 2024 VLDB Test of Time Award for an initial development of open-source spatial data processing systems on large-scale clusters. He received his Ph.D. in Computer Science from University of Colorado at Boulder\, where he was honored with a Distinguished Engineering Alumni Award in 2011. He received the Education Leadership Award from the Lutron Foundation for chairing the Department of Computer Science and Engineering at Ohio State from 2006 to 2018. He is a Fellow of the ACM\, and a Fellow of the IEEE.
URL:https://isg.ics.uci.edu/event/xiaodong-zhang-the-ohio-state-university-data-management-interactions-with-computer-architecture-and-systems/
LOCATION:DBH 6011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20250131T130000
DTEND;TZID=America/Los_Angeles:20250131T140000
DTSTAMP:20260604T185834
CREATED:20250211T005308Z
LAST-MODIFIED:20250211T005308Z
UID:2165-1738328400-1738332000@isg.ics.uci.edu
SUMMARY:Yicong Huang: Building Data Systems to Broaden the Access of Data Science\, AI\, and ML
DESCRIPTION:Abstract \nIn an era where data-driven decision-making shapes industries\, governments\, and everyday life\, the ability to leverage data science has become an essential skill. Modern data science tools—encompassing data collection\, analysis\, and advanced techniques such as artificial intelligence (AI)\, machine learning (ML)\, and large language models (LLMs)—play a critical role across diverse fields. However\, many of these tools rely heavily on programming expertise\, which limits their accessibility to a broader audience. In this talk\, I will discuss my work on Texera\, an open-source system designed to make data science\, AI\, and ML accessible to everyone. Texera features a low-code and even no-code workflow interface\, enabling users of varying technical backgrounds to engage in data science. It emphasizes cloud-based collaboration for data science\, enabling multiple users to seamlessly work on the same shared execution\, much like the collaborative experiences offered by Google Docs and Overleaf. I will discuss the design choices behind our actor-based parallel engine for executing data science workflows. I will also highlight my works on the system’s innovative features for interacting with data workflow executions\, focusing on debugging capabilities that improve transparency and enhance usability. To conclude\, I will outline future research directions aimed at developing a comprehensive ecosystem that integrates advanced interfaces and intelligent systems\, enhancing accessibility\, efficiency\, and user empowerment in data science. \nBio \nYicong Huang is a final-year Ph.D. candidate from the Information Systems Group (ISG)\, Computer Science Department\, University of California\, Irvine.  Under the guidance of Dr. Chen Li\, his research focuses on big data management\, data-processing systems\, and machine learning systems. Yicong has made significant contributions in the Texera project. He has published in top-tier database venues such as VLDB\, SIGMOD and ICDE. His interdisciplinary reach spans venues like TOCHI\, PNAS Nexus\, JAMIA\, AMIA\, and PloS ONE. Yicong completed research internships at Bytedance\, VISA\, and Observe\, and contributed to patents and papers. His research earned a Best Demo Runner-Up Award at SIGMOD 2024. He received honors such as the 2024 Graduate Dean’s Dissertation Fellowship and the 2023 Public Impact Fellowship from UCI. For more information about his work\, please visit https://yicong-huang.github.io.
URL:https://isg.ics.uci.edu/event/yicong-huang-building-data-systems-to-broaden-the-access-of-data-science-ai-and-ml/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20250207T110000
DTEND;TZID=America/Los_Angeles:20250207T120000
DTSTAMP:20260604T185834
CREATED:20250112T010756Z
LAST-MODIFIED:20250225T040958Z
UID:2153-1738926000-1738929600@isg.ics.uci.edu
SUMMARY:Amr El Abbadi (UCSB): Practical Approaches for Private and Scalable Information Data Management Systems
DESCRIPTION:Practical Approaches for Private and Scalable Information Data Management Systems\n  \nAmr El Abbadi \nProfessor of Computer Science \nUniversity of California at Santa Barbara  \nAbstract.  \nIncreasingly countries and regions have strict laws and regulations to protect the privacy of personal data. For example\, the states of the European Union (EU) enforce the General Data Protection Regulations (GDPR) to protect personal data of individuals living in the EU. Much research has focused on preserving the privacy of data using various advanced cryptographic techniques. However\, and irrespective of the privacy of the data itself\, just the queries requesting the data raise severe privacy concerns owing to numerous attacks and data breaches using access patterns. Our goal in this talk is to demonstrate how private access of data\, using sophisticated\, expensive but secure cryptographic methods can become a practical reality in the near future. Our focus is on supporting oblivious queries and thus hide any associated access patterns on both private and public data.  For private data\, ORAM (Oblivious RAM) is one of the most popular approaches for supporting oblivious access to encrypted data. However\, most existing ORAM datastores are not fault tolerant and hence an application may lose all of its data when failures occur. To achieve fault tolerance\, we propose QuORAM\, the first datastore to provide oblivious access and fault-tolerant data storage using a quorum-based replication protocol.  For public data\, PIR (Private Information Retrieval) is the main mechanism proposed in recent years.  However\, PIR requires the server to consider data as an array of elements and clients retrieve data using an index into the array. This requirement limits the use of PIR in many practical settings\, especially for key-value stores\, where the client may be interested in a particular key\, but does not know the exact location of the data at the server. In this talk we will discuss recent efforts to overcome these limitations\, using Fully Homomorphic Encryption (FHE)\, to improve the performance\, scalability and expressiveness of privacy preserving queries of public data.  \nBiography \nAmr El Abbadi is a Professor of Computer Science. He received his B. Eng. from Alexandria University\, Egypt\, and his Ph.D. from Cornell University. His research interests are in the fields of fault-tolerant distributed systems and databases\, focusing recently on Cloud data management\, blockchain based systems and privacy concerns. Prof. El Abbadi is an ACM Fellow\, AAAS Fellow\, and IEEE Fellow.  He was Chair of the Computer Science Department at UCSB from 2007 to 2011. He served as Associate Graduate Dean at the University of California\, Santa Barbara from 2021–2023.  He served as a journal editor for several database journals\, including\, The VLDB Journal\, IEEE Transactions on Computers and The Computer Journal. He was Program Chair for multiple database and distributed systems conferences\, including most recently SIGMOD 2022. He served on the executive committee of the IEEE Technical Committee on Data Engineering (TCDE) and was a board member of the VLDB Endowment from 2002 to 2008. In 2007\, Prof. El Abbadi received the UCSB Senate Outstanding Mentorship Award for his excellence in mentoring graduate students. In 2013\, his student\, Sudipto Das received the SIGMOD Jim Gray Doctoral Dissertation Award. Prof. El Abbadi is also a co-recipient of the Test of Time Award at EDBT/ICDT 2015.  Recently\, papers he co-authored received an Outstanding paper award in NSDI (Networked System Design and Implementation) 2024 and the Test of Time Award from MDM (Mobile Data Management)2024.   He has published over 350 articles in databases and distributed systems and has supervised over 40 PhD students. \nHost: Faisal Nawab
URL:https://isg.ics.uci.edu/event/amr-el-abbadi-ucsb-practical-approaches-for-private-and-scalable-information-data-management-systems/
LOCATION:DBH 6011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20250214T130000
DTEND;TZID=America/Los_Angeles:20250214T140000
DTSTAMP:20260604T185834
CREATED:20250211T005519Z
LAST-MODIFIED:20250225T032621Z
UID:2167-1739538000-1739541600@isg.ics.uci.edu
SUMMARY:Jiadong Bai:  Supporting Data Science Education Using Texera with a Cloud Infrastructure
DESCRIPTION:Abstract\nIn this talk\, we will first present our paper at the DSE-K12 conference with the title “DS4ALL: Teaching High-School Students Data Science and AI/ML Using the Texera Workflow Platform as a Service.” Traditional data science education often requires students to have programming experience and install local software. It also makes collaboration inefficient and slows down the feedback loop when students need help from TAs. To address these challenges\, we developed a new teaching paradigm using Texera\, and successfully hosted DS4ALL summer programs in 2023 and 2024. As a result\, students with no prior coding experience were able to perform data analysis using AI/ML techniques on the platform.The second part of the talk will focus on how we are scaling to a broader audience based on the success of the Texera system used in DS4ALL\,. There are several challenges of building such a cloud  infrastructure\, such as handling versatile service traffic\, supporting concurrent isolated workflow executions\, and managing diverse data storage needs. To address these challenges\, we develop Texera to be able to run using Kubernetes to achieve scalable service management; For workflow execution\, we run each workflow in an isolated Kubernetes pod to ensure performance and security. On the data storage side\, we design a storage layer that integrates LakeFS for managing versioned datasets and Apache Iceberg for handling versioned tables with support for concurrent read/write operations. These solutions allow Texera to be deployed as a cloud service to provide a scalable\, secure\, and efficient environment for data science workflows.Bio\nJiadong Bai is a second-year Ph.D. student in the Computer Science Department at UC Irvine\, with research interests in data systems\, data science\, and big data analysis. He’s supervised by Prof. Chen Li.Shengquan Ni is a sixth-year Ph.D. student in the Department of Computer Science advised by Professor Chen Li. His research interests include big data processing\, distributed systems\, data analytics\, and data science.
URL:https://isg.ics.uci.edu/event/jiadong-bai-tbd/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20250221T130000
DTEND;TZID=America/Los_Angeles:20250221T140000
DTSTAMP:20260604T185834
CREATED:20250211T005602Z
LAST-MODIFIED:20250225T040933Z
UID:2169-1740142800-1740146400@isg.ics.uci.edu
SUMMARY:Ketan C Maheshwari (Oak Ridge National Laboratory): Enacting Distributed HPC Workflows: Opportunities and Challenges
DESCRIPTION:Abstract: The Dept of Energy (DOE) complex comprises of many science facilities that could be classified as data producing (eg. the Advanced Photon Source at Argonne National Laboratory) and consuming (eg. the Leadership Class Computing Facilities at the Oak Ridge National Laboratory) facilities. Modern science campaigns often require extensive usage of more than one such facilities which may be located remote from each other and administered separately. This presents opportunities for the scientific computational workflows to aid in the process. At the same time\, there are equally daunting challenges faced to successfully and smoothly accomplish these workflows. Our talk will dive into these opportunities and challenges and dive into proposed solutions and path forward. \nBio: Dr. Ketan Maheshwari is a Senior Linux Systems Engineer within the NCCS Division at the Oak Ridge National Laboratory. He has over 15 years of experience working with HPC systems with over 10 years with the leadership class systems at ALCF and OLCF. He is interested in science applications porting to large scale computing infrastructures and has a hands-on expertise in workflows\, parallelization and HPC. He has given several talks on technical topics at local as well as international venues\, most notably on GNU Parallel (eScience’23\, CUG’24\, PEARC’24\, SC’24)\, Linux Terminal Tools (USENIX/LISA’19\, LOPSA’18) and Swift Workflows (CCGrid’13). Ketan received his PhD in the area of Scientific Workflows from University of Nice and a Masters in Grid Computing from University of Amsterdam. \n  \nHost: Chen Li
URL:https://isg.ics.uci.edu/event/ketan-c-maheshwari-oak-ridge-national-laboratory/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20250228T110000
DTEND;TZID=America/Los_Angeles:20250228T120000
DTSTAMP:20260604T185834
CREATED:20250211T005638Z
LAST-MODIFIED:20250225T032827Z
UID:2171-1740740400-1740744000@isg.ics.uci.edu
SUMMARY:Sainyam Galhotra (Cornell): Context-aware Responsible Data Science
DESCRIPTION:ABSTRACT Data-based systems are increasingly used in applications that have far-reaching consequences and long-lasting societal impact. However\, the development process remains highly specialized\, tedious\, and unscalable. This produces a manually fine-tuned rigid solution that works only for one specific problem in one specific context. The system fails to adapt to the changing world and severely limits the full utilization of valuable data. \nSo\, how can you avert this fate for your systems? \nIn this talk\, I present my vision of context-aware systems that enable even non-expert users to develop correct\, explainable\, and equitable data-science pipelines. To achieve this\, I will focus on i) re-thinking the design of data science pipelines\, and ii) the importance of causal inference for trustworthy data analysis. I will present a data discovery framework that helps users identify useful data for various tasks like hypothesis generation\, fact checking and causal inference. Lastly\, I will discuss my proposal of leveraging causal reasoning to quantify the impact of an input on the outcome. These topics are the pieces of the puzzle that come together to create the Data Scientists’ holy grail – an easily deployable\, scalable\, and robust system that you can trust even as everything around it evolves. \n  \nBIO Sainyam Galhotra is an Assistant Professor in Computer Science at Cornell University and a field member for Computer Science\, Statistics and Data Science. Previously\, he was a Computing Innovation Fellow pursuing postdoctoral research at the University of Chicago. He received his Ph.D. from the University of Massachusetts Amherst under the supervision of Prof. Barna Saha (currently at UC San Diego). The goal of his research is to lay the foundation of responsible data science\, that enable efficient development and deployment of trustworthy data analytics applications. His research has combined techniques from Data Management\, Probabilistic Methods\, Causal Inference\, Machine Learning\, and Software Engineering. His research has been published in top-tier Data Management (SIGMOD\, VLDB\, PODS\, & ICDE)\, AI (NeurIPS\, AAAI & AIES) and Software Engineering (FSE) conferences. He is a recipient of the Best Paper Award in FSE 2017 and Most Reproducible Paper Award in both SIGMOD 2017 and 2018\, and Best Artifact Paper Honorable Mention Award in SIGMOD 2023. He was recognized as a Data Science rising star\, a DAAD AInet Fellow\, and as the first recipient of the Krithi Ramamritham Award at UMass for contribution to database research. \nhttps://sainyamgalhotra.com/
URL:https://isg.ics.uci.edu/event/sainyam-galhotra-cornell-tbd/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20250307T130000
DTEND;TZID=America/Los_Angeles:20250307T170000
DTSTAMP:20260604T185834
CREATED:20250401T171401Z
LAST-MODIFIED:20250401T171401Z
UID:2186-1741352400-1741366800@isg.ics.uci.edu
SUMMARY:Lukas Lokowski: Knowledge Graphs and AI: Bridging Enterprise Data and Knowledge Graphs to Leverage AI Applications
DESCRIPTION:Knowledge Graphs and AI: Bridging Enterprise Data and Knowledge Graphs to Leverage AI Applications\nAbstract:\nIn today’s data-driven world\, enterprises face significant challenges in integrating\, structuring\, and utilizing vast amounts of heterogeneous data. Knowledge graphs offer a powerful solution by organizing and connecting enterprise data\, thus enabling advanced AI applications through structured\, semantically enriched information. In this seminar\, Lukas Lokowski will explore the role of knowledge graphs and ontology learning systems within enterprise environments. His talk will cover research on automating ontology extraction from relational databases\, benchmarking current ontology learning approaches\, and enhancing AI applications through high-quality data integration. \nBio:\nLukas Lokowski is a PhD student at the Hasso Plattner Institute in Potsdam\, under the guidance of Prof. Dr. Felix Naumann\, head of the Information Systems Chair. His research focuses on ontology learning from enterprise data and leveraging knowledge graphs to enhance AI applications. Additionally\, he addresses data cleaning challenges\, with a particular emphasis on entity resolution in numerical data. Passionate about bridging structured data with AI\, Lukas is committed to developing innovative approaches to improve data integration and interoperability.
URL:https://isg.ics.uci.edu/event/lukas-lokowski-knowledge-graphs-and-ai-bridging-enterprise-data-and-knowledge-graphs-to-leverage-ai-applications/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20250411T110000
DTEND;TZID=America/Los_Angeles:20250411T120000
DTSTAMP:20260604T185834
CREATED:20250401T171717Z
LAST-MODIFIED:20260401T210231Z
UID:2188-1744369200-1744372800@isg.ics.uci.edu
SUMMARY:Jiawei Han (distinguished lecture): A Retrieval-and-Structuring Approach for LLM-Enhanced\, Theme-Focused Scientific Exploration
DESCRIPTION:“A Retrieval-and-Structuring Approach for LLM-Enhanced\, Theme-Focused Scientific Exploration” \nAbstract:  Large Language Models (LLMs) may bring unprecedented power for scientific exploration.  However\, current LLMs may still encounter major challenges for effective scientific exploration due to their lack of in-depth\, theme-focused data and knowledge.  Retrieval augmented generation (RAG) has recently become an interesting approach for augmenting LLMs with grounded\, theme-specific datasets.  We discuss the challenges of RAG and propose a retrieval and structuring (RAS) approach\, which enhances RAG by improving retrieval quality and mining structures (e.g.\, extracting entities and relations and building knowledge graphs) to ensure its effective integration of theme-specific data with LLM.  We show the promise of this approach at augmenting LLMs and discuss its potential power for LLM-enabled science exploration.  \n\n\n\n\nBio: Jiawei Han is Michael Aiken Chair Professor in the Siebel School of Computing and Data Science\, University of Illinois Urbana-Champaign.  He received ACM SIGKDD Innovation Award (2004)\, IEEE Computer Society Technical Achievement Award (2005)\, IEEE Computer Society W. Wallace McDowell Award (2009)\, Japan’s Funai Achievement Award (2018)\, and being elevated to Fellow of Royal Society of Canada (2022).  He is Fellow of ACM and Fellow of IEEE and served as the Director of Information Network Academic Research Center (INARC) (2009-2016) supported by the Network Science-Collaborative Technology Alliance (NS-CTA) program of U.S. Army Research Lab and co-Director of KnowEnG\, a Center of Excellence in Big Data Computing (2014-2019)\, funded by NIH Big Data to Knowledge (BD2K) Initiative.  Currently\, he is serving on the executive committees of two NSF funded research centers:  MMLI (Molecular Make Research Institute)—one of NSF funded national AI centers since 2020 and I-Guide—The National Science Foundation (NSF) Institute for Geospatial Understanding through an Integrative Discovery Environment (I-GUIDE) since 2021.
URL:https://isg.ics.uci.edu/event/jiawei-han-distinguished-lecture/
LOCATION:DBH 6011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20250418T130000
DTEND;TZID=America/Los_Angeles:20250418T140000
DTSTAMP:20260604T185834
CREATED:20250401T171754Z
LAST-MODIFIED:20250423T180252Z
UID:2190-1744981200-1744984800@isg.ics.uci.edu
SUMMARY:Abhishek Singh: Transaction Processing in Hybrid Edge Data Management Systems
DESCRIPTION:Title: Transaction Processing in Hybrid Edge Data Management Systems \nAbstract:\nAdvances in computing and storage systems have enabled end users to run complex workloads on relatively cheap machines. These advancements have given rise to a novel infrastructure in data management: Edge-Cloud. Edge-Cloud data management systems allow data to be stored at the edge and managed by the cloud. The availability of Edge-Cloud systems has given rise to interesting research directions in data management. \nIn this talk\, we deal with the problem of building data management systems that use resources from Edge and Cloud. We use the idea of the `Cloud’ and `Edge’ as abstractions of Trusted and Untrusted systems respectively. The `Cloud’ in this thesis is treated as a trusted resource. This abstraction is motivated by the fact that applications deployed on the cloud (such as Gmail\, Facebook\, etc) are trusted by end users. Applications on the cloud are deployed and managed by large corporations that have a financial incentive to ensure that the data and applications they manage are secure. In contrast to the Cloud\, Edge data management systems use cheaper components and no assumption can be made about them. \nIn order to execute transactions on an integrated edge-cloud system we propose LogPoseDB\, an edge-cloud database that spans both edge and cloud nodes. LogPoseDB aims to overcome the two challenges above. LogPoseDB does not require any dedicated edge infrastructure. Rather\, clients may utilize their edge nodes – if desired – to perform the processing and storage of their data while they need it. (Other clients can still process their data on cloud nodes.) To enable this type of processing\, LogPoseDB proposes state disentanglement\, where the state (storage and processing) is treated as a shared resource between the cloud and the edge. \nLogPoseDB’s transaction processing protocol ensures fast response by avoiding wide-area coordination with the cloud or other faraway edge nodes. This is done by leveraging data locality of detached state and by methods that build on the areas of transaction chopping and commutativity. To address the trust challenges\, we propose a byzantine fault-tolerant (BFT) protocol that targets edge nodes. LogPoseDB’s BFT replication protocol proposes the principle of remote lazy trust that enables efficient BFT edge coordination by utilizing a remote trusted node asynchronously. \nBio:\nAbhishek Alfred Singh is a PhD Candidate working with Professor Faisal Nawab. His research interests are in transaction processing in emerging edge-cloud data management systems. His work deals with transaction processing in loosely coupled distributed systems.
URL:https://isg.ics.uci.edu/event/abhishek-singh/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20250425T130000
DTEND;TZID=America/Los_Angeles:20250425T140000
DTSTAMP:20260604T185834
CREATED:20250401T171824Z
LAST-MODIFIED:20250521T184951Z
UID:2192-1745586000-1745589600@isg.ics.uci.edu
SUMMARY:Yiming Lin (Berkeley): Toward Building Efficient Document Analytics Systems from the Lens of Document Structure
DESCRIPTION:Abstract:\nThe vast majority—over 80%—of data today exists in unstructured formats\, and querying and extracting value from unstructured document collections remains a considerable challenge. While Large Language Models (LLMs) have made remarkable progress in document understanding\, they fail to provide high-accuracy results for analytical queries on documents and incur high costs. \nIn this talk\, we demonstrate that document collections often have hidden structure\, and discovering them can facilitate multiple downstream data analytics tasks on documents effectively. At one extreme\, we explore documents sharing a similar high-level template that impart a common semantic structure\, such as scientific papers from the same venue. We introduce ZenDB\, a document analytics system that leverages this semantic structure\, coupled with LLMs\, to answer ad-hoc SQL queries on document collections. At another extreme\, we explore documents that are form-like\, such as invoices\, order bills\, containing structured data like tables or key-value pairs\, which are programmatically generated by populating fields in a visual blueprint. We present TWIX\, a document analytics tool that first infers the common blueprint and then extracts structured data from documents efficiently. For both extremes explored\, we provide theoretical guarantees on the correctness of structure extraction\, present empirical results demonstrating their potential for document analytics\, and show their early impact on our collaborators\, including Big Local News at Stanford and California Police Data Applications. \nBio: \nYiming Lin is a postdoctoral researcher at UC Berkeley\, and he received his PhD in Computer Science from UC Irvine. His research interests span document analytics\, query processing and optimization\, and data cleaning\, with a current focus on developing data management systems for document analytics. Yiming has closely collaborated with and interned at industrial pioneers in data analytics\, including Microsoft Research and Amazon. His work has been published in several flagship conferences\, including VLDB\, SIGMOD\, and ICDE.
URL:https://isg.ics.uci.edu/event/yiming-lin-berkeley/
END:VEVENT
BEGIN:VEVENT
DTSTART;VALUE=DATE:20250502
DTEND;VALUE=DATE:20250503
DTSTAMP:20260604T185834
CREATED:20250401T171841Z
LAST-MODIFIED:20250521T185049Z
UID:2194-1746144000-1746230399@isg.ics.uci.edu
SUMMARY:Yicong Huang: From 100 Applications to 9 Offers: Lessons from the Academic and Industry Job Market
DESCRIPTION:Abstract:\nIn this talk\, I’ll share my personal journey navigating the academic and industry job markets during the 2024–2025 academic year. I’ll reflect on key decisions\, behind-the-scenes experiences\, and the lessons I learned through 100+ applications\, 20+ interviews\, and 9 offers across both domains. Along the way\, I’ll highlight the differences in timelines\, expectations\, and evaluation criteria between academia and industry. This session is intended to be informal\, candid\, and interactive—an honest look at what the job market feels like from the inside. Whether you’re an early-stage Ph.D. student beginning to explore career options\, a researcher actively preparing for the job market\, or someone still deciding between academia and industry\, I hope this talk offers useful insights and encouragement. There’s no single “right” path\, and I’ll share both successes and challenges to help demystify the process for anyone thinking about their next chapter. \nBio:\nYicong Huang is a final-year Ph.D. candidate in the Information Systems Group (ISG)\, Computer Science Department\, University of California\, Irvine. Under the guidance of Dr. Chen Li\, his research focuses on big data management\, data-processing systems\, and machine learning systems. Yicong has made significant contributions to the Texera project. He has published in top-tier database venues such as VLDB\, SIGMOD\, and ICDE. His interdisciplinary reach spans venues like TOCHI\, PNAS Nexus\, JAMIA\, AMIA\, and PLOS ONE. Yicong completed research internships at ByteDance\, VISA\, and Observe\, and contributed to patents and papers. His research earned a Best Demo Runner-Up Award at SIGMOD 2024. He received honors such as the 2025 Joseph & Dorothy Fischer Memorial Endowed Fellowship\, the 2025 Beall Family Foundation Graduate Student Entrepreneur Award in Computer Science\, the 2024 Graduate Dean’s Dissertation Fellowship\, and the 2023 Public Impact Fellowship from UCI. In 2025\, Yicong will join Databricks\, where he will work in the Apache Spark Runtime team. In Fall 2027\, he will begin a faculty appointment as a Tenure-Track Assistant Professor at the Manning College of Information and Computer Sciences (CICS) at UMass Amherst. For more information\, please visit yicong-huang.github.io.
URL:https://isg.ics.uci.edu/event/pat-helland/
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20250523T130000
DTEND;TZID=America/Los_Angeles:20250523T140000
DTSTAMP:20260604T185834
CREATED:20250401T172015Z
LAST-MODIFIED:20250521T185546Z
UID:2198-1748005200-1748008800@isg.ics.uci.edu
SUMMARY:Binbin Gu: PoneglyphDB: Efficient Non-interactive Zero-Knowledge Proofs for Arbitrary SQL-Query Verification
DESCRIPTION:Abstract:\nAs databases increasingly serve as backbones for sensitive applications\, ensuring both data confidentiality and computational integrity becomes critical—especially when the data owner and querier do not fully trust each other. In this talk\, I will introduce PoneglyphDB\, a novel database system that generates non-interactive zero-knowledge proofs (ZKPs) for arbitrary SQL queries\, allowing users to verify query results without seeing the underlying data. PoneglyphDB tackles key limitations of prior interactive ZKP systems by compiling SQL operators into efficient PLONKish arithmetic circuits\, optimizing them with low-degree polynomial constraints\, recursive proof composition\, and oblivious execution. The system supports a rich set of SQL operations including joins\, aggregations\, group-by\, and even string predicates\, while achieving significant performance improvements over state-of-the-art systems like ZKSQL and Libra. I will detail the system architecture\, circuit compiler\, and experimental results on the TPC-H benchmark\, showing how PoneglyphDB bridges theory and practice for verifiable\, privacy-preserving data processing. \nBio:\nBinbin Gu is a final-year Ph.D. candidate in Computer Science at the University of California\, Irvine\, advised by Faisal Nawab. His research lies at the intersection of trustworthy AI\, database systems\, and cryptographic verification.
URL:https://isg.ics.uci.edu/event/janki-bhimani-fiu/
LOCATION:DBH 4011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20250530T130000
DTEND;TZID=America/Los_Angeles:20250530T140000
DTSTAMP:20260604T185834
CREATED:20250401T172052Z
LAST-MODIFIED:20250521T185630Z
UID:2200-1748610000-1748613600@isg.ics.uci.edu
SUMMARY:Xiaozhen Liu
DESCRIPTION:
URL:https://isg.ics.uci.edu/event/xiaozhen-liu/
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20250606T130000
DTEND;TZID=America/Los_Angeles:20250606T140000
DTSTAMP:20260604T185834
CREATED:20250521T185445Z
LAST-MODIFIED:20250521T185445Z
UID:2217-1749214800-1749218400@isg.ics.uci.edu
SUMMARY:Fangqi Liu (Vanderbilt University)
DESCRIPTION:
URL:https://isg.ics.uci.edu/event/fangqi-liu-vanderbilt-university/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20250613T010000
DTEND;TZID=America/Los_Angeles:20250613T140000
DTSTAMP:20260604T185834
CREATED:20250614T144455Z
LAST-MODIFIED:20250614T162547Z
UID:2225-1749776400-1749823200@isg.ics.uci.edu
SUMMARY:ISG End-of-year Celebration
DESCRIPTION:We celebrate the achievements of the ISG members in the past academic year. Here’s the presentation deck.
URL:https://isg.ics.uci.edu/event/isg-eoy-celebration-2025/
LOCATION:DBH 4011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20250716T110000
DTEND;TZID=America/Los_Angeles:20250716T120000
DTSTAMP:20260604T185834
CREATED:20250717T035358Z
LAST-MODIFIED:20250717T040900Z
UID:2245-1752663600-1752667200@isg.ics.uci.edu
SUMMARY:Michal Shmueli-Scheuer (IBM Research): GenAI Benchmarking and Evaluation
DESCRIPTION:Title: GenAI Benchmarking and Evaluation\nWednesday (July 16)\, 11 am – noon\, DBH 6011\, UC Irvine \nAbstract:\nIn this seminar\, I will present a practical perspective on benchmarking in the era of generative AI. We will begin with an overview of goals and motivations\, followed by key definitions such as benchmarks and leaderboards. I’ll review prominent existing benchmarks and highlight several core challenges that limit their effectiveness in evaluating today’s models. In response\, I’ll introduce a new methodology for benchmarking that focuses on flexibility\, reuse\, and deeper insights. This includes a closer look at ToRR\, a new table reasoning benchmark\, and Unitxt\, an open-source tool designed to simplify the creation and management of benchmarks—accompanied by a brief demo. Finally\, I will outline emerging directions in agentic evaluation\, pointing to what’s next in assessing increasingly complex AI systems. \nBio:\nMichal Shmueli-Scheuer is a Distinguished Engineer for AI Benchmarking and Evaluation at IBM Research. Her expertise spans Natural Language Generation (NLG) and Natural Language Processing (NLP)\, with a focus on evaluation\, data-to-text generation\, conversational bots\, scientific document summarization\, and affective computing. \nShe has authored over 40 academic papers in top NLP and AI conferences and journals. Michal is a co-organizer of the GEM-squared workshop on evaluation (ACL) and recently delivered the “Navigating the Modern Evaluation Landscape” tutorial at LREC-COLING. She will also deliver a tutorial on “Evaluating LLM-based Agents: Foundations\, Best Practices\, and Open Challenges” at IJCAI. \nActively involved in the NLP community\, she has organized multiple workshops and shared tasks. Michal earned her Ph.D. in Computer Science from the University of California\, Irvine\, in 2009.
URL:https://isg.ics.uci.edu/event/genai-benchmarking-and-evaluation/
LOCATION:DBH 6011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20251003T130000
DTEND;TZID=America/Los_Angeles:20251003T140000
DTSTAMP:20260604T185834
CREATED:20250928T025441Z
LAST-MODIFIED:20251005T052555Z
UID:2259-1759496400-1759500000@isg.ics.uci.edu
SUMMARY:Dr. Taina Coleman: Research Without Borders: High Performance Computing for Discovery Across Domains
DESCRIPTION:Research Without Borders: High Performance Computing for Discovery Across Domains\nDr. Tainã Coleman\, San Diego Supercomputer Center (SDSC)\n\nOct. 3\, Friday\, 1 – 2 pm\, DBH 3011\, UCI\n\n\n\nTitle: High-performance computing (HPC) drives discovery\, but access often requires deep technical expertise. This talk explores how to broaden access and empower researchers across disciplines. I’ll begin with WfCommons (wfcommons.org)\, which generates realistic workflows and benchmarks to lower barriers for building and evaluating workflow applications. Next\, I’ll cover the National Data Platform (nationaldataplatform.org)\, a federated ecosystem that connects distributed data resources. My work there involves developing tools that make large-scale data and computing resources usable beyond computer science\, thereby democratizing access across various fields. Finally\, I’ll share emerging directions that extend HPC into the humanities and beyond\, including collaborations on Indigenous datasets that highlight the need for culturally respectful approaches. Together\, these efforts envision HPC without borders\, where workflows\, data\, and cultural awareness converge to enable discovery across all domains.\n\n\nBio: Tainã Coleman is a Schmidt AI in Science Postdoctoral Fellow at the San Diego Supercomputer Center. Her research centers on developing solutions and tools that expand access to high-performance computing (HPC) for researchers across both the sciences and the humanities. A key focus of her work is on scientific workflows\, primarily examining how workflow structure impacts execution in HPC environments and designing algorithms\, benchmarks\, and data-driven methods to enhance efficiency and usability. More recently\, she has integrated artificial intelligence into her research\, exploring its applications across diverse domains. She earned her B.S. in Computer Engineering from Universidade Federal de Itajubá (2016)\, her M.S. in Computer Science from California State University\, Long Beach (2020)\, and her Ph.D. in Computer Science from the University of Southern California (2023).\n\nSlides: Research Without Borders_ High Performance Computing for Discovery Across Domains
URL:https://isg.ics.uci.edu/event/research-without-borders-high-performance-computing-for-discovery-across-domains/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20251010T130000
DTEND;TZID=America/Los_Angeles:20251010T140000
DTSTAMP:20260604T185834
CREATED:20251010T193828Z
LAST-MODIFIED:20251010T193828Z
UID:2276-1760101200-1760104800@isg.ics.uci.edu
SUMMARY:Farzad Habibi: Brook-2PL: Tolerating High Contention Workloads with A Deadlock-Free Two-Phase Locking Protocol
DESCRIPTION:For this week’s IGS seminar\, Farzad will be presenting his research work.\n\nTime & Location:\n\n\nFriday Oct 10\, 2025\, 1:00 PM – 2:00 PM\nDonald Bren Hall 3011\, ICS\, UC Irvine \nLunch will be provided. \nTitle:\nBrook-2PL: Tolerating High Contention Workloads with A Deadlock-Free Two-Phase Locking Protocol \nAbstract: \nThe problem of hotspots remains a critical challenge in high-contention workloads for concurrency control (CC) protocols. Traditional concurrency control approaches encounter significant difficulties under high contention\, resulting in excessive transaction aborts and deadlocks. In this paper\, we propose Brook-2PL\, a novel two-phase locking (2PL) protocol that (1) introduces SLW-Graph for deadlock-free transaction execution\, and (2) proposes partial transaction chopping for early lock release. Previous methods suffer from transaction aborts that lead to wasted work and can further burden the system due to their cascading effects. Brook-2PL addresses this limitation by statically analyzing a new graph-based dependency structure called SLW-Graph\, enabling deadlock-free two-phase locking through predetermined lock acquisition. Brook-2PL also reduces contention by enabling early lock release using partial transaction chopping and static transaction analysis. We overcome the inherent limitations of traditional transaction chopping by providing a more flexible chopping method. Evaluation using both our synthetic online game store workload and the TPC-C benchmark shows that Brook-2PL significantly outperforms state-of-the-art CC protocols. Brook-2PL achieves an average speed-up of 2.86x while reducing tail latency (p95) by 48% in the TPC-C benchmark.\n\nBio: \n\n\n\nFarzad Habibi is a fourth-year PhD candidate in Computer Science at UC Irvine\, with a background in Computer Engineering from the University of Tehran. His research focuses on distributed data management\, spanning blockchain resilience\, database availability to metastable failures\, geo-distributed transactional databases\, and concurrency control under high contention.\n\n\n\n\nVolunteer:\nJuncheng Fang
URL:https://isg.ics.uci.edu/event/farzad-habibi-brook-2pl-tolerating-high-contention-workloads-with-a-deadlock-free-two-phase-locking-protocol/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20251022T093000
DTEND;TZID=America/Los_Angeles:20251022T110000
DTSTAMP:20260604T185834
CREATED:20251017T005844Z
LAST-MODIFIED:20251017T005844Z
UID:2279-1761125400-1761130800@isg.ics.uci.edu
SUMMARY:Dr. Doug Terry: Consistency in the Cloud
DESCRIPTION:Dr. Doug Terry will give a guest lecture in CS 224P (“Big Data”) class. The talk should be very interesting to ISG members.  Doug is a fantastic speaker!\n  \nTitle: Consistency in the Cloud\nLecturer: Doug Terry\n\nOct. 22\, Wednesday\, 9:30 am – 10:50 am\nDBH 6011\, UC Irvine \nAbstract: Data is routinely replicated in the cloud for fault-tolerance\, disaster recovery\, scalable throughput\, and low-latency access. The design of a replication scheme involves fundamental trade-offs between consistency\, availability\, and performance. This lecture explores the consistency choices that are offered by cloud providers using baseball as an illustrative application. \nBio: Doug Terry recently joined LinkedIn as a Distinguished Software Engineer. Prior to that\, he led research projects at Xerox PARC\, Microsoft\, and Samsung\, and he pioneered innovative cloud services at Amazon and Microsoft. He also taught Distributed Systems at U.C. Berkeley and Stanford University.
URL:https://isg.ics.uci.edu/event/dr-doug-terry-consistency-in-the-cloud/
LOCATION:DBH 6011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20251031T130000
DTEND;TZID=America/Los_Angeles:20251031T140000
DTSTAMP:20260604T185834
CREATED:20251027T210415Z
LAST-MODIFIED:20251027T210415Z
UID:2282-1761915600-1761919200@isg.ics.uci.edu
SUMMARY:Pratyoy Das: SmartRabbit An Interactive Query Processor
DESCRIPTION:For this week’s IGS seminar\, Pratyoy will be presenting his research work.\n\nTime & Location:\n\n\nFriday Oct 31\, 2025\, 1:00 PM – 2:00 PM\nDonald Bren Hall 3011\, ICS\, UC Irvine \nLunch will be provided. \nTitle:\nSmartRabbit: An Interactive Query Processor \nAbstract: \nTraditional relational database systems optimize analytical queries to minimize their end-to-end latency. The resulting optimal plans are usually blocking\, forcing users to wait until full query completion before seeing any results. This execution model precludes interactivity\, i.e.\, users cannot observe partial results or gain early insights for long-running queries. Query optimizers rarely choose plans that promote interactivity\, since such plans either incur prohibitively large latencies or involve operators for which interactive alternatives are unavailable. We introduce a novel interactive query processor SmartRabbit that promotes interactivity of answers while matching the end-to-end latency of blocking execution plans. We achieve this by first designing a plan optimized for interactivity for a given query\, and then simultaneously executing this plan alongside a traditional blocking plan. The two executions are carefully synchronized to maintain the correct order of answers and prevent duplicates. We implement SmartRabbit in AsterixDB and show that SmartRabbit consistently delivers early and continuous results across various analytical queries\, data scales\, and parallel (multi-node\, multi-partition) system instances\, while matching the latencies of the standalone blocking executions.\n\nBio: \n\n\n\nPratyoy is a 4th year PhD student under Professor Sharad Mehrotra. His research focuses on query optimization and query execution with specific interests in adaptive\, interactive and progressive query optimization. Pratyoy had previously interned in the query optimization team of Amazon Redshift and was a Software Engineer at Microsoft before joining UC Irvine.\n\n\n\n\n\nVolunteer:\nXiaozhen Liu
URL:https://isg.ics.uci.edu/event/pratyoy-das-smartrabbit-an-interactive-query-processor/
LOCATION:DBH 3011
END:VEVENT
END:VCALENDAR