BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Information Systems Group - ECPv6.4.0.1//NONSGML v1.0//EN
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-WR-CALNAME:Information Systems Group
X-ORIGINAL-URL:https://isg.ics.uci.edu
X-WR-CALDESC:Events for Information Systems Group
REFRESH-INTERVAL;VALUE=DURATION:PT1H
X-Robots-Tag:noindex
X-PUBLISHED-TTL:PT1H
BEGIN:VTIMEZONE
TZID:America/Los_Angeles
BEGIN:DAYLIGHT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
TZNAME:PDT
DTSTART:20250309T100000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
TZNAME:PST
DTSTART:20251102T090000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
TZNAME:PDT
DTSTART:20260308T100000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
TZNAME:PST
DTSTART:20261101T090000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20260501T130000
DTEND;TZID=America/Los_Angeles:20260501T140000
DTSTAMP:20260508T090626
CREATED:20260429T054607Z
LAST-MODIFIED:20260429T054607Z
UID:2426-1777640400-1777644000@isg.ics.uci.edu
SUMMARY:Yinan & Juncheng ICDE Practice Talk
DESCRIPTION:We’ll have Yinan & Juncheng present their ICDE works for this week’s ISG Seminar. This will be a shared session. \nTime & Location: \nFriday May 1\, 2026\, 1:00 PM – 2:00 PM\nDonald Bren Hall 3011\, ICS\, UC Irvine \nZoom: \nhttps://uci.zoom.us/j/95509222811?pwd=2V8Hnx71iP6dyfNsEPoo97NUfCFWTo.1\n\nLunch will be provided. \n———————————————————————————————————————— \nSpeaker \nYinan Zhou \nTitle \nSpendableStore: A UTXO-based Decentralized Data Store \nAbstract \nThe literature on blockchain-based databases is divided into permissioned blockchains and permissionless account based blockchains. However\, the former is not fully decentralized\, and the latter suffers from challenges in scalability and practicality. We propose SpendableStore\, a hybrid on/off-chain database that operates on top of permissionless UTXO-based blockchains as a novel approach to the problem of data decentralization. Our design integrates atomic data units into individual UTXOs to create a new blockchain concept called Spendable Data Objects that perform traditional CRUD operations. The integrity\, immutability\, and ownership of these Spendable Data Objects are safeguarded directly by the blockchain peer nodes\, thus constraining the power of database administrators to achieve true data decentralization. We further support database transactions and propose an isolation mechanism called Future Now Snapshot Isolation to reason about transactional correctness in SpendableStore. We performed experiments on a major blockchain’s Mainnet and observed up to 16x better throughput compared to a state-of-the-art blockchain-based database. \n———————————————————————————————————————— \nSpeaker \nJuncheng Fang \nTitle \nImmortalChopper: Real-Time and Resilient Distributed Transactions in the Edge-Cloud \nAbstract \nEmerging applications in the areas of real-time Internet of Things (IoT) and edge technologies require fast processing and response times. This motivates the utilization of edge nodes for storing and processing data close to the user. In settings with a vast number of edge nodes\, the state of the data is distributed across a large number of edge nodes. This makes it expensive to perform distributed transactions as these transactions would span edge nodes that are connected via less reliable and relatively slow network infrastructure. It is prohibitive to use existing protocols like 2PC that require many rounds of communication across participants.\nIn this talk\, we discuss ImmortalChopper\, a distributed transaction processing protocol designed for the edge-cloud environment. The goal of ImmortalChopper is to provide One-Node Response (1n-Response)\, a guarantee of transaction commitment by contacting only one node without waiting for coordination with the other nodes. To achieve this\, we build on and extend the literature of transaction chopping and lazy replication. However\, combining transaction chopping and lazy replication without special care can lead to transactions operating on a stale state and potentially violating serializability. We present a new transaction chopping theory called ChopperGraph that integrates the notion of lazy replication and speculative execution. It ensures 1n-Response while preserving serializability. \nBio \nJuncheng Fang is a 5th-year Ph.D. candidate in the Computer Science Department at UC Irvine\, supervised by Prof. Faisal Nawab. His current research focuses on distributed transaction processing\, specifically improving the concurrency by exploiting the semantics of transactions. \n————————————————————————————————————————
URL:https://isg.ics.uci.edu/event/yinan-juncheng-icde-practice-talk/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20260424T130000
DTEND;TZID=America/Los_Angeles:20260424T140000
DTSTAMP:20260508T090626
CREATED:20260421T003909Z
LAST-MODIFIED:20260421T060244Z
UID:2422-1777035600-1777039200@isg.ics.uci.edu
SUMMARY:Prof. Eduard Dragut (Temple University): Toward Scalable Knowledge Extraction with Weak Supervision and Large Language Models
DESCRIPTION:Friday April 24\, 2026\, 1:00 PM – 2:00 PM\nDonald Bren Hall 3011\, ICS\, UC Irvine \nZoom:\nhttps://uci.zoom.us/j/95509222811?pwd=2V8Hnx71iP6dyfNsEPoo97NUfCFWTo.1 \nLunch will be provided. \nTitle: Toward Scalable Knowledge Extraction with Weak Supervision and Large Language Models \nAbstract: Information extraction is a foundational capability for transforming unstructured text into structured knowledge\, enabling downstream applications such as knowledge graph construction\, semantic search\, question answering\, and scientific discovery. However\, building high-quality extraction systems traditionally depends on large manually annotated datasets\, which are costly to create and often impractical in specialized domains. In this talk\, I will present recent advances toward scalable information extraction under limited supervision. I will discuss methods for improving the quality of weakly supervised training data through automatic label cleaning\, show how richer benchmarks over full scientific documents expose new challenges for scientific information extraction beyond simplified abstract-level settings\, and demonstrate how large language models can be leveraged in many-shot in-context learning regimes to perform competitive named entity recognition and generate high-quality annotations for low-resource domains. Together\, these results suggest a promising path toward scalable knowledge extraction pipelines that reduce reliance on expensive manual annotation while improving the robustness and adaptability of systems used to build next-generation knowledge graphs and AI applications. \nBio: Eduard Dragut is a Professor in the Department of Computer and Information Sciences at Temple University. He is a senior member of the IEEE. He received his Ph.D. in Computer Science from the University of Illinois at Chicago. His research focuses on data management\, information retrieval\, and applied artificial intelligence\, with an emphasis on building scalable systems for extracting and integrating knowledge from large and heterogeneous data sources. He also pursues interdisciplinary AI projects for social good\, including work on assistive technologies such as augmentative and alternative communication (AAC) and AI-driven tools for knowledge discovery. He has published widely in leading venues in databases\, natural language processing\, and data mining.
URL:https://isg.ics.uci.edu/event/prof-eduard-dragut-temple-university-toward-scalable-knowledge-extraction-with-weak-supervision-and-large-language-models/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20260417T130000
DTEND;TZID=America/Los_Angeles:20260417T140000
DTSTAMP:20260508T090626
CREATED:20260414T044342Z
LAST-MODIFIED:20260414T044342Z
UID:2397-1776430800-1776434400@isg.ics.uci.edu
SUMMARY:Sarah Asad: Teaching Data Science and AI/ML to Diverse Learners Using Apache Texera: An Experience Report
DESCRIPTION:We’ll have Sarah present her work for this week’s ISG seminar. \nTime & Location: \nFriday April 17\, 2026\, 1:00 PM – 2:00 PM\nDonald Bren Hall 3011\, ICS\, UC Irvine \nZoom: \nhttps://uci.zoom.us/j/95509222811?pwd=2V8Hnx71iP6dyfNsEPoo97NUfCFWTo.1\n\nLunch will be provided. \nTitle \nTeaching Data Science and AI/ML to Diverse Learners Using Apache Texera: An Experience Report \nAbstract \nThis talk reports on our experiences teaching data science and AI/ML through a series of hands-on programs to learners ranging from high school to graduate students and non-STEM faculty. The programs are taught using Texera\, an open-source system for collaborative data science and AI/ML using GUI-based workflows. A uniqueness of these programs is that they did not require participants to have prior coding skills. We describe the program-preparation process\, curriculum structure\, classroom experience\, and feedback collected from participants. We summarize our insights regarding student engagement\, effectiveness of interactive and collaborative learning environments\, and practical considerations for designing accessible data science programs for learners with diverse backgrounds. \nBio \nSarah Asad is a second-year PhD student in the Computer Science Department at UC Irvine\, with research interests in data systems\, data science\, and big data analysis. She is supervised by Prof. Chen Li.
URL:https://isg.ics.uci.edu/event/sarah-asad-teaching-data-science-and-ai-ml-to-diverse-learners-using-apache-texera-an-experience-report/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20260306T130000
DTEND;TZID=America/Los_Angeles:20260306T140000
DTSTAMP:20260508T090626
CREATED:20260219T064700Z
LAST-MODIFIED:20260219T065044Z
UID:2311-1772802000-1772805600@isg.ics.uci.edu
SUMMARY:Zheng LUO (UCLA):  Generating Join Trees for Yannakakis Algorithm
DESCRIPTION:ISG seminar on March 6\, Friday\, 1 – 2 pm\, DBH 3011 \nTitle: Generating Join Trees for Yannakakis Algorithm \nAbstract:\nMost research on query optimization has centered on binary join algorithms like hash join and sort-merge join. However\, recent years have seen growing interest in theoretically optimal algorithms\, notably Yannakakis algorithm. These algorithms require new optimization techniques\, as they rely on join trees where each node represents a relation\, very different from the operator trees for binary joins.\nOur recent theoretical work proposes three approaches to constructing join trees for Alpha-acyclic queries:\n(1) an algorithm to enumerate all join trees\, which forms the basis of a cost-based optimizer;\n(2) a 1-shot approach to construct a unique shallowest join tree for any Berge-acyclic query\, thus enabling parallel execution of large join queries;\n(3) a simple algorithm that converts any connected left-deep linear plan of a Gamma-acyclic query into a join tree\, allowing reuse of existing optimizers developed for binary joins.\nIn this talk\, we will also discuss how the theoretical results can turbocharge query processing in modern database systems. \nBio:\nZheng LUO is a Ph.D. student at the University of California\, Los Angeles (UCLA)\, advised by Prof. Remy WANG.\nHis research interests are twofold\, spanning from theory to systems.\n(1) His current work centers on the theoretical aspects of query optimization in relational databases by examining the algorithms and data structures that improve the efficiency of query processing;\n(2) He is also exploring ways to put theory into practice by implementing theoretical results and integrating them into systems.
URL:https://isg.ics.uci.edu/event/yannakakis/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20260227T130000
DTEND;TZID=America/Los_Angeles:20260227T140000
DTSTAMP:20260508T090626
CREATED:20260224T215802Z
LAST-MODIFIED:20260405T015004Z
UID:2318-1772197200-1772200800@isg.ics.uci.edu
SUMMARY:Zexin Li (UCR): Unified Full-stack Co-design for On-device Machine Learning
DESCRIPTION:For this week’s ISG seminar\, we’ll have an invited speaker: Zexin Li from University of California\, Riverside to give us a talk. \nTime & Location:\nFriday Feb 27\, 2026\, 1:00 PM – 2:00 PM\nDonald Bren Hall 3011\, ICS\, UC Irvine\n(Zoom link will be shared by request) \nLunch will be provided. \nTitle:\nUnified Full-stack Co-design for On-device Machine Learning \nAbstract:\nThe integration of advanced artificial intelligence into Cyber-Physical Systems (CPS)\, such as multirotor UAVs and wheeled mobile robots\, promises a future of edge intelligence. However\, deploying complex machine learning models directly onto real-time embedded systems presents significant challenges\, primarily due to strict timing constraints\, limited memory\, and dynamically changing environments. This talk presents a unified full-stack co-design approach to manage these complex\, multidimensional trade-offs. \nBio:\nZexin Li is a Ph.D. student at the University of California\, Riverside\, advised by Cong Liu. His research interests lie in interdisciplinary fields of real-time embedded systems and on-device machine learning. \nVolunteer:\nKeming Li \nSponsors:
URL:https://isg.ics.uci.edu/event/zexin-li-ucr-unified-full-stack-co-design-for-on-device-machine-learning/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20260220T130000
DTEND;TZID=America/Los_Angeles:20260220T140000
DTSTAMP:20260508T090626
CREATED:20260217T234405Z
LAST-MODIFIED:20260217T234405Z
UID:2308-1771592400-1771596000@isg.ics.uci.edu
SUMMARY:Ashwin Gerard Colaco (UCI): Bringing Simulators Inside the Database: A Vision for Interactive Scientific Exploration
DESCRIPTION:Time & Location:\n\n\nFriday Feb 20\, 2026\, 1:00 PM – 2:00 PM\nDonald Bren Hall 3011\, ICS\, UC Irvine \nLunch will be provided. \n\nTitle:\nBringing Simulators Inside the Database: A Vision for Interactive Scientific Exploration\n\nAbstract:\nPhysics-based simulators are essential for scientific discovery and risk assessment\, powering what-if analyses for events like wildfires and hurricanes. Yet today’s workflow is fundamentally disconnected: analysts manually run simulations\, export results\, and load them into a database before any analysis can begin. This linear pipeline is slow\, brittle\, and especially limiting when the analysis itself reveals the need for new or refined simulation data.\n\n\n\n\n\n\n\n\nIn this talk\, I present a vision for GenIE\, a new database paradigm that makes the database simulation-aware by seamlessly integrating multiple simulators as first-class components. Designed as an extension to PostgreSQL\, GenIE dynamically invokes simulators based on the user’s query\, avoids generating irrelevant data\, reuses prior results\, and supports iterative refinement at interactive speeds. I illustrate GenIE’s potential through two use cases: wildfire smoke dispersion analysis using WRF-SFIRE and HYSPLIT\, and hurricane hazard assessment combining wind\, surge\, and flood models. Our preliminary results show how GenIE can transform these traditionally slow\, static analyses into responsive explorations by intelligently managing the trade-off between simulation accuracy and runtime. I conclude by outlining the data engineering challenges and future research directions in realizing the full potential of simulation-aware databases for next-generation scientific data management.\n\n\n\n\n\n\nBio:\nAshwin Gerard Colaco is a PhD candidate in Computer Science at UC Irvine\, advised by Sharad Mehrotra. His research focuses on the intersection of database systems and machine learning\, with a focus on cost-efficient inference and simulation-driven data exploration. He’s supported by the Hasso Plattner Institute Fellowship for scalable databases for ML research.
URL:https://isg.ics.uci.edu/event/ashwin-gerard-colaco-uci-bringing-simulators-inside-the-database-a-vision-for-interactive-scientific-exploration/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20260213T130000
DTEND;TZID=America/Los_Angeles:20260213T140000
DTSTAMP:20260508T090626
CREATED:20260119T063942Z
LAST-MODIFIED:20260119T063942Z
UID:2300-1770987600-1770991200@isg.ics.uci.edu
SUMMARY:Santosh Hegde (Couchbase\, Inc.): From Transactions to Intelligence: Evolution of Couchbase for AI
DESCRIPTION:The Department of Computer Science\, Information Systems Group\, UC Irvine  \nWELCOMES \nSantosh Hegde \nCouchbase\, Inc. \n  \nFrom Transactions to Intelligence: Evolution of Couchbase for AI \n  \n02/13/2026\, Friday\, 1:00 – 2:00 pm \nPlace: DBH 3011 \n  \nAbstract:  When Couchbase started\, it was built to solve a very specific problem\, namely the handling of high-performance\, low-latency transactional workloads at massive scale. Over time\, however\, customer needs have evolved from simply storing and retrieving data reliably at scale\, to analyzing it in real time\, and now to reasoning over it using AI. In this talk\, we will explore the new capabilities in the Couchbase platform as it transforms from a pure NoSQL transactional database into a unified platform that supports transactions\, analytics\, and AI workloads all on the same data foundation. \n  \nBio:  Santosh Hegde is an engineering leader with over 18 years of experience in the design and development of large-scale distributed database systems. He currently serves as Vice President of Engineering at Couchbase\, where his work focuses on the intersection of database systems\, analytics\, and artificial intelligence. Previously\, he has held senior engineering leadership roles at IBM Software Labs and Visa Inc. His technical interests include query processing\, database runtimes\, columnar data systems\, large-scale data ingestion\, and database-as-a-service architectures. Santosh has contributed to the development of multiple enterprise data platforms\, including Informix\, DB2 (LUW) and Trino . More recently\, his work has focused on the evolution of data platforms to support emerging AI-driven application patterns.
URL:https://isg.ics.uci.edu/event/santosh-hegde-couchbase-inc-from-transactions-to-intelligence-evolution-of-couchbase-for-ai/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20260206T130000
DTEND;TZID=America/Los_Angeles:20260206T140000
DTSTAMP:20260508T090626
CREATED:20260203T045046Z
LAST-MODIFIED:20260203T045046Z
UID:2306-1770382800-1770386400@isg.ics.uci.edu
SUMMARY:ISG seminar: "How grants and funding work in academia"
DESCRIPTION:[ISG] ISG seminar this Friday (Feb. 6): “How grants and funding work in academia” \nIn the ISG seminar this Friday (1 – 2 pm\, DBH 3011)\, the ISG faculty will give a talk on “How grants and funding work in academia.”  We will discuss how faculty write proposals\, receive awards\, and spend the funds to support students and research activities.  We will also emphasize the importance of PhD students participating in proposal writing.
URL:https://isg.ics.uci.edu/event/isg-seminar-how-grants-and-funding-work-in-academia/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20260130T130000
DTEND;TZID=America/Los_Angeles:20260130T140000
DTSTAMP:20260508T090626
CREATED:20260127T031809Z
LAST-MODIFIED:20260324T060539Z
UID:2302-1769778000-1769781600@isg.ics.uci.edu
SUMMARY:Andrew Chio: Building Resilient Systems for Critical Infrastructures: A Model-Driven and Data-Driven Approach
DESCRIPTION:Time & Location:\n\n\nFriday Jan 30\, 2026\, 1:00 PM – 2:00 PM\nDonald Bren Hall 3011\, ICS\, UC IrvineLunch will be provided. \n\n\nTitle:\nBuilding Resilient Systems for Critical Infrastructures: A Model-Driven and Data-Driven Approach\n\nAbstract:\nCritical infrastructures such as water\, power\, and buildings are large-scale distributed systems that serve as essential lifelines for communities worldwide. Today\, they face unprecedented resilience challenges affecting millions of people and cause billions in damage. In this talk\, I will present my research addressing fundamental computational challenges in these cyber-physical systems by combining model-driven approaches that encode physics and network constraints with data-driven techniques that learn from real-world patterns. I will demonstrate this across three critical infrastructure domains\, addressing key resilience challenges. First\, I present STEP\, a framework that solves the NP-hard sensor placement problem for detecting transient contamination events in stormwater networks. Second\, I introduce SEQUIN\, which leverages network science principles and physics-based optimization to identify sequential attack patterns. Third\, I showcase SmartSPEC\, an event-driven simulation framework that generates realistic synthetic human behavioral data by exploiting environmental semantics. Together\, these systems demonstrate how integrating models and data can address diverse resilience challenges for societal-scale systems.Bio:\nAndrew is a final PhD candidate in the Distributed Systems Middleware (DSM) Group at the University of California\, Irvine\, advised by Prof. Nalini Venkatasubramanian. He is also affiliated with the Los Alamos National Laboratory\, working with Dr. Russell Bent in the T-5 Theoretical Division Applied Mathematics and Plasma Physics Group. His research interests lie at the intersection of cyber-physical systems\, optimization\, middleware\, and artificial intelligence. His current work focuses on building systems that enhance the resilience of societal-scale cyber-physical systems such as electric power grids\, stormwater networks\, and smart buildings. His work has been published in top venues such as ACM/IEEE ICCPS\, IEEE PerCom\, ACM BuildSys\, and VLDB. He is the recipient of the NSF CPS Rising Stars Award in 2025\, the CPS-Week PhD Forum Best Poster Award in 2025\, the UC National Lab In-Residence Graduate Fellowship in 2022\, the ARCS Foundation Scholarship in 2022\, as well as the Best Paper Award in IEEE PerCom 2022. \n\n\n\n\n\n\n\n\n\n\n\n\n\nSponsors:
URL:https://isg.ics.uci.edu/event/andrew-chio-building-resilient-systems-for-critical-infrastructures-a-model-driven-and-data-driven-approach/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20260123T130000
DTEND;TZID=America/Los_Angeles:20260123T140000
DTSTAMP:20260508T090626
CREATED:20251121T025541Z
LAST-MODIFIED:20251121T025541Z
UID:2289-1769173200-1769176800@isg.ics.uci.edu
SUMMARY:Prof. Jianguo Wang (Purdue): Databases for AI: The Case for Vector Databases
DESCRIPTION:Title: Databases for AI: The Case for Vector Databases\n1/23/2026\, Friday\nDBH 3011\, UCI \n\nAbstract: Vector databases have recently emerged as a hot topic due to the widespread interest in LLMs\, where they provide relevant context that enables LLMs to generate more accurate responses. Current vector databases can be broadly categorized into two types: specialized and integrated. Specialized vector databases are explicitly designed for managing vector data\, while integrated vector databases support vector search within existing database systems (mostly relational databases). While specialized vector databases are interesting\, there is a significant customer base interested in integrated vector databases for various reasons\, such as reluctance to move data out\, the desire to link vector embeddings with their source data\, and the need for advanced vector search capabilities. However\, integrated vector databases face challenges in performance and interoperability. In this talk\, I will share our recent experience building integrated vector databases within two relational databases: SingleStore (VLDB’24) and PostgreSQL (CIDR’26). I will show how we address performance and interoperability challenges\, resulting in more powerful vector databases that support advanced RAGs. I will also present additional challenges in vector databases and our ongoing research to address them. Finally\, I will discuss the broader role of database systems in the era of LLMs and how to build future data infrastructure that extends beyond vector databases to better support AI.\n\nBio. Jianguo Wang is an Assistant Professor of Computer Science at Purdue University. He received his Ph.D. from the University of California\, San Diego. His research focuses on database systems for the Cloud and LLMs\, with a particular focus on Disaggregated Databases and Vector Databases. He has worked or interned at Zilliz\, Amazon AWS\, Microsoft Research\, Oracle\, and Samsung\, contributing to the development of various database systems. He regularly publishes and serves on program committees for premier database conferences\, including SIGMOD\, VLDB\, and ICDE. He also moderated the VLDB 2024 panel on vector databases and was invited to the Dagstuhl Seminar on vector databases. His research has impacted multiple industrial-strength database systems\, including Amazon Aurora\, Zilliz Milvus\, SingleStore\, and TigerGraph. His research has been recognized with multiple awards\, including the NSF CAREER Award\, the ACM SIGMOD Research Highlight Award\, the Google ML and Systems Junior Faculty Award\, and the IEEE TCDE Rising Star Award.
URL:https://isg.ics.uci.edu/event/prof-jianguo-wang-purdue-databases-for-ai-the-case-for-vector-databases/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20260116T130000
DTEND;TZID=America/Los_Angeles:20260116T140000
DTSTAMP:20260508T090626
CREATED:20260113T215534Z
LAST-MODIFIED:20260113T215534Z
UID:2297-1768568400-1768572000@isg.ics.uci.edu
SUMMARY:Yiming Lin (UC Berkeley): AI-Powered Data Systems for Multimodal Analytics
DESCRIPTION:Time & Location:\n\n\nFriday Jan 16\, 2026\, 1:00 PM – 2:00 PM\nDonald Bren Hall 3011\, ICS\, UC Irvine \nLunch will be provided. \nTitle:\nAI-Powered Data Systems for Multimodal Analytics\n\nAbstract: \n\nWe live in a world overflowing with data\, and the emergence of AI\, such as Large Language Models (LLMs)\, is revolutionizing data analytics. However\, directly using AI to process massive and complex data is neither effective nor scalable. \nIn this talk\, I introduce my work on building database systems powered by AI to analyze and process multimodal data at scale\, focusing on tables and documents. On one hand\, when analyzing tables\, AI is often used to prepare data\, such as cleaning\, enriching\, or synthesizing data prior to query processing. This becomes prohibitively expensive when the data scale is large. To support scalable analysis over expensive data ingestion\, my work leverages the fact that not all data are needed to answer a query and explores a set of techniques to reduce AI operations unnecessary to analytics by optimizing the query engine in the database. On the other hand\, when analyzing documents\, current systems treat them as plain text and ignore underlying structures\, leading to limited accuracy and performance. In this regard\, we exhaustively identified three document structures that encompass most real-world documents we have encountered\, and we designed tools and systems to extract their structures and leverage them for accurate and efficient document analytics. Finally\, I’ll share my vision for building data systems for multimodal analytics\, including aspects of trustworthy systems\, interaction with hardware\, and co-optimization among different data modalities. \n\n\nBio: \n\n\n\n\nYiming Lin is a postdoctoral researcher at UC Berkeley\, and he received his Ph.D. from UC Irvine. His research interests span document analytics\, query processing and optimization\, and data cleaning\, with a current focus on building databases for multimodal analytics powered by AI. His work has had real-world impact: document analytics help public defenders\, journalists\, and the California police department process over 30\,000 pages\, while his efforts as part of TippersDB deliver high-quality IoT services to nursing homes\, industries\, and universities across five sites over six years. He has a number of publications and serves on the program committee of VLDB\, SIGMOD\, and ICDE. \n\n\n\n\nVolunteer: \nGuangxue Zhang
URL:https://isg.ics.uci.edu/event/yiming-lin-uc-berkeley-ai-powered-data-systems-for-multimodal-analytics/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20251121T130000
DTEND;TZID=America/Los_Angeles:20251121T140000
DTSTAMP:20260508T090626
CREATED:20251004T082112Z
LAST-MODIFIED:20251028T011209Z
UID:2271-1763730000-1763733600@isg.ics.uci.edu
SUMMARY:Dr. Matteo Interlandi (Microsoft): Query Processing on Tensor Computation Runtimes
DESCRIPTION:The Department of Computer Science\, Information Systems Group\, UC Irvine \nWELCOMES \nDr. Matteo Interlandi \nGray Systems Lab (GSL) at Microsoft \nQuery Processing on Tensor Computation Runtimes \nNovember 21\, 2025\, Friday\, 1 – 2 pm\, DBH 3011\, UC Irvine \nThe huge demand for computation in artificial intelligence (AI) is driving unparalleled investments in new hardware and software systems for AI. This leads to an explosion in the number of specialized hardware devices\, which are now part of the offerings of major cloud providers. Meanwhile\, by hiding the low-level complexity through a tensor-based interface\, AI frameworks such as PyTorch allow data scientists to efficiently exploit the exciting capabilities offered by the new hardware. In this talk\, we will present how databases can ride the wave of innovation happening in the AI space thanks to Tensor Query Processor (TQP). TQP is the first AI-native SQL query processor leveraging AI frameworks for: (1) efficiently running SQL queries on GPUs; (2) scale out query execution on clusters of GPU nodes; and (3) bring new multi-modal capabilities into SQL. \nMatteo Interlandi is a Principal Scientist Manager in the Gray Systems Lab (GSL) at Microsoft. Before Microsoft\, he was a Postdoc at the University of California\, Los Angeles. Matteo received his Ph.D. from the University of Modena and Reggio Emilia. Matteo’s work has received a best demo award at VLDB 2022\, and honorable mentions at SIGMOD 2021 and VLDB 2023/24.
URL:https://isg.ics.uci.edu/event/dr-matteo-interlandi-microsoft-query-processing-on-tensor-computation-runtimes/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20251114T130000
DTEND;TZID=America/Los_Angeles:20251114T140000
DTSTAMP:20260508T090626
CREATED:20251114T210711Z
LAST-MODIFIED:20251114T210711Z
UID:2287-1763125200-1763128800@isg.ics.uci.edu
SUMMARY:Prof. Ruben Vescovo (Tohoku University): Modeling Japan’s DMAT Framework: an agent-based model for disaster medical mobility at scale
DESCRIPTION:For this week’s IGS seminar\, we’ll have Prof. Ruben Vescovo\, a visiting collaborator from Tohoku University to present his work.\n\nTime & Location:\n\n\nFriday Nov 14\, 2025\, 1:00 PM – 2:00 PM\nDonald Bren Hall 3011\, ICS\, UC Irvine \nLunch will be provided. \nTitle:\nModeling Japan’s DMAT Framework: an agent-based model for disaster medical\nmobility at scale \nAbstract: \nDisasters provide us with a unique set of conditions: they are both incredibly destructive\nand very infrequent. Statistically speaking\, when we consider disasters\, we are thinking\nabout distribution outliers with significant magnitude\, which are often not in distribution\nof one another. Hence\, studying mobility for a disaster is non-deterministic\, causal\nsystem that is highly coupled with the co-disaster or post-disaster condition.\n\nDisaster management frameworks that deal with the co-disaster and post-disaster\nlogistics are often devised in the mitigation and preparedness stages of the disaster-\ncycle\, but remain unproven until the next disaster\, which inevitably prompts a review of\nthe framework resulting from unaccounted-for disaster-driven circumstances.\nOne such framework is the Japanese Disaster Medical Assistance Team (DMAT)\, a\ngovernment taskforce comprised of medical professionals (doctors\, nurses\, logistics\npersonnel) which is prescribed to operate in the post disaster window (0 to 72 hours)\nafter a disaster event. DMAT is a centrally managed\, strictly regulated\, command &amp;\ncontrol organization which is tasked with assisting hospital operations by facilitating\npatient processing\, transfers\, and transport.\n\nTo better understand the limitations of the DMAT system in context\, IRIDeS is\ncollaborating with the Japanese Government to reproduce the DMAT framework as an\nAgent-Based “Digital Shadow” to test various post-disaster conditions\, infrastructure\narrangements\, and DMAT supply &amp; support flows. Due to the scale of operations\, the\nmodel is being developed as a high-performance parallel system designed to operate\non vector processing units and wide CPU throughput configurations.\n\nBio: \n\n\n\nRuben is Assistant Professor at the International Research Institute of Disaster Science\n(IRIDeS) at Tohoku University. Currently his research focus is Agent Based Model applications to disaster management and disaster science\, with an emphasis on decision making agents under uncertain conditions. Ruben’s previous work centered on statistical and probabilistic machine learning applications to uncertainty-aware predictive hazard-to-vulnerability models for\ninfrastructure in disasters.
URL:https://isg.ics.uci.edu/event/prof-ruben-vescovo-tohoku-university-modeling-japans-dmat-framework-an-agent-based-model-for-disaster-medical-mobility-at-scale/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20251031T130000
DTEND;TZID=America/Los_Angeles:20251031T140000
DTSTAMP:20260508T090626
CREATED:20251027T210415Z
LAST-MODIFIED:20251027T210415Z
UID:2282-1761915600-1761919200@isg.ics.uci.edu
SUMMARY:Pratyoy Das: SmartRabbit An Interactive Query Processor
DESCRIPTION:For this week’s IGS seminar\, Pratyoy will be presenting his research work.\n\nTime & Location:\n\n\nFriday Oct 31\, 2025\, 1:00 PM – 2:00 PM\nDonald Bren Hall 3011\, ICS\, UC Irvine \nLunch will be provided. \nTitle:\nSmartRabbit: An Interactive Query Processor \nAbstract: \nTraditional relational database systems optimize analytical queries to minimize their end-to-end latency. The resulting optimal plans are usually blocking\, forcing users to wait until full query completion before seeing any results. This execution model precludes interactivity\, i.e.\, users cannot observe partial results or gain early insights for long-running queries. Query optimizers rarely choose plans that promote interactivity\, since such plans either incur prohibitively large latencies or involve operators for which interactive alternatives are unavailable. We introduce a novel interactive query processor SmartRabbit that promotes interactivity of answers while matching the end-to-end latency of blocking execution plans. We achieve this by first designing a plan optimized for interactivity for a given query\, and then simultaneously executing this plan alongside a traditional blocking plan. The two executions are carefully synchronized to maintain the correct order of answers and prevent duplicates. We implement SmartRabbit in AsterixDB and show that SmartRabbit consistently delivers early and continuous results across various analytical queries\, data scales\, and parallel (multi-node\, multi-partition) system instances\, while matching the latencies of the standalone blocking executions.\n\nBio: \n\n\n\nPratyoy is a 4th year PhD student under Professor Sharad Mehrotra. His research focuses on query optimization and query execution with specific interests in adaptive\, interactive and progressive query optimization. Pratyoy had previously interned in the query optimization team of Amazon Redshift and was a Software Engineer at Microsoft before joining UC Irvine.\n\n\n\n\n\nVolunteer:\nXiaozhen Liu
URL:https://isg.ics.uci.edu/event/pratyoy-das-smartrabbit-an-interactive-query-processor/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20251022T093000
DTEND;TZID=America/Los_Angeles:20251022T110000
DTSTAMP:20260508T090626
CREATED:20251017T005844Z
LAST-MODIFIED:20251017T005844Z
UID:2279-1761125400-1761130800@isg.ics.uci.edu
SUMMARY:Dr. Doug Terry: Consistency in the Cloud
DESCRIPTION:Dr. Doug Terry will give a guest lecture in CS 224P (“Big Data”) class. The talk should be very interesting to ISG members.  Doug is a fantastic speaker!\n  \nTitle: Consistency in the Cloud\nLecturer: Doug Terry\n\nOct. 22\, Wednesday\, 9:30 am – 10:50 am\nDBH 6011\, UC Irvine \nAbstract: Data is routinely replicated in the cloud for fault-tolerance\, disaster recovery\, scalable throughput\, and low-latency access. The design of a replication scheme involves fundamental trade-offs between consistency\, availability\, and performance. This lecture explores the consistency choices that are offered by cloud providers using baseball as an illustrative application. \nBio: Doug Terry recently joined LinkedIn as a Distinguished Software Engineer. Prior to that\, he led research projects at Xerox PARC\, Microsoft\, and Samsung\, and he pioneered innovative cloud services at Amazon and Microsoft. He also taught Distributed Systems at U.C. Berkeley and Stanford University.
URL:https://isg.ics.uci.edu/event/dr-doug-terry-consistency-in-the-cloud/
LOCATION:DBH 6011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20251010T130000
DTEND;TZID=America/Los_Angeles:20251010T140000
DTSTAMP:20260508T090626
CREATED:20251010T193828Z
LAST-MODIFIED:20251010T193828Z
UID:2276-1760101200-1760104800@isg.ics.uci.edu
SUMMARY:Farzad Habibi: Brook-2PL: Tolerating High Contention Workloads with A Deadlock-Free Two-Phase Locking Protocol
DESCRIPTION:For this week’s IGS seminar\, Farzad will be presenting his research work.\n\nTime & Location:\n\n\nFriday Oct 10\, 2025\, 1:00 PM – 2:00 PM\nDonald Bren Hall 3011\, ICS\, UC Irvine \nLunch will be provided. \nTitle:\nBrook-2PL: Tolerating High Contention Workloads with A Deadlock-Free Two-Phase Locking Protocol \nAbstract: \nThe problem of hotspots remains a critical challenge in high-contention workloads for concurrency control (CC) protocols. Traditional concurrency control approaches encounter significant difficulties under high contention\, resulting in excessive transaction aborts and deadlocks. In this paper\, we propose Brook-2PL\, a novel two-phase locking (2PL) protocol that (1) introduces SLW-Graph for deadlock-free transaction execution\, and (2) proposes partial transaction chopping for early lock release. Previous methods suffer from transaction aborts that lead to wasted work and can further burden the system due to their cascading effects. Brook-2PL addresses this limitation by statically analyzing a new graph-based dependency structure called SLW-Graph\, enabling deadlock-free two-phase locking through predetermined lock acquisition. Brook-2PL also reduces contention by enabling early lock release using partial transaction chopping and static transaction analysis. We overcome the inherent limitations of traditional transaction chopping by providing a more flexible chopping method. Evaluation using both our synthetic online game store workload and the TPC-C benchmark shows that Brook-2PL significantly outperforms state-of-the-art CC protocols. Brook-2PL achieves an average speed-up of 2.86x while reducing tail latency (p95) by 48% in the TPC-C benchmark.\n\nBio: \n\n\n\nFarzad Habibi is a fourth-year PhD candidate in Computer Science at UC Irvine\, with a background in Computer Engineering from the University of Tehran. His research focuses on distributed data management\, spanning blockchain resilience\, database availability to metastable failures\, geo-distributed transactional databases\, and concurrency control under high contention.\n\n\n\n\nVolunteer:\nJuncheng Fang
URL:https://isg.ics.uci.edu/event/farzad-habibi-brook-2pl-tolerating-high-contention-workloads-with-a-deadlock-free-two-phase-locking-protocol/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20251003T130000
DTEND;TZID=America/Los_Angeles:20251003T140000
DTSTAMP:20260508T090626
CREATED:20250928T025441Z
LAST-MODIFIED:20251005T052555Z
UID:2259-1759496400-1759500000@isg.ics.uci.edu
SUMMARY:Dr. Taina Coleman: Research Without Borders: High Performance Computing for Discovery Across Domains
DESCRIPTION:Research Without Borders: High Performance Computing for Discovery Across Domains\nDr. Tainã Coleman\, San Diego Supercomputer Center (SDSC)\n\nOct. 3\, Friday\, 1 – 2 pm\, DBH 3011\, UCI\n\n\n\nTitle: High-performance computing (HPC) drives discovery\, but access often requires deep technical expertise. This talk explores how to broaden access and empower researchers across disciplines. I’ll begin with WfCommons (wfcommons.org)\, which generates realistic workflows and benchmarks to lower barriers for building and evaluating workflow applications. Next\, I’ll cover the National Data Platform (nationaldataplatform.org)\, a federated ecosystem that connects distributed data resources. My work there involves developing tools that make large-scale data and computing resources usable beyond computer science\, thereby democratizing access across various fields. Finally\, I’ll share emerging directions that extend HPC into the humanities and beyond\, including collaborations on Indigenous datasets that highlight the need for culturally respectful approaches. Together\, these efforts envision HPC without borders\, where workflows\, data\, and cultural awareness converge to enable discovery across all domains.\n\n\nBio: Tainã Coleman is a Schmidt AI in Science Postdoctoral Fellow at the San Diego Supercomputer Center. Her research centers on developing solutions and tools that expand access to high-performance computing (HPC) for researchers across both the sciences and the humanities. A key focus of her work is on scientific workflows\, primarily examining how workflow structure impacts execution in HPC environments and designing algorithms\, benchmarks\, and data-driven methods to enhance efficiency and usability. More recently\, she has integrated artificial intelligence into her research\, exploring its applications across diverse domains. She earned her B.S. in Computer Engineering from Universidade Federal de Itajubá (2016)\, her M.S. in Computer Science from California State University\, Long Beach (2020)\, and her Ph.D. in Computer Science from the University of Southern California (2023).\n\nSlides: Research Without Borders_ High Performance Computing for Discovery Across Domains
URL:https://isg.ics.uci.edu/event/research-without-borders-high-performance-computing-for-discovery-across-domains/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20250716T110000
DTEND;TZID=America/Los_Angeles:20250716T120000
DTSTAMP:20260508T090626
CREATED:20250717T035358Z
LAST-MODIFIED:20250717T040900Z
UID:2245-1752663600-1752667200@isg.ics.uci.edu
SUMMARY:Michal Shmueli-Scheuer (IBM Research): GenAI Benchmarking and Evaluation
DESCRIPTION:Title: GenAI Benchmarking and Evaluation\nWednesday (July 16)\, 11 am – noon\, DBH 6011\, UC Irvine \nAbstract:\nIn this seminar\, I will present a practical perspective on benchmarking in the era of generative AI. We will begin with an overview of goals and motivations\, followed by key definitions such as benchmarks and leaderboards. I’ll review prominent existing benchmarks and highlight several core challenges that limit their effectiveness in evaluating today’s models. In response\, I’ll introduce a new methodology for benchmarking that focuses on flexibility\, reuse\, and deeper insights. This includes a closer look at ToRR\, a new table reasoning benchmark\, and Unitxt\, an open-source tool designed to simplify the creation and management of benchmarks—accompanied by a brief demo. Finally\, I will outline emerging directions in agentic evaluation\, pointing to what’s next in assessing increasingly complex AI systems. \nBio:\nMichal Shmueli-Scheuer is a Distinguished Engineer for AI Benchmarking and Evaluation at IBM Research. Her expertise spans Natural Language Generation (NLG) and Natural Language Processing (NLP)\, with a focus on evaluation\, data-to-text generation\, conversational bots\, scientific document summarization\, and affective computing. \nShe has authored over 40 academic papers in top NLP and AI conferences and journals. Michal is a co-organizer of the GEM-squared workshop on evaluation (ACL) and recently delivered the “Navigating the Modern Evaluation Landscape” tutorial at LREC-COLING. She will also deliver a tutorial on “Evaluating LLM-based Agents: Foundations\, Best Practices\, and Open Challenges” at IJCAI. \nActively involved in the NLP community\, she has organized multiple workshops and shared tasks. Michal earned her Ph.D. in Computer Science from the University of California\, Irvine\, in 2009.
URL:https://isg.ics.uci.edu/event/genai-benchmarking-and-evaluation/
LOCATION:DBH 6011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20250613T010000
DTEND;TZID=America/Los_Angeles:20250613T140000
DTSTAMP:20260508T090626
CREATED:20250614T144455Z
LAST-MODIFIED:20250614T162547Z
UID:2225-1749776400-1749823200@isg.ics.uci.edu
SUMMARY:ISG End-of-year Celebration
DESCRIPTION:We celebrate the achievements of the ISG members in the past academic year. Here’s the presentation deck.
URL:https://isg.ics.uci.edu/event/isg-eoy-celebration-2025/
LOCATION:DBH 4011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20250606T130000
DTEND;TZID=America/Los_Angeles:20250606T140000
DTSTAMP:20260508T090626
CREATED:20250521T185445Z
LAST-MODIFIED:20250521T185445Z
UID:2217-1749214800-1749218400@isg.ics.uci.edu
SUMMARY:Fangqi Liu (Vanderbilt University)
DESCRIPTION:
URL:https://isg.ics.uci.edu/event/fangqi-liu-vanderbilt-university/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20250530T130000
DTEND;TZID=America/Los_Angeles:20250530T140000
DTSTAMP:20260508T090626
CREATED:20250401T172052Z
LAST-MODIFIED:20250521T185630Z
UID:2200-1748610000-1748613600@isg.ics.uci.edu
SUMMARY:Xiaozhen Liu
DESCRIPTION:
URL:https://isg.ics.uci.edu/event/xiaozhen-liu/
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20250523T130000
DTEND;TZID=America/Los_Angeles:20250523T140000
DTSTAMP:20260508T090626
CREATED:20250401T172015Z
LAST-MODIFIED:20250521T185546Z
UID:2198-1748005200-1748008800@isg.ics.uci.edu
SUMMARY:Binbin Gu: PoneglyphDB: Efficient Non-interactive Zero-Knowledge Proofs for Arbitrary SQL-Query Verification
DESCRIPTION:Abstract:\nAs databases increasingly serve as backbones for sensitive applications\, ensuring both data confidentiality and computational integrity becomes critical—especially when the data owner and querier do not fully trust each other. In this talk\, I will introduce PoneglyphDB\, a novel database system that generates non-interactive zero-knowledge proofs (ZKPs) for arbitrary SQL queries\, allowing users to verify query results without seeing the underlying data. PoneglyphDB tackles key limitations of prior interactive ZKP systems by compiling SQL operators into efficient PLONKish arithmetic circuits\, optimizing them with low-degree polynomial constraints\, recursive proof composition\, and oblivious execution. The system supports a rich set of SQL operations including joins\, aggregations\, group-by\, and even string predicates\, while achieving significant performance improvements over state-of-the-art systems like ZKSQL and Libra. I will detail the system architecture\, circuit compiler\, and experimental results on the TPC-H benchmark\, showing how PoneglyphDB bridges theory and practice for verifiable\, privacy-preserving data processing. \nBio:\nBinbin Gu is a final-year Ph.D. candidate in Computer Science at the University of California\, Irvine\, advised by Faisal Nawab. His research lies at the intersection of trustworthy AI\, database systems\, and cryptographic verification.
URL:https://isg.ics.uci.edu/event/janki-bhimani-fiu/
LOCATION:DBH 4011
END:VEVENT
BEGIN:VEVENT
DTSTART;VALUE=DATE:20250502
DTEND;VALUE=DATE:20250503
DTSTAMP:20260508T090626
CREATED:20250401T171841Z
LAST-MODIFIED:20250521T185049Z
UID:2194-1746144000-1746230399@isg.ics.uci.edu
SUMMARY:Yicong Huang: From 100 Applications to 9 Offers: Lessons from the Academic and Industry Job Market
DESCRIPTION:Abstract:\nIn this talk\, I’ll share my personal journey navigating the academic and industry job markets during the 2024–2025 academic year. I’ll reflect on key decisions\, behind-the-scenes experiences\, and the lessons I learned through 100+ applications\, 20+ interviews\, and 9 offers across both domains. Along the way\, I’ll highlight the differences in timelines\, expectations\, and evaluation criteria between academia and industry. This session is intended to be informal\, candid\, and interactive—an honest look at what the job market feels like from the inside. Whether you’re an early-stage Ph.D. student beginning to explore career options\, a researcher actively preparing for the job market\, or someone still deciding between academia and industry\, I hope this talk offers useful insights and encouragement. There’s no single “right” path\, and I’ll share both successes and challenges to help demystify the process for anyone thinking about their next chapter. \nBio:\nYicong Huang is a final-year Ph.D. candidate in the Information Systems Group (ISG)\, Computer Science Department\, University of California\, Irvine. Under the guidance of Dr. Chen Li\, his research focuses on big data management\, data-processing systems\, and machine learning systems. Yicong has made significant contributions to the Texera project. He has published in top-tier database venues such as VLDB\, SIGMOD\, and ICDE. His interdisciplinary reach spans venues like TOCHI\, PNAS Nexus\, JAMIA\, AMIA\, and PLOS ONE. Yicong completed research internships at ByteDance\, VISA\, and Observe\, and contributed to patents and papers. His research earned a Best Demo Runner-Up Award at SIGMOD 2024. He received honors such as the 2025 Joseph & Dorothy Fischer Memorial Endowed Fellowship\, the 2025 Beall Family Foundation Graduate Student Entrepreneur Award in Computer Science\, the 2024 Graduate Dean’s Dissertation Fellowship\, and the 2023 Public Impact Fellowship from UCI. In 2025\, Yicong will join Databricks\, where he will work in the Apache Spark Runtime team. In Fall 2027\, he will begin a faculty appointment as a Tenure-Track Assistant Professor at the Manning College of Information and Computer Sciences (CICS) at UMass Amherst. For more information\, please visit yicong-huang.github.io.
URL:https://isg.ics.uci.edu/event/pat-helland/
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20250425T130000
DTEND;TZID=America/Los_Angeles:20250425T140000
DTSTAMP:20260508T090626
CREATED:20250401T171824Z
LAST-MODIFIED:20250521T184951Z
UID:2192-1745586000-1745589600@isg.ics.uci.edu
SUMMARY:Yiming Lin (Berkeley): Toward Building Efficient Document Analytics Systems from the Lens of Document Structure
DESCRIPTION:Abstract:\nThe vast majority—over 80%—of data today exists in unstructured formats\, and querying and extracting value from unstructured document collections remains a considerable challenge. While Large Language Models (LLMs) have made remarkable progress in document understanding\, they fail to provide high-accuracy results for analytical queries on documents and incur high costs. \nIn this talk\, we demonstrate that document collections often have hidden structure\, and discovering them can facilitate multiple downstream data analytics tasks on documents effectively. At one extreme\, we explore documents sharing a similar high-level template that impart a common semantic structure\, such as scientific papers from the same venue. We introduce ZenDB\, a document analytics system that leverages this semantic structure\, coupled with LLMs\, to answer ad-hoc SQL queries on document collections. At another extreme\, we explore documents that are form-like\, such as invoices\, order bills\, containing structured data like tables or key-value pairs\, which are programmatically generated by populating fields in a visual blueprint. We present TWIX\, a document analytics tool that first infers the common blueprint and then extracts structured data from documents efficiently. For both extremes explored\, we provide theoretical guarantees on the correctness of structure extraction\, present empirical results demonstrating their potential for document analytics\, and show their early impact on our collaborators\, including Big Local News at Stanford and California Police Data Applications. \nBio: \nYiming Lin is a postdoctoral researcher at UC Berkeley\, and he received his PhD in Computer Science from UC Irvine. His research interests span document analytics\, query processing and optimization\, and data cleaning\, with a current focus on developing data management systems for document analytics. Yiming has closely collaborated with and interned at industrial pioneers in data analytics\, including Microsoft Research and Amazon. His work has been published in several flagship conferences\, including VLDB\, SIGMOD\, and ICDE.
URL:https://isg.ics.uci.edu/event/yiming-lin-berkeley/
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20250418T130000
DTEND;TZID=America/Los_Angeles:20250418T140000
DTSTAMP:20260508T090626
CREATED:20250401T171754Z
LAST-MODIFIED:20250423T180252Z
UID:2190-1744981200-1744984800@isg.ics.uci.edu
SUMMARY:Abhishek Singh: Transaction Processing in Hybrid Edge Data Management Systems
DESCRIPTION:Title: Transaction Processing in Hybrid Edge Data Management Systems \nAbstract:\nAdvances in computing and storage systems have enabled end users to run complex workloads on relatively cheap machines. These advancements have given rise to a novel infrastructure in data management: Edge-Cloud. Edge-Cloud data management systems allow data to be stored at the edge and managed by the cloud. The availability of Edge-Cloud systems has given rise to interesting research directions in data management. \nIn this talk\, we deal with the problem of building data management systems that use resources from Edge and Cloud. We use the idea of the `Cloud’ and `Edge’ as abstractions of Trusted and Untrusted systems respectively. The `Cloud’ in this thesis is treated as a trusted resource. This abstraction is motivated by the fact that applications deployed on the cloud (such as Gmail\, Facebook\, etc) are trusted by end users. Applications on the cloud are deployed and managed by large corporations that have a financial incentive to ensure that the data and applications they manage are secure. In contrast to the Cloud\, Edge data management systems use cheaper components and no assumption can be made about them. \nIn order to execute transactions on an integrated edge-cloud system we propose LogPoseDB\, an edge-cloud database that spans both edge and cloud nodes. LogPoseDB aims to overcome the two challenges above. LogPoseDB does not require any dedicated edge infrastructure. Rather\, clients may utilize their edge nodes – if desired – to perform the processing and storage of their data while they need it. (Other clients can still process their data on cloud nodes.) To enable this type of processing\, LogPoseDB proposes state disentanglement\, where the state (storage and processing) is treated as a shared resource between the cloud and the edge. \nLogPoseDB’s transaction processing protocol ensures fast response by avoiding wide-area coordination with the cloud or other faraway edge nodes. This is done by leveraging data locality of detached state and by methods that build on the areas of transaction chopping and commutativity. To address the trust challenges\, we propose a byzantine fault-tolerant (BFT) protocol that targets edge nodes. LogPoseDB’s BFT replication protocol proposes the principle of remote lazy trust that enables efficient BFT edge coordination by utilizing a remote trusted node asynchronously. \nBio:\nAbhishek Alfred Singh is a PhD Candidate working with Professor Faisal Nawab. His research interests are in transaction processing in emerging edge-cloud data management systems. His work deals with transaction processing in loosely coupled distributed systems.
URL:https://isg.ics.uci.edu/event/abhishek-singh/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20250411T110000
DTEND;TZID=America/Los_Angeles:20250411T120000
DTSTAMP:20260508T090626
CREATED:20250401T171717Z
LAST-MODIFIED:20260401T210231Z
UID:2188-1744369200-1744372800@isg.ics.uci.edu
SUMMARY:Jiawei Han (distinguished lecture): A Retrieval-and-Structuring Approach for LLM-Enhanced\, Theme-Focused Scientific Exploration
DESCRIPTION:“A Retrieval-and-Structuring Approach for LLM-Enhanced\, Theme-Focused Scientific Exploration” \nAbstract:  Large Language Models (LLMs) may bring unprecedented power for scientific exploration.  However\, current LLMs may still encounter major challenges for effective scientific exploration due to their lack of in-depth\, theme-focused data and knowledge.  Retrieval augmented generation (RAG) has recently become an interesting approach for augmenting LLMs with grounded\, theme-specific datasets.  We discuss the challenges of RAG and propose a retrieval and structuring (RAS) approach\, which enhances RAG by improving retrieval quality and mining structures (e.g.\, extracting entities and relations and building knowledge graphs) to ensure its effective integration of theme-specific data with LLM.  We show the promise of this approach at augmenting LLMs and discuss its potential power for LLM-enabled science exploration.  \n\n\n\n\nBio: Jiawei Han is Michael Aiken Chair Professor in the Siebel School of Computing and Data Science\, University of Illinois Urbana-Champaign.  He received ACM SIGKDD Innovation Award (2004)\, IEEE Computer Society Technical Achievement Award (2005)\, IEEE Computer Society W. Wallace McDowell Award (2009)\, Japan’s Funai Achievement Award (2018)\, and being elevated to Fellow of Royal Society of Canada (2022).  He is Fellow of ACM and Fellow of IEEE and served as the Director of Information Network Academic Research Center (INARC) (2009-2016) supported by the Network Science-Collaborative Technology Alliance (NS-CTA) program of U.S. Army Research Lab and co-Director of KnowEnG\, a Center of Excellence in Big Data Computing (2014-2019)\, funded by NIH Big Data to Knowledge (BD2K) Initiative.  Currently\, he is serving on the executive committees of two NSF funded research centers:  MMLI (Molecular Make Research Institute)—one of NSF funded national AI centers since 2020 and I-Guide—The National Science Foundation (NSF) Institute for Geospatial Understanding through an Integrative Discovery Environment (I-GUIDE) since 2021.
URL:https://isg.ics.uci.edu/event/jiawei-han-distinguished-lecture/
LOCATION:DBH 6011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20250307T130000
DTEND;TZID=America/Los_Angeles:20250307T170000
DTSTAMP:20260508T090626
CREATED:20250401T171401Z
LAST-MODIFIED:20250401T171401Z
UID:2186-1741352400-1741366800@isg.ics.uci.edu
SUMMARY:Lukas Lokowski: Knowledge Graphs and AI: Bridging Enterprise Data and Knowledge Graphs to Leverage AI Applications
DESCRIPTION:Knowledge Graphs and AI: Bridging Enterprise Data and Knowledge Graphs to Leverage AI Applications\nAbstract:\nIn today’s data-driven world\, enterprises face significant challenges in integrating\, structuring\, and utilizing vast amounts of heterogeneous data. Knowledge graphs offer a powerful solution by organizing and connecting enterprise data\, thus enabling advanced AI applications through structured\, semantically enriched information. In this seminar\, Lukas Lokowski will explore the role of knowledge graphs and ontology learning systems within enterprise environments. His talk will cover research on automating ontology extraction from relational databases\, benchmarking current ontology learning approaches\, and enhancing AI applications through high-quality data integration. \nBio:\nLukas Lokowski is a PhD student at the Hasso Plattner Institute in Potsdam\, under the guidance of Prof. Dr. Felix Naumann\, head of the Information Systems Chair. His research focuses on ontology learning from enterprise data and leveraging knowledge graphs to enhance AI applications. Additionally\, he addresses data cleaning challenges\, with a particular emphasis on entity resolution in numerical data. Passionate about bridging structured data with AI\, Lukas is committed to developing innovative approaches to improve data integration and interoperability.
URL:https://isg.ics.uci.edu/event/lukas-lokowski-knowledge-graphs-and-ai-bridging-enterprise-data-and-knowledge-graphs-to-leverage-ai-applications/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20250228T110000
DTEND;TZID=America/Los_Angeles:20250228T120000
DTSTAMP:20260508T090626
CREATED:20250211T005638Z
LAST-MODIFIED:20250225T032827Z
UID:2171-1740740400-1740744000@isg.ics.uci.edu
SUMMARY:Sainyam Galhotra (Cornell): Context-aware Responsible Data Science
DESCRIPTION:ABSTRACT Data-based systems are increasingly used in applications that have far-reaching consequences and long-lasting societal impact. However\, the development process remains highly specialized\, tedious\, and unscalable. This produces a manually fine-tuned rigid solution that works only for one specific problem in one specific context. The system fails to adapt to the changing world and severely limits the full utilization of valuable data. \nSo\, how can you avert this fate for your systems? \nIn this talk\, I present my vision of context-aware systems that enable even non-expert users to develop correct\, explainable\, and equitable data-science pipelines. To achieve this\, I will focus on i) re-thinking the design of data science pipelines\, and ii) the importance of causal inference for trustworthy data analysis. I will present a data discovery framework that helps users identify useful data for various tasks like hypothesis generation\, fact checking and causal inference. Lastly\, I will discuss my proposal of leveraging causal reasoning to quantify the impact of an input on the outcome. These topics are the pieces of the puzzle that come together to create the Data Scientists’ holy grail – an easily deployable\, scalable\, and robust system that you can trust even as everything around it evolves. \n  \nBIO Sainyam Galhotra is an Assistant Professor in Computer Science at Cornell University and a field member for Computer Science\, Statistics and Data Science. Previously\, he was a Computing Innovation Fellow pursuing postdoctoral research at the University of Chicago. He received his Ph.D. from the University of Massachusetts Amherst under the supervision of Prof. Barna Saha (currently at UC San Diego). The goal of his research is to lay the foundation of responsible data science\, that enable efficient development and deployment of trustworthy data analytics applications. His research has combined techniques from Data Management\, Probabilistic Methods\, Causal Inference\, Machine Learning\, and Software Engineering. His research has been published in top-tier Data Management (SIGMOD\, VLDB\, PODS\, & ICDE)\, AI (NeurIPS\, AAAI & AIES) and Software Engineering (FSE) conferences. He is a recipient of the Best Paper Award in FSE 2017 and Most Reproducible Paper Award in both SIGMOD 2017 and 2018\, and Best Artifact Paper Honorable Mention Award in SIGMOD 2023. He was recognized as a Data Science rising star\, a DAAD AInet Fellow\, and as the first recipient of the Krithi Ramamritham Award at UMass for contribution to database research. \nhttps://sainyamgalhotra.com/
URL:https://isg.ics.uci.edu/event/sainyam-galhotra-cornell-tbd/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20250221T130000
DTEND;TZID=America/Los_Angeles:20250221T140000
DTSTAMP:20260508T090626
CREATED:20250211T005602Z
LAST-MODIFIED:20250225T040933Z
UID:2169-1740142800-1740146400@isg.ics.uci.edu
SUMMARY:Ketan C Maheshwari (Oak Ridge National Laboratory): Enacting Distributed HPC Workflows: Opportunities and Challenges
DESCRIPTION:Abstract: The Dept of Energy (DOE) complex comprises of many science facilities that could be classified as data producing (eg. the Advanced Photon Source at Argonne National Laboratory) and consuming (eg. the Leadership Class Computing Facilities at the Oak Ridge National Laboratory) facilities. Modern science campaigns often require extensive usage of more than one such facilities which may be located remote from each other and administered separately. This presents opportunities for the scientific computational workflows to aid in the process. At the same time\, there are equally daunting challenges faced to successfully and smoothly accomplish these workflows. Our talk will dive into these opportunities and challenges and dive into proposed solutions and path forward. \nBio: Dr. Ketan Maheshwari is a Senior Linux Systems Engineer within the NCCS Division at the Oak Ridge National Laboratory. He has over 15 years of experience working with HPC systems with over 10 years with the leadership class systems at ALCF and OLCF. He is interested in science applications porting to large scale computing infrastructures and has a hands-on expertise in workflows\, parallelization and HPC. He has given several talks on technical topics at local as well as international venues\, most notably on GNU Parallel (eScience’23\, CUG’24\, PEARC’24\, SC’24)\, Linux Terminal Tools (USENIX/LISA’19\, LOPSA’18) and Swift Workflows (CCGrid’13). Ketan received his PhD in the area of Scientific Workflows from University of Nice and a Masters in Grid Computing from University of Amsterdam. \n  \nHost: Chen Li
URL:https://isg.ics.uci.edu/event/ketan-c-maheshwari-oak-ridge-national-laboratory/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20250214T130000
DTEND;TZID=America/Los_Angeles:20250214T140000
DTSTAMP:20260508T090626
CREATED:20250211T005519Z
LAST-MODIFIED:20250225T032621Z
UID:2167-1739538000-1739541600@isg.ics.uci.edu
SUMMARY:Jiadong Bai:  Supporting Data Science Education Using Texera with a Cloud Infrastructure
DESCRIPTION:Abstract\nIn this talk\, we will first present our paper at the DSE-K12 conference with the title “DS4ALL: Teaching High-School Students Data Science and AI/ML Using the Texera Workflow Platform as a Service.” Traditional data science education often requires students to have programming experience and install local software. It also makes collaboration inefficient and slows down the feedback loop when students need help from TAs. To address these challenges\, we developed a new teaching paradigm using Texera\, and successfully hosted DS4ALL summer programs in 2023 and 2024. As a result\, students with no prior coding experience were able to perform data analysis using AI/ML techniques on the platform.The second part of the talk will focus on how we are scaling to a broader audience based on the success of the Texera system used in DS4ALL\,. There are several challenges of building such a cloud  infrastructure\, such as handling versatile service traffic\, supporting concurrent isolated workflow executions\, and managing diverse data storage needs. To address these challenges\, we develop Texera to be able to run using Kubernetes to achieve scalable service management; For workflow execution\, we run each workflow in an isolated Kubernetes pod to ensure performance and security. On the data storage side\, we design a storage layer that integrates LakeFS for managing versioned datasets and Apache Iceberg for handling versioned tables with support for concurrent read/write operations. These solutions allow Texera to be deployed as a cloud service to provide a scalable\, secure\, and efficient environment for data science workflows.Bio\nJiadong Bai is a second-year Ph.D. student in the Computer Science Department at UC Irvine\, with research interests in data systems\, data science\, and big data analysis. He’s supervised by Prof. Chen Li.Shengquan Ni is a sixth-year Ph.D. student in the Department of Computer Science advised by Professor Chen Li. His research interests include big data processing\, distributed systems\, data analytics\, and data science.
URL:https://isg.ics.uci.edu/event/jiadong-bai-tbd/
LOCATION:DBH 3011
END:VEVENT
END:VCALENDAR