BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Information Systems Group - ECPv6.4.0.1//NONSGML v1.0//EN
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-WR-CALNAME:Information Systems Group
X-ORIGINAL-URL:https://isg.ics.uci.edu
X-WR-CALDESC:Events for Information Systems Group
REFRESH-INTERVAL;VALUE=DURATION:PT1H
X-Robots-Tag:noindex
X-PUBLISHED-TTL:PT1H
BEGIN:VTIMEZONE
TZID:America/Los_Angeles
BEGIN:DAYLIGHT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
TZNAME:PDT
DTSTART:20240310T100000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
TZNAME:PST
DTSTART:20241103T090000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
TZNAME:PDT
DTSTART:20250309T100000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
TZNAME:PST
DTSTART:20251102T090000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20240119T110000
DTEND;TZID=America/Los_Angeles:20240119T120000
DTSTAMP:20260606T114503
CREATED:20240122T190712Z
LAST-MODIFIED:20240122T190712Z
UID:1663-1705662000-1705665600@isg.ics.uci.edu
SUMMARY:Henry F. Korth (Lehigh University): Blockchain: Computer Science Foundations\, Positive Social and Business Impact\, and Research Opportunities
DESCRIPTION:The Computer Science Department and Information Systems Group (ISG) \nat UC Irvine welcomes \nHenry F. Korth \nLehigh University \nBlockchain: Computer Science Foundations\, Positive Social and Business Impact\, and Research Opportunities \nJanuary 19 at 11:00AM \nDBH 6011 \n  \nAbstract: To start\, basic concepts of blockchain systems will be introduced assuming only a basic background in computing. The key properties that distinguish blockchains from traditional databases will be explained along with an introduction to the mathematical concepts that make them possible.   \nWe next survey briefly current and emerging blockchain applications and their benefits to business and to society as a whole.   \nThe latter and main part of the talk presents research currently underway at Lehigh: \n\nAcceleration of the generation of zero-knowledge proofs on modern parallel architectures\,\nA consensus mechanism for the implementation of global-scale central-bank digital currencies in a manner that allows privacy along with transparency and regulatability\, \nA framework for benchmarking blockchain systems\, including current work extending the existing framework to layer-2 blockchain systems\n\nWe conclude with a brief survey of other development projects in Lehigh Blockchain with industry partners \n  \nBio: Korth’s current research is focused in blockchain systems\, with an emphasis on blockchain system internals and applications in financial systems. For full details on his blockchain work\, spanning research and education\, see blockchain.cse.lehigh.edu. More broadly\, Korth’s research focuses on database systems\, information systems and distributed systems. He is co-author of Database Systems Concepts\, now in its seventh edition. Previously\, Korth was Director of Database Principles Research at Bell Laboratories\, Vice President of Panasonic Technologies\, and Associate Professor at the University of Texas at Austin. Korth is a Fellow of the ACM and the IEEE. At the 1995 International Conference on Very Large Data Bases\, his paper “A Model of CAD Transactions” was chosen as “Most Influential Paper from the Proceedings of Ten Years Ago.”  At Lehigh\, he has served as department chair and directed the Lehigh-unique Computer Science and Business degree.  He currently holds a concurrent courtesy appointment in the Decision and Technology Analytics Department in Lehigh’s College of Business and is director of the Blockchain Lab in Lehigh’s Center for Financial Services. \n 
URL:https://isg.ics.uci.edu/event/henry-f-korth-lehigh-university-blockchain-computer-science-foundations-positive-social-and-business-impact-and-research-opportunities/
LOCATION:DBH 6011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20240126T130000
DTEND;TZID=America/Los_Angeles:20240126T140000
DTSTAMP:20260606T114503
CREATED:20240122T191108Z
LAST-MODIFIED:20240122T191108Z
UID:1666-1706274000-1706277600@isg.ics.uci.edu
SUMMARY:Volker Markl (TU Berlin): Mosaics of Big Data: Database Systems and Information Management – Trends and a Vision
DESCRIPTION:Prof. Dr. Volker Markl \nChair of the Database Systems and Information Management (DIMA) Group at TU Berlin \nDirector of the Berlin Institute for the Foundations of Learning and Data (BIFOLD) \nChief Scientist and Head of the Intelligent Analytics for Massive Data Research Group at German Research Center for Artificial Intelligence (DFKI) \n \nMosaics of Big Data  Database Systems and Information Management – Trends and a Vision \n\nAbstract: The global database research community has greatly impacted the functionality and performance of data storage and processing systems along the dimensions that define “big data”\, i.e.\, volume\, velocity\, variety\, and veracity. Locally\, over the past five years\, we have also been working on varying fronts. Among our contributions are: (1) establishing a vision for a database-inspired big data analytics system\, which unifies the best of database and distributed systems technologies\, and augments it with concepts drawn from compilers (e.g.\, iterations) and data stream processing\, as well as (2) forming a community of researchers and institutions to create the Stratosphere platform to realize our vision. One major result from these activities was Apache Flink\, an open-source big data analytics platform and its thriving global community of developers and production users. Although much progress has been made\, when looking at the overall big data stack\, a major challenge for database research community still remains. That is\, how to maintain the ease-of-use despite the increasing heterogeneity and complexity of data analytics\, involving specialized engines for various aspects of an end-to-end data analytics pipeline\, including\, among others\, graph-based\, linear algebra-based\, and relational-based algorithms\, and the underlying\, increasingly heterogeneous hardware and computing infrastructure. At TU Berlin\, DFKI\, and the Berlin Institute for Foundations of Learning and Data (BIFOLD) we currently aim to advance research in this field via the NebulaStream and Agora projects. Our goal is to remedy some of the heterogeneity challenges that hamper developer productivity and limit the use of data science technologies to just the privileged few\, who are coveted experts. In this talk\, we will outline how state-of-the-art SPEs have to change to exploit the new capabilities of the IoT and showcase how we tackle IoT challenges in our own system\, NebulaStream. We will also present our vision for Agora\, an asset ecosystem that provides the technical infrastructure for offering and using data and algorithms\, as well as physical infrastructure components.  \n  \nBio: Volker Markl is a German Professor of Computer Science. He leads the Chair of Database Systems and Information Management at TU Berlin and the Intelligent Analytics for Massive Data Research Department at DFKI. In addition\, he is Director of the Berlin Institute for the Foundations of Learning and Data (BIFOLD). He is a database systems researcher\, conducting research at the intersection of distributed systems\, scalable data processing\, and machine learning. Volker led the Stratosphere project\, which resulted in the creation of Apache Flink. Volker has received numerous honors and prestigious awards\, including best paper awards at ACM SIGMOD\, VLDB\, and ICDE as well as the ACM SIGMOD Systems Award. In 2014\, he was elected one of Germany‘s leading “Digital Minds“ (Digitale Köpfe) by the German Informatics Society and is a member of the Berlin-Brandenburg Academy of Sciences. He was elected an ACM Fellow for his contributions to query optimization\, scalable data processing\, and data programmability. He served President of the VLDB Endowment\, and serves as advisor to academic institutions\, governmental organizations\, and technology companies. Volker holds eighteen patents and has been co-founder and mentor to several startups. \n 
URL:https://isg.ics.uci.edu/event/volker-markl-tu-berlin-mosaics-of-big-data-database-systems-and-information-management-trends-and-a-vision/
LOCATION:DBH 4011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20240202T130000
DTEND;TZID=America/Los_Angeles:20240202T140000
DTSTAMP:20260606T114503
CREATED:20240131T055243Z
LAST-MODIFIED:20240131T055243Z
UID:1674-1706878800-1706882400@isg.ics.uci.edu
SUMMARY:Shengquan Ni: Supporting time-travel debugging in Texera
DESCRIPTION:Title: Supporting time-travel debugging in Texera \nSpeaker: Shengquan Ni \nAbstract: Dataflow systems\, traditionally used for relational analysis\, now support a variety of tasks including complex user-defined functions. As dataflow jobs become more diverse and complex\, there is an increasing need for better debugging support to understand their runtime behaviors and identify issues either in data or the analysis. To achieve the goal in the Texera system\, we develop techniques to support “time-travel debugging.” In particular\, the system allows users to interact with an execution during runtime to retrieve an execution state\, which is a consistent snapshot of the engine. The user has the ability to “travel back to the past” to access the execution state of a previous interaction\, thus retrospectively explore and analyze a previous execution state.  We will show a demo of this powerful feature\, and give an overview of the underlying techniques. \nBio: Shengquan Ni is a Ph.D. student in the Department of Computer Science advised by Professor Chen Li. His current research interests include big data processing\, distributed systems\, data analytics and data science. He was a summer intern at Google.
URL:https://isg.ics.uci.edu/event/shengquan-ni-supporting-time-travel-debugging-in-texera/
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20240209T110000
DTEND;TZID=America/Los_Angeles:20240209T120000
DTSTAMP:20260606T114504
CREATED:20240122T191410Z
LAST-MODIFIED:20240122T191430Z
UID:1669-1707476400-1707480000@isg.ics.uci.edu
SUMMARY:Joseph Hellerstein (UC Berkeley): Hydro: A Compiler Stack for Distributed Programs
DESCRIPTION:The Computer Science Department and Information Systems Group (ISG) \nat UC Irvine welcomes \n \nJoseph Hellerstein \nUC Berkeley and Sutter Hill Ventures \nTITLE: Hydro: A Compiler Stack for Distributed Programs \nABSTRACT:   \nNearly all programs of interest today are distributed. Unfortunately\, the traditional languages and compilers in common use today offer little assistance in ensuring the correctness of distributed programs. This state of affairs makes infrastructure development and tuning unduly expensive\, and hampers the ability of less-technical but highly creative individuals to invent new applications that take advantage of the ubiquity of cloud and mobile computing. \n  \nThe Hydro project at Berkeley is an effort to build a compiler stack to address these issues\, taking lessons from the success of scaling data management software. The foundation of the Hydro stack is Hydroflow\, a Rust-based dataflow runtime with an IR based on algebraic dataflow. Hydroflow enables a compiler to make correct program transformations that are natural in the context of distributed systems. Transformations include: \n– Refactoring: Given an arbitrary block of code\, refactor it into smaller blocks that can be launched on independent machines \n– Replication: Given an arbitrary block of code\, determine whether it can be safely replicated in deployment \n– Partitioning: Given an arbitrary block of code\, determine how its inputs can be safely partitioned (“sharded”) to multiple machines in deployment \n  \nThese transformations in turn allow distributed programs to be optimized for various goals\, including parallelism (both pipelines and partitioning)\, memory scaling\, performance isolation\, geoproximity and physical security. \n  \nAlthough the Hydro project is still in early stages\, I will present case studies showing correctness\, latency and scaling results when optimizing programs ranging from infrastructure like key-value stores\, applications like shopping carts and messaging systems\, and tricky consensus protocols. \n  \nJoint work with colleagues at UC Berkeley and Sutter Hill Ventures. \n  \nBIO: Joseph M. Hellerstein is the Jim Gray Professor of Computer Science at UC Berkeley\, and a Faculty Fellow at Sutter Hill Ventures. His academic recognition includes the ACM SIGMOD Codd Innovations Award\, ACM Fellow and Sloan Research Fellow awards\, and six “Test of Time” awards for his papers. Hellerstein is a longtime participant in the computing industry\, co-founding startups\, advising companies and venture funds\, and directing industry research. He also enjoys playing music\, and has performed live with legendary musicians including Joe Henderson\, Joshua Redman and Michael J. Carey. \n 
URL:https://isg.ics.uci.edu/event/joseph-hellerstein-uc-berkeley-hydro-a-compiler-stack-for-distributed-programs/
LOCATION:DBH 6011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20240212T130000
DTEND;TZID=America/Los_Angeles:20240212T140000
DTSTAMP:20260606T114504
CREATED:20240206T000852Z
LAST-MODIFIED:20260405T013527Z
UID:1676-1707742800-1707746400@isg.ics.uci.edu
SUMMARY:Raul Castro Fernandez (U. Chicago): On Data Ecology\, Data Markets\, the Value of Data\, and Dataflow Governance
DESCRIPTION:Abstract: \nData shapes our social\, economic\, cultural\, and technological environments. Data is valuable\, so people seek it\, inducing data to flow. The resulting dataflows distribute data and thus value. For example\, large Internet companies profit from accessing data from their users\, and engineers of large language models seek large and diverse data sources to train powerful models. It is possible to judge the impact of data in an environment by analyzing how the dataflows in that environment impact the participating agents. My research hypothesizes that it is also possible to design (better) data environments by controlling what dataflows materialize; not only can we analyze environments but also synthesize them. In this talk\, I present the research agenda on “data ecology\,” which seeks to build the principles\, theory\, algorithms\, and systems to design beneficial data environments. I will also present examples of data environments my group has designed\, including data markets for machine learning\, data-sharing\, and data integration. I will conclude by discussing the impact of dataflows in data governance and how the ideas are interwoven with the concepts of trust\, privacy\, and the elusive notion of “data value.” As part of the technical discussion\, I will complement the data market designs with the design of a data escrow system that permits controlling dataflows. \nBio (Raul Castro Fernandez): \nIn my research\, I ask what is the value of data and explore the potential of data markets to unlock that value. My group collaborates with economists\, legal scholars\, statisticians\, and domain scientists. We build systems to share\, discover\, prepare\, integrate\, and process data. I have traditionally worked on distributed query processing systems and continue to do so. I have received a SIGMOD’23 Test-of-time-Award. I am an assistant professor in the Department of Computer Science and on the Committee of Data Science at The University of Chicago. Before UChicago\, I did a postdoc at MIT with Sam Madden and Mike Stonebraker. And before that\, I completed a PhD at Imperial College London with Peter Pietzuch.
URL:https://isg.ics.uci.edu/event/raul-castro-fernandez-u-chicago-on-data-ecology-data-markets-the-value-of-data-and-dataflow-governance/
LOCATION:DBH 4011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20240301T130000
DTEND;TZID=America/Los_Angeles:20240301T140000
DTSTAMP:20260606T114504
CREATED:20240228T200048Z
LAST-MODIFIED:20240228T200048Z
UID:1685-1709298000-1709301600@isg.ics.uci.edu
SUMMARY:Yunyan Ding: Efficient Mouse Brain Image Processing Using Collaborative Data Workflows on Texera
DESCRIPTION:Abstract:\nIn the field of neuroscience\, accurately mapping the complex three-dimensional (3D) neural circuitry and architecture of the brain is crucial for advancing our understanding of brain functions and disorders. In this study\, we introduce a distributed computational pipeline designed for processing high-resolution mouse brain tile images captured by TissueCyte. This pipeline efficiently and accurately stitches these tiles and stacks 2D sections to construct detailed whole mouse brain models in 3D. Our high-quality 3D volumes can facilitate secondary analyses such as whole-brain 3-dimensional registration and segmentation\, cell counting\, and high-resolution volumetric visualization. By utilizing specialized optimization methods within Texera platform to distribute computational resources\, our pipeline achieves an over 80\% reduction in processing times\, enhancing the efficiency significantly. Additionally\, this pipeline is designed with scalability and flexibility\, enabling it to process large volumes of high-resolution neuroimaging data across various computational environments. Our work is developed through a collaborative effort among neuroscience\, computer vision\, and data processing teams\, exemplifying the power of creating tools for interdisciplinary collaboration in addressing complex research challenges.\n\nBio:\nYunyan Ding is a second-year Ph.D. student in the Computer Science Department at UC Irvine. Her research interests include data processing systems and big data analytics.
URL:https://isg.ics.uci.edu/event/yunyan-ding-efficient-mouse-brain-image-processing-using-collaborative-data-workflows-on-texera/
LOCATION:DBH 4011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20240304T110000
DTEND;TZID=America/Los_Angeles:20240304T120000
DTSTAMP:20260606T114504
CREATED:20240304T175318Z
LAST-MODIFIED:20250211T004006Z
UID:1689-1709550000-1709553600@isg.ics.uci.edu
SUMMARY:Bratin Saha (AWS Amazon): Scaling Generative AI in the Enterprise
DESCRIPTION:Abstract: Machine learning (ML) and generative artificial intelligence (AI) is one of the most transformational technologies that is opening up new opportunities for innovation in every domain across software\, finance\, health care\, manufacturing\, media\, entertainment and others. This talk will discuss the key trends that are driving AI/ML innovation\, how enterprises are using AI/ML today to innovate how they run their businesses\, the key technology challenges in scaling out ML and generative AI across the enterprise\, some of the key innovations from Amazon\, and how this field is likely to evolve in the future. \nBio: Dr. Bratin Saha is the Vice President of Machine Learning and AI services at AWS where he leads all the ML and AI services and helped build one of the fastest growing businesses in AWS history. In 2022 Harvard Business School wrote three case studies on how he built the machine learning business at AWS. He is an alumnus of Harvard Business School (General Management Program)\, Yale University (PhD Computer Science)\, and Indian Institute of Technology (BS Computer Science). He has more than 70 patents granted (with another 50+ pending) and more than 30 papers in conferences/journals. Prior to Amazon he worked at Nvidia and Intel leading different product groups spanning imaging\, analytics\, media processing\, high performance computing\, machine learning\, and software infrastructure. Bratin received the Distinguished Alumnus Award from the Indian Institute of Technology and is an Executive Fellow at the Harvard Business School. 
URL:https://isg.ics.uci.edu/event/bratin-saha-aws-amazon-scaling-generative-ai-in-the-enterprise/
LOCATION:DBH 4011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20240315T130000
DTEND;TZID=America/Los_Angeles:20240315T140000
DTSTAMP:20260606T114504
CREATED:20240311T193926Z
LAST-MODIFIED:20240311T193926Z
UID:1693-1710507600-1710511200@isg.ics.uci.edu
SUMMARY:Yinan Zhou: SpendableDB: A UTxO-based decentralized Database
DESCRIPTION:Abstract: \nBlockchain technology has attracted a significant amount of attention ever since the Bitcoin blockchain’s success. Currently\, most of the research and engineering efforts have been centered around monetary transactions such as token exchange protocols. The potential of building databases on top of blockchains is largely overlooked and remains an open problem. The literature on blockchain databases is divided into permissioned blockchains and permissionless account-based blockchains. However\, the former is not fully decentralized\, and the latter suffers from challenges in performance and cost. We propose SpendableDB\, a permissionless UTxO-based blockchain database as a novel approach to the problem of data decentralization. Our design integrates data into individual UTxOs to achieve true decentralization of data ownership that can be securely transferred and traded\, similar to how the regular monetary UTxOs are protected by the underlying blockchain’s decentralization protocol. Additionally\, SpendableDB provides cryptographically secured data integrity and immutable data lineage that can be easily verified. Our implementation and experiments show that our design is economically practical as it incurs a small amount of blockchain transaction fees. \nBio: \nYinan Zhou is a second-year Ph.D. student in the Computer Science Department at UC Irvine. His primary research focus is on blockchain infrastructure and application developments.
URL:https://isg.ics.uci.edu/event/yinan-zhou-spendabledb-a-utxo-based-decentralized-database/
LOCATION:DBH 4011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20240405T130000
DTEND;TZID=America/Los_Angeles:20240405T140000
DTSTAMP:20260606T114504
CREATED:20240401T183244Z
LAST-MODIFIED:20240401T183244Z
UID:1696-1712322000-1712325600@isg.ics.uci.edu
SUMMARY:Lukasz Golab (University of Waterloo): Understanding models and the data they learn from
DESCRIPTION:Lukasz Golab (U. Waterloo) \nUnderstanding models and the data they learn from \nAbstract: The modern world is powered by data. However\, as the capabilities of data-intensive systems grow\, so does their complexity\, making them hard to understand and troubleshoot. I will discuss my lab’s efforts towards understanding models and the data they learn from\, including local and global model explanations as well as model diagnostics for fairness and bias avoidance. \n  \nBio: Lukasz Golab is a Professor and Canada Research Chair at the University of Waterloo. From 2006 to 2011\, he was a Senior Member of Research Staff at AT&T Labs. He obtained a BSc in Computer Science from the University of Toronto (with High Distinction) and a PhD in Computer Science from the University of Waterloo (with Alumni Gold Medal). His long-term research agenda of Data for Good calls for building data-intensive systems with societal impact. His recent projects focus on systems for managing high-speed data events such as data stream engines and blockchains\, understanding complex models and the data they learn from\, and applications including online safety\, education\, and sustainability.
URL:https://isg.ics.uci.edu/event/lukasz-golab-university-of-waterloo-understanding-models-and-the-data-they-learn-from/
LOCATION:DBH 4011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20240412T130000
DTEND;TZID=America/Los_Angeles:20240412T140000
DTSTAMP:20260606T114504
CREATED:20240409T184916Z
LAST-MODIFIED:20240409T185018Z
UID:1703-1712926800-1712930400@isg.ics.uci.edu
SUMMARY:Juncheng Fang: ImmortalChopper: Real-Time and Resilient Distributed Transactions in the Edge-Cloud
DESCRIPTION:Abstract:\n\nEmerging applications in the areas of real-time Internet of Things (IoT) and edge technologies (such as wearables\, and mobile headsets) require fast processing and response times. This motivates the utilization of edge nodes for both processing and storage of data. In settings with a vast number of edge nodes—such as the case of smart cities and spaces—the state of the data is distributed across a large number of edge nodes. This makes it expensive to perform distributed transactions as these transactions would span edge nodes that are connected via less reliable and relatively slow network infrastructure. This makes it prohibitive to use existing protocols like 2PC that require rounds of communication across participants. \nIn this paper\, we propose ImmortalChopper\, a distributed transaction processing protocol that is designed for the edge-cloud environment. The goal of ImmortalChopper is to allow fast commitment of transactions on the edge without having to wait for distributed coordination. To achieve this\, we build on the literature of Transaction Chopping. Transaction Chopping allows breaking a transaction into smaller hops. If the first hop commits\, then\, the rest of the transaction is guaranteed to commit. We utilize this feature to allow a transaction to commit from the closest edge node without having to wait for the rest of the processing of the other participating edge nodes. However\, the direct use of Transaction Chopping is not suitable for the edge-cloud. This is because of the sporadic availability of edge nodes that leads to either blocking behavior during failures or the necessity to replicate each step which defies the purpose of using Chopping in our case. The innovation in ImmortalChopper is the introduction of the concept of ChopperGraph which utilizes lazy replication between edge and cloud nodes. This enables resilience to failures without the added synchronous overhead. \nBio:\nJuncheng Fang is a 3rd-year Ph.D. student in the Computer Science Department at UC Irvine\, supervised by Prof. Faisal Nawab. His current research focuses on blockchain\, distributed systems\, and edge cloud.
URL:https://isg.ics.uci.edu/event/juncheng-fang-immortalchopper-real-time-and-resilient-distributed-transactions-in-the-edge-cloud/
LOCATION:DBH 4011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20240419T130000
DTEND;TZID=America/Los_Angeles:20240419T140000
DTSTAMP:20260606T114504
CREATED:20240407T214847Z
LAST-MODIFIED:20240407T214847Z
UID:1698-1713531600-1713535200@isg.ics.uci.edu
SUMMARY:Mohammed Al-Kateb (Amazon Redshift): The Evolution of Amazon Redshift
DESCRIPTION:Abstract:\nIn this talk\, we will discuss the evolution of Amazon Redshift over the past 10 years. We’ll discuss the Amazon Redshift architecture. We’ll dive deep in the lifecycle of executing a query in Amazon Redshift. And we’ll examine how Amazon Redshift continues to maintain a leading price/performance in the market. \nBio:\n Mohammed Alkateb leads the Query Optimizer team of Redshift – The Amazon AWS Distributed Cloud Data Warehouse that tens of thousands of customers rely on to gain the insight they need from their most critical data. Prior to joining Amazon\, Mohammed spent over a decade with the Teradata Optimizer team as an individual contributor and engineering manager. Mohammed is also an adjunct professor at Worcester Polytechnic Institute (WPI) and at California State University\, Northridge (CSUN). Mohammed has 16 U.S. patents. And he has publications in research and industrial tracks of premier database conferences including EDBT\, ICDE\, SIGMOD and VLDB. Mohammed holds a Ph.D. degree in Computer Science from The University of Vermont\, and M.Sc. & B.Sc. degrees in Information Systems from Cairo University.
URL:https://isg.ics.uci.edu/event/mohammed-al-kateb-amazon-redshift-the-evolution-of-amazon-redshift/
LOCATION:DBH 4011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20240426T130000
DTEND;TZID=America/Los_Angeles:20240426T140000
DTSTAMP:20260606T114504
CREATED:20240424T002712Z
LAST-MODIFIED:20240424T002712Z
UID:1757-1714136400-1714140000@isg.ics.uci.edu
SUMMARY:Xinyuan Lin: Data Science Tasks Implemented with Scripts versus GUI-Based Workflows: The Good\, the Bad\, and the Ugly.
DESCRIPTION:Abstract: As leveraging large-scale data analytics becomes the norm for many applications\, platforms for developing these capabilities have become increasingly important. This work compares the benefits and drawbacks of implementing two commonly used data science platform paradigms: code-based scripts and GUI-based workflows. We implement tasks in both paradigms that provide examples of phases in the typical life cycle of a data science project\, including data wrangling\, machine learning (ML) model training\, and inference. In this talk\, we will examine the relative performance of the implementations under each paradigm in various experimental settings. We will discuss the benefits and drawbacks of each platform implementation and provide a foundation for future work in comparing data science platform paradigms. \nBio: Xinyuan Lin is a third-year Ph.D student in the Computer Science Department at UC Irvine. His research interests include data processing systems and big data analytics.
URL:https://isg.ics.uci.edu/event/xinyuan-lin-data-science-tasks-implemented-with-scripts-versus-gui-based-workflows-the-good-the-bad-and-the-ugly/
LOCATION:DBH 4011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20240510T130000
DTEND;TZID=America/Los_Angeles:20240510T140000
DTSTAMP:20260606T114504
CREATED:20240422T224849Z
LAST-MODIFIED:20240422T224849Z
UID:1744-1715346000-1715349600@isg.ics.uci.edu
SUMMARY:Mike Heddes: Efficient Cardinality Estimation of Multi-Join Queries using Count Sketches
DESCRIPTION:Abstract: \nCardinality estimates are a primary input to query optimizers to determine an appropriate join order. The seminal AMS sketch can estimate the cardinality of an equi-join between two relations using little space. Since then\, two important advancements are the Count sketch\, a method which significantly improves upon the sketching time\, and secondly\, an extension of the AMS sketch to accommodate multi-join queries. However\, combining the strengths of these methods to maintain sketches for multi-join queries while ensuring fast update times is a non-trivial task\, and has remained an open problem for decades as highlighted in the existing literature. This talk will address this problem by introducing a novel sketching method which has fast updates\, even for sketches capable of accurately estimating the cardinality of complex multi-join queries. Experimental results confirm the significant improvement in update time complexity\, resulting in orders of magnitude faster estimates\, with equal or better estimation accuracy.\n\n\nBio:\nMike Heddes is a 4th-year PhD candidate at the University of California\, Irvine under supervision of Alex Nicolau and Tony Givargis. His research focusses on efficient algorithms for big data applications in machine learning and data mining. He has publications in prestigious venues such as SIGMOD\, KDD\, and JMLR. Mike has interned at Intel Labs as well as with the Advanced Concepts Team of the European Space Agency.
URL:https://isg.ics.uci.edu/event/mike-heddes-efficient-cardinality-estimation-of-multi-join-queries-using-count-sketches/
LOCATION:DBH 4011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20240517T130000
DTEND;TZID=America/Los_Angeles:20240517T140000
DTSTAMP:20260606T114504
CREATED:20240507T221711Z
LAST-MODIFIED:20240513T192421Z
UID:1825-1715950800-1715954400@isg.ics.uci.edu
SUMMARY:Pat Helland (Salesforce): Scalable OLTP in the Cloud:  What's the BIG DEAL?
DESCRIPTION:Abstract:  \n\n\n\n\nThe pursuit of scalable OLTP systems has been the holy grail of my career. Because OLTP systems are typically split into applications and databases\, the isolation semantics provided by the DB and used by the app have a major impact on the scalability of the OLTP system as a whole. The isolation semantics are a BIG DEAL! \nThis thought experiment explores the asymptotic limits to scale for OLTP systems. An OLTP (OnLine Transaction Processing) system is a domain-specific application using a RCSI (READ COMMITTED SNAPSHOT ISOLATION) SQL database to provide transactions across many concurrent users. This interface provides the contractual BIG DEAL between OLTP databases and OLTP applications. \nFocusing on the BIG DEAL\, shows today’s popular databases unnecessarily limit scale. Similarly\, we identify common app patterns that inhibit scale. We can reimagine the way we build both databases and applications to empower scale. All while complying with the established SQL and RCSI interface (i.e.\, the BIG DEAL). \nPerhaps\, this can provoke discussions within the database community leading to new opportunities for OLTP systems. To me\, that would be a big deal! . \n\nBio: \nPat Helland has been building distributed systems\, database systems\, high-performance messaging systems\, and multiprocessors since 1978\, shortly after dropping out of UC Irvine without a bachelor’s degree. That hasn’t stopped him from having a passion for academics and publication. From 1982 to 1990\, Pat was the chief architect for TMF (Transaction Monitoring Facility)\, the transaction logging and recovery systems for NonStop SQL\, a message-based fault-tolerant system providing high-availability solutions for business critical solutions. In 1991\, he moved to HaL Computers where he was chief architect for the Mercury Interconnect Architecture\, a cache-coherent non-uniform memory architecture multiprocessor. In 1994\, Pat moved to Microsoft to help the company develop a business providing enterprise software solutions. He was chief architect for MTS (Microsoft Transaction Server) and DTC (Distributed Transaction Coordinator). Starting in 2000\, Pat began the SQL Service Broker project\, a high-performance transactional exactly-once in-order message processing and app execution engine built deeply into Microsoft SQL Server 2005. From 2005-2007\, he worked at Amazon on scalable enterprise solutions\, scale-out user facing services\, integrating product catalog feeds from millions of sellers\, and highly-available eventually consistent storage. From 2007 to 2011\, Pat was back at Microsoft working on a number of projects including Structured Streams in Cosmos. Structured streams kept metadata within the “big data” streams that were typically 10s of terabytes in size. This metadata allowed affinitized placement within the cluster as well as efficient joins across multiple streams. On launch\, this doubled the work performed within the 250PB store. Pat also did the initial design for Baja\, the distributed transaction support for a distributed event-processing engine implemented as an LSM atop structured streams providing transactional updates targeting the ingestion of “the entire web in one table” with changes visible in seconds. Starting in 2012\, Pat has worked at Salesforce on database technology running within cloud environments. His current interests include latency bounding of online enterprise-grade transaction systems in the face of jitter\, the management of metastability in complex environments\, and zero-downtime upgrades to databases and stateful applications. In his spare time\, Pat regularly writes for ACM Queue\, Communications of the ACM\, and various conferences. He has been deeply involved in the organization of the HPTS (High Performance Transactions Systems – www.hpts.ws) workshop since 1985. His blog is at pathelland.substack.com and he parsimoniously tweets with the handle @pathelland.
URL:https://isg.ics.uci.edu/event/pat-helland-salesforce-scalable-oltp-in-the-cloud-whats-the-big-deal/
LOCATION:DBH 4011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20240531T110000
DTEND;TZID=America/Los_Angeles:20240531T120000
DTSTAMP:20260606T114504
CREATED:20240521T194943Z
LAST-MODIFIED:20240522T054705Z
UID:2030-1717153200-1717156800@isg.ics.uci.edu
SUMMARY:Mohammad Sadoghi (UC Davis): The Journey of Building Global-Scale Sustainable Blockchain Fabric 
DESCRIPTION:Abstract \nThe inception of Bitcoin and blockchain has renewed the vision of a democratic and decentralized computational paradigm\, that is\, to ingrain integrity\, transparency\, and accountability into the very fabric of the computational model. These fundamental concepts and the technologies behind them–a generic ledger-based data model\, cryptographically ensured data integrity and transparent and accountable consensus-based replication–prove to be a powerful and inspiring combination. Arguably\, the resilient consensus protocol is at the heart of this paradigm shift. To this end\, we share the story behind our (resilient) journey in building a consensus-based blockchain called Apache ResilientDB (Incubating). In this presentation\, we aim to provide an insightful overview of the core structure of the consensus protocols. We will further offer the intuitions behind our ongoing work\, including the speculative consensus model\, concurrent consensus with a wait-free property\, geo-scale meta-consensus\, consensus with weaker consistency models and isolation semantics\, as well as a variety of sharding and cross-chain protocols through our novel reliable communication primitives. \n  \nBio \nMohammad Sadoghi is an Associate Professor in the Computer Science Department at the University of California\, Davis. Formerly\, he was an Assistant Professor at Purdue University and a Research Staff Member at IBM T.J. Watson Research Center. He received his Ph.D. from the University of Toronto in 2013. He leads the ExpoLab research group with the mission to pioneer a resilient data platform at scale under our flagship project called Apache ResilientDB (Incubating)\, a distributed ledger centered around a democratic and decentralized computational model that further aims to unify secure transactional and real-time analytical processing (L-Store). He envisions ResilientDB to serve as a platform to foster “creativity.” He co-founded the blockchain spinoff\, Moka Blox LLC\, as the ResilientDB spinoff. He has over 100 publications in leading database conferences/journals and 36 filed U.S. patents. His ACM Middleware’18 entitled “QueCC: A Queue-oriented\, Control-free Concurrency Architecture” won the Best Paper Award; his paper “Dissecting BFT Consensus: In Trusted Components we Trust!” won the Best Paper Award at EuroSys’23; and his paper “The Bedrock of Byzantine Fault Tolerance: A Unified Platform for BFT Protocols Analysis\, Implementation\, and Experimentation” won the Outstanding Paper Award at NSDI’24. He has co-authored several books\, “Transaction Processing on Modern Hardware” and “Fault-tolerant Distributed Transactions on Blockchain\,” both published by Morgan & Claypool Synthesis Lectures on Data Management and a book published by Foundations and Trends® in Databases\, entitled “Consensus in Data Management: From Distributed Commit to Blockchain.“
URL:https://isg.ics.uci.edu/event/mohammad-sadoghi-uc-davis-the-journey-of-building-global-scale-sustainable-blockchain-fabric/
LOCATION:DBH 6011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20240927T110000
DTEND;TZID=America/Los_Angeles:20240927T120000
DTSTAMP:20260606T114504
CREATED:20241007T164605Z
LAST-MODIFIED:20260417T191550Z
UID:2100-1727434800-1727438400@isg.ics.uci.edu
SUMMARY:Aditya Parameswaran (Berkeley): Enhance\, Don't Replace: A Recipe for Success in Data Tooling
DESCRIPTION:Enhance\, Don’t Replace: A Recipe for Success in Data Tooling\nAbstract: Most data analysis and data science is performed in human-centered tools\, such as spreadsheets\, visual analytics tools\, and data science libraries. However\, these tools often pose challenges for end-users\, especially those without extensive programming expertise\, in terms of scalability\, interactivity\, and usability. \nRather than forcing such users to switch tools\, over the past decade\, we’ve instead taken the approach of enhancing existing tools with an eye towards addressing these challenges. To do so\, we draw on techniques from data management and human-computer interaction. In my talk\, I’ll describe a couple of successful examples of tools that have been adopted widely by end-users. Finally\, we also reflect on how our recipe — of enhancing existing tools as opposed to replacing them — may need revisiting in the exciting arena of LLM-powered data work\, which forms the focus of our new EPIC Data lab at Berkeley. \nBio:  Aditya Parameswaran is an Associate Professor in EECS at UC Berkeley. He works in the broad area of human-centered data science\, developing usable\, robust\, scalable\, and intuitive data science tools. His open-source tools have received thousands of GitHub stars\, and have been downloaded millions of times across a wide spectrum of industries. Ponder\, a company that Aditya cofounded with his students in 2021 based on open-source tooling developed as part of research\, was acquired by Snowflake\, the leading cloud data warehouse vendor in 2023. Aditya has received the Alfred P. Sloan Research Fellowship\, VLDB Early Career Research Contributions Award\, the ARO Young Investigator Program Award\, the NSF CAREER Award\, the TCDE Rising Star Award\, a number of best paper awards\, along with other recognitions. \nWebsite: http://adityagp.net
URL:https://isg.ics.uci.edu/event/aditya-parameswaran-berkeley-enhance-dont-replace-a-recipe-for-success-in-data-tooling/
LOCATION:DBH 6011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20241011T130000
DTEND;TZID=America/Los_Angeles:20241011T140000
DTSTAMP:20260606T114504
CREATED:20241007T164906Z
LAST-MODIFIED:20250211T004315Z
UID:2102-1728651600-1728655200@isg.ics.uci.edu
SUMMARY:Arnab Nandi (OSU): Data Exploration in a Camera-first World: Query and Result Challenges
DESCRIPTION:Prof. Arnab Nandi \n \nAssociate Professor\, Computer Science and Engineering \nThe Ohio State University \nFriday\, October 11\, 2024\nat 11 a.m.\nDonald Bren Hall 6011 \nTitle: “Data Exploration in a Camera-first World: Query and Result Challenges” \nAbstract: The pervasive availability of cameras in smartphones\, vehicles\, drones and more has triggered a new “camera-first” data revolution across industries. When combined with rapid advances in computer vision and machine learning\, this video data deluge presents several data exploration challenges. Here\, we will talk about two complementary challenges for end-users: query specification and result consumption. \nWhen searching large video collections\, the first challenge is that the user is often unaware of the contents of the video\, its structure\, and the exact terminology to use in the user query\, putting them at a loss for where to begin specifying the query. Here\, we present methods to guide the user through the query construction process by building on vision language models and search query interfaces. \nOnce users have executed a search\, they are faced with a new challenge of result consumption. Presenting query results as a list of links poses an impedance mismatch: they are cumbersome to skim through and are in a different modality compared to the source data. However\, processing large video collections within interactive response times has performance implications. We present V2V\, a system to efficiently synthesize video results for video queries. V2V returns a fully-edited video\, allowing the user to consume results in the same modality as the source videos\, resulting in a fluid\, user-centric video exploration experience. \nBio:  Arnab’s work focuses on bridging data infrastructure with human interaction\, spanning areas of database systems\, human factors\, and next-generation interfaces. Arnab is a recipient of the US National Science Foundation’s CAREER Award\, IEEE’s TCDE Early Career Award for his contributions towards user-focused data interaction\, The Ohio State University’s Alumni Award for Distinguished Teaching\, and the University’s Early Career Innovator of the Year Award. \nOver the years\, Arnab has served as Program Committee member and Associate Editor for several database systems journals and conferences including SIGMOD\, VLDB\, ICDE\, and HILDA. Most recently\, Arnab served as Vice President of Data Science at Azuga Inc. (a Bridgestone company) after the acquisition of his connected vehicles analytics startup\, Mobikit. https://arnab.org/ \n 
URL:https://isg.ics.uci.edu/event/arnab-nandi-osu-data-exploration-in-a-camera-first-world-query-and-result-challenges/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20241017T150000
DTEND;TZID=America/Los_Angeles:20241017T160000
DTSTAMP:20260606T114504
CREATED:20241011T010505Z
LAST-MODIFIED:20250211T004309Z
UID:2128-1729177200-1729180800@isg.ics.uci.edu
SUMMARY:Nika Mansouri Ghiasi (ETH): Storage-Centric Computing for Genomics and Metagenomics
DESCRIPTION:Title: Storage-Centric Computing for Genomics and Metagenomics \nAbstract \nGenomics and metagenomics applications have enabled significant advancements in many critical areas. The exponential growth of genomic data poses unprecedented challenges in genomics and metagenomic applications. These applications suffer from significant data movement overheads from the storage system. To fundamentally address these overheads\, we make a case for storage-centric computing. \nFirst\, we propose GenStore\, the first in-storage processing system designed for genome sequence analysis that greatly reduces both data movement and computational overheads of genome sequence analysis by exploiting low-cost and accurate in-storage filters. We address the challenges of in-storage processing\, supporting reads with 1) different read lengths and error rates\, and 2) different degrees of genetic variation. Through rigorous analysis of read mapping processes\, we design low-cost hardware accelerators and data/computation flows inside a NAND flash-based SSD. Our evaluation using a wide range of real genomic datasets shows that GenStore significantly improves the read mapping performance of state-of-the-art software (hardware) baselines by 2.07-6.05× (1.52-3.32×) for read sets with high similarity to the reference genome and 1.45-33.63× (2.70-19.2×) for read sets with low similarity to the reference genome. \nSecond\, we propose MegIS\, the first in-storage processing system designed to significantly reduce the data movement overhead of the end-to-end metagenomic analysis pipeline. MegIS is enabled by our lightweight design that effectively leverages and orchestrates processing inside and outside the storage system. Through our detailed analysis of the end-to-end metagenomic analysis pipeline and careful hardware/software co-design\, we address \nin-storage processing challenges for metagenomics via specialized and efficient 1) task partitioning\, 2) data/computation flow coordination\, 3) storage technology-aware algorithmic optimizations\, 4) data mapping\, and 5) lightweight in-storage accelerators. MegIS’s design is flexible\, capable of supporting different types of metagenomic input datasets\, and can be integrated into various metagenomic analysis pipelines. Our evaluation shows that MegIS outperforms the state-of-the-art performance- and accuracy-optimized software metagenomic tools by 2.7×–37.2× and 6.9×–100.2×\, respectively\, while matching the accuracy of the accuracy-optimized tool. MegIS achieves 1.5×–5.1× speedup compared to the state-of-the-art metagenomic hardware-accelerated (using processing-in-memory) tool\, while achieving significantly higher accuracy. \n Bio \nNika Mansouri Ghiasi is a Ph.D. candidate in the SAFARI Research Group at ETH Zürich\, working with Professor Onur Mutlu. Her current research interests are in computer architecture and bioinformatics\, focusing on 1) large-scale bioinformatics applications\, storage systems\, and their interactions\, and 2) emerging technologies such as ultra-dense 3D integrated systems. Nika has co-authored several works on these topics in major computer architecture venues such as ISCA\, ASPLOS\, and MICRO\, as well as major bioinformatics venues such as ISMB\, Bioinformatics\, and Nature Reviews. \n 
URL:https://isg.ics.uci.edu/event/nika-mansouri-ghiasi-eth-storage-centric-computing-for-genomics-and-metagenomics/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20241018T110000
DTEND;TZID=America/Los_Angeles:20241018T120000
DTSTAMP:20260606T114504
CREATED:20241008T004750Z
LAST-MODIFIED:20241009T163638Z
UID:2109-1729249200-1729252800@isg.ics.uci.edu
SUMMARY:Yannis Papakonstantinou (Google): Vector Search and Databases
DESCRIPTION:Yannis Papakonstantinou\nDistinguished Engineer\, Query Processing and GenAI at Google Cloud Databases\nAbstract:\nSemantic search ability\, via embedding (vectors) and vector indexing\, has been added to Google Cloud Platform (GCP) databases in order to enable GenAI applications. The inclusion of vectors in databases confers many of the traditional benefits of databases: Developers can now develop GenAI applications on their familiar and trusted databases. Furthermore\, developers can be sure that the vectors are also up-to-date and transactionally consistent. The rapid adoption of the postgres pgvector extension is evidence of the appreciation of these benefits by the database developer community.\nThe inclusion of vectors in databases raises three R&D questions\, which we will discuss in this talk.\nFirst\, can databases with vector abilities perform as well as purpose-built vector databases in pure vector search? What does it take to achieve this?\nSecond\, what are the opportunities and respective R&D challenges that emerge at the intersection of SQL data and vectors?\nFinally\, what does it take to facilitate and align the experience of SQL developers with the world of vector management and vector indexing? \nBio:\nYannis Papakonstantinou is a Distinguished Engineer\, working on Query Processing and GenAI\, at Google Cloud. He is also an Adjunct Professor of Computer Science and Engineering at the University of California\, San Diego\, following many years of having been a UCSD regular faculty member. Previously he was an architect in query processing & ETL at Databricks. Earlier\, he was a Senior Principal Scientist at Amazon Web Services from 2018-2021 and was a consultant for AWS since 2016. He was the CEO and Chief Scientist of Enosys Software\, which built and commercialized an early Enterprise Information Integration platform for structured and semistructured data. The Enosys Software was OEM’d and sold under the BEA Liquid Data and BEA Aqualogic brand names\, eventually acquired in 2003 by BEA Systems.\nHis R&D work has been mostly on query processing with focus on querying semistructured data. He has published over one hundred twenty research articles that have received over 21\,000 citations. Yannis holds a Diploma of Electrical Engineering from the National Technical University of Athens\, MS and Ph.D. in Computer Science from Stanford University (1997). \n 
URL:https://isg.ics.uci.edu/event/yannis-papakonstantinou-google-vector-search-and-database/
LOCATION:DBH 6011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20241101T130000
DTEND;TZID=America/Los_Angeles:20241101T140000
DTSTAMP:20260606T114504
CREATED:20241017T163643Z
LAST-MODIFIED:20241017T163750Z
UID:2133-1730466000-1730469600@isg.ics.uci.edu
SUMMARY:Michael Jungmair (TU Munich): A Compiler-Centric Query Engine Design for Mixed Workloads and Modern Hardware
DESCRIPTION:A Compiler-Centric Query Engine Design for Mixed Workloads and Modern Hardware \n11/1/2024\, 1:00 PM 2 PM\, DBH 3011 \nMichael Jungmair\, Technical University of Munich\, Germany \nAbstract: Relational query engines are increasingly expected to handle more than just relational queries and also run on modern hardware that is increasingly parallel and distributed. However\, it is not clear how existing system designs can deal with these two challenges effectively.\nWe propose a holistic\, compiler-centric design for data processing systems that is designed for tightly integrated optimization and execution of relational queries\, non-relational workloads and user-defined functions on modern hardware. \nBio: Michael Jungmair is a third year PhD student at the Technical University of Munich. Supervised by Jana Giceva\, he is performing research in the intersection of database engines and compiler technology. So far\, this research culminated in the design and implementation of LingoDB (lingo-db.com)\, a novel query engine based on the MLIR compiler framework
URL:https://isg.ics.uci.edu/event/michael-jungmair-tu-munich-a-compiler-centric-query-engine-design-for-mixed-workloads-and-modern-hardware/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20241115T130000
DTEND;TZID=America/Los_Angeles:20241115T140000
DTSTAMP:20260606T114504
CREATED:20241008T012443Z
LAST-MODIFIED:20250211T004236Z
UID:2116-1731675600-1731679200@isg.ics.uci.edu
SUMMARY:Kunwoo Park: CloudMapper: A Pay-as-you-go Solution for Accelerating Genomics Sequence Alignment Using Public Clouds
DESCRIPTION: CloudMapper: A Pay-as-you-go Solution for Accelerating Genomics Sequence Alignment Using Public Clouds \nAbstract: Single-cell RNA sequencing (scRNA-seq) alignment remains a computational bottleneck in bioinformatics data analysis. As datasets grow in size and complexity\, traditional alignment tools\, such as CellRanger\, face significant limitations\, often requiring hours or even days. Furthermore\, setting up the necessary infrastructure frequently demands familiarity with complex tools like Slurm\, creating a barrier for researchers without cluster management expertise. To address these challenges\, we introduce CloudMapper\, a pay-as-you-go solution that simplifies and accelerates scRNA-seq alignment through scalable public cloud resources. Built on the Texera platform\, CloudMapper allows researchers to launch and manage clusters on cloud providers like AWS via an intuitive web interface\, enabling parallel processing of large scRNA-seq datasets. By automating infrastructure setup and providing streamlined resource options\, CloudMapper offers bioinformaticians flexible tools to balance cost and performance\, significantly reducing alignment time and technical overhead. In this talk\, we’ll explore CloudMapper’s problem-centered design\, architectural framework\, and user experience tailored for bioinformaticians. We’ll also discuss ongoing research challenges\, such as minimizing manual configuration for non-expert users\, optimizing resource pre-provisioning to reduce cluster launch times\, and future plans to expand CloudMapper’s capabilities beyond RNA alignment to broader bioinformatics and data-processing tasks. \nBio: Kunwoo Park is a second-year Ph.D. student in the Computer Science Department at UC Irvine\, with research interests in data systems and big data a
URL:https://isg.ics.uci.edu/event/kunwoo-park-talk/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20241122T110000
DTEND;TZID=America/Los_Angeles:20241122T120000
DTSTAMP:20260606T114504
CREATED:20241008T012638Z
LAST-MODIFIED:20260401T210003Z
UID:2118-1732273200-1732276800@isg.ics.uci.edu
SUMMARY:Sainyam Galhotra (Cornell): Context-aware Responsible Data Science
DESCRIPTION:Abstract: Data-based systems are increasingly used in applications that have far-reaching consequences and long-lasting societal impact. However\, the development process remains highly specialized\, tedious\, and unscalable. This produces a manually fine-tuned rigid solution that works only for one specific problem in one specific context. The system fails to adapt to the changing world and severely limits the full utilization of valuable data. \nSo\, how can you avert this fate for your systems? \nIn this talk\, I present my vision of context-aware systems that enable even non-expert users to develop correct\, explainable\, and equitable data-science pipelines. To achieve this\, I will focus on i) re-thinking the design of data science pipelines\, and ii) the importance of causal inference for trustworthy data analysis. I will present a data discovery framework that automatically identifies useful data on behalf of end-users for various tasks. Lastly\, I will discuss my proposal of leveraging counterfactual reasoning and causal inference to quantify the impact of an input on the outcome. These topics are the pieces of the puzzle that come together to create the Data Scientists’ holy grail – an easily deployable\, scalable\, and robust system that you can trust even as everything around it evolves. \n\n\n\n\nBio: Sainyam Galhotra is an Assistant Professor in Computer Science at Cornell University and a field member for Computer Science\, Statistics and Data Science. Previously\, he was a Computing Innovation Fellow pursuing postdoctoral research at the University of Chicago. He received his Ph.D. from the University of Massachusetts Amherst under the supervision of Prof. Barna Saha (currently at UC San Diego). The goal of his research is to lay the foundation of responsible data science\, that enable efficient development and deployment of trustworthy data analytics applications. His research has combined techniques from Data Management\, Probabilistic Methods\, Causal Inference\, Machine Learning\, and Software Engineering. His research has been published in top-tier Data Management (SIGMOD\, VLDB\, PODS\, & ICDE)\, AI (NeurIPS\, AAAI & AIES) and Software Engineering (FSE) conferences. He is a recipient of the Best Paper Award in FSE 2017 and Most Reproducible Paper Award in both SIGMOD 2017 and 2018\, and Best Artifact Paper Honorable Mention Award in SIGMOD 2023. He was recognized as a Data Science rising star\, a DAAD AInet Fellow\, and as the first recipient of the Krithi Ramamritham Award at UMass for contribution to database research. \nhttps://sainyamgalhotra.com/
URL:https://isg.ics.uci.edu/event/sainyam-galhotra-cornell/
LOCATION:DBH 6011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20241206T130000
DTEND;TZID=America/Los_Angeles:20241206T140000
DTSTAMP:20260606T114504
CREATED:20241008T012801Z
LAST-MODIFIED:20250211T004551Z
UID:2120-1733490000-1733493600@isg.ics.uci.edu
SUMMARY:Binbin Gu: PoneglyphDB: Efficient Non-interactive Zero-Knowledge Proofs for Arbitrary SQL Queries Verification
DESCRIPTION:Abstract:\nIn database applications involving sensitive data\, the dual imperatives of data confidentiality and provable (verifiable) query processing are important. This paper introduces PoneglyphDB\, a database system that leverages non-interactive zero-knowledge proofs (ZKP) to support both confidentiality and provability. Unlike traditional databases\, PoneglyphDB enhances confidentiality by ensuring that raw data remains exclusively with the host\, while also enabling verifying the correctness of query responses by providing proofs to clients. \nThe main innovation in this paper is proposing efficient ZKP designs (called circuits) for basic operations in SQL query processing. These basic operation circuits are then combined to form ZKP circuits for larger\, more complex queries. PoneglyphDB’s circuits are \ncarefully designed to be efficient by utilizing advances in cryptography such as PLONKish-based circuits\, recursive proof composition techniques\, and designing with low-order polynomial constraints. We demonstrate the performance of PoneglyphDB with the standard TPC-H benchmark. Our experimental results show that PoneglyphDB can efficiently achieve both confidentiality and provability\, outperforming existing state-of-the-art ZKP methods.
URL:https://isg.ics.uci.edu/event/binbin-gu-tbd/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20250110T130000
DTEND;TZID=America/Los_Angeles:20250110T140000
DTSTAMP:20260606T114504
CREATED:20250211T004808Z
LAST-MODIFIED:20250211T004808Z
UID:2160-1736514000-1736517600@isg.ics.uci.edu
SUMMARY:Shengquan Ni: IcedTea: Efficient and Responsive Time-Travel Debugging in Dataflow Systems
DESCRIPTION:Abstract: As data analytics grow in popularity\, the increasing volume of data and complexity of jobs require users to wait longer to see results\, hindering productivity and causing frustration. To address this\, we developed an actor-based data processing engine optimized for pipelined execution\, featuring a flexible interface for defining control messages. This enables users to seamlessly customize and manage interactions during execution.\nWhile interactive systems help users identify incorrect behavior earlier\, the pipelined and distributed nature of execution often leads to non-deterministic behavior\, making it difficult to pinpoint the root cause of bugs. To tackle this\, we created IcedTea\, a time-travel debugger with tuple-based step semantics. IcedTea enables lightweight recording of execution\, allowing users to roll back to previous states and step forward to investigate issues effectively.\nAs data jobs increasingly operate in cloud environments\, adapting the system to the cloud is crucial. Challenges such as resource isolation and fault tolerance must be addressed to ensure security and reliability in distributed systems.\nIn this talk\, I will focus on IcedTea\, a time-travel debugger that allows users to record a pipelined distributed workflow execution and jump back to specific points to replay the execution step-by-step. Our evaluation demonstrates that IcedTea effectively helps identify state-related bugs with minimal overhead added to the original execution.
URL:https://isg.ics.uci.edu/event/shengquan-ni-icedtea-efficient-and-responsive-time-travel-debugging-in-dataflow-systems/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20250117T130000
DTEND;TZID=America/Los_Angeles:20250117T140000
DTSTAMP:20260606T114504
CREATED:20241008T012400Z
LAST-MODIFIED:20250211T005019Z
UID:2114-1737118800-1737122400@isg.ics.uci.edu
SUMMARY:Abhishek Singh: LogPoseDB: Transaction Handoff and Agreement in Edge-Cloud Systems
DESCRIPTION:Abstract: Emerging IoT and edge applications demand fast response times that cannot be achieved by faraway cloud datacenters. This motivates building edge-cloud systems where nodes on the edge can participate in the processing and storage of data. However\, building an edge-cloud transaction processing system faces two main challenges: (1) Inefficient transaction execution due to low concurrency arising from long  Round Trip Times between Edge and Cloud\, (2)The absence of a dedicated edge nodes infrastructure\, and (2) edge nodes may be untrusted.\nWe propose LogPoseDB\, an edge-cloud database that spans both edge and cloud nodes. LogPoseDB aims to overcome the challenges above. LogPoseDB proposes dynamic state detachment\, where the state storage and processing is treated as a disjoint resource between the cloud and the edge. LogPoseDB’s transaction processing protocol ensures fast response by avoiding wide-area coordination with the cloud or other faraway edge nodes. This is done by leveraging data locality of detached state and by methods that build on the areas of transaction chopping and commutativity.\nLogPoseDB does not require any dedicated edge infrastructure. Rather\, clients may utilize their edge nodes—if desired—to perform the processing and storage of their data while they need it. (Other clients can still process their data on cloud nodes.)  To address the trust challenges\, we propose a byzantine fault-tolerant (BFT) protocol that targets edge nodes. LogPoseDB’s BFT replication protocol proposes the principle of  remote lazy trust that enables efficient BFT edge coordination by utilizing a remote trusted node asynchronously. \nBio: Abhishek is a PhD Candidate supervised by Prof. Faisal Nawab. His research includes building data management and transaction processing systems for the emerging Edge-Cloud infrastructure.
URL:https://isg.ics.uci.edu/event/abhishek-singh-talk/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20250124T110000
DTEND;TZID=America/Los_Angeles:20250124T120000
DTSTAMP:20260606T114504
CREATED:20250211T005142Z
LAST-MODIFIED:20250211T005142Z
UID:2163-1737716400-1737720000@isg.ics.uci.edu
SUMMARY:Xiaodong Zhang (The Ohio State University): Data Management: Interactions with Computer Architecture and Systems
DESCRIPTION:Abstract:  We have entered a data-centric computing era\, characterized by the coexistence of diverse parallel and specialized hardware accelerators along with general-purpose processors. In this ecosystem\, minimizing data movement has become a critical priority for the design of both systems and applications. Over the years\, the CPU-centric ecosystem has evolved into a one-size-fits-all environment\, supporting a wide variety of applications. However\, its efficiency in performance\, computational power\, and energy consumption has steadily declined\, making the general-purpose computing model increasingly unsustainable for the rapidly growing demand of data analytics and machine learning applications. In this presentation\, I will explore the constraints and obstacles inherent in our current computing ecosystem. I will also provide case studies to support the evolution of computer hardware and software for high-performance data processing\, featuring advanced hardware components such as GPUs\, RDMA\, and other relevant technologies. All associated algorithms and software implementations are open source\, with some having been integrated into production systems. The system infrastructure transition for data-centric workloads also challenges our college computer science education. In this context\, I will briefly introduce a new textbook of mine\, which shares the same title of this presentation\, and was published by Cambridge University Press last year. \nBio: Xiaodong Zhang is a University Distinguished Scholar and the Robert M. Critchfield Professor in Engineering at the Ohio State University. His research interests focus on data management in computer and distributed systems. Driven by a commitment to translate his academic research solutions into cutting-edge technology\, he has made continuous efforts in advancing the design and implementation of several major production systems. He was recognized by the 2020 ACM Microarchitecture Test of Time Award for his contributions on memory architecture design and the 2024 VLDB Test of Time Award for an initial development of open-source spatial data processing systems on large-scale clusters. He received his Ph.D. in Computer Science from University of Colorado at Boulder\, where he was honored with a Distinguished Engineering Alumni Award in 2011. He received the Education Leadership Award from the Lutron Foundation for chairing the Department of Computer Science and Engineering at Ohio State from 2006 to 2018. He is a Fellow of the ACM\, and a Fellow of the IEEE.
URL:https://isg.ics.uci.edu/event/xiaodong-zhang-the-ohio-state-university-data-management-interactions-with-computer-architecture-and-systems/
LOCATION:DBH 6011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20250131T130000
DTEND;TZID=America/Los_Angeles:20250131T140000
DTSTAMP:20260606T114504
CREATED:20250211T005308Z
LAST-MODIFIED:20250211T005308Z
UID:2165-1738328400-1738332000@isg.ics.uci.edu
SUMMARY:Yicong Huang: Building Data Systems to Broaden the Access of Data Science\, AI\, and ML
DESCRIPTION:Abstract \nIn an era where data-driven decision-making shapes industries\, governments\, and everyday life\, the ability to leverage data science has become an essential skill. Modern data science tools—encompassing data collection\, analysis\, and advanced techniques such as artificial intelligence (AI)\, machine learning (ML)\, and large language models (LLMs)—play a critical role across diverse fields. However\, many of these tools rely heavily on programming expertise\, which limits their accessibility to a broader audience. In this talk\, I will discuss my work on Texera\, an open-source system designed to make data science\, AI\, and ML accessible to everyone. Texera features a low-code and even no-code workflow interface\, enabling users of varying technical backgrounds to engage in data science. It emphasizes cloud-based collaboration for data science\, enabling multiple users to seamlessly work on the same shared execution\, much like the collaborative experiences offered by Google Docs and Overleaf. I will discuss the design choices behind our actor-based parallel engine for executing data science workflows. I will also highlight my works on the system’s innovative features for interacting with data workflow executions\, focusing on debugging capabilities that improve transparency and enhance usability. To conclude\, I will outline future research directions aimed at developing a comprehensive ecosystem that integrates advanced interfaces and intelligent systems\, enhancing accessibility\, efficiency\, and user empowerment in data science. \nBio \nYicong Huang is a final-year Ph.D. candidate from the Information Systems Group (ISG)\, Computer Science Department\, University of California\, Irvine.  Under the guidance of Dr. Chen Li\, his research focuses on big data management\, data-processing systems\, and machine learning systems. Yicong has made significant contributions in the Texera project. He has published in top-tier database venues such as VLDB\, SIGMOD and ICDE. His interdisciplinary reach spans venues like TOCHI\, PNAS Nexus\, JAMIA\, AMIA\, and PloS ONE. Yicong completed research internships at Bytedance\, VISA\, and Observe\, and contributed to patents and papers. His research earned a Best Demo Runner-Up Award at SIGMOD 2024. He received honors such as the 2024 Graduate Dean’s Dissertation Fellowship and the 2023 Public Impact Fellowship from UCI. For more information about his work\, please visit https://yicong-huang.github.io.
URL:https://isg.ics.uci.edu/event/yicong-huang-building-data-systems-to-broaden-the-access-of-data-science-ai-and-ml/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20250207T110000
DTEND;TZID=America/Los_Angeles:20250207T120000
DTSTAMP:20260606T114504
CREATED:20250112T010756Z
LAST-MODIFIED:20250225T040958Z
UID:2153-1738926000-1738929600@isg.ics.uci.edu
SUMMARY:Amr El Abbadi (UCSB): Practical Approaches for Private and Scalable Information Data Management Systems
DESCRIPTION:Practical Approaches for Private and Scalable Information Data Management Systems\n  \nAmr El Abbadi \nProfessor of Computer Science \nUniversity of California at Santa Barbara  \nAbstract.  \nIncreasingly countries and regions have strict laws and regulations to protect the privacy of personal data. For example\, the states of the European Union (EU) enforce the General Data Protection Regulations (GDPR) to protect personal data of individuals living in the EU. Much research has focused on preserving the privacy of data using various advanced cryptographic techniques. However\, and irrespective of the privacy of the data itself\, just the queries requesting the data raise severe privacy concerns owing to numerous attacks and data breaches using access patterns. Our goal in this talk is to demonstrate how private access of data\, using sophisticated\, expensive but secure cryptographic methods can become a practical reality in the near future. Our focus is on supporting oblivious queries and thus hide any associated access patterns on both private and public data.  For private data\, ORAM (Oblivious RAM) is one of the most popular approaches for supporting oblivious access to encrypted data. However\, most existing ORAM datastores are not fault tolerant and hence an application may lose all of its data when failures occur. To achieve fault tolerance\, we propose QuORAM\, the first datastore to provide oblivious access and fault-tolerant data storage using a quorum-based replication protocol.  For public data\, PIR (Private Information Retrieval) is the main mechanism proposed in recent years.  However\, PIR requires the server to consider data as an array of elements and clients retrieve data using an index into the array. This requirement limits the use of PIR in many practical settings\, especially for key-value stores\, where the client may be interested in a particular key\, but does not know the exact location of the data at the server. In this talk we will discuss recent efforts to overcome these limitations\, using Fully Homomorphic Encryption (FHE)\, to improve the performance\, scalability and expressiveness of privacy preserving queries of public data.  \nBiography \nAmr El Abbadi is a Professor of Computer Science. He received his B. Eng. from Alexandria University\, Egypt\, and his Ph.D. from Cornell University. His research interests are in the fields of fault-tolerant distributed systems and databases\, focusing recently on Cloud data management\, blockchain based systems and privacy concerns. Prof. El Abbadi is an ACM Fellow\, AAAS Fellow\, and IEEE Fellow.  He was Chair of the Computer Science Department at UCSB from 2007 to 2011. He served as Associate Graduate Dean at the University of California\, Santa Barbara from 2021–2023.  He served as a journal editor for several database journals\, including\, The VLDB Journal\, IEEE Transactions on Computers and The Computer Journal. He was Program Chair for multiple database and distributed systems conferences\, including most recently SIGMOD 2022. He served on the executive committee of the IEEE Technical Committee on Data Engineering (TCDE) and was a board member of the VLDB Endowment from 2002 to 2008. In 2007\, Prof. El Abbadi received the UCSB Senate Outstanding Mentorship Award for his excellence in mentoring graduate students. In 2013\, his student\, Sudipto Das received the SIGMOD Jim Gray Doctoral Dissertation Award. Prof. El Abbadi is also a co-recipient of the Test of Time Award at EDBT/ICDT 2015.  Recently\, papers he co-authored received an Outstanding paper award in NSDI (Networked System Design and Implementation) 2024 and the Test of Time Award from MDM (Mobile Data Management)2024.   He has published over 350 articles in databases and distributed systems and has supervised over 40 PhD students. \nHost: Faisal Nawab
URL:https://isg.ics.uci.edu/event/amr-el-abbadi-ucsb-practical-approaches-for-private-and-scalable-information-data-management-systems/
LOCATION:DBH 6011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20250214T130000
DTEND;TZID=America/Los_Angeles:20250214T140000
DTSTAMP:20260606T114504
CREATED:20250211T005519Z
LAST-MODIFIED:20250225T032621Z
UID:2167-1739538000-1739541600@isg.ics.uci.edu
SUMMARY:Jiadong Bai:  Supporting Data Science Education Using Texera with a Cloud Infrastructure
DESCRIPTION:Abstract\nIn this talk\, we will first present our paper at the DSE-K12 conference with the title “DS4ALL: Teaching High-School Students Data Science and AI/ML Using the Texera Workflow Platform as a Service.” Traditional data science education often requires students to have programming experience and install local software. It also makes collaboration inefficient and slows down the feedback loop when students need help from TAs. To address these challenges\, we developed a new teaching paradigm using Texera\, and successfully hosted DS4ALL summer programs in 2023 and 2024. As a result\, students with no prior coding experience were able to perform data analysis using AI/ML techniques on the platform.The second part of the talk will focus on how we are scaling to a broader audience based on the success of the Texera system used in DS4ALL\,. There are several challenges of building such a cloud  infrastructure\, such as handling versatile service traffic\, supporting concurrent isolated workflow executions\, and managing diverse data storage needs. To address these challenges\, we develop Texera to be able to run using Kubernetes to achieve scalable service management; For workflow execution\, we run each workflow in an isolated Kubernetes pod to ensure performance and security. On the data storage side\, we design a storage layer that integrates LakeFS for managing versioned datasets and Apache Iceberg for handling versioned tables with support for concurrent read/write operations. These solutions allow Texera to be deployed as a cloud service to provide a scalable\, secure\, and efficient environment for data science workflows.Bio\nJiadong Bai is a second-year Ph.D. student in the Computer Science Department at UC Irvine\, with research interests in data systems\, data science\, and big data analysis. He’s supervised by Prof. Chen Li.Shengquan Ni is a sixth-year Ph.D. student in the Department of Computer Science advised by Professor Chen Li. His research interests include big data processing\, distributed systems\, data analytics\, and data science.
URL:https://isg.ics.uci.edu/event/jiadong-bai-tbd/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20250221T130000
DTEND;TZID=America/Los_Angeles:20250221T140000
DTSTAMP:20260606T114504
CREATED:20250211T005602Z
LAST-MODIFIED:20250225T040933Z
UID:2169-1740142800-1740146400@isg.ics.uci.edu
SUMMARY:Ketan C Maheshwari (Oak Ridge National Laboratory): Enacting Distributed HPC Workflows: Opportunities and Challenges
DESCRIPTION:Abstract: The Dept of Energy (DOE) complex comprises of many science facilities that could be classified as data producing (eg. the Advanced Photon Source at Argonne National Laboratory) and consuming (eg. the Leadership Class Computing Facilities at the Oak Ridge National Laboratory) facilities. Modern science campaigns often require extensive usage of more than one such facilities which may be located remote from each other and administered separately. This presents opportunities for the scientific computational workflows to aid in the process. At the same time\, there are equally daunting challenges faced to successfully and smoothly accomplish these workflows. Our talk will dive into these opportunities and challenges and dive into proposed solutions and path forward. \nBio: Dr. Ketan Maheshwari is a Senior Linux Systems Engineer within the NCCS Division at the Oak Ridge National Laboratory. He has over 15 years of experience working with HPC systems with over 10 years with the leadership class systems at ALCF and OLCF. He is interested in science applications porting to large scale computing infrastructures and has a hands-on expertise in workflows\, parallelization and HPC. He has given several talks on technical topics at local as well as international venues\, most notably on GNU Parallel (eScience’23\, CUG’24\, PEARC’24\, SC’24)\, Linux Terminal Tools (USENIX/LISA’19\, LOPSA’18) and Swift Workflows (CCGrid’13). Ketan received his PhD in the area of Scientific Workflows from University of Nice and a Masters in Grid Computing from University of Amsterdam. \n  \nHost: Chen Li
URL:https://isg.ics.uci.edu/event/ketan-c-maheshwari-oak-ridge-national-laboratory/
LOCATION:DBH 3011
END:VEVENT
END:VCALENDAR