ISG Talks are sponsored by Couchbase.

<< All Talks

Sushant Jain : Large scale and low latency data distribution from database to servers

DBH 6011

Many applications at Google are structured with data stored in a transactional database (source of truth) and same data being required by servers distributed worldwide. For efficient and fast computation servers store this data in memory. Further, the database is changing continuously and we need to update the in-memory view of these large number of servers in real-time. For example, in Google Search Ads application we have Advertisers configuration stored in a database and this data is loaded in the memory of various servers to compute Ads in a scalable and fast way. In this talk, we describe our solution to this data distribution problem and the challenges that we encountered in providing a highly reliable and low latency service.

Dr. Andrey Balmin and Mayank Pradhan (Workday): Workday Prism Analytics: Unifying Interactive and Batch Data Processing Using Apache Spark

DBH 3011

Abstract: Workday Prism Analytics enables data discovery and interactive Business Intelligence analysis for Workday customers. To prepare data for analysis, business users can setup data transformation pipelines in an interactive, self-service, modern data prep environment. Thus, Workday Prism Analytics needs to run three types of scalable data processing applications: "always on" query engine and data […]

Vinayak Borkar (FireEye Inc.): The X15 Machine Data Management Platform

DBH 4011

ABSTRACT: Machine Data (aka Log Data) is continuously produced by applications and devices as a result of human-computer and computer-computer interactions. Although most of this data was initially generated for ad-hoc human consumption to aid with debugging and troubleshooting systems and deployments, their systematic treatment using well-known data processing techniques can unlock valuable insight about operations […]

David Lomet (Microsoft Research): How Data Caching Systems Succeed

DBH 4011

Data in traditional "caching'' data systems resides on secondary storage, and is read into main memory only when operated on. This limits system performance. Main memory data stores with data always in main memory are much faster. But this performance comes at a cost. In this paper, we analyze the costs of both in-memory operations and secondary storage operations where data is not "in cache''. We study the performance impact of cache misses on caching system performance. The analysis considers both execution and storage costs. Based on our analysis, we derive cost/performance results for a data caching system and a main memory system to understand where each demonstrates the best cost per operation, what is driving the cost differences, and the scale of the differences. This analysis (1) provides insight into why data caching systems continue to dominate the market; (2) points to higher performance that does not rely on simply increasing main memory cache size; and (3) suggests a path to lower costs and hence better cost/performance.

Prof. Jeff Ullman: Data Science: Is it Real?

DBH 6011

ABSTRACT: We shall discuss the various ways in which data science is approached by different communities, including the Statistics, Machine-Learning, and Database communities.  Each presents a different viewpoint and values different outcomes. Some consequences of these approaches will be discussed. As an example, of why data science is not machine learning, we shall sketch two […]

Prof. Sang-Woo Jun: Lowering the cost of large-scale data analytics via efficient use of flash storage

DBH 3011

In this talk, I present the storage systems aspect of the ongoing work on using relatively cheap solid-state secondary storage to replace expensive DRAM for analytics on large amounts of data, using as examples graph analytics and the bioinformatics application somatic mutation finding. Both applications are inherently random access intensive, which is a bad fit […]

Xiangyao Yu: Transaction Processing at Scale

DBH 3011

Abstract: Online transaction processing (OLTP) is critical for applications including finance, e-commerce, social networks, and healthcare. The increasing performance demands of these applications require OLTP to scale massively. Concurrency control is a major scalability bottleneck in such systems. This talk presents three projects that identify and help resolve scalability challenges. First, I present a scalability […]

Fatemeh Nargesian: Data Enrichment for Data Science

DBH 3011

Data Enrichment for Data Science Fatemeh Nargesian, University of Toronto March 21, Thursday, 2019 2:00 - 3 pm, DBH 3011 Refreshments start at 1:30 pm Data Science is built on the power of data processing and data preparation. In this talk, I discuss the challenges of data preparation for end-to-end data science. Particularly, I talk […]

Pat Helland: There’s No Substitute for Interchangeability

DBH 3011

Speaker: Pat Helland (Salesforce.com) Title: There's No Substitute for Interchangeability Time: 3-4 PM Place: 3011 DBH Abstract: Distributed systems have many challenges including loosely coupled systems, long running work, and distributed workflow.  In addition,  replication with out-of-order reconciliation is quite difficult, especially when composed with the other challenges.  In this talk, we propose data-centric REST-style […]

Michal Shmueli-Scheuer : Conversational bots for customer support

DBH 4011

Conversational bots for customer support Michal Shmueli-Scheuer, IBM Research - Haifa August 9, 2019, Friday, 3:00 pm - 4:00 pm, DBH 4011 Abstract: In this talk, I'll cover various aspects of conversational bots, focusing on the domain of customer support. Often, human conversations with bots mimic the way humans interact with each other. Moreover, even […]

Gift Sinthong: AsterixDB Meets Data Science

DBH 4011

Abstract: In the last few years, Data Science has become an increasingly important use case for data platforms. To support the full Big Data analysis lifecycle, we have examined one of the most popular exploratory data analytics tools, Pandas, which has a serious problem: scalability. Exploratory tools such as Pandas only work well against locally […]

Multistage Adaptive Load Balancing in Big Active Data Publish Subscribe Systems

DBH 3011

Speaker: Hang Time: 12:30pm Room: 3011 We address issues in the design and operation of a Big Active Data Publish Subscribe (BAD Pub/Sub) systems to enable the next generation of enriched notification systems that can scale to societal levels. The proposed BAD Pub/Sub systems aim to ingest massive amounts of data from heterogeneous publishers and […]

Texera: Supporting Big Data Analytics for Domain Experts through GUI-based workflows

DBH 3011

Speakers:  Avinash Kumar,  Shengquan Ni, Zuozhi Wang   Abstract: Big data analytics is a daunting task for domain experts such as doctors and teachers. Their non-IT background makes it challenging for them to write analytics code and maintain computing infrastructures to efficiently process large amounts of data. Existing data analytics frameworks that offer GUI-based alternatives […]

AquaEIS: Middleware Support for Event Identification in CommunityWater Infrastructures

DBH 3011

Speaker: Quing Han   Abstract: Real-time event identification is critical in complex distributed infrastructures, e.g., water systems, where failures are difficult to isolate. We present AquaEIS, an event-based middleware tailored to the problem of locating sources of failure (e.g., contamination) in community water infrastructures. The inherent complexity of underground hydraulic systems combined with aging infrastructure presents […]

Scalable transaction and polystore data management in LeanXcale

DBH 5011

Speaker: Ricardo Jimenez-Péris (LeanXcale, Spain), Patrick Valduriez (Inria, France) Abstract: Hybrid Transaction Analytical Processing (HTAP) is poised to revolutionize data management. By providing online analytics over operational data, HTAP systems open up new opportunities in many application domains where real-time decision is critical. Important use cases are proximity marketing, real-time pricing, risk monitoring, real-time fraud […]

Scalable Programming: Progress, Prospects and Challenges (CS/NetSys Seminar)

DBH 6011

Speaker: Prof. Gul Agha (University of Illinois at Urbana-Champaign) Abstract: Mobile cloud computing, social media, cyberphysical systems, and the internet of things, are examples of increasingly important applications requiring scalable concurrency. The Actor model facilitates programming large-scale concurrent applications. Not surprisingly, Actor languages and frameworks have been widely adopted in industry to address scalability. Although […]

LSM-based storage techniques: a tutorial

Speaker: Chen Luo Abstract: Recently, the log-structured merge-tree (LSM-tree) has been widely adopted for use in the storage layer of modern NoSQL systems. Because of this, there have been a large number of research efforts, from both the database community and the operating systems community, that try to improve various aspects of LSM-trees. In this […]

A Theoretical View of Distributed Systems (CS Distinguished Seminar Series)

DBH 6011

Speaker: Prof. Nancy Lynch (Massachusetts Institute of Technology) Abstract: For several decades, my collaborators, students, and I have worked on theory for distributed systems, in order to understand their capabilities and limitations in a rigorous, mathematical way. This work has produced many different kinds of results, including: Abstract models for problems that are solved by […]

Building Personal Chronicle of Life Events (Final Defense)

Speaker: Jordan Oh Abstract: Human beings have always been interested in understanding themselves and their surroundings. Learning about the relationship between the two can reveal facts of the present and help predict the future, a critical part to live a better life. With the proliferation of IoT sensor devices, it is now possible to collect […]

Event Detection with Temporal Predicates

DBH 3011

Speaker: Fabio Persia (Free University of Bozen-Bolzano, Italy) Abstract: Human perception tends to group individual values into larger structures, this is also the case for time series data. This tendency inspired us to define an event-detection language based on time intervals, which combines timepoint-based events into larger structures. Complex events can then be defined on […]

Effective Filters and Linear Time Verification for Tree Similarity Joins

DBH 3011

Speaker: Thomas Hütter (University of Salzburg) Abstract: The tree similarity join computes all similar pairs in a collection of trees. Two trees are similar if their edit distance falls within a user-defined threshold. Previous algorithms, which are based on a filter-verify approach, suffer from the following two issues. First, ineffective filters produce a large number […]

Systems and ML at RISELab (CS Distinguished Seminar Series)

DBH 6011

Speaker: Prof. Ion Stoica (University of California at Berkeley) Abstract: In this talk, I will present several of the projects we are developing at RISELab, a two-year old lab at UC Berkeley that focuses on building platforms and algorithms for real-time intelligent decisions, decisions that are secure and explainable. These projects include both systems to […]

Dr. Alfred Spector(Two Sigma) : Opportunities and Perils of Data Science: A Roadmap (ICS Distinguished Lecture)

DBH 6011

Speaker: Dr. Alfred Spector Abstract: Data-driven approaches have led to powerful prediction, optimization and automation techniques. Powered by large-scale, networked computer systems and machine learning algorithms, these have been very impactful to-date and hold great promise in many disciplines, even in the humanities and social sciences. However, no new technology arrives without complications, and we […]

Pat Helland (Salesforce.com) : Depending on Appending

DBH 3011

Speaker: Pat Helland (Salesforce.com) Abstract: Increasingly, we see "Gray Failures" in the datacenter and public cloud. This happens when a server, router, or other device just plain goes slow. This may result in severe problems in the user perceived performance as the slowness cascades, sometimes not slow enough to cause the exclusion of the bad […]

Babak Salimi: Causal Inference for Responsible Data Science

https://uci.zoom.us/j/232157494

ABSTRACT: Scaling and democratizing access to big data promises to provide meaningful, actionable information that supports decision-making. Today, data-driven decisions profoundly affect the course of our lives, such as whether to admit applicants to a particular school, offer them a job, or grant them a mortgage. Unfair, inconsistent, or faulty decision-making raises serious concerns about […]

David Lomet: Better Database Cost/Performance via Programmable SSD Batched I/O

DBH 3011

Abstract: A database storage manager should place data at the most cost/performance-effective tier in the storage hierarchy.  While performance and cost both decrease with distance from the CPU, the cost/performance trade-off depends on how efficiently a storage manager can move data across tiers.  Log structuring (LS) is designed to improve the cost/performance of secondary storage […]

Redesigning Storage Systems for Future Workloads, Hardware, and Performance Requirements (CS Faculty Candidate Seminar)

DBH 3011

Speaker: Oana Balmau (University of Sydney) Abstract: Cloud storage stacks are being challenged by new workloads, new hardware and new performance requirements. First, workloads evolved from following a read-heavy pattern (e.g., a static web-page) to a write-heavy profile where the read:write ratio is closer to 1:1 (e.g., as in the Internet of Things). Second, the […]

Lei Cao: Toward an End-to-end Anomaly Discovery Paradigm

ABSTRACT: Anomaly detection is critical in enterprises, with applications ranging from preventing financial fraud, and defending network intrusions, to detecting imminent device failures. Although previously developed research offers a plethora of stand-alone methods for detecting particular types of anomalies, there is no end-to-end solution for data scientists to effectively discover anomalies over large volumes of […]

Aaron J. Elmore: CrocodileDB – Resource Efficient Database Execution

https://uci.zoom.us/j/92895672890

Abstract: The coming end of Moore’s law requires that data systems be more judicious with computation and resources as the growth in data outpaces the availability of computational resources. Current database systems are eager and aggressively consume resources to immediately and quickly complete the task at hand. Intelligently deferring a task to a later point […]

Karthik Ranganathan: YugabyteDB – Bringing Together the Best of Amazon Aurora and Google Spanner

Speaker: Karthik Ranganathan Abstract: PostgreSQL, a single-node open-source RDBMS, is widely adopted for its powerful set of features. However, PostgreSQL is not built to be used as a cloud-native database, and therefore cannot inherently survive failures, scale horizontally or support geo-distributed deployments. While Amazon Aurora has modified the subsystem of PostgreSQL that writes to disk […]

Yiming Lin (UCI): LOCATER – Cleaning WiFi Connectivity Datasets for Semantic Localization

https://uci.zoom.us/j/95066121155

Speaker: Yiming Lin, UCI Abstract: Sensor data is abundant in our life but often dirty to generate services with high quality. This talk explores the data cleaning challenges that arise in using WiFi connectivity data to locate users to semantic indoor locations such as buildings, regions, rooms. WiFi connectivity data consists of sporadic connections between […]

Phil Bernstein (MSR): Adding Data Management to Orleans – A Journey

Zoom

Speaker: Phil Bernstein Microsoft Research Abstract: I spent eight years adding database features to the Orleans object-oriented programming framework: replication, geo-distribution, transactions, and indexing. The challenge is how to do it when storage is a plug-in service that you don’t control. In this talk, I’ll describe the journey, summarizing the main technical ideas and recounting […]

Jerry Power (I3 Systems): Managing Digital Flows in a Data Driven World

Zoom

Speaker: Jerry Power Bio: Jerry Power is the CEO of I3 Systems and a founder of the I3 Consortium. I3 Systems creates real-time data networks that span organizational and geographical constraints. Prior to the formation of I3 Systems, Jerry was the Executive Director of The Institute for Communication Technology Management (CTM) at USC. Jerry has […]

Yannis Chronis (University of Wisconsin-Madison): Analytic Query Processing using Associative Computing

DBH 4011

Speaker: Yannis Chronis (University of Wisconsin-Madison) Title: Analytic Query Processing using Associative Computing Abstract: We are in the midst of a "Cambrian'' hardware evolution in which a variety of architectures are being invented with a flurry that we haven't seen in a long time. The associative computing paradigm enables designs that utilize memories in new […]

Nandit Soparkar (Ubiquiti): Data-driven AI technologies for a Consumer webapp

Zoom

Title:  Data-driven AI technologies for a Consumer webapp   Abstract: We discuss the challenges, and the opportunity, in providing a consumer-facing data-driven AI webapp. Our presentation will include a demo, make available access to the audience, and cover the technical as also relevant business challenges being addressed. Our webapp is the new CarBeast.com (about 2 months […]

Tim Kraska (MIT): Towards instance-optimized data systems

DBH 6011

Location:  DBH 6011 https://uci.zoom.us/j/94559511434 (for UCI users only) Speaker: Tim Kraska, MIT Abstract: Recently, there has been a lot of excitement around ML-enhanced (or learned) algorithms and data structures. For example, there has been work on applying machine learning to improve query optimization, indexing, storage layouts, scheduling, log-structured merge trees, sorting, compression, sketches, among many other […]

Matt Ingenthron (Couchbase): Couchbase and Distributed Computing Backends for big data processing

DBH 3011

Location:  DBH 3011  Couchbase and Distributed Computing Backends for big data processing Speaker:   Matt Ingenthron,  Engineering Director, Couchbase Biography:  Matt is a Couchbase co-founder and Engineering Director who leads SDK and Connector development at Couchbase. He has a deep software development background with extensive experience scaling Java, Ruby on Rails, and AMP web applications. He […]

Jayant Haritsa (IISc Bangalore): Shedding Light on Opaque Database Queries

DBH 3011

Shedding Light on Opaque Database Queries location: Donald Bren Hall 3011 Speaker:  Jayant Haritsa                  Database Systems Lab                  Indian Institute of Science, Bangalore Abstract: We have recently defined a new query reverse-engineering problem of unmasking SQL queries hidden within opaque database applications. […]

Anand Deshpahde (Persistent Technologies): How to build your own Business

Hybrid: DBH3011 & Zoom

How to build your own Business location: Donald Bren Hall 3011 Zoom info: the meeting will be hybrid and will also be available on zoom https://uci.zoom.us/j/96160303043 Skype for Business https://uci.zoom.us/skype/96160303043   Speaker: Anand Deshpande Founder, Chairman and Managing Director, Persistent Technologies Host: Prof. Sharad Mehrotra Abstract: In this talk Dr. Deshpande will provide insight into […]

Sadeem Alsudais: Drove: Tracking Execution Results of Workflows on Large Data

DBH 4011

Abstract: Data analytics using workflows is an iterative process, in which an analyst makes many iterations of changes, such as additions, deletions, and alterations of operators and their links. In many cases, the analyst wants to compare these workflow versions and their execution results to help decide the next iteration of changes. To this end, […]

Qiushi Bai: QueryBooster-Improving SQL Performance Using Middleware Services for Human-Centered Query Rewriting + Demo

DBH 4011

Title: QueryBooster: Improving SQL Performance Using Middleware Services for Human-Centered Query Rewriting Abstract: Query latency is critical in many database-backed applications where users need answers quickly to gain timely insights and make mission-critical decisions.  "Query rewriting" is one of the query optimization techniques which transforms SQL queries to more efficient formats based on pre-defined rewriting […]

Xiaozhen Liu: Demonstration of Collaborative and Interactive Workflow-based Data Analytics in Texera

DBH 4011

Abstract: Collaborative data analytics is becoming increasingly important due to the higher complexity of data science, more diverse skills from different disciplines, more common asynchronous schedules of team members, and the global trend of working remotely. In this demo we will show how Texera supports this emerging computing paradigm to achieve high productivity among collaborators […]

Abhishek Singh: WedgeBlock – An Off-Chain Secure Logging Platform for Blockchain Applications

DBH 4011

Abstract In recent years, there has been a growing interest in building blockchain-based decentralized applications (DApps). DApps typically consist of two components: an on-chain component that implements the logic of the application and runs on blockchain as a smart contract, and an off-chain component that runs on a regular server to receive and process user […]

Juncheng Fang: PeloPartition- Improving Blockchain Resilience to Partitioning by Sharding

DBH 4011

Abstract: Blockchain has gained considerable traction over the last few years and plays a critical role in realizing decentralized and cryptocurrency applications. A challenge that has been overlooked in prior blockchain algorithms is that they do not consider large-scale network outages and relied on the assumption of reliable global network connectivity. In the event of […]

Peeyush Gupta: A Demonstration of TippersDB

DBH 4011

Abstract: In the talk, I'll present TippersDB, a middleware system designed to build sensor-based smart space analytical applications. TippersDB supports a powerful data model that decouples semantic data about the application domain from sensor data using which the semantic data is derived. By supporting mechanisms to map/translate data, concepts, and queries between the two levels, TippersDB […]

Glenn Galvizo: Navigational Pattern Matching w/ Graphix

DBH 4011

Abstract: Users aiming to perform scalable graph analytics on large datasets are stuck between a rock and a hard place. On one side, a user works with an intuitive data model and query language chained to a system that cannot gracefully scale across multiple machines (i.e. the rock). On the other side, a user works […]

Andrew Chio: SmartSPEC: Customizable Smart Space Datasets via Event-Driven Simulations

DBH 4011

Bio - Andrew is a 4th year Ph.D. student in the Distributed Systems Middleware (DSM) group under the supervision of Professor Nalini Venkatasubramanian. His general research interests revolve around middleware, data mining and analytics, optimization, and machine learning. Abstract - In this talk, we present SmartSPEC, an approach to generate customizable smart space datasets using […]

Tung-Chun Chang: SmartParcels: Cross-Layer IoT Planning for Smart Communities

DBH 4011

Abstract: The emergence of IoT-aided smart communities has created the need for a new set of urban planning tools. The extra design process includes instrumenting infrastructures (sensing, networking, and computing devices) in smartspaces to generate information units (from data analytics) to realize a range of required services. We propose SmartParcels, a framework that generates a […]

Aaron Elmore: Adventures in Database Compression

TBD

Prof. Aaron Elmore University of Chicago Abstract: Columnar databases enable effective compression by improving entropy through attribute locality and provides opportunities for fast query execution directly on compressed data. In this talk I will briefly overview how compressed query execution works in columnar systems and discuss techniques developed by our group over the past several […]

Aaron Elmore: CrocodileDB: Resource Efficient Database Execution (CS Seminar)

DBH 6011

Prof. Aaron Elmore University of Chicago Abstract: The coming end of Moore’s law requires that data systems be more judicious with computation and resources as the growth in data outpaces the availability of computational resources. Current database systems are eager and aggressively consume resources to immediately and quickly complete the task at hand. Intelligently deferring a […]

Yiming Lin: QUIP: Query-driven Missing Value Imputation

DBH 4011

QUIP: Query-driven Missing Value Imputation This paper develops a query-time missing value imputation frame- work, entitled QUIP, that minimizes the joint costs of imputation and query execution. QUIP achieves this by modifying how rela- tional operators are processed. It adds a cost-based decision function in each operator that checks whether the operator should invoke imputation […]

Shanshan Han: Veil: Storage and Communication Efficient Volume Hiding Algorithms

DBH 4011

February 17, 2023, Friday, 1:00 PM - 2 PM Donald Bren Hall 4011, ICS, UC Irvine Zoom: https://uci.zoom.us/j/92445274511 (UCI only) Abstract Volume leakage is a major threat to searchable encryption and data outsourcing, where an adversary can obtain the number of values in response to a query and deduce additional information about the data, such as the […]

Babak Salimi (UCSD): Certifying the Fairness of Predictive Models in the Face of Selection Bias

DBH 4011

The Department of Computer Science, UC Irvine WELCOMES Prof. Babak Salimi UCSD Hosts: Prof. Chen Li Certifying the Fairness of Predictive Models in the Face of Selection Bias   Abstract: The widespread use of data-driven algorithmic decision making in crucial areas such as hiring, loan assessments, medical diagnoses, and pretrial release has raised questions about […]

Alex Behm (Databricks): Photon: How to think vectorized

DBH 4011

The Department of Computer Science, Information Systems Group, UC Irvine WELCOMES Dr. Alex Behm Databricks Photon: How to think vectorized 3/3/2023, Friday, 1:00 - 2 pm Place DBH 4011 I'm presenting Photon, a new vectorized execution engine powering Databricks written from scratch in C++. I will introduce you to its basic building blocks by walking […]

Fangqi Liu: DOME: Drone-assisted Monitoring of Emergent Events For Wildland Fire Resilience

DBH 4011

Abstract: By serving as "eyes in the sky," data obtained from a carefully coordinated set of drones equipped with sensors have the potential to enable continuous monitoring of mission-critical events. We develop a Drone-assisted Monitoring system, DOME, that gathers real-time data for situational awareness in emergent and evolving events. The driving use case for this […]

C. Mohan: A Survey of Cloud Database Systems

DBH 3011

C. Mohan Distinguished Visiting Professor, Tsinghua University, China & Member, Board of Governors (Digital University Kerala, India) & Retired IBM Fellow (IBM Research, USA) "A Survey of Cloud Database Systems" ABSTRACT:  In this talk, I will first introduce traditional (non-cloud) parallel and distributed database systems. Concepts like SQL and NoSQL systems, data replication, distributed and parallel query […]

Zuozhi Wang: Texera: A System for Collaborative and Interactive Data Analytics Using Workflows (PhD Final Defense)

Abstract In the world of data analytics, domain experts, such as public health scientists and medical researchers, play a crucial role as their domain knowledge can unlock valuable insights from data. However, they face several challenges in the current landscape of data analytics tools. They often lack the technical skills necessary to analyze large datasets, […]

Quishi Bai: Maliva: Using Machine Learning to Rewrite Visualization Queries Under Time Constraints

DBH 4011

Abstract: As a powerful way for people to gain insights from data quickly and intuitively,  visualization is becoming increasingly important in the Big Data era. Considering data-visualization systems where a middleware layer translates a frontend request to a SQL query to a backend database to compute visual results.  In this talk, we study the problem of […]

Farzad Habibi: Metastable Failures in Consensus Algorithms

DBH 4011

Abstract Metastable failure is a recent abstraction of a pattern of failures in distributed systems. A metastable failure is characterized as "permanent overload with an ultra-low goodput." Prior research has proposed a framework for understanding metastable failure and has observed various cases of such failures in real-world settings. In this talk, we discuss the challenge […]

CS Seminar: Prof. Arun Kumar: The New DBfication of ML/AI

DBH 6011

The Department of Computer Science, UC Irvine WELCOMES Prof. Arun Kumar UCSD 5/12/2023, Friday, 11:00 am - noon Place DBH 6011 Abstract: The recent boom in ML/AI applications has brought into sharp focus the pressing need for tackling the concerns of scalability, usability, and manageability across the entire lifecycle of ML/AI applications. The ML/AI world […]

Yiming Lin: Auto-BI: Automatically Build BI-Models Leveraging Local Join Prediction and Global Schema Graph

DBH 4011

Abstract: Business Intelligence (BI) is crucial in modern enterprises and billion-dollar business. Traditionally, technical experts like database administrators would manually prepare BI-models (e.g., in star or snowflake schemas) that join tables in data warehouses, before less-technical business users can run analytics using end-user dashboarding tools. However, the popularity of self-service BI (e.g., Tableau and Power-BI) […]

Qiushi Bai: Improving SQL Performance Using Middleware-Based Query Rewriting

DBH 4011

Abstract: Query performance is critical in database-supported applications where users need answers quickly to make timely decisions. Traditional databases rely on rewriting queries to improve SQL performance. With the emergence of business intelligence and interactive visualization applications, databases often miss opportunities to rewrite their queries, due to reasons such as failure to adopt high-accuracy time […]

Saeed Kargar: Hamming Tree: The case for Energy-Aware Indexing for NVMs

DBH 4011

Zoom Link: https://uci.zoom.us/j/8045933305 Abstract NVM technologies play a crucial role in data storage solutions as well as in battery-powered mobile and IoT devices. However, the challenges of wear-out and energy efficiency need to be addressed for the widespread adoption of NVM. In this presentation, I will discuss our research endeavors aimed at enhancing various aspects of […]

Hari Kishore Chaparala: When (Apache) AsterixDB Hit An (Apache) Iceberg

DBH 4011

Abstract Apache Iceberg is an open-source table format with rich data management capabilities, including schema evolution, time travel, and efficient data pruning. It offers a reliable foundation for storing and organizing data in a data lake environment. Iceberg specification allows multiple query engines to safely operate on the same data simultaneously. In this talk, we […]

Glenn Galvizo: Removing the ‘A’ in DAG: Navigational Queries in Hyracks

DBH 4011

Abstract The need to “view” existing data under different models (e.g. JSON to graph) is a requirement seen in many modern applications. A naive solution involves utilizing narrow-purposed systems to handle each model, however, this multi-DBMS architecture significantly increases the cost of owning one’s data. For Apache AsterixDB users, we offer Graphix as a way […]

Suyash Gupta(UC Berkeley): Dissecting BFT Consensus: In Trusted Components we Trust!

DBH 4011

The Information Systems Group (ISG) at UC Irvine welcomes Suyash Gupta UC Berkeley  Dissecting BFT Consensus: In Trusted Components we Trust!   ABSTRACT The growing interest in reliable multi-party applications has fostered widespread adoption of Byzantine Fault-Tolerant (bft) consensus protocols. Existing bft protocols need f more replicas than Paxos-style protocols to prevent equivocation attacks. trust-bft protocols seek to minimize this cost by making use of trusted components at replicas. This paper makes two contributions. First, we analyze the design of existing trust-bft protocols and uncover three fundamental limitations that preclude most practical deployments. Some of these limitations are fundamental, while others are linked to the state of trusted components today. Second, we introduce a novel suite of consensus protocols, FlexiTrust, that attempts to sidestep these issues. We show that our FlexiTrust protocols achieve up to 185% more throughput than their trust-bft counterparts. BIO Suyash Gupta is a postdoctoral researcher at the SkyLab, University of California, Berkeley. He is also the Lead Architect of ResilientDB fabric. Prior to joining Berkeley, he received his Ph.D. degree from University of California, Davis. He also holds two Master of Science degrees; one from Purdue University and another from Indian Institute of Technology Madras. His current research focuses on attaining safe and efficient, fault tolerant distributed consensus and communication. He has also co-authored a book on fault-tolerant distributed transaction processing at Morgan & Claypool. He has been awarded the Best Graduate Researcher Award for 2021 by UC Davis and Best Paper Award at EuroSys'23. In his free time, Suyash likes to code and his team won Best Hacker Award at BostonHacks, HackIllinois, and HackPrinceton, among others.

Boon Thau Loo(UPenn): Towards Full-Stack Adaptivity in Permissioned Blockchain Systems

DBH 6011

  The Computer Science Department and Information Systems Group (ISG) at UC Irvine welcomes Boon Thau Loo University of Pennsylvania Towards Full-Stack Adaptivity in Permissioned Blockchain Systems October 20, 2023 at 11:00AM DBH 6011   ABSTRACT Permissioned blockchain systems are an emerging instance of untrustworthy distributed databases. As novel smart contracts, modern hardware, and new […]

Ken Birman (Cornell): Cascade:  A Platform for Fast Edge Intelligence

DBH 6011

The Computer Science Department and Information Systems Group (ISG) at UC Irvine welcomes Ken Birman Cornell University Cascade:  A Platform for Fast Edge Intelligence October 27, 2023 at 11:00AM DBH 6011   ABSTRACT There is a growing need to apply machine intelligence and learning at the edge of the cloud.  Doing so would reduce delays […]

Nada Lahjouji: ProBE: Proportioning Privacy Budget for Complex Exploratory Decision Support

DBH 4011

ProBE: Proportioning Privacy Budget for Complex Exploratory Decision Support Nada Lahjouji PhD Student, UC, Irvine Abstract Decision support (DS) applications play a crucial role in analyzing large volumes of data to produce valuable insights that facilitate informed decision-making. Such data can, however, contain sensitive information about individuals that requires privacy-preserving mechanisms to prevent data leaks, […]

Vishal Chakraborty: Much Ado About Data-Undo: Semantically Meaningful Data Erasure

DBH 4011

Title: Much Ado About Data-Undo: Semantically Meaningful Data Erasure   Abstract:  Data regulations, such as GDPR and CCPA, are increasingly being adopted globally to protect against unsafe data management practices. Such regulations are, often ambiguous (with multiple valid interpretations) when it comes to defining the expected dynamic behaviour of data processing systems. We will argue and show […]

Shahram Ghandeharizadeh(USC): Intelligent 3D Multimedia Displays using Flying Light Specks

DBH 6011

The Computer Science Department and Information Systems Group (ISG) at UC Irvine welcomes Shahram Ghandeharizadeh University of Southern California   Intelligent 3D Multimedia Displays using Flying Light Specks January 12 at 11:00AM DBH 6011 Abstract: A Flying Light Speck, FLS, is a miniature sized drone equipped with one or more light sources to generate different […]

Henry F. Korth (Lehigh University): Blockchain: Computer Science Foundations, Positive Social and Business Impact, and Research Opportunities

DBH 6011

The Computer Science Department and Information Systems Group (ISG) at UC Irvine welcomes Henry F. Korth Lehigh University Blockchain: Computer Science Foundations, Positive Social and Business Impact, and Research Opportunities January 19 at 11:00AM DBH 6011   Abstract: To start, basic concepts of blockchain systems will be introduced assuming only a basic background in computing. […]

Volker Markl (TU Berlin): Mosaics of Big Data: Database Systems and Information Management – Trends and a Vision

DBH 4011

Prof. Dr. Volker Markl Chair of the Database Systems and Information Management (DIMA) Group at TU Berlin Director of the Berlin Institute for the Foundations of Learning and Data (BIFOLD) Chief Scientist and Head of the Intelligent Analytics for Massive Data Research Group at German Research Center for Artificial Intelligence (DFKI) Mosaics of Big Data  […]

Shengquan Ni: Supporting time-travel debugging in Texera

Title: Supporting time-travel debugging in Texera Speaker: Shengquan Ni Abstract: Dataflow systems, traditionally used for relational analysis, now support a variety of tasks including complex user-defined functions. As dataflow jobs become more diverse and complex, there is an increasing need for better debugging support to understand their runtime behaviors and identify issues either in data […]

Joseph Hellerstein (UC Berkeley): Hydro: A Compiler Stack for Distributed Programs

DBH 6011

The Computer Science Department and Information Systems Group (ISG) at UC Irvine welcomes Joseph Hellerstein UC Berkeley and Sutter Hill Ventures TITLE: Hydro: A Compiler Stack for Distributed Programs ABSTRACT:   Nearly all programs of interest today are distributed. Unfortunately, the traditional languages and compilers in common use today offer little assistance in ensuring the correctness […]

Raul Castro Fernandez (U. Chicago): On Data Ecology, Data Markets, the Value of Data, and Dataflow Governance

DBH 4011

Abstract: Data shapes our social, economic, cultural, and technological environments. Data is valuable, so people seek it, inducing data to flow. The resulting dataflows distribute data and thus value. For example, large Internet companies profit from accessing data from their users, and engineers of large language models seek large and diverse data sources to train […]

Yunyan Ding: Efficient Mouse Brain Image Processing Using Collaborative Data Workflows on Texera

DBH 4011

Abstract: In the field of neuroscience, accurately mapping the complex three-dimensional (3D) neural circuitry and architecture of the brain is crucial for advancing our understanding of brain functions and disorders. In this study, we introduce a distributed computational pipeline designed for processing high-resolution mouse brain tile images captured by TissueCyte. This pipeline efficiently and accurately […]

Bratin Saha (AWS Amazon): Scaling Generative AI in the Enterprise

DBH 4011

Abstract: Machine learning (ML) and generative artificial intelligence (AI) is one of the most transformational technologies that is opening up new opportunities for innovation in every domain across software, finance, health care, manufacturing, media, entertainment and others. This talk will discuss the key trends that are driving AI/ML innovation, how enterprises are using AI/ML today […]

Yinan Zhou: SpendableDB: A UTxO-based decentralized Database

DBH 4011

Abstract: Blockchain technology has attracted a significant amount of attention ever since the Bitcoin blockchain's success. Currently, most of the research and engineering efforts have been centered around monetary transactions such as token exchange protocols. The potential of building databases on top of blockchains is largely overlooked and remains an open problem. The literature on blockchain databases is divided into permissioned blockchains and permissionless account-based blockchains. However, the former is not fully decentralized, and the latter suffers from challenges in performance and cost. We propose SpendableDB, a permissionless UTxO-based blockchain database as a novel approach to the problem of data decentralization. Our design integrates data into individual UTxOs to achieve true decentralization of data ownership that can be securely transferred and traded, similar to how the regular monetary UTxOs are protected by the underlying blockchain's decentralization protocol. Additionally, SpendableDB provides cryptographically secured data integrity and immutable data lineage that can be easily verified. Our implementation and experiments show that our design is economically practical as it incurs a small amount of blockchain transaction fees. Bio: Yinan Zhou is a second-year Ph.D. student in the Computer Science Department at UC Irvine. His primary research focus is on blockchain infrastructure and application developments.

Lukasz Golab (University of Waterloo): Understanding models and the data they learn from

DBH 4011

Lukasz Golab (U. Waterloo) Understanding models and the data they learn from Abstract: The modern world is powered by data. However, as the capabilities of data-intensive systems grow, so does their complexity, making them hard to understand and troubleshoot. I will discuss my lab's efforts towards understanding models and the data they learn from, including […]

Mohammed Al-Kateb (Amazon Redshift): The Evolution of Amazon Redshift

DBH 4011

Abstract: In this talk, we will discuss the evolution of Amazon Redshift over the past 10 years. We’ll discuss the Amazon Redshift architecture. We’ll dive deep in the lifecycle of executing a query in Amazon Redshift. And we’ll examine how Amazon Redshift continues to maintain a leading price/performance in the market. Bio: Mohammed Alkateb leads […]

Xinyuan Lin: Data Science Tasks Implemented with Scripts versus GUI-Based Workflows: The Good, the Bad, and the Ugly.

DBH 4011

Abstract: As leveraging large-scale data analytics becomes the norm for many applications, platforms for developing these capabilities have become increasingly important. This work compares the benefits and drawbacks of implementing two commonly used data science platform paradigms: code-based scripts and GUI-based workflows. We implement tasks in both paradigms that provide examples of phases in the […]

Mike Heddes: Efficient Cardinality Estimation of Multi-Join Queries using Count Sketches

DBH 4011

Abstract:  Cardinality estimates are a primary input to query optimizers to determine an appropriate join order. The seminal AMS sketch can estimate the cardinality of an equi-join between two relations using little space. Since then, two important advancements are the Count sketch, a method which significantly improves upon the sketching time, and secondly, an extension […]

Pat Helland (Salesforce): Scalable OLTP in the Cloud: What’s the BIG DEAL?

DBH 4011

Abstract:   The pursuit of scalable OLTP systems has been the holy grail of my career. Because OLTP systems are typically split into applications and databases, the isolation semantics provided by the DB and used by the app have a major impact on the scalability of the OLTP system as a whole. The isolation semantics are […]

Mohammad Sadoghi (UC Davis): The Journey of Building Global-Scale Sustainable Blockchain Fabric 

DBH 6011

Abstract The inception of Bitcoin and blockchain has renewed the vision of a democratic and decentralized computational paradigm, that is, to ingrain integrity, transparency, and accountability into the very fabric of the computational model. These fundamental concepts and the technologies behind them--a generic ledger-based data model, cryptographically ensured data integrity and transparent and accountable consensus-based […]

Aditya Parameswaran (Berkeley): Enhance, Don’t Replace: A Recipe for Success in Data Tooling

DBH 6011

Enhance, Don't Replace: A Recipe for Success in Data Tooling Abstract: Most data analysis and data science is performed in human-centered tools, such as spreadsheets, visual analytics tools, and data science libraries. However, these tools often pose challenges for end-users, especially those without extensive programming expertise, in terms of scalability, interactivity, and usability. Rather than forcing […]

Arnab Nandi (OSU): Data Exploration in a Camera-first World: Query and Result Challenges

DBH 4011

Prof. Arnab Nandi Associate Professor, Computer Science and Engineering The Ohio State University Friday, October 11, 2024 at 11 a.m. Donald Bren Hall 6011 Title: "Data Exploration in a Camera-first World: Query and Result Challenges" Abstract: The pervasive availability of cameras in smartphones, vehicles, drones and more has triggered a new "camera-first" data revolution across […]

Nika Mansouri Ghiasi (ETH): Storage-Centric Computing for Genomics and Metagenomics

DBH 4011

Title: Storage-Centric Computing for Genomics and Metagenomics Abstract Genomics and metagenomics applications have enabled significant advancements in many critical areas. The exponential growth of genomic data poses unprecedented challenges in genomics and metagenomic applications. These applications suffer from significant data movement overheads from the storage system. To fundamentally address these overheads, we make a case […]

Yannis Papakonstantinou (Google): Vector Search and Databases

DBH 6011

Yannis Papakonstantinou Distinguished Engineer, Query Processing and GenAI at Google Cloud Databases Abstract: Semantic search ability, via embedding (vectors) and vector indexing, has been added to Google Cloud Platform (GCP) databases in order to enable GenAI applications. The inclusion of vectors in databases confers many of the traditional benefits of databases: Developers can now develop […]

Michael Jungmair (TU Munich): A Compiler-Centric Query Engine Design for Mixed Workloads and Modern Hardware

DBH 3011

A Compiler-Centric Query Engine Design for Mixed Workloads and Modern Hardware 11/1/2024, 1:00 PM 2 PM, DBH 3011 Michael Jungmair, Technical University of Munich, Germany Abstract: Relational query engines are increasingly expected to handle more than just relational queries and also run on modern hardware that is increasingly parallel and distributed. However, it is not clear how existing system designs can deal with these two challenges effectively. We propose a holistic, compiler-centric design for data processing systems that is designed for tightly integrated optimization and execution of relational queries, non-relational workloads and user-defined functions on modern hardware. Bio: Michael Jungmair is a third year PhD student at the Technical University of Munich. Supervised by Jana Giceva, he is performing research in the intersection of database engines and compiler technology. So far, this research culminated in the design and implementation of LingoDB (lingo-db.com), a novel query engine based on the MLIR compiler framework