ISG Talks are sponsored by Couchbase.

<< All Talks

Sushant Jain : Large scale and low latency data distribution from database to servers

DBH 6011

Many applications at Google are structured with data stored in a transactional database (source of truth) and same data being required by servers distributed worldwide. For efficient and fast computation servers store this data in memory. Further, the database is changing continuously and we need to update the in-memory view of these large number of servers in real-time. For example, in Google Search Ads application we have Advertisers configuration stored in a database and this data is loaded in the memory of various servers to compute Ads in a scalable and fast way. In this talk, we describe our solution to this data distribution problem and the challenges that we encountered in providing a highly reliable and low latency service.

David Lomet (Microsoft Research): How Data Caching Systems Succeed

DBH 4011

Data in traditional "caching'' data systems resides on secondary storage, and is read into main memory only when operated on. This limits system performance. Main memory data stores with data always in main memory are much faster. But this performance comes at a cost. In this paper, we analyze the costs of both in-memory operations and secondary storage operations where data is not "in cache''. We study the performance impact of cache misses on caching system performance. The analysis considers both execution and storage costs. Based on our analysis, we derive cost/performance results for a data caching system and a main memory system to understand where each demonstrates the best cost per operation, what is driving the cost differences, and the scale of the differences. This analysis (1) provides insight into why data caching systems continue to dominate the market; (2) points to higher performance that does not rely on simply increasing main memory cache size; and (3) suggests a path to lower costs and hence better cost/performance.

Nandit Soparkar (Ubiquiti): Data-driven AI technologies for a Consumer webapp

Zoom

Title:  Data-driven AI technologies for a Consumer webapp   Abstract: We discuss the challenges, and the opportunity, in providing a consumer-facing data-driven AI webapp. Our presentation will include a demo, make available access to the audience, and cover the technical as also relevant business challenges being addressed. Our webapp is the new CarBeast.com (about 2 months […]

Tim Kraska (MIT): Towards instance-optimized data systems

DBH 6011

Location:  DBH 6011 https://uci.zoom.us/j/94559511434 (for UCI users only) Speaker: Tim Kraska, MIT Abstract: Recently, there has been a lot of excitement around ML-enhanced (or learned) algorithms and data structures. For example, there has been work on applying machine learning to improve query optimization, indexing, storage layouts, scheduling, log-structured merge trees, sorting, compression, sketches, among many other […]

Matt Ingenthron (Couchbase): Couchbase and Distributed Computing Backends for big data processing

DBH 3011

Location:  DBH 3011  Couchbase and Distributed Computing Backends for big data processing Speaker:   Matt Ingenthron,  Engineering Director, Couchbase Biography:  Matt is a Couchbase co-founder and Engineering Director who leads SDK and Connector development at Couchbase. He has a deep software development background with extensive experience scaling Java, Ruby on Rails, and AMP web applications. He […]

Jayant Haritsa (IISc Bangalore): Shedding Light on Opaque Database Queries

DBH 3011

Shedding Light on Opaque Database Queries location: Donald Bren Hall 3011 Speaker:  Jayant Haritsa                  Database Systems Lab                  Indian Institute of Science, Bangalore Abstract: We have recently defined a new query reverse-engineering problem of unmasking SQL queries hidden within opaque database applications. […]

Anand Deshpahde (Persistent Technologies): How to build your own Business

Hybrid: DBH3011 & Zoom

How to build your own Business location: Donald Bren Hall 3011 Zoom info: the meeting will be hybrid and will also be available on zoom https://uci.zoom.us/j/96160303043 Skype for Business https://uci.zoom.us/skype/96160303043   Speaker: Anand Deshpande Founder, Chairman and Managing Director, Persistent Technologies Host: Prof. Sharad Mehrotra Abstract: In this talk Dr. Deshpande will provide insight into […]

Sadeem Alsudais: Drove: Tracking Execution Results of Workflows on Large Data

DBH 4011

Abstract: Data analytics using workflows is an iterative process, in which an analyst makes many iterations of changes, such as additions, deletions, and alterations of operators and their links. In many cases, the analyst wants to compare these workflow versions and their execution results to help decide the next iteration of changes. To this end, […]

Qiushi Bai: QueryBooster-Improving SQL Performance Using Middleware Services for Human-Centered Query Rewriting + Demo

DBH 4011

Title: QueryBooster: Improving SQL Performance Using Middleware Services for Human-Centered Query Rewriting Abstract: Query latency is critical in many database-backed applications where users need answers quickly to gain timely insights and make mission-critical decisions.  "Query rewriting" is one of the query optimization techniques which transforms SQL queries to more efficient formats based on pre-defined rewriting […]

Xiaozhen Liu: Demonstration of Collaborative and Interactive Workflow-based Data Analytics in Texera

DBH 4011

Abstract: Collaborative data analytics is becoming increasingly important due to the higher complexity of data science, more diverse skills from different disciplines, more common asynchronous schedules of team members, and the global trend of working remotely. In this demo we will show how Texera supports this emerging computing paradigm to achieve high productivity among collaborators […]

Abhishek Singh: WedgeBlock – An Off-Chain Secure Logging Platform for Blockchain Applications

DBH 4011

Abstract In recent years, there has been a growing interest in building blockchain-based decentralized applications (DApps). DApps typically consist of two components: an on-chain component that implements the logic of the application and runs on blockchain as a smart contract, and an off-chain component that runs on a regular server to receive and process user […]

Juncheng Fang: PeloPartition- Improving Blockchain Resilience to Partitioning by Sharding

DBH 4011

Abstract: Blockchain has gained considerable traction over the last few years and plays a critical role in realizing decentralized and cryptocurrency applications. A challenge that has been overlooked in prior blockchain algorithms is that they do not consider large-scale network outages and relied on the assumption of reliable global network connectivity. In the event of […]

Peeyush Gupta: A Demonstration of TippersDB

DBH 4011

Abstract: In the talk, I'll present TippersDB, a middleware system designed to build sensor-based smart space analytical applications. TippersDB supports a powerful data model that decouples semantic data about the application domain from sensor data using which the semantic data is derived. By supporting mechanisms to map/translate data, concepts, and queries between the two levels, TippersDB […]

Glenn Galvizo: Navigational Pattern Matching w/ Graphix

DBH 4011

Abstract: Users aiming to perform scalable graph analytics on large datasets are stuck between a rock and a hard place. On one side, a user works with an intuitive data model and query language chained to a system that cannot gracefully scale across multiple machines (i.e. the rock). On the other side, a user works […]

Andrew Chio: SmartSPEC: Customizable Smart Space Datasets via Event-Driven Simulations

DBH 4011

Bio - Andrew is a 4th year Ph.D. student in the Distributed Systems Middleware (DSM) group under the supervision of Professor Nalini Venkatasubramanian. His general research interests revolve around middleware, data mining and analytics, optimization, and machine learning. Abstract - In this talk, we present SmartSPEC, an approach to generate customizable smart space datasets using […]

Tung-Chun Chang: SmartParcels: Cross-Layer IoT Planning for Smart Communities

DBH 4011

Abstract: The emergence of IoT-aided smart communities has created the need for a new set of urban planning tools. The extra design process includes instrumenting infrastructures (sensing, networking, and computing devices) in smartspaces to generate information units (from data analytics) to realize a range of required services. We propose SmartParcels, a framework that generates a […]

Aaron Elmore: Adventures in Database Compression

TBD

Prof. Aaron Elmore University of Chicago Abstract: Columnar databases enable effective compression by improving entropy through attribute locality and provides opportunities for fast query execution directly on compressed data. In this talk I will briefly overview how compressed query execution works in columnar systems and discuss techniques developed by our group over the past several […]

Aaron Elmore: CrocodileDB: Resource Efficient Database Execution (CS Seminar)

DBH 6011

Prof. Aaron Elmore University of Chicago Abstract: The coming end of Moore’s law requires that data systems be more judicious with computation and resources as the growth in data outpaces the availability of computational resources. Current database systems are eager and aggressively consume resources to immediately and quickly complete the task at hand. Intelligently deferring a […]

Yiming Lin: QUIP: Query-driven Missing Value Imputation

DBH 4011

QUIP: Query-driven Missing Value Imputation This paper develops a query-time missing value imputation frame- work, entitled QUIP, that minimizes the joint costs of imputation and query execution. QUIP achieves this by modifying how rela- tional operators are processed. It adds a cost-based decision function in each operator that checks whether the operator should invoke imputation […]

Shanshan Han: Veil: Storage and Communication Efficient Volume Hiding Algorithms

DBH 4011

February 17, 2023, Friday, 1:00 PM - 2 PM Donald Bren Hall 4011, ICS, UC Irvine Zoom: https://uci.zoom.us/j/92445274511 (UCI only) Abstract Volume leakage is a major threat to searchable encryption and data outsourcing, where an adversary can obtain the number of values in response to a query and deduce additional information about the data, such as the […]

Babak Salimi (UCSD): Certifying the Fairness of Predictive Models in the Face of Selection Bias

DBH 4011

The Department of Computer Science, UC Irvine WELCOMES Prof. Babak Salimi UCSD Hosts: Prof. Chen Li Certifying the Fairness of Predictive Models in the Face of Selection Bias   Abstract: The widespread use of data-driven algorithmic decision making in crucial areas such as hiring, loan assessments, medical diagnoses, and pretrial release has raised questions about […]

Alex Behm (Databricks): Photon: How to think vectorized

DBH 4011

The Department of Computer Science, Information Systems Group, UC Irvine WELCOMES Dr. Alex Behm Databricks Photon: How to think vectorized 3/3/2023, Friday, 1:00 - 2 pm Place DBH 4011 I'm presenting Photon, a new vectorized execution engine powering Databricks written from scratch in C++. I will introduce you to its basic building blocks by walking […]

Fangqi Liu: DOME: Drone-assisted Monitoring of Emergent Events For Wildland Fire Resilience

DBH 4011

Abstract: By serving as "eyes in the sky," data obtained from a carefully coordinated set of drones equipped with sensors have the potential to enable continuous monitoring of mission-critical events. We develop a Drone-assisted Monitoring system, DOME, that gathers real-time data for situational awareness in emergent and evolving events. The driving use case for this […]

C. Mohan: A Survey of Cloud Database Systems

DBH 3011

C. Mohan Distinguished Visiting Professor, Tsinghua University, China & Member, Board of Governors (Digital University Kerala, India) & Retired IBM Fellow (IBM Research, USA) "A Survey of Cloud Database Systems" ABSTRACT:  In this talk, I will first introduce traditional (non-cloud) parallel and distributed database systems. Concepts like SQL and NoSQL systems, data replication, distributed and parallel query […]

Zuozhi Wang: Texera: A System for Collaborative and Interactive Data Analytics Using Workflows (PhD Final Defense)

Abstract In the world of data analytics, domain experts, such as public health scientists and medical researchers, play a crucial role as their domain knowledge can unlock valuable insights from data. However, they face several challenges in the current landscape of data analytics tools. They often lack the technical skills necessary to analyze large datasets, […]

Quishi Bai: Maliva: Using Machine Learning to Rewrite Visualization Queries Under Time Constraints

DBH 4011

Abstract: As a powerful way for people to gain insights from data quickly and intuitively,  visualization is becoming increasingly important in the Big Data era. Considering data-visualization systems where a middleware layer translates a frontend request to a SQL query to a backend database to compute visual results.  In this talk, we study the problem of […]

Farzad Habibi: Metastable Failures in Consensus Algorithms

DBH 4011

Abstract Metastable failure is a recent abstraction of a pattern of failures in distributed systems. A metastable failure is characterized as "permanent overload with an ultra-low goodput." Prior research has proposed a framework for understanding metastable failure and has observed various cases of such failures in real-world settings. In this talk, we discuss the challenge […]

CS Seminar: Prof. Arun Kumar: The New DBfication of ML/AI

DBH 6011

The Department of Computer Science, UC Irvine WELCOMES Prof. Arun Kumar UCSD 5/12/2023, Friday, 11:00 am - noon Place DBH 6011 Abstract: The recent boom in ML/AI applications has brought into sharp focus the pressing need for tackling the concerns of scalability, usability, and manageability across the entire lifecycle of ML/AI applications. The ML/AI world […]

Yiming Lin: Auto-BI: Automatically Build BI-Models Leveraging Local Join Prediction and Global Schema Graph

DBH 4011

Abstract: Business Intelligence (BI) is crucial in modern enterprises and billion-dollar business. Traditionally, technical experts like database administrators would manually prepare BI-models (e.g., in star or snowflake schemas) that join tables in data warehouses, before less-technical business users can run analytics using end-user dashboarding tools. However, the popularity of self-service BI (e.g., Tableau and Power-BI) […]

Qiushi Bai: Improving SQL Performance Using Middleware-Based Query Rewriting

DBH 4011

Abstract: Query performance is critical in database-supported applications where users need answers quickly to make timely decisions. Traditional databases rely on rewriting queries to improve SQL performance. With the emergence of business intelligence and interactive visualization applications, databases often miss opportunities to rewrite their queries, due to reasons such as failure to adopt high-accuracy time […]

Saeed Kargar: Hamming Tree: The case for Energy-Aware Indexing for NVMs

DBH 4011

Zoom Link: https://uci.zoom.us/j/8045933305 Abstract NVM technologies play a crucial role in data storage solutions as well as in battery-powered mobile and IoT devices. However, the challenges of wear-out and energy efficiency need to be addressed for the widespread adoption of NVM. In this presentation, I will discuss our research endeavors aimed at enhancing various aspects of […]

Suyash Gupta(UC Berkeley): Dissecting BFT Consensus: In Trusted Components we Trust!

DBH 4011

The Information Systems Group (ISG) at UC Irvine welcomes Suyash Gupta UC Berkeley  Dissecting BFT Consensus: In Trusted Components we Trust!   ABSTRACT The growing interest in reliable multi-party applications has fostered widespread adoption of Byzantine Fault-Tolerant (bft) consensus protocols. Existing bft protocols need f more replicas than Paxos-style protocols to prevent equivocation attacks. trust-bft protocols seek to minimize this cost by making use of trusted components at replicas. This paper makes two contributions. First, we analyze the design of existing trust-bft protocols and uncover three fundamental limitations that preclude most practical deployments. Some of these limitations are fundamental, while others are linked to the state of trusted components today. Second, we introduce a novel suite of consensus protocols, FlexiTrust, that attempts to sidestep these issues. We show that our FlexiTrust protocols achieve up to 185% more throughput than their trust-bft counterparts. BIO Suyash Gupta is a postdoctoral researcher at the SkyLab, University of California, Berkeley. He is also the Lead Architect of ResilientDB fabric. Prior to joining Berkeley, he received his Ph.D. degree from University of California, Davis. He also holds two Master of Science degrees; one from Purdue University and another from Indian Institute of Technology Madras. His current research focuses on attaining safe and efficient, fault tolerant distributed consensus and communication. He has also co-authored a book on fault-tolerant distributed transaction processing at Morgan & Claypool. He has been awarded the Best Graduate Researcher Award for 2021 by UC Davis and Best Paper Award at EuroSys'23. In his free time, Suyash likes to code and his team won Best Hacker Award at BostonHacks, HackIllinois, and HackPrinceton, among others.

Boon Thau Loo(UPenn): Towards Full-Stack Adaptivity in Permissioned Blockchain Systems

DBH 6011

  The Computer Science Department and Information Systems Group (ISG) at UC Irvine welcomes Boon Thau Loo University of Pennsylvania Towards Full-Stack Adaptivity in Permissioned Blockchain Systems October 20, 2023 at 11:00AM DBH 6011   ABSTRACT Permissioned blockchain systems are an emerging instance of untrustworthy distributed databases. As novel smart contracts, modern hardware, and new […]

Ken Birman (Cornell): Cascade:  A Platform for Fast Edge Intelligence

DBH 6011

The Computer Science Department and Information Systems Group (ISG) at UC Irvine welcomes Ken Birman Cornell University Cascade:  A Platform for Fast Edge Intelligence October 27, 2023 at 11:00AM DBH 6011   ABSTRACT There is a growing need to apply machine intelligence and learning at the edge of the cloud.  Doing so would reduce delays […]

Nada Lahjouji: ProBE: Proportioning Privacy Budget for Complex Exploratory Decision Support

DBH 4011

ProBE: Proportioning Privacy Budget for Complex Exploratory Decision Support Nada Lahjouji PhD Student, UC, Irvine Abstract Decision support (DS) applications play a crucial role in analyzing large volumes of data to produce valuable insights that facilitate informed decision-making. Such data can, however, contain sensitive information about individuals that requires privacy-preserving mechanisms to prevent data leaks, […]

Vishal Chakraborty: Much Ado About Data-Undo: Semantically Meaningful Data Erasure

DBH 4011

Title: Much Ado About Data-Undo: Semantically Meaningful Data Erasure   Abstract:  Data regulations, such as GDPR and CCPA, are increasingly being adopted globally to protect against unsafe data management practices. Such regulations are, often ambiguous (with multiple valid interpretations) when it comes to defining the expected dynamic behaviour of data processing systems. We will argue and show […]

Shahram Ghandeharizadeh(USC): Intelligent 3D Multimedia Displays using Flying Light Specks

DBH 6011

The Computer Science Department and Information Systems Group (ISG) at UC Irvine welcomes Shahram Ghandeharizadeh University of Southern California   Intelligent 3D Multimedia Displays using Flying Light Specks January 12 at 11:00AM DBH 6011 Abstract: A Flying Light Speck, FLS, is a miniature sized drone equipped with one or more light sources to generate different […]

Henry F. Korth (Lehigh University): Blockchain: Computer Science Foundations, Positive Social and Business Impact, and Research Opportunities

DBH 6011

The Computer Science Department and Information Systems Group (ISG) at UC Irvine welcomes Henry F. Korth Lehigh University Blockchain: Computer Science Foundations, Positive Social and Business Impact, and Research Opportunities January 19 at 11:00AM DBH 6011   Abstract: To start, basic concepts of blockchain systems will be introduced assuming only a basic background in computing. […]

Volker Markl (TU Berlin): Mosaics of Big Data: Database Systems and Information Management – Trends and a Vision

DBH 4011

Prof. Dr. Volker Markl Chair of the Database Systems and Information Management (DIMA) Group at TU Berlin Director of the Berlin Institute for the Foundations of Learning and Data (BIFOLD) Chief Scientist and Head of the Intelligent Analytics for Massive Data Research Group at German Research Center for Artificial Intelligence (DFKI) Mosaics of Big Data  […]

Shengquan Ni: Supporting time-travel debugging in Texera

Title: Supporting time-travel debugging in Texera Speaker: Shengquan Ni Abstract: Dataflow systems, traditionally used for relational analysis, now support a variety of tasks including complex user-defined functions. As dataflow jobs become more diverse and complex, there is an increasing need for better debugging support to understand their runtime behaviors and identify issues either in data […]

Raul Castro Fernandez (U. Chicago): On Data Ecology, Data Markets, the Value of Data, and Dataflow Governance

DBH 4011

Abstract: Data shapes our social, economic, cultural, and technological environments. Data is valuable, so people seek it, inducing data to flow. The resulting dataflows distribute data and thus value. For example, large Internet companies profit from accessing data from their users, and engineers of large language models seek large and diverse data sources to train […]

Yunyan Ding: Efficient Mouse Brain Image Processing Using Collaborative Data Workflows on Texera

DBH 4011

Abstract: In the field of neuroscience, accurately mapping the complex three-dimensional (3D) neural circuitry and architecture of the brain is crucial for advancing our understanding of brain functions and disorders. In this study, we introduce a distributed computational pipeline designed for processing high-resolution mouse brain tile images captured by TissueCyte. This pipeline efficiently and accurately […]

Yinan Zhou: SpendableDB: A UTxO-based decentralized Database

DBH 4011

Abstract: Blockchain technology has attracted a significant amount of attention ever since the Bitcoin blockchain's success. Currently, most of the research and engineering efforts have been centered around monetary transactions such as token exchange protocols. The potential of building databases on top of blockchains is largely overlooked and remains an open problem. The literature on blockchain databases is divided into permissioned blockchains and permissionless account-based blockchains. However, the former is not fully decentralized, and the latter suffers from challenges in performance and cost. We propose SpendableDB, a permissionless UTxO-based blockchain database as a novel approach to the problem of data decentralization. Our design integrates data into individual UTxOs to achieve true decentralization of data ownership that can be securely transferred and traded, similar to how the regular monetary UTxOs are protected by the underlying blockchain's decentralization protocol. Additionally, SpendableDB provides cryptographically secured data integrity and immutable data lineage that can be easily verified. Our implementation and experiments show that our design is economically practical as it incurs a small amount of blockchain transaction fees. Bio: Yinan Zhou is a second-year Ph.D. student in the Computer Science Department at UC Irvine. His primary research focus is on blockchain infrastructure and application developments.

Lukasz Golab (University of Waterloo): Understanding models and the data they learn from

DBH 4011

Lukasz Golab (U. Waterloo) Understanding models and the data they learn from Abstract: The modern world is powered by data. However, as the capabilities of data-intensive systems grow, so does their complexity, making them hard to understand and troubleshoot. I will discuss my lab's efforts towards understanding models and the data they learn from, including […]

Mohammed Al-Kateb (Amazon Redshift): The Evolution of Amazon Redshift

DBH 4011

Abstract: In this talk, we will discuss the evolution of Amazon Redshift over the past 10 years. We’ll discuss the Amazon Redshift architecture. We’ll dive deep in the lifecycle of executing a query in Amazon Redshift. And we’ll examine how Amazon Redshift continues to maintain a leading price/performance in the market. Bio: Mohammed Alkateb leads […]

Xinyuan Lin: Data Science Tasks Implemented with Scripts versus GUI-Based Workflows: The Good, the Bad, and the Ugly.

DBH 4011

Abstract: As leveraging large-scale data analytics becomes the norm for many applications, platforms for developing these capabilities have become increasingly important. This work compares the benefits and drawbacks of implementing two commonly used data science platform paradigms: code-based scripts and GUI-based workflows. We implement tasks in both paradigms that provide examples of phases in the […]

Mike Heddes: Efficient Cardinality Estimation of Multi-Join Queries using Count Sketches

DBH 4011

Abstract:  Cardinality estimates are a primary input to query optimizers to determine an appropriate join order. The seminal AMS sketch can estimate the cardinality of an equi-join between two relations using little space. Since then, two important advancements are the Count sketch, a method which significantly improves upon the sketching time, and secondly, an extension […]

Mohammad Sadoghi (UC Davis): The Journey of Building Global-Scale Sustainable Blockchain Fabric 

DBH 6011

Abstract The inception of Bitcoin and blockchain has renewed the vision of a democratic and decentralized computational paradigm, that is, to ingrain integrity, transparency, and accountability into the very fabric of the computational model. These fundamental concepts and the technologies behind them--a generic ledger-based data model, cryptographically ensured data integrity and transparent and accountable consensus-based […]

Aditya Parameswaran (Berkeley): Enhance, Don’t Replace: A Recipe for Success in Data Tooling

DBH 6011

Enhance, Don't Replace: A Recipe for Success in Data Tooling Abstract: Most data analysis and data science is performed in human-centered tools, such as spreadsheets, visual analytics tools, and data science libraries. However, these tools often pose challenges for end-users, especially those without extensive programming expertise, in terms of scalability, interactivity, and usability. Rather than forcing […]