ISG Talks are sponsored by Couchbase.
Sushant Jain : Large scale and low latency data distribution from database to servers
DBH 6011Many applications at Google are structured with data stored in a transactional database (source of truth) and same data being required by servers distributed worldwide. For efficient and fast computation servers store this data in memory. Further, the database is changing continuously and we need to update the in-memory view of these large number of servers in real-time. For example, in Google Search Ads application we have Advertisers configuration stored in a database and this data is loaded in the memory of various servers to compute Ads in a scalable and fast way. In this talk, we describe our solution to this data distribution problem and the challenges that we encountered in providing a highly reliable and low latency service.
Dr. Andrey Balmin and Mayank Pradhan (Workday): Workday Prism Analytics: Unifying Interactive and Batch Data Processing Using Apache Spark
DBH 3011Abstract: Workday Prism Analytics enables data discovery and interactive Business Intelligence analysis for Workday customers. To prepare data for analysis, business users can setup data transformation pipelines in an interactive, self-service, modern data prep environment. Thus, Workday Prism Analytics needs to run three types of scalable data processing applications: "always on" query engine and data […]
Vinayak Borkar (FireEye Inc.): The X15 Machine Data Management Platform
DBH 4011ABSTRACT: Machine Data (aka Log Data) is continuously produced by applications and devices as a result of human-computer and computer-computer interactions. Although most of this data was initially generated for ad-hoc human consumption to aid with debugging and troubleshooting systems and deployments, their systematic treatment using well-known data processing techniques can unlock valuable insight about operations […]
David Lomet (Microsoft Research): How Data Caching Systems Succeed
DBH 4011Data in traditional "caching'' data systems resides on secondary storage, and is read into main memory only when operated on. This limits system performance. Main memory data stores with data always in main memory are much faster. But this performance comes at a cost. In this paper, we analyze the costs of both in-memory operations and secondary storage operations where data is not "in cache''. We study the performance impact of cache misses on caching system performance. The analysis considers both execution and storage costs. Based on our analysis, we derive cost/performance results for a data caching system and a main memory system to understand where each demonstrates the best cost per operation, what is driving the cost differences, and the scale of the differences. This analysis (1) provides insight into why data caching systems continue to dominate the market; (2) points to higher performance that does not rely on simply increasing main memory cache size; and (3) suggests a path to lower costs and hence better cost/performance.
Prof. Jeff Ullman: Data Science: Is it Real?
DBH 6011ABSTRACT: We shall discuss the various ways in which data science is approached by different communities, including the Statistics, Machine-Learning, and Database communities. Each presents a different viewpoint and values different outcomes. Some consequences of these approaches will be discussed. As an example, of why data science is not machine learning, we shall sketch two […]
Prof. Sang-Woo Jun: Lowering the cost of large-scale data analytics via efficient use of flash storage
DBH 3011In this talk, I present the storage systems aspect of the ongoing work on using relatively cheap solid-state secondary storage to replace expensive DRAM for analytics on large amounts of data, using as examples graph analytics and the bioinformatics application somatic mutation finding. Both applications are inherently random access intensive, which is a bad fit […]
Xiangyao Yu: Transaction Processing at Scale
DBH 3011Abstract: Online transaction processing (OLTP) is critical for applications including finance, e-commerce, social networks, and healthcare. The increasing performance demands of these applications require OLTP to scale massively. Concurrency control is a major scalability bottleneck in such systems. This talk presents three projects that identify and help resolve scalability challenges. First, I present a scalability […]
Fatemeh Nargesian: Data Enrichment for Data Science
DBH 3011Data Enrichment for Data Science Fatemeh Nargesian, University of Toronto March 21, Thursday, 2019 2:00 - 3 pm, DBH 3011 Refreshments start at 1:30 pm Data Science is built on the power of data processing and data preparation. In this talk, I discuss the challenges of data preparation for end-to-end data science. Particularly, I talk […]
Pat Helland: There’s No Substitute for Interchangeability
DBH 3011Speaker: Pat Helland (Salesforce.com) Title: There's No Substitute for Interchangeability Time: 3-4 PM Place: 3011 DBH Abstract: Distributed systems have many challenges including loosely coupled systems, long running work, and distributed workflow. In addition, replication with out-of-order reconciliation is quite difficult, especially when composed with the other challenges. In this talk, we propose data-centric REST-style […]
Michal Shmueli-Scheuer : Conversational bots for customer support
DBH 4011Conversational bots for customer support Michal Shmueli-Scheuer, IBM Research - Haifa August 9, 2019, Friday, 3:00 pm - 4:00 pm, DBH 4011 Abstract: In this talk, I'll cover various aspects of conversational bots, focusing on the domain of customer support. Often, human conversations with bots mimic the way humans interact with each other. Moreover, even […]
Gift Sinthong: AsterixDB Meets Data Science
DBH 4011Abstract: In the last few years, Data Science has become an increasingly important use case for data platforms. To support the full Big Data analysis lifecycle, we have examined one of the most popular exploratory data analytics tools, Pandas, which has a serious problem: scalability. Exploratory tools such as Pandas only work well against locally […]
Multistage Adaptive Load Balancing in Big Active Data Publish Subscribe Systems
DBH 3011Speaker: Hang Time: 12:30pm Room: 3011 We address issues in the design and operation of a Big Active Data Publish Subscribe (BAD Pub/Sub) systems to enable the next generation of enriched notification systems that can scale to societal levels. The proposed BAD Pub/Sub systems aim to ingest massive amounts of data from heterogeneous publishers and […]
Texera: Supporting Big Data Analytics for Domain Experts through GUI-based workflows
DBH 3011Speakers: Avinash Kumar, Shengquan Ni, Zuozhi Wang Abstract: Big data analytics is a daunting task for domain experts such as doctors and teachers. Their non-IT background makes it challenging for them to write analytics code and maintain computing infrastructures to efficiently process large amounts of data. Existing data analytics frameworks that offer GUI-based alternatives […]
AquaEIS: Middleware Support for Event Identification in CommunityWater Infrastructures
DBH 3011Speaker: Quing Han Abstract: Real-time event identification is critical in complex distributed infrastructures, e.g., water systems, where failures are difficult to isolate. We present AquaEIS, an event-based middleware tailored to the problem of locating sources of failure (e.g., contamination) in community water infrastructures. The inherent complexity of underground hydraulic systems combined with aging infrastructure presents […]
Scalable transaction and polystore data management in LeanXcale
DBH 5011Speaker: Ricardo Jimenez-Péris (LeanXcale, Spain), Patrick Valduriez (Inria, France) Abstract: Hybrid Transaction Analytical Processing (HTAP) is poised to revolutionize data management. By providing online analytics over operational data, HTAP systems open up new opportunities in many application domains where real-time decision is critical. Important use cases are proximity marketing, real-time pricing, risk monitoring, real-time fraud […]
Scalable Programming: Progress, Prospects and Challenges (CS/NetSys Seminar)
DBH 6011Speaker: Prof. Gul Agha (University of Illinois at Urbana-Champaign) Abstract: Mobile cloud computing, social media, cyberphysical systems, and the internet of things, are examples of increasingly important applications requiring scalable concurrency. The Actor model facilitates programming large-scale concurrent applications. Not surprisingly, Actor languages and frameworks have been widely adopted in industry to address scalability. Although […]
LSM-based storage techniques: a tutorial
Speaker: Chen Luo Abstract: Recently, the log-structured merge-tree (LSM-tree) has been widely adopted for use in the storage layer of modern NoSQL systems. Because of this, there have been a large number of research efforts, from both the database community and the operating systems community, that try to improve various aspects of LSM-trees. In this […]
A Theoretical View of Distributed Systems (CS Distinguished Seminar Series)
DBH 6011Speaker: Prof. Nancy Lynch (Massachusetts Institute of Technology) Abstract: For several decades, my collaborators, students, and I have worked on theory for distributed systems, in order to understand their capabilities and limitations in a rigorous, mathematical way. This work has produced many different kinds of results, including: Abstract models for problems that are solved by […]
Building Personal Chronicle of Life Events (Final Defense)
Speaker: Jordan Oh Abstract: Human beings have always been interested in understanding themselves and their surroundings. Learning about the relationship between the two can reveal facts of the present and help predict the future, a critical part to live a better life. With the proliferation of IoT sensor devices, it is now possible to collect […]
Event Detection with Temporal Predicates
DBH 3011Speaker: Fabio Persia (Free University of Bozen-Bolzano, Italy) Abstract: Human perception tends to group individual values into larger structures, this is also the case for time series data. This tendency inspired us to define an event-detection language based on time intervals, which combines timepoint-based events into larger structures. Complex events can then be defined on […]
Effective Filters and Linear Time Verification for Tree Similarity Joins
DBH 3011Speaker: Thomas Hütter (University of Salzburg) Abstract: The tree similarity join computes all similar pairs in a collection of trees. Two trees are similar if their edit distance falls within a user-defined threshold. Previous algorithms, which are based on a filter-verify approach, suffer from the following two issues. First, ineffective filters produce a large number […]
Systems and ML at RISELab (CS Distinguished Seminar Series)
DBH 6011Speaker: Prof. Ion Stoica (University of California at Berkeley) Abstract: In this talk, I will present several of the projects we are developing at RISELab, a two-year old lab at UC Berkeley that focuses on building platforms and algorithms for real-time intelligent decisions, decisions that are secure and explainable. These projects include both systems to […]
Dr. Alfred Spector(Two Sigma) : Opportunities and Perils of Data Science: A Roadmap (ICS Distinguished Lecture)
DBH 6011Speaker: Dr. Alfred Spector Abstract: Data-driven approaches have led to powerful prediction, optimization and automation techniques. Powered by large-scale, networked computer systems and machine learning algorithms, these have been very impactful to-date and hold great promise in many disciplines, even in the humanities and social sciences. However, no new technology arrives without complications, and we […]
Pat Helland (Salesforce.com) : Depending on Appending
DBH 3011Speaker: Pat Helland (Salesforce.com) Abstract: Increasingly, we see "Gray Failures" in the datacenter and public cloud. This happens when a server, router, or other device just plain goes slow. This may result in severe problems in the user perceived performance as the slowness cascades, sometimes not slow enough to cause the exclusion of the bad […]
Babak Salimi: Causal Inference for Responsible Data Science
https://uci.zoom.us/j/232157494ABSTRACT: Scaling and democratizing access to big data promises to provide meaningful, actionable information that supports decision-making. Today, data-driven decisions profoundly affect the course of our lives, such as whether to admit applicants to a particular school, offer them a job, or grant them a mortgage. Unfair, inconsistent, or faulty decision-making raises serious concerns about […]
David Lomet: Better Database Cost/Performance via Programmable SSD Batched I/O
DBH 3011Abstract: A database storage manager should place data at the most cost/performance-effective tier in the storage hierarchy. While performance and cost both decrease with distance from the CPU, the cost/performance trade-off depends on how efficiently a storage manager can move data across tiers. Log structuring (LS) is designed to improve the cost/performance of secondary storage […]
Redesigning Storage Systems for Future Workloads, Hardware, and Performance Requirements (CS Faculty Candidate Seminar)
DBH 3011Speaker: Oana Balmau (University of Sydney) Abstract: Cloud storage stacks are being challenged by new workloads, new hardware and new performance requirements. First, workloads evolved from following a read-heavy pattern (e.g., a static web-page) to a write-heavy profile where the read:write ratio is closer to 1:1 (e.g., as in the Internet of Things). Second, the […]
Lei Cao: Toward an End-to-end Anomaly Discovery Paradigm
ABSTRACT: Anomaly detection is critical in enterprises, with applications ranging from preventing financial fraud, and defending network intrusions, to detecting imminent device failures. Although previously developed research offers a plethora of stand-alone methods for detecting particular types of anomalies, there is no end-to-end solution for data scientists to effectively discover anomalies over large volumes of […]
Aaron J. Elmore: CrocodileDB – Resource Efficient Database Execution
https://uci.zoom.us/j/92895672890Abstract: The coming end of Moore’s law requires that data systems be more judicious with computation and resources as the growth in data outpaces the availability of computational resources. Current database systems are eager and aggressively consume resources to immediately and quickly complete the task at hand. Intelligently deferring a task to a later point […]
Karthik Ranganathan: YugabyteDB – Bringing Together the Best of Amazon Aurora and Google Spanner
Speaker: Karthik Ranganathan Abstract: PostgreSQL, a single-node open-source RDBMS, is widely adopted for its powerful set of features. However, PostgreSQL is not built to be used as a cloud-native database, and therefore cannot inherently survive failures, scale horizontally or support geo-distributed deployments. While Amazon Aurora has modified the subsystem of PostgreSQL that writes to disk […]
Yiming Lin (UCI): LOCATER – Cleaning WiFi Connectivity Datasets for Semantic Localization
https://uci.zoom.us/j/95066121155Speaker: Yiming Lin, UCI Abstract: Sensor data is abundant in our life but often dirty to generate services with high quality. This talk explores the data cleaning challenges that arise in using WiFi connectivity data to locate users to semantic indoor locations such as buildings, regions, rooms. WiFi connectivity data consists of sporadic connections between […]
Phil Bernstein (MSR): Adding Data Management to Orleans – A Journey
ZoomSpeaker: Phil Bernstein Microsoft Research Abstract: I spent eight years adding database features to the Orleans object-oriented programming framework: replication, geo-distribution, transactions, and indexing. The challenge is how to do it when storage is a plug-in service that you don’t control. In this talk, I’ll describe the journey, summarizing the main technical ideas and recounting […]
Jerry Power (I3 Systems): Managing Digital Flows in a Data Driven World
ZoomSpeaker: Jerry Power Bio: Jerry Power is the CEO of I3 Systems and a founder of the I3 Consortium. I3 Systems creates real-time data networks that span organizational and geographical constraints. Prior to the formation of I3 Systems, Jerry was the Executive Director of The Institute for Communication Technology Management (CTM) at USC. Jerry has […]
Yannis Chronis (University of Wisconsin-Madison): Analytic Query Processing using Associative Computing
DBH 4011Speaker: Yannis Chronis (University of Wisconsin-Madison) Title: Analytic Query Processing using Associative Computing Abstract: We are in the midst of a "Cambrian'' hardware evolution in which a variety of architectures are being invented with a flurry that we haven't seen in a long time. The associative computing paradigm enables designs that utilize memories in new […]
Nandit Soparkar (Ubiquiti): Data-driven AI technologies for a Consumer webapp
ZoomTitle: Data-driven AI technologies for a Consumer webapp Abstract: We discuss the challenges, and the opportunity, in providing a consumer-facing data-driven AI webapp. Our presentation will include a demo, make available access to the audience, and cover the technical as also relevant business challenges being addressed. Our webapp is the new CarBeast.com (about 2 months […]
Tim Kraska (MIT): Towards instance-optimized data systems
DBH 6011Location: DBH 6011 https://uci.zoom.us/j/94559511434 (for UCI users only) Speaker: Tim Kraska, MIT Abstract: Recently, there has been a lot of excitement around ML-enhanced (or learned) algorithms and data structures. For example, there has been work on applying machine learning to improve query optimization, indexing, storage layouts, scheduling, log-structured merge trees, sorting, compression, sketches, among many other […]
Matt Ingenthron (Couchbase): Couchbase and Distributed Computing Backends for big data processing
DBH 3011Location: DBH 3011 Couchbase and Distributed Computing Backends for big data processing Speaker: Matt Ingenthron, Engineering Director, Couchbase Biography: Matt is a Couchbase co-founder and Engineering Director who leads SDK and Connector development at Couchbase. He has a deep software development background with extensive experience scaling Java, Ruby on Rails, and AMP web applications. He […]
Jayant Haritsa (IISc Bangalore): Shedding Light on Opaque Database Queries
DBH 3011Shedding Light on Opaque Database Queries location: Donald Bren Hall 3011 Speaker: Jayant Haritsa Database Systems Lab Indian Institute of Science, Bangalore Abstract: We have recently defined a new query reverse-engineering problem of unmasking SQL queries hidden within opaque database applications. […]
Anand Deshpahde (Persistent Technologies): How to build your own Business
Hybrid: DBH3011 & ZoomHow to build your own Business location: Donald Bren Hall 3011 Zoom info: the meeting will be hybrid and will also be available on zoom https://uci.zoom.us/j/96160303043 Skype for Business https://uci.zoom.us/skype/96160303043 Speaker: Anand Deshpande Founder, Chairman and Managing Director, Persistent Technologies Host: Prof. Sharad Mehrotra Abstract: In this talk Dr. Deshpande will provide insight into […]
ISG talks: Welcome Back
DBH 4011Sadeem Alsudais: Drove: Tracking Execution Results of Workflows on Large Data
DBH 4011Abstract: Data analytics using workflows is an iterative process, in which an analyst makes many iterations of changes, such as additions, deletions, and alterations of operators and their links. In many cases, the analyst wants to compare these workflow versions and their execution results to help decide the next iteration of changes. To this end, […]
Qiushi Bai: QueryBooster-Improving SQL Performance Using Middleware Services for Human-Centered Query Rewriting + Demo
DBH 4011Title: QueryBooster: Improving SQL Performance Using Middleware Services for Human-Centered Query Rewriting Abstract: Query latency is critical in many database-backed applications where users need answers quickly to gain timely insights and make mission-critical decisions. "Query rewriting" is one of the query optimization techniques which transforms SQL queries to more efficient formats based on pre-defined rewriting […]
Xiaozhen Liu: Demonstration of Collaborative and Interactive Workflow-based Data Analytics in Texera
DBH 4011Abstract: Collaborative data analytics is becoming increasingly important due to the higher complexity of data science, more diverse skills from different disciplines, more common asynchronous schedules of team members, and the global trend of working remotely. In this demo we will show how Texera supports this emerging computing paradigm to achieve high productivity among collaborators […]
Abhishek Singh: WedgeBlock – An Off-Chain Secure Logging Platform for Blockchain Applications
DBH 4011Abstract In recent years, there has been a growing interest in building blockchain-based decentralized applications (DApps). DApps typically consist of two components: an on-chain component that implements the logic of the application and runs on blockchain as a smart contract, and an off-chain component that runs on a regular server to receive and process user […]
Juncheng Fang: PeloPartition- Improving Blockchain Resilience to Partitioning by Sharding
DBH 4011Abstract: Blockchain has gained considerable traction over the last few years and plays a critical role in realizing decentralized and cryptocurrency applications. A challenge that has been overlooked in prior blockchain algorithms is that they do not consider large-scale network outages and relied on the assumption of reliable global network connectivity. In the event of […]
Peeyush Gupta: A Demonstration of TippersDB
DBH 4011Abstract: In the talk, I'll present TippersDB, a middleware system designed to build sensor-based smart space analytical applications. TippersDB supports a powerful data model that decouples semantic data about the application domain from sensor data using which the semantic data is derived. By supporting mechanisms to map/translate data, concepts, and queries between the two levels, TippersDB […]
Glenn Galvizo: Navigational Pattern Matching w/ Graphix
DBH 4011Abstract: Users aiming to perform scalable graph analytics on large datasets are stuck between a rock and a hard place. On one side, a user works with an intuitive data model and query language chained to a system that cannot gracefully scale across multiple machines (i.e. the rock). On the other side, a user works […]
Andrew Chio: SmartSPEC: Customizable Smart Space Datasets via Event-Driven Simulations
DBH 4011Bio - Andrew is a 4th year Ph.D. student in the Distributed Systems Middleware (DSM) group under the supervision of Professor Nalini Venkatasubramanian. His general research interests revolve around middleware, data mining and analytics, optimization, and machine learning. Abstract - In this talk, we present SmartSPEC, an approach to generate customizable smart space datasets using […]
Tung-Chun Chang: SmartParcels: Cross-Layer IoT Planning for Smart Communities
DBH 4011Abstract: The emergence of IoT-aided smart communities has created the need for a new set of urban planning tools. The extra design process includes instrumenting infrastructures (sensing, networking, and computing devices) in smartspaces to generate information units (from data analytics) to realize a range of required services. We propose SmartParcels, a framework that generates a […]
Aaron Elmore: Adventures in Database Compression
TBDProf. Aaron Elmore University of Chicago Abstract: Columnar databases enable effective compression by improving entropy through attribute locality and provides opportunities for fast query execution directly on compressed data. In this talk I will briefly overview how compressed query execution works in columnar systems and discuss techniques developed by our group over the past several […]
Aaron Elmore: CrocodileDB: Resource Efficient Database Execution (CS Seminar)
DBH 6011Prof. Aaron Elmore University of Chicago Abstract: The coming end of Moore’s law requires that data systems be more judicious with computation and resources as the growth in data outpaces the availability of computational resources. Current database systems are eager and aggressively consume resources to immediately and quickly complete the task at hand. Intelligently deferring a […]
Yiming Lin: QUIP: Query-driven Missing Value Imputation
DBH 4011QUIP: Query-driven Missing Value Imputation This paper develops a query-time missing value imputation frame- work, entitled QUIP, that minimizes the joint costs of imputation and query execution. QUIP achieves this by modifying how rela- tional operators are processed. It adds a cost-based decision function in each operator that checks whether the operator should invoke imputation […]
Shanshan Han: Veil: Storage and Communication Efficient Volume Hiding Algorithms
DBH 4011February 17, 2023, Friday, 1:00 PM - 2 PM Donald Bren Hall 4011, ICS, UC Irvine Zoom: https://uci.zoom.us/j/92445274511 (UCI only) Abstract Volume leakage is a major threat to searchable encryption and data outsourcing, where an adversary can obtain the number of values in response to a query and deduce additional information about the data, such as the […]
Babak Salimi (UCSD): Certifying the Fairness of Predictive Models in the Face of Selection Bias
DBH 4011The Department of Computer Science, UC Irvine WELCOMES Prof. Babak Salimi UCSD Hosts: Prof. Chen Li Certifying the Fairness of Predictive Models in the Face of Selection Bias Abstract: The widespread use of data-driven algorithmic decision making in crucial areas such as hiring, loan assessments, medical diagnoses, and pretrial release has raised questions about […]
Alex Behm (Databricks): Photon: How to think vectorized
DBH 4011The Department of Computer Science, Information Systems Group, UC Irvine WELCOMES Dr. Alex Behm Databricks Photon: How to think vectorized 3/3/2023, Friday, 1:00 - 2 pm Place DBH 4011 I'm presenting Photon, a new vectorized execution engine powering Databricks written from scratch in C++. I will introduce you to its basic building blocks by walking […]
Fangqi Liu: DOME: Drone-assisted Monitoring of Emergent Events For Wildland Fire Resilience
DBH 4011Abstract: By serving as "eyes in the sky," data obtained from a carefully coordinated set of drones equipped with sensors have the potential to enable continuous monitoring of mission-critical events. We develop a Drone-assisted Monitoring system, DOME, that gathers real-time data for situational awareness in emergent and evolving events. The driving use case for this […]
C. Mohan: A Survey of Cloud Database Systems
DBH 3011C. Mohan Distinguished Visiting Professor, Tsinghua University, China & Member, Board of Governors (Digital University Kerala, India) & Retired IBM Fellow (IBM Research, USA) "A Survey of Cloud Database Systems" ABSTRACT: In this talk, I will first introduce traditional (non-cloud) parallel and distributed database systems. Concepts like SQL and NoSQL systems, data replication, distributed and parallel query […]
Zuozhi Wang: Texera: A System for Collaborative and Interactive Data Analytics Using Workflows (PhD Final Defense)
Abstract In the world of data analytics, domain experts, such as public health scientists and medical researchers, play a crucial role as their domain knowledge can unlock valuable insights from data. However, they face several challenges in the current landscape of data analytics tools. They often lack the technical skills necessary to analyze large datasets, […]
Quishi Bai: Maliva: Using Machine Learning to Rewrite Visualization Queries Under Time Constraints
DBH 4011Abstract: As a powerful way for people to gain insights from data quickly and intuitively, visualization is becoming increasingly important in the Big Data era. Considering data-visualization systems where a middleware layer translates a frontend request to a SQL query to a backend database to compute visual results. In this talk, we study the problem of […]
Farzad Habibi: Metastable Failures in Consensus Algorithms
DBH 4011Abstract Metastable failure is a recent abstraction of a pattern of failures in distributed systems. A metastable failure is characterized as "permanent overload with an ultra-low goodput." Prior research has proposed a framework for understanding metastable failure and has observed various cases of such failures in real-world settings. In this talk, we discuss the challenge […]
CS Seminar: Prof. Arun Kumar: The New DBfication of ML/AI
DBH 6011The Department of Computer Science, UC Irvine WELCOMES Prof. Arun Kumar UCSD 5/12/2023, Friday, 11:00 am - noon Place DBH 6011 Abstract: The recent boom in ML/AI applications has brought into sharp focus the pressing need for tackling the concerns of scalability, usability, and manageability across the entire lifecycle of ML/AI applications. The ML/AI world […]
Yiming Lin: Auto-BI: Automatically Build BI-Models Leveraging Local Join Prediction and Global Schema Graph
DBH 4011Abstract: Business Intelligence (BI) is crucial in modern enterprises and billion-dollar business. Traditionally, technical experts like database administrators would manually prepare BI-models (e.g., in star or snowflake schemas) that join tables in data warehouses, before less-technical business users can run analytics using end-user dashboarding tools. However, the popularity of self-service BI (e.g., Tableau and Power-BI) […]
Qiushi Bai: Improving SQL Performance Using Middleware-Based Query Rewriting
DBH 4011Abstract: Query performance is critical in database-supported applications where users need answers quickly to make timely decisions. Traditional databases rely on rewriting queries to improve SQL performance. With the emergence of business intelligence and interactive visualization applications, databases often miss opportunities to rewrite their queries, due to reasons such as failure to adopt high-accuracy time […]
Saeed Kargar: Hamming Tree: The case for Energy-Aware Indexing for NVMs
DBH 4011Zoom Link: https://uci.zoom.us/j/8045933305 Abstract NVM technologies play a crucial role in data storage solutions as well as in battery-powered mobile and IoT devices. However, the challenges of wear-out and energy efficiency need to be addressed for the widespread adoption of NVM. In this presentation, I will discuss our research endeavors aimed at enhancing various aspects of […]
Hari Kishore Chaparala: When (Apache) AsterixDB Hit An (Apache) Iceberg
DBH 4011Abstract Apache Iceberg is an open-source table format with rich data management capabilities, including schema evolution, time travel, and efficient data pruning. It offers a reliable foundation for storing and organizing data in a data lake environment. Iceberg specification allows multiple query engines to safely operate on the same data simultaneously. In this talk, we […]
Glenn Galvizo: Removing the ‘A’ in DAG: Navigational Queries in Hyracks
DBH 4011Abstract The need to “view” existing data under different models (e.g. JSON to graph) is a requirement seen in many modern applications. A naive solution involves utilizing narrow-purposed systems to handle each model, however, this multi-DBMS architecture significantly increases the cost of owning one’s data. For Apache AsterixDB users, we offer Graphix as a way […]
Suyash Gupta(UC Berkeley): Dissecting BFT Consensus: In Trusted Components we Trust!
DBH 4011The Information Systems Group (ISG) at UC Irvine welcomes Suyash Gupta UC Berkeley Dissecting BFT Consensus: In Trusted Components we Trust! ABSTRACT The growing interest in reliable multi-party applications has fostered widespread adoption of Byzantine Fault-Tolerant (bft) consensus protocols. Existing bft protocols need f more replicas than Paxos-style protocols to prevent equivocation attacks. trust-bft protocols seek to minimize this cost by making use of trusted components at replicas. This paper makes two contributions. First, we analyze the design of existing trust-bft protocols and uncover three fundamental limitations that preclude most practical deployments. Some of these limitations are fundamental, while others are linked to the state of trusted components today. Second, we introduce a novel suite of consensus protocols, FlexiTrust, that attempts to sidestep these issues. We show that our FlexiTrust protocols achieve up to 185% more throughput than their trust-bft counterparts. BIO Suyash Gupta is a postdoctoral researcher at the SkyLab, University of California, Berkeley. He is also the Lead Architect of ResilientDB fabric. Prior to joining Berkeley, he received his Ph.D. degree from University of California, Davis. He also holds two Master of Science degrees; one from Purdue University and another from Indian Institute of Technology Madras. His current research focuses on attaining safe and efficient, fault tolerant distributed consensus and communication. He has also co-authored a book on fault-tolerant distributed transaction processing at Morgan & Claypool. He has been awarded the Best Graduate Researcher Award for 2021 by UC Davis and Best Paper Award at EuroSys'23. In his free time, Suyash likes to code and his team won Best Hacker Award at BostonHacks, HackIllinois, and HackPrinceton, among others.
Boon Thau Loo(UPenn): Towards Full-Stack Adaptivity in Permissioned Blockchain Systems
DBH 6011The Computer Science Department and Information Systems Group (ISG) at UC Irvine welcomes Boon Thau Loo University of Pennsylvania Towards Full-Stack Adaptivity in Permissioned Blockchain Systems October 20, 2023 at 11:00AM DBH 6011 ABSTRACT Permissioned blockchain systems are an emerging instance of untrustworthy distributed databases. As novel smart contracts, modern hardware, and new […]
Ken Birman (Cornell): Cascade: A Platform for Fast Edge Intelligence
DBH 6011The Computer Science Department and Information Systems Group (ISG) at UC Irvine welcomes Ken Birman Cornell University Cascade: A Platform for Fast Edge Intelligence October 27, 2023 at 11:00AM DBH 6011 ABSTRACT There is a growing need to apply machine intelligence and learning at the edge of the cloud. Doing so would reduce delays […]
Nada Lahjouji: ProBE: Proportioning Privacy Budget for Complex Exploratory Decision Support
DBH 4011ProBE: Proportioning Privacy Budget for Complex Exploratory Decision Support Nada Lahjouji PhD Student, UC, Irvine Abstract Decision support (DS) applications play a crucial role in analyzing large volumes of data to produce valuable insights that facilitate informed decision-making. Such data can, however, contain sensitive information about individuals that requires privacy-preserving mechanisms to prevent data leaks, […]
Vishal Chakraborty: Much Ado About Data-Undo: Semantically Meaningful Data Erasure
DBH 4011Title: Much Ado About Data-Undo: Semantically Meaningful Data Erasure Abstract: Data regulations, such as GDPR and CCPA, are increasingly being adopted globally to protect against unsafe data management practices. Such regulations are, often ambiguous (with multiple valid interpretations) when it comes to defining the expected dynamic behaviour of data processing systems. We will argue and show […]
Shahram Ghandeharizadeh(USC): Intelligent 3D Multimedia Displays using Flying Light Specks
DBH 6011The Computer Science Department and Information Systems Group (ISG) at UC Irvine welcomes Shahram Ghandeharizadeh University of Southern California Intelligent 3D Multimedia Displays using Flying Light Specks January 12 at 11:00AM DBH 6011 Abstract: A Flying Light Speck, FLS, is a miniature sized drone equipped with one or more light sources to generate different […]
Henry F. Korth (Lehigh University): Blockchain: Computer Science Foundations, Positive Social and Business Impact, and Research Opportunities
DBH 6011The Computer Science Department and Information Systems Group (ISG) at UC Irvine welcomes Henry F. Korth Lehigh University Blockchain: Computer Science Foundations, Positive Social and Business Impact, and Research Opportunities January 19 at 11:00AM DBH 6011 Abstract: To start, basic concepts of blockchain systems will be introduced assuming only a basic background in computing. […]
Volker Markl (TU Berlin): Mosaics of Big Data: Database Systems and Information Management – Trends and a Vision
DBH 4011Prof. Dr. Volker Markl Chair of the Database Systems and Information Management (DIMA) Group at TU Berlin Director of the Berlin Institute for the Foundations of Learning and Data (BIFOLD) Chief Scientist and Head of the Intelligent Analytics for Massive Data Research Group at German Research Center for Artificial Intelligence (DFKI) Mosaics of Big Data […]
Shengquan Ni: Supporting time-travel debugging in Texera
Title: Supporting time-travel debugging in Texera Speaker: Shengquan Ni Abstract: Dataflow systems, traditionally used for relational analysis, now support a variety of tasks including complex user-defined functions. As dataflow jobs become more diverse and complex, there is an increasing need for better debugging support to understand their runtime behaviors and identify issues either in data […]
Joseph Hellerstein (UC Berkeley): Hydro: A Compiler Stack for Distributed Programs
DBH 6011The Computer Science Department and Information Systems Group (ISG) at UC Irvine welcomes Joseph Hellerstein UC Berkeley and Sutter Hill Ventures TITLE: Hydro: A Compiler Stack for Distributed Programs ABSTRACT: Nearly all programs of interest today are distributed. Unfortunately, the traditional languages and compilers in common use today offer little assistance in ensuring the correctness […]
Raul Castro Fernandez (U. Chicago): On Data Ecology, Data Markets, the Value of Data, and Dataflow Governance
DBH 4011Abstract: Data shapes our social, economic, cultural, and technological environments. Data is valuable, so people seek it, inducing data to flow. The resulting dataflows distribute data and thus value. For example, large Internet companies profit from accessing data from their users, and engineers of large language models seek large and diverse data sources to train […]
Yunyan Ding: Efficient Mouse Brain Image Processing Using Collaborative Data Workflows on Texera
DBH 4011Abstract: In the field of neuroscience, accurately mapping the complex three-dimensional (3D) neural circuitry and architecture of the brain is crucial for advancing our understanding of brain functions and disorders. In this study, we introduce a distributed computational pipeline designed for processing high-resolution mouse brain tile images captured by TissueCyte. This pipeline efficiently and accurately […]
Bratin Saha (AWS Amazon): Scaling Generative AI in the Enterprise
DBH 4011Abstract: Machine learning (ML) and generative artificial intelligence (AI) is one of the most transformational technologies that is opening up new opportunities for innovation in every domain across software, finance, health care, manufacturing, media, entertainment and others. This talk will discuss the key trends that are driving AI/ML innovation, how enterprises are using AI/ML today […]
Yinan Zhou: SpendableDB: A UTxO-based decentralized Database
DBH 4011Abstract: Blockchain technology has attracted a significant amount of attention ever since the Bitcoin blockchain's success. Currently, most of the research and engineering efforts have been centered around monetary transactions such as token exchange protocols. The potential of building databases on top of blockchains is largely overlooked and remains an open problem. The literature on blockchain databases is divided into permissioned blockchains and permissionless account-based blockchains. However, the former is not fully decentralized, and the latter suffers from challenges in performance and cost. We propose SpendableDB, a permissionless UTxO-based blockchain database as a novel approach to the problem of data decentralization. Our design integrates data into individual UTxOs to achieve true decentralization of data ownership that can be securely transferred and traded, similar to how the regular monetary UTxOs are protected by the underlying blockchain's decentralization protocol. Additionally, SpendableDB provides cryptographically secured data integrity and immutable data lineage that can be easily verified. Our implementation and experiments show that our design is economically practical as it incurs a small amount of blockchain transaction fees. Bio: Yinan Zhou is a second-year Ph.D. student in the Computer Science Department at UC Irvine. His primary research focus is on blockchain infrastructure and application developments.
Lukasz Golab (University of Waterloo): Understanding models and the data they learn from
DBH 4011Lukasz Golab (U. Waterloo) Understanding models and the data they learn from Abstract: The modern world is powered by data. However, as the capabilities of data-intensive systems grow, so does their complexity, making them hard to understand and troubleshoot. I will discuss my lab's efforts towards understanding models and the data they learn from, including […]
Juncheng Fang: ImmortalChopper: Real-Time and Resilient Distributed Transactions in the Edge-Cloud
DBH 4011Abstract: Emerging applications in the areas of real-time Internet of Things (IoT) and edge technologies (such as wearables, and mobile headsets) require fast processing and response times. This motivates the utilization of edge nodes for both processing and storage of data. In settings with a vast number of edge nodes---such as the case of smart […]
Mohammed Al-Kateb (Amazon Redshift): The Evolution of Amazon Redshift
DBH 4011Abstract: In this talk, we will discuss the evolution of Amazon Redshift over the past 10 years. We’ll discuss the Amazon Redshift architecture. We’ll dive deep in the lifecycle of executing a query in Amazon Redshift. And we’ll examine how Amazon Redshift continues to maintain a leading price/performance in the market. Bio: Mohammed Alkateb leads […]
Xinyuan Lin: Data Science Tasks Implemented with Scripts versus GUI-Based Workflows: The Good, the Bad, and the Ugly.
DBH 4011Abstract: As leveraging large-scale data analytics becomes the norm for many applications, platforms for developing these capabilities have become increasingly important. This work compares the benefits and drawbacks of implementing two commonly used data science platform paradigms: code-based scripts and GUI-based workflows. We implement tasks in both paradigms that provide examples of phases in the […]
Mike Heddes: Efficient Cardinality Estimation of Multi-Join Queries using Count Sketches
DBH 4011Abstract: Cardinality estimates are a primary input to query optimizers to determine an appropriate join order. The seminal AMS sketch can estimate the cardinality of an equi-join between two relations using little space. Since then, two important advancements are the Count sketch, a method which significantly improves upon the sketching time, and secondly, an extension […]
Pat Helland (Salesforce): Scalable OLTP in the Cloud: What’s the BIG DEAL?
DBH 4011Abstract: The pursuit of scalable OLTP systems has been the holy grail of my career. Because OLTP systems are typically split into applications and databases, the isolation semantics provided by the DB and used by the app have a major impact on the scalability of the OLTP system as a whole. The isolation semantics are […]
Mohammad Sadoghi (UC Davis): The Journey of Building Global-Scale Sustainable Blockchain Fabric
DBH 6011Abstract The inception of Bitcoin and blockchain has renewed the vision of a democratic and decentralized computational paradigm, that is, to ingrain integrity, transparency, and accountability into the very fabric of the computational model. These fundamental concepts and the technologies behind them--a generic ledger-based data model, cryptographically ensured data integrity and transparent and accountable consensus-based […]
Aditya Parameswaran (Berkeley): Enhance, Don’t Replace: A Recipe for Success in Data Tooling
DBH 6011Enhance, Don't Replace: A Recipe for Success in Data Tooling Abstract: Most data analysis and data science is performed in human-centered tools, such as spreadsheets, visual analytics tools, and data science libraries. However, these tools often pose challenges for end-users, especially those without extensive programming expertise, in terms of scalability, interactivity, and usability. Rather than forcing […]
Arnab Nandi (OSU): Data Exploration in a Camera-first World: Query and Result Challenges
DBH 4011Prof. Arnab Nandi Associate Professor, Computer Science and Engineering The Ohio State University Friday, October 11, 2024 at 11 a.m. Donald Bren Hall 6011 Title: "Data Exploration in a Camera-first World: Query and Result Challenges" Abstract: The pervasive availability of cameras in smartphones, vehicles, drones and more has triggered a new "camera-first" data revolution across […]
Nika Mansouri Ghiasi (ETH): Storage-Centric Computing for Genomics and Metagenomics
DBH 4011Title: Storage-Centric Computing for Genomics and Metagenomics Abstract Genomics and metagenomics applications have enabled significant advancements in many critical areas. The exponential growth of genomic data poses unprecedented challenges in genomics and metagenomic applications. These applications suffer from significant data movement overheads from the storage system. To fundamentally address these overheads, we make a case […]
Yannis Papakonstantinou (Google): Vector Search and Databases
DBH 6011Yannis Papakonstantinou Distinguished Engineer, Query Processing and GenAI at Google Cloud Databases Abstract: Semantic search ability, via embedding (vectors) and vector indexing, has been added to Google Cloud Platform (GCP) databases in order to enable GenAI applications. The inclusion of vectors in databases confers many of the traditional benefits of databases: Developers can now develop […]
Michael Jungmair (TU Munich): A Compiler-Centric Query Engine Design for Mixed Workloads and Modern Hardware
DBH 3011A Compiler-Centric Query Engine Design for Mixed Workloads and Modern Hardware 11/1/2024, 1:00 PM 2 PM, DBH 3011 Michael Jungmair, Technical University of Munich, Germany Abstract: Relational query engines are increasingly expected to handle more than just relational queries and also run on modern hardware that is increasingly parallel and distributed. However, it is not clear how existing system designs can deal with these two challenges effectively. We propose a holistic, compiler-centric design for data processing systems that is designed for tightly integrated optimization and execution of relational queries, non-relational workloads and user-defined functions on modern hardware. Bio: Michael Jungmair is a third year PhD student at the Technical University of Munich. Supervised by Jana Giceva, he is performing research in the intersection of database engines and compiler technology. So far, this research culminated in the design and implementation of LingoDB (lingo-db.com), a novel query engine based on the MLIR compiler framework