BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Information Systems Group - ECPv6.4.0.1//NONSGML v1.0//EN
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-WR-CALNAME:Information Systems Group
X-ORIGINAL-URL:https://isg.ics.uci.edu
X-WR-CALDESC:Events for Information Systems Group
REFRESH-INTERVAL;VALUE=DURATION:PT1H
X-Robots-Tag:noindex
X-PUBLISHED-TTL:PT1H
BEGIN:VTIMEZONE
TZID:America/Los_Angeles
BEGIN:DAYLIGHT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
TZNAME:PDT
DTSTART:20180311T100000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
TZNAME:PST
DTSTART:20181104T090000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
TZNAME:PDT
DTSTART:20190310T100000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
TZNAME:PST
DTSTART:20191103T090000
END:STANDARD
BEGIN:DAYLIGHT
TZOFFSETFROM:-0800
TZOFFSETTO:-0700
TZNAME:PDT
DTSTART:20200308T100000
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0700
TZOFFSETTO:-0800
TZNAME:PST
DTSTART:20201101T090000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20201120T150000
DTEND;TZID=America/Los_Angeles:20201120T160000
DTSTAMP:20260521T000718
CREATED:20201120T091507Z
LAST-MODIFIED:20210409T181135Z
UID:1340-1605884400-1605888000@isg.ics.uci.edu
SUMMARY:Yiming Lin (UCI): LOCATER - Cleaning WiFi Connectivity Datasets for Semantic Localization
DESCRIPTION:Speaker:\nYiming Lin\, UCI \nAbstract:\nSensor data is abundant in our life but often dirty to generate services with high quality. This talk explores the data cleaning challenges that arise in using WiFi connectivity data to locate users to semantic indoor locations such as buildings\, regions\, rooms. WiFi connectivity data consists of sporadic connections between devices and nearby WiFi access points (APs)\, each of which may cover a relatively large area within a building. Our system\, entitled semantic LOCATion cleanER (LOCATER)\, postulates semantic localization as a series of data cleaning tasks – first\, it treats the problem of determining the AP to which a device is connected between any two of its connection events as a missing value detection and repair problem. It then associates the device with the semantic subregion (e.g.\, a conference room in the region) by postulating it as a location disambiguation problem. LOCATER uses a bootstrapping semi-supervised learning method for  coarse localization and a probabilistic method to achieve finer localization. Evaluation on both real and synthetic datasets shows that LOCATER can achieve significantly high accuracy  at both the coarse and fine levels. \nBio: \nYiming Lin is a Ph.D. student in the Department of Computer Science at the University of California\, Irvine. He is currently doing research on data cleaning and data stream analysis. He is the recipient of Hasso Plattner Institute Fellowship since 2020.  \nZoom Link:\nhttps://uci.zoom.us/j/95066121155
URL:https://isg.ics.uci.edu/event/locater-cleaning-wifi-connectivity-datasets-for-semantic-localization/
LOCATION:https://uci.zoom.us/j/95066121155
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20201023T150000
DTEND;TZID=America/Los_Angeles:20201023T160000
DTSTAMP:20260521T000718
CREATED:20201017T155305Z
LAST-MODIFIED:20210409T181153Z
UID:1337-1603465200-1603468800@isg.ics.uci.edu
SUMMARY:Karthik Ranganathan: YugabyteDB - Bringing Together the Best of Amazon Aurora and Google Spanner
DESCRIPTION:Speaker: \n\nKarthik Ranganathan\n\nAbstract: \nPostgreSQL\, a single-node open-source RDBMS\, is widely adopted for its powerful set of features. However\, PostgreSQL is not built to be used as a cloud-native database\, and therefore cannot inherently survive failures\, scale horizontally or support geo-distributed deployments. While Amazon Aurora has modified the subsystem of PostgreSQL that writes to disk along with simplifying async replication to make the database resilient to failures\, it does not address horizontal scalability or geo-distribution. Google Spanner is a distributed SQL database that addresses these features as well\, however it does not offer many of the powerful features of PostgreSQL. YugabyteDB\, a fully open-source distributed SQL database\, aims to combine the best of these in an attempt to build a very compelling\, cloud-native database. \n  \nIn this talk\, we will look at the architecture of YugabyteDB that enables it to support all PostgreSQL features along with distributed transactions\, resilience\, scalability\, and geo-distribution of data. \n\n\n\n\n\n\n\n\n\n\nBio:\n\nKarthik was one of the original database engineers at Facebook responsible for building distributed databases including Cassandra and HBase. He is an Apache HBase committer\, and also an early contributor to Cassandra\, before it was open-sourced by Facebook. He is currently the co-founder and CTO of the company behind YugabyteDB\, a fully open-source distributed SQL database for building cloud-native and geo-distributed applications. \nZoom link \n\n\n\n\n\n\n\n\n\n\n\nhttps://uci.zoom.us/j/97063664357
URL:https://isg.ics.uci.edu/event/yugabytedb-bringing-together-the-best-of-amazon-aurora-and-google-spanner/
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20201016T140000
DTEND;TZID=America/Los_Angeles:20201016T150000
DTSTAMP:20260521T000718
CREATED:20200929T234710Z
LAST-MODIFIED:20210409T181216Z
UID:1333-1602856800-1602860400@isg.ics.uci.edu
SUMMARY:Aaron J. Elmore: CrocodileDB - Resource Efficient Database Execution
DESCRIPTION:Abstract:\nThe coming end of Moore’s law requires that data systems be more judicious with computation and resources as the growth in data outpaces the availability of computational resources. Current database systems are eager and aggressively consume resources to immediately and quickly complete the task at hand. Intelligently deferring a task to a later point in time can increase result reuse\, reduce work that might later be invalidated\, or avoid unnecessary work altogether. In this talk I will introduce CrocodileDB\, a resource-efficient database system that automatically optimizes deferment based on user-specification and workload prediction. CrocodileDB integrates new ways of specifying timing information\, new query execution policies\, new task schedulers\, and new data loading schemes. In particular\, this talk will highlight two new query execution paradigms\, Intermittent Query Processing and Incremental-Aware Query Execution. \nZoom link:\nhttps://uci.zoom.us/j/92895672890 \nAuthor bio:\nAaron J. Elmore is an Assistant Professor in the Department of Computer Science\, and the College of the University of Chicago. Aaron was previously a Postdoctoral Associate at MIT. Aaron’s thesis on Elasticity Primitives for Database-as-a-Service was completed at the University of California\, Santa Barbara. His recent research interests focus on building data systems that address the growing data deluge. This includes building a resource-efficient database system (CrocodileDB)\, new compression techniques for databases (DenseStore and EdgeTSD)\, bringing git-like functionality to Databases (Datahub\, Decibel\, and OrpheusDB)\, and data discovery (DataSwamp and Relic). His previous projects included polystores (BigDAWG)\, elastic databases (E-Store)\, and database migration (Squall and Zephyr). He is currently an associate editor for SIGMOD record\, and has served as co-chair for SIGMOD demonstration track\, the inaugural SIGMOD student research competition\, and VLDB proceeding editor.
URL:https://isg.ics.uci.edu/event/crocodiledb-resource-efficient-database-execution/
LOCATION:https://uci.zoom.us/j/92895672890
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20200427T110000
DTEND;TZID=America/Los_Angeles:20200427T120000
DTSTAMP:20260521T000718
CREATED:20200424T192059Z
LAST-MODIFIED:20200424T192059Z
UID:1291-1587985200-1587988800@isg.ics.uci.edu
SUMMARY:Lei Cao: Toward an End-to-end Anomaly Discovery Paradigm
DESCRIPTION:ABSTRACT: \nAnomaly detection is critical in enterprises\, with applications ranging from preventing financial fraud\, and defending network intrusions\, to detecting imminent device failures. \nAlthough previously developed research offers a plethora of stand-alone methods for detecting particular types of anomalies\, there is no end-to-end solution for data scientists to effectively discover anomalies over large volumes of varied data. To build such a system\, several critical challenges have to be solved: How to determine which among many alternative anomaly detection algorithms is the best for a given task and to find the proper parameter settings? How to leverage a small amount of end-user feedback to improve the anomaly extraction process? How to best present the anomaly detection results such that users do not have to evaluate the potentially large number of anomaly candidates one by one? \nThis talk will present our ADP solution that solves all above problems. ADP supports all stages of anomaly discovery by seamlessly integrating anomaly-related services within one integrated platform. It enables the tuning free anomaly detection and achieves the power of sense making afforded by anomaly summarization and explanation services\, while allowing the users to easily steer the discovery process with human ingenuity. \nBio: \nDr. Lei Cao is a Postdoc Associate at MIT CSAIL\, working with Prof. Samuel Madden and Prof. Michael Stonebraker. Before that he worked at IBM T.J. Watson Research Center as a Research Staff Member. He received his Ph.D. in Computer Science from Worcester Polytechnic Institute\, supervised by Prof. Elke Rundensteiner. His recent research is focused on developing end-to-end tools for data scientists to effectively make sense of data. \n\nZoom Linj: https://uci.zoom.us/j/808173003
URL:https://isg.ics.uci.edu/event/lei-cao-toward-an-end-to-end-anomaly-discovery-paradigm/
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20200420T110000
DTEND;TZID=America/Los_Angeles:20200420T120000
DTSTAMP:20260521T000718
CREATED:20200222T010700Z
LAST-MODIFIED:20200222T010700Z
UID:1285-1587380400-1587384000@isg.ics.uci.edu
SUMMARY:Redesigning Storage Systems for Future Workloads\, Hardware\, and Performance Requirements (CS Faculty Candidate Seminar)
DESCRIPTION:Speaker: Oana Balmau (University of Sydney) \nAbstract: Cloud storage stacks are being challenged by new workloads\, new hardware and new performance requirements. First\, workloads evolved from following a read-heavy pattern (e.g.\, a static web-page) to a write-heavy profile where the read:write ratio is closer to 1:1 (e.g.\, as in the Internet of Things). Second\, the hardware is undergoing rapid changes.  The divide between fine-grained volatile memory and slow block-level storage is rapidly being bridged by the emerging byte-addressable non-volatile memory devices and the fast block-addressable NVMe SSDs (e.g.\, Intel Optane NVMe SSDs). Third\, performance requirements in storage systems now emphasize low tail latency\, in addition to high throughput.  \nIn this talk I will argue that existing storage systems have fundamental limitations that do not allow them to fully meet these challenges\, and that therefore the storage stack needs to undergo radical change. In particular\, using state-of-the-art key-value stores I will demonstrate that with modern workloads and hardware the bottleneck shifts from I/O to CPU\, invalidating an assumption that has underpinned all past storage system design. In line with this observation I will then present a new design paradigm for key-value stores that departs from the conventional wisdom of optimizing disk usage and instead optimizes CPU usage. To do so\, we keep data unsorted on disk\, reduce contention for shared data structures\, and do away with expensive maintenance operations. \nThis design has been implemented in the KVell key-value store. KVell outperforms state-of-the-art key-value stores such as RocksDB in both read- and write-heavy workloads\, running on modern NVMe SSDs.  Thanks to its novel design\, KVell achieves up to 5x better throughput\, and up to two orders of magnitude lower tail latency. \nBio: Oana Balmau is a PhD candidate in the School of Computer Science at the University of Sydney\, advised by Prof. Willy Zwaenepoel. She earned her Bachelors and Masters degrees in Computer Science from EPFL\, Switzerland. Her research interests are computer systems and storage technologies. Currently\, she is working towards redesigning storage systems\, aimed at future workloads and hardware. (Homepage: oanabalmau.com)
URL:https://isg.ics.uci.edu/event/redesigning-storage-systems-for-future-workloads-hardware-and-performance-requirements-cs-faculty-candidate-seminar/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20200417T123000
DTEND;TZID=America/Los_Angeles:20200417T140000
DTSTAMP:20260521T000718
CREATED:20191118T211239Z
LAST-MODIFIED:20200117T182717Z
UID:1228-1587126600-1587132000@isg.ics.uci.edu
SUMMARY:David Lomet: Better Database Cost/Performance via Programmable SSD Batched I/O
DESCRIPTION:Abstract: A database storage manager should place data at the most cost/performance-effective tier in the storage hierarchy.  While performance and cost both decrease with distance from the CPU\, the cost/performance trade-off depends on how efficiently a storage manager can move data across tiers.  Log structuring (LS) is designed to improve the cost/performance of secondary storage by writing batches of pages from main memory to secondary storage when using a conventional block-at-a-time I/O interface.  The advent of programmable SSDs changes the game.  A storage manager no longer has to work within the limitations of the block-at-a-time interface.  In this paper\, we re-program an SSD controller’s flash translation layer into one that enables a storage manager to batch pages without itself needing to implement LS.  This removes the CPU computational and I/O load associated with LS recovery and garbage collection.  We compare the Bw-tree key-value store with its LLAMA host-based LS to the same key-value software stack but executing on our batch-interface SSD.  Our experimental results show the benefits of eliminating redundancies\, minimizing interactions across storage layers\, and avoiding the CPU cost of providing LS. \nShort Bio: \nDavid Lomet founded the Database Group at Microsoft Research Redmond in 1995 and managed it for 20 years. His research career began at IBM where\, while on a 1975-76 sabbatical at the University of Newcastle-on-Tyne\, he invented atomic actions (a form of transactions).  He later worked at Wang Institute as a faculty member\, and at Digital Equipment Corporation as a software architect and research staff member. He received a Ph.D. in computer science from the University of Pennsylvania. \nLomet’s primary focus has been the engineering of database systems\, with a focus on database system kernels.  His work on concurrency control and recovery contributed to making DEC’s Rdb and Microsoft’s SQL Server database management systems leaders in cost/performance. His Deuteronomy research project’s latch-free Bw-tree index and log structure store are key elements in Microsoft’s Hekaton main memory database and Azure Cosmos DB cloud data service.  Deuteronomy won the Microsoft Research Redmond “2017 Best Research Project” Award.  Lomet is an author of over 120 papers and over 60 patents. \nLomet has won IEEE awards as well as the ACM SIGMOD Contributions Award and the Computer Society TCDE Service Award  for his 25 year tenure as EIC of the IEEE Data Engineering Bulletin.  HE has also served as editor of ACM TODS\, VLDB Journal and others\, and has been a member of the VLDB Board.  He has been a PC co-chair for ICDE and VLDB.  He is a member of the IEEE Computer Society Board of Governors and society Secretary\, and has been First Vice President and Treasurer.  He is a fellow of IEEE\, ACM\, and AAAS\, and a member of the National Academy of Engineering. \n  \nLunch provided.
URL:https://isg.ics.uci.edu/event/david-lomet-tbd/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20200414T110000
DTEND;TZID=America/Los_Angeles:20200414T120000
DTSTAMP:20260521T000718
CREATED:20200425T025505Z
LAST-MODIFIED:20200425T025505Z
UID:1294-1586862000-1586865600@isg.ics.uci.edu
SUMMARY:Babak Salimi: Causal Inference for Responsible Data Science
DESCRIPTION:ABSTRACT: \nScaling and democratizing access to big data promises to provide meaningful\, actionable information that supports decision-making. Today\, data-driven decisions profoundly affect the course of our lives\, such as whether to admit applicants to a particular school\, offer them a job\, or grant them a mortgage. Unfair\, inconsistent\, or faulty decision-making raises serious concerns about ethics and responsibility. For example\, we may know that our training data is biased\, but how do we avoid propagating discrimination when we use this data?  How do we avoid incorrect\, spurious and non-reproducible findings? How can we curate and expose existing data to make it “safe” for informed decision making? \nIn this talk\,I describe how we can combine techniques from causal inference and data management to develop systems and algorithms that help answer some of these questions.  Many existing popular notions of fairness in ML fail to distinguish between discriminatory\, non-discriminatory and spurious correlations between sensitive attributes and outcomes of learning algorithms. I present a new notion of fairness that subsumes and improves upon previous definitions and correctly distinguishes between fairness violations and non-violations. Further\, I describe an approach to removing discrimination by repairing training data in order to remove the effects of any inappropriate and/or discriminatory causal relationships between a protected attribute and classifier predictions.  Analytical SQL queries supported by mainstream business intelligence and analytics environments can lead to perplexing observations and incorrect business decisions. I describe a system that automatically rewrites analytical SQL queries into complex causal queries that support decision-making. \nBio \nBabak Salimi is a postdoctoral research associate in Computer Science & Engineering at the University of Washington\, Seattle\, where he works with Prof. Dan Suciu and the Database Group. He received his Ph.D. from the School of Computer Science at Carleton University in Ottawa\, Canada and his M.Sc. in Computation Theory (2009) and B.Sc. in Computer Engineering (2006) from Sharif University of Technology and Azad University of Mashhad\, respectively. Salimi’s research interests span data management\, causal inference\, decision-making systems\, algorithmic fairness and responsible data science. \n\nZoom Link: https://uci.zoom.us/j/232157494
URL:https://isg.ics.uci.edu/event/babak-salimi-causal-inference-for-responsible-data-science/
LOCATION:https://uci.zoom.us/j/232157494
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20200228T150000
DTEND;TZID=America/Los_Angeles:20200228T160000
DTSTAMP:20260521T000718
CREATED:20200219T174631Z
LAST-MODIFIED:20210409T181320Z
UID:1277-1582902000-1582905600@isg.ics.uci.edu
SUMMARY:Pat Helland (Salesforce.com) : Depending on Appending
DESCRIPTION:Speaker:  Pat Helland (Salesforce.com) \nAbstract: \n    Increasingly\, we see “Gray Failures” in the datacenter and public cloud.  This happens when a server\, router\, or other device just plain goes slow.  This may result in severe problems in the user perceived performance as the slowness cascades\, sometimes not slow enough to cause the exclusion of the bad devices. \n    In this talk\, we briefly examine Gray Failures and consider the use of  “append” to support our work.  How has append been used in distributed systems?  What are some novel ways that new cloud-centric databases use forms of append to partially address these issues?  We conclude that it seems likely we can get relief by “Depending on Appending”. \nBio: \n    Pat Helland has been working building databases\, transaction systems\, and distributed systems since 1978\, a couple of years after dropping out of UC Irvine to feed his new family. \n    He is keenly interested in strong consistency\, eventual consistency\, scale\, and making complex systems easy for application developers.  Pat writes regularly for ACM Queue and Communications of the ACM.
URL:https://isg.ics.uci.edu/event/depending-on-appending/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20200225T110000
DTEND;TZID=America/Los_Angeles:20200225T120000
DTSTAMP:20260521T000718
CREATED:20200219T174827Z
LAST-MODIFIED:20210409T181309Z
UID:1279-1582628400-1582632000@isg.ics.uci.edu
SUMMARY:Dr. Alfred Spector(Two Sigma) : Opportunities and Perils of Data Science: A Roadmap (ICS Distinguished Lecture)
DESCRIPTION:Speaker: Dr. Alfred Spector \nAbstract: Data-driven approaches have led to powerful prediction\, optimization and automation techniques.  Powered by large-scale\, networked computer systems and machine learning algorithms\, these have been very impactful to-date and hold great promise in many disciplines\, even in the humanities and social sciences. However\, no new technology arrives without complications\, and we have recently seen the press and various political circles illustrating real\, potential\, and fictional implications of Big Data. \nThis presentation aims to balance the opportunities provided by Big Data and its associated artificial intelligence techniques with a discussion of the various challenges that have ensued. I review eleven types of challenges\, including those which are technical (resilience and complexity)\, societal (difficulties in setting objective functions or understanding causation)\, and humanist (issues relating to free will or privacy). I provide example problems and suggest ways to address some of the unanticipated consequences of Big Data. \nBio: Dr. Alfred Spector is Chief Technology Officer at Two Sigma\, a firm dedicated to using information to undertake many forms of economic optimization. His career has led him from innovation in large scale\, networked computing systems (as a professor at CMU and founder of his company\, Transarc) to broad research leadership: five years leading IBM Software Research and eight years leading Google Research. Recently\, Spector has lectured widely on the growing importance of computer science across all disciplines (CS+X) and on the Societal Implications of Data Science. He  received an AB in Applied Mathematics from Harvard and a Ph.D. in Computer Science from Stanford\, where he was a Hertz Fellow. He is a Fellow of the ACM and IEEE\, and a member of the National Academy of Engineering and the American Academy of Arts and Sciences. Dr. Spector won the 2001 IEEE Kanai Award for Distributed Computing\, was co-awarded the 2016 ACM Software Systems Award\, and was a 2018-19 Phi Beta Kappa Scholar.
URL:https://isg.ics.uci.edu/event/opportunities-and-perils-of-data-science-a-roadmap-ics-distinguished-lecture/
LOCATION:DBH 6011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20200221T110000
DTEND;TZID=America/Los_Angeles:20200221T120000
DTSTAMP:20260521T000718
CREATED:20200219T175125Z
LAST-MODIFIED:20200219T175125Z
UID:1281-1582282800-1582286400@isg.ics.uci.edu
SUMMARY:Systems and ML at RISELab (CS Distinguished Seminar Series)
DESCRIPTION:Speaker: Prof. Ion Stoica (University of California at Berkeley) \nAbstract: In this talk\, I will present several of the projects we are developing at RISELab\, a two-year old lab at UC Berkeley that focuses on building platforms and algorithms for real-time intelligent decisions\, decisions that are secure and explainable. These projects include both systems to better support machine learning (ML) workloads\, and leveraging ML to build better systems. In the first category\, I will present\, Ray\, a general-purpose distributed system which provides both task-parallel and actor abstractions. Ray is highly scalable employing an in-memory storage system and a distributed scheduler. Ray already supports several popular libraries\, including a reinforcement learning library (RLlib) and a hyperparameter search library (Tune)\, and it is deployed in production at tens of organizations. In the second category\, I will present Autopandas\, a system that synthesizes snippets of API calls from input-output examples for Pandas\, the most popular data science library today\, and NeuroCuts\, a tool to generate decision trees that implement network packet classifiers. \nBio: Ion Stoica is a Professor in the EECS Department at the University of California at Berkeley\, and the Director of RISELab (https://rise.cs.berkeley.edu/). He is currently doing research on cloud computing and AI systems. Past work includes Apache Spark\, Apache Mesos\, Tachyon\, Chord DHT\, and Dynamic Packet State (DPS). He is an ACM Fellow and has received numerous awards\, including the Mark Weiser Award (2019)\, SIGOPS Hall of Fame Award (2015)\, the SIGCOMM Test of Time Award (2011)\, and the ACM doctoral dissertation award (2001). He also co-founded three companies\, Anyscale (2019)\, Databricks (2013) and Conviva (2006).
URL:https://isg.ics.uci.edu/event/systems-and-ml-at-riselab-cs-distinguished-seminar-series/
LOCATION:DBH 6011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20200214T150000
DTEND;TZID=America/Los_Angeles:20200214T160000
DTSTAMP:20260521T000718
CREATED:20200201T005425Z
LAST-MODIFIED:20200214T225332Z
UID:1274-1581692400-1581696000@isg.ics.uci.edu
SUMMARY:Effective Filters and Linear Time Verification for Tree Similarity Joins
DESCRIPTION:Speaker:\nThomas Hütter (University of Salzburg) \nAbstract:\nThe tree similarity join computes all similar pairs in a collection of trees. Two trees are similar if their edit distance falls within a user-defined threshold. Previous algorithms\, which are based on a filter-verify approach\, suffer from the following two issues. First\, ineffective filters produce a large number of candidates that must be further verified. Second\, the candidates are verified by computing the tree edit distance\, which is cubic in the number of tree nodes. Thus\, these techniques fail to scale to large tree collections and are infeasible even for small collections when the trees are large. In this talk\, a scalable solution for the tree similarity join\, called TJoin\, is presented that is based on (1) an effective indexing technique that leverages both the labels and the structure of trees to reduce the number of candidates\, (2) an efficient upper bound filter that moves many of the candidates directly to the join result without additional verification\, and (3) a linear time verification technique for the remaining candidates that avoids the expensive tree edit distance computation. Unlike previous solutions\, TJoin scales to collections with millions of large trees and improves the overall join time by up to two orders of magnitude w.r.t. the state of the art. \nShort-Bio:\nThomas Hütter received his bachelor’s and master’s degrees in Computer Sciences from the University of Salzburg\, Austria\, in 2014 and 2017\, respectively. During his master’s degree\, he spent a semester abroad as an exchange student at École Polytechnique Fédérale de Lausanne (EPFL)\, Switzerland. His research career started in 2013 when he became a member of the Computational Systems Group supervised by Prof. Christoph Kirsch at the University of Salzburg. His work focused on memory management optimizations which resulted in two projects in collaboration with Google Munich.\nCurrently\, he is pursuing his Ph.D. degree in the Database Research Group supervised by Prof. Nikolaus Augsten in Salzburg. His research interests include efficient algorithms for similarity queries on complex data structures. In 2019\, Thomas Hütter won the “Young Investigators Award” for his work on tree similarity joins (published at ICDE 2019) and was awarded with the “Austrian Marshallplan Scholarship”.
URL:https://isg.ics.uci.edu/event/effective-filters-and-linear-time-verification-for-tree-similarity-joins/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20200207T150000
DTEND;TZID=America/Los_Angeles:20200207T160000
DTSTAMP:20260521T000718
CREATED:20200130T224250Z
LAST-MODIFIED:20200130T224250Z
UID:1272-1581087600-1581091200@isg.ics.uci.edu
SUMMARY:Event Detection with Temporal Predicates
DESCRIPTION:Speaker: Fabio Persia (Free University of Bozen-Bolzano\, Italy) \nAbstract: Human perception tends to group individual values into larger structures\, this is also the case for time series data. This tendency inspired us to define an event-detection language based on time intervals\, which combines timepoint-based events into larger structures. Complex events can then be defined on a more abstract level by specifying temporal relationships between different time intervals. As a result\, we propose a system based on an extension of relational algebra (ISEQL\, Interval-based Surveillance Event Query Language)\, enriched with powerful temporal operators and additional constraints\, which is able to: (i) extend Allen’s interval relationships; (ii) support overlap percentage and cardinality constraints over the interval relationships; (iii) provide robustness with respect to small variations in the intervals; (iv) be more efficient than state-of-the-art approaches. With the help of intuitive and interactive graphical user interfaces\, a user can also have a look at the different layers of our system to gain insights into the inner workings of the system\, as well as create new events on the fly and track their processing through the system. \nBio: Fabio Persia is currently (since 2015) a Researcher/Assistant Professor (RTD/A) at the Faculty of Computer Science of the Free University of Bozen-Bolzano\, Italy. He received the master’s and PhD degrees in Computer Engineering from the University of Naples “Federico II”\, Italy\, in 2009 and 2013\, respectively\, under the supervision of Professors Antonio Picariello and Vincenzo Moscato. During his PhD\, he was visiting scholar at the University of Maryland\, College Park (USA) for six months\, under the supervision of Prof. V.S. Subrahmanian. He was also Postdoctoral Research Fellow at the Department of Electrical Engineering and Information Technology of the University of Naples “Federico II” from 2013 to 2015. His research interests include the fields of event detection and analysis in large scale databases applied to multimedia\, semantic analysis and security\, as demonstrated by several papers published in top journals (such as TKDE and TOIT) and conferences (IJCAI\, EDBT\, CIKM).
URL:https://isg.ics.uci.edu/event/event-detection-with-temporal-predicates/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20200131T150000
DTEND;TZID=America/Los_Angeles:20200131T160000
DTSTAMP:20260521T000718
CREATED:20200113T213640Z
LAST-MODIFIED:20200121T095512Z
UID:1260-1580482800-1580486400@isg.ics.uci.edu
SUMMARY:Building Personal Chronicle of Life Events (Final Defense)
DESCRIPTION:Speaker: Jordan Oh \nAbstract:\nHuman beings have always been interested in understanding themselves and their surroundings. Learning about the relationship between the two can reveal facts of the present and help predict the future\, a critical part to live a better life. With the proliferation of IoT sensor devices\, it is now possible to collect quality data for each individual and utilize this data for building personal models that can help to understand the self and environment. However\, since this sensor data have different granularities and semantics\, the semantic gap becomes even more formidable. Thus\, there are challenges in aggregating\, integrating\, and synchronizing this heterogeneous data to a form such that it effectively describes the life experiences of each individual. In this dissertation\, we design a personal chronicle\, which contributes a solution to the aforementioned challenges\, called Personicle\, in which all kinds of personal data streams can be correlated with one another to form a model of a person.  \nTo implement the Personicle\, we first attempt to bridge the semantic gap between the low-level multimedia logs and high-level semantics by developing a common daily event model through the data unobtrusively obtained from smart devices. To do this\, we define an atomic interval\, which brings together the scattered sources of heterogeneous data to partition the data into manageable pieces. This atomic interval lets us segment a day into sequences of similar patterns and use the segments for daily event recognition.  \nSecondly\, we design an event-triggered Ecological Momentary Assessment (EMA) to maximize the chance of aggregating the semantic data from the users. Unlike the traditional EMA process\, which mainly depends on user initiative and intervention\, we contribute to overcoming the problems endemic to persistent data collection\, such as missing a logging moment or early abandonment\, by initiating the EMA process from the system side at the right moment.  \nLastly\, we propose a fully-automated approach to obtain latent semantic information from all the integrated data aiming to maximize the opportunity of both qualitatively and quantitatively capturing one’s life experiences. To show a concrete example of this enrichment\, we perform an experiment with “Eating” and “Working”\, a complex event central to human experiences. These enhanced daily events can then be used to create a personal model that could capture how a person reacts to different stimuli under specific conditions.   \nBio:\nHyungik Oh is a Ph.D. Candidate at UC Irvine. His research interests are lifelogging\, activity/event recognition and applied machine learning. He is currently working on pioneering research called Personicle in which he builds a personal chronicle of life events by aggregating\, integrating\, and synchronizing heterogeneous sensor data streams.
URL:https://isg.ics.uci.edu/event/final-defense-jordan-oh/
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20200130T140000
DTEND;TZID=America/Los_Angeles:20200130T180000
DTSTAMP:20260521T000718
CREATED:20200130T181101Z
LAST-MODIFIED:20200130T181101Z
UID:1270-1580392800-1580407200@isg.ics.uci.edu
SUMMARY:A Theoretical View of Distributed Systems (CS Distinguished Seminar Series)
DESCRIPTION:Speaker: Prof. Nancy Lynch (Massachusetts Institute of Technology) \nAbstract: For several decades\, my collaborators\, students\, and I have worked on theory for distributed systems\, in order to understand their capabilities and limitations in a rigorous\, mathematical way. This work has produced many different kinds of results\, including: \n    Abstract models for problems that are solved by distributed systems\, and for the algorithms used to\n    solve them\,\n    Rigorous proofs of algorithm correctness and performance properties (also some error discoveries)\,\n    Impossibility results and lower bounds\, expressing inherent limitations of distributed systems\,\n    Some new algorithms\, and\n    General mathematical foundations for modeling and analyzing distributed systems. \nThese various results have spanned many different kinds of systems\, ranging from distributed data-management systems\, to communication systems\, to biological systems such as insect colonies and brains. \nIn this talk\, I will overview some highlights of our work over many years on theory for distributed systems. I will break this down in terms of three intertwined “research threads”: algorithms for traditional distributed systems\, impossibility results\, and mathematical foundations. At the end\, I will say something about our recent work on algorithms for new kinds of distributed systems. \nBio: Nancy Lynch is the NEC Professor of Software Science and Engineering in MIT’s EECS department. She heads the Theory of Distributed Systems research group in the Computer Science and AI Laboratory. She received her PhD from MIT and her BS from Brooklyn College\, both in Mathematics. \nLynch has (co-)written many research articles about distributed algorithms and impossibility results\, and about formal modeling and verification of distributed systems. Her best-known contributions are the “FLP’’ impossibility result for reaching consensus in asynchronous distributed systems in the presence of failures\, with Fischer and Paterson\, and a paper with Dwork and Stockmeyer on algorithms for reaching consensus under restricted failure assumptions. Other contributions include the I/O automata system modeling frameworks\, with Tuttle\, Kaynar\, Segala\, and Vaandrager. \nLynch is the author of the textbook “Distributed Algorithms” and co-author of “The Theory of Timed I/O Automata” and “Atomic Transactions”. She is an ACM Fellow\, a member of the National Academy of Engineering and the National Academy of Sciences\, and a Fellow of the American Academy of Arts and Sciences. She has been awarded the Dijkstra Prize (twice)\, the van Wijngaarden Award\, the Knuth Prize\, the Piore Award\, the Athena Award\, and the IEEE Technical Committee on Distributed Processing Outstanding Achievement award. She has supervised approximately 100 PhD students\, Masters students\, and postdocs. \nLynch is interested in all aspects of distributed computing theory\, including modeling\, algorithm design and analysis\, lower bounds\, and applications. She is especially interested in algorithms for “difficult” platforms\, which are subject to noise\, failures\, and changes. Recently\, her work has focused on wireless network algorithms\, biological distributed algorithms\, and the connections between them.
URL:https://isg.ics.uci.edu/event/a-theoretical-view-of-distributed-systems-cs-distinguished-seminar-series/
LOCATION:DBH 6011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20200124T150000
DTEND;TZID=America/Los_Angeles:20200124T160000
DTSTAMP:20260521T000718
CREATED:20200121T095342Z
LAST-MODIFIED:20200124T182021Z
UID:1265-1579878000-1579881600@isg.ics.uci.edu
SUMMARY:LSM-based storage techniques: a tutorial
DESCRIPTION:Speaker: Chen Luo \nAbstract: Recently\, the log-structured merge-tree (LSM-tree) has been widely adopted for use in the storage layer of modern NoSQL systems. Because of this\, there have been a large number of research efforts\, from both the database community and the operating systems community\, that try to improve various aspects of LSM-trees. In this tutorial\, I will describe the basics of LSM-tree storage techniques as well as exploring their design space and performance trade-offs. If time permits\, I will also survey some representative work on optimizing LSM-trees.
URL:https://isg.ics.uci.edu/event/lsm-based-storage-techniques-a-tutorial/
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20200110T110000
DTEND;TZID=America/Los_Angeles:20200110T120000
DTSTAMP:20260521T000718
CREATED:20191122T030338Z
LAST-MODIFIED:20200108T194010Z
UID:1232-1578654000-1578657600@isg.ics.uci.edu
SUMMARY:Scalable Programming: Progress\, Prospects and Challenges (CS/NetSys Seminar)
DESCRIPTION:Speaker:  Prof. Gul Agha (University of Illinois at Urbana-Champaign) \nAbstract:\nMobile cloud computing\, social media\, cyberphysical systems\, and the internet of things\, are examples of increasingly important applications requiring scalable concurrency.  The Actor model facilitates programming large-scale concurrent applications.  Not surprisingly\, Actor languages and frameworks have been widely adopted in industry to address scalability.  Although this has significantly reduced programming errors\, developing complex concurrent systems and reasoning about their properties can nevertheless be challenging and error-prone.  A key source of complexity is the interactions between actors.  I will describe our work in programming languages defining new constructs to address this complexity.  Specifically\, we have developed ways of expressing multiparty session types and synchronization constraints which capture interactions.  I will then briefly discuss three promising techniques to formally reason about Actor systems.  First\, inferring the concurrency structure of an actor program can facilitate targeted test generation.  Second\, predictive runtime verification can flag safety violations in future potential executions.  Finally\, statistical methods can improve confidence that a system obeys certain probabilistic properties.  I will conclude by discussing open problems and promising research directions. \nBio:\nDr. Gul Agha is Professor Emeritus of Computer Science at the University of Illinois at Urbana-Champaign and Interim CEO of Embedor Technologies.  Dr. Agha is a Fellow of the ACM and of the IEEE\, and a recipient the 2019 ACM Sigsoft Impact Paper Award.  He has served as Editor-in-Chief of IEEE Concurrency: Parallel\, Distributed and Mobile Computing\, and of ACM Computing Surveys.  Dr. Agha is best known for his formalization of the Actor model\, used in languages and frameworks such as Erlang\, Scala/Akka\, and Orleans\, and to develop scalable applications such as Twitter\, LinkedIn\, and Facebook Chat.  Dr. Agha’s other contributions include development of Statistical Model Checking\, which has applications in biological systems and cyberphysical systems among others; Concolic Testing for programs with dynamic memory and concurrency\, a method incorporated in industrial software testing tools such as KLEE\, Microsoft SAGE\, and S2E; Euclidean model checking for reasoning about the evolution of probability distributions and for synthesizing controllers; the application of computational learning to program verification; logical methods for automated decentralized\, predictive runtime verification; and distributed algorithms for wireless sensor networks(WSNs).  Dr. Agha co-founded Embedor Technologies which is applying WSNs to continually monitor the structural health of bridges\, buildings and large machinery.  For example\, Embedor’s technology was used to monitor the world largest Ferris wheel during construction\, and will be used to continuously monitor it during operation.
URL:https://isg.ics.uci.edu/event/personalization-of-pervasive-autonomy-applications-and-system-support/
LOCATION:DBH 6011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20191213T150000
DTEND;TZID=America/Los_Angeles:20191213T160000
DTSTAMP:20260521T000718
CREATED:20191202T202712Z
LAST-MODIFIED:20191204T182807Z
UID:1242-1576249200-1576252800@isg.ics.uci.edu
SUMMARY:Scalable transaction and polystore data management in LeanXcale
DESCRIPTION:Speaker: Ricardo Jimenez-Péris (LeanXcale\, Spain)\, Patrick Valduriez (Inria\, France) \nAbstract:\nHybrid Transaction Analytical Processing (HTAP) is poised to revolutionize data management. By providing online analytics over operational data\, HTAP systems open up new opportunities in many application domains where real-time decision is critical. Important use cases are proximity marketing\, real-time pricing\, risk monitoring\, real-time fraud detection\, etc. HTAP also simplifies data management\, by removing the traditional separation between operational database and data warehouse/ data lake (no more ETLs!). However\, a hard problem is scaling out transactions in mixed operational and analytical workloads over big data\, possibly coming from different data stores (HDFS\, SQL\, NoSQL\, …).Today\, only a few NewSQL systems have solved this problem. In this talk\, we introduce the solution for scalable transaction and polystore data management in LeanXcale\, a new HTAP DBMS. \nBio:\nDr. Ricardo Jimenez-Peris was a professor at TU Madrid (UPM) for over 25 years. Four years ago he abandoned his scientific career to incorporate LeanXcale and became an entrepreneur. As a researcher he co-authored over 100 scientific articles and papers. He is co-author of the book of “Replicated databases”. He is also co-inventor of two patents granted in US and Europe being exploited by LeanXcale. Already at the University he transferred technology to industry\, an in memory database to Ericsson\, a federated database to Teléfonica\, a replication system for J2EE to Bull\, and an orchestration engine for web services and its visual composer to Bull. He has been invited speaker at top tech companies to present LeanXcale technology such as Facebook\, Twitter\, Salesforce\, Heroku\, Cloudera\, Hortonworks\, Mapr\, MongoDB\, Greenplum\, Microsoft\, IBM\, HP\, etc. \nPatrick Valduriez is a senior researcher at Inria\, working on distributed data management. He has authored over 300 technical papers and several textbooks\, among which one is “Principles of Distributed Database Systems”. He has been associate editor of major journals such as VLDBJ and DAPD and has served as PC chair or general chair of major conferences such as EDBT\, SIGMOD and VLDB. He received the best paper award at VLDB 2000. He was the recipient of the 1993 IBM scientific prize in Computer Science in France and the 2014 Innovation Award from Inria – French Academy of Science. He is an ACM Fellow.
URL:https://isg.ics.uci.edu/event/scalable-transaction-and-polystore-data-management-in-leanxcale/
LOCATION:DBH 5011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20191206T123000
DTEND;TZID=America/Los_Angeles:20191206T140000
DTSTAMP:20260521T000718
CREATED:20191113T025449Z
LAST-MODIFIED:20191122T162630Z
UID:1217-1575635400-1575640800@isg.ics.uci.edu
SUMMARY:AquaEIS: Middleware Support for Event Identification in CommunityWater Infrastructures
DESCRIPTION:Speaker: Quing Han \n  \nAbstract: Real-time event identification is critical in complex distributed infrastructures\, e.g.\, water systems\, where failures are difficult to isolate. We present AquaEIS\, an event-based middleware tailored to the problem of locating sources of failure (e.g.\, contamination) in community water infrastructures. The inherent complexity of underground hydraulic systems combined with aging infrastructure presents unique challenges. AquaEIS combines online learning techniques\, model-driven simulators and data from limited sensing networks to intelligently guide human participants (e.g.\, staff) in identifying contaminant sources. The framework integrates the necessary abstractions with event processing methods into a workflow that iteratively selects and refines the set of potential failure points for human-driven grab sampling. The integrated platform utilizes Hidden Markov Model (HMM) based representations along with field reports for event inference; reinforcement learning (RL) methods have also shown promise for further refining event locations and reducing the cost of human engagement. Our approach is evaluated in real-world water systems under a range of distinct events. The results show that AquaEIS can significantly reduce the number of sampling cycles while ensuring localization accuracy (detected 100% of the failure events as compared to a baseline that can only identify 38% of the events).
URL:https://isg.ics.uci.edu/event/aquaeis-middleware-support-for-event-identification-in-communitywater-infrastructures/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20191122T130000
DTEND;TZID=America/Los_Angeles:20191122T140000
DTSTAMP:20260521T000718
CREATED:20191115T192950Z
LAST-MODIFIED:20191122T203428Z
UID:1223-1574427600-1574431200@isg.ics.uci.edu
SUMMARY:Texera: Supporting Big Data Analytics for Domain Experts through GUI-based workflows
DESCRIPTION:Speakers:  Avinash Kumar\,  Shengquan Ni\, Zuozhi Wang \n  \nAbstract: Big data analytics is a daunting task for domain experts such as doctors and teachers. Their non-IT background makes it challenging for them to write analytics code and maintain computing infrastructures to efficiently process large amounts of data. Existing data analytics frameworks that offer GUI-based alternatives are mostly limited to a single machine. In light of these facts\, we are developing Texera\, a scalable data-processing system that supports interaction and debugging and is easy to use by non-IT people. One of the challenges of the system is supporting real-time interactions without compromising on scalability. We discuss how we solve the problem in Amber\, the backend engine of Texera. Amber\, built using the actor model\, is a cluster-based distributed engine that can process workflows while supporting interactions. We will show a demonstration of using Texera to analyze social media data.
URL:https://isg.ics.uci.edu/event/amber-a-debuggable-dataflow-system-based-on-theactor-model/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20191115T123000
DTEND;TZID=America/Los_Angeles:20191115T140000
DTSTAMP:20260521T000718
CREATED:20191115T192124Z
LAST-MODIFIED:20191115T192124Z
UID:1219-1573821000-1573826400@isg.ics.uci.edu
SUMMARY:Multistage Adaptive Load Balancing in Big Active Data Publish Subscribe Systems
DESCRIPTION:Speaker: Hang\nTime: 12:30pm\nRoom: 3011 \nWe address issues in the design and operation of a Big Active Data Publish Subscribe (BAD Pub/Sub) systems to enable the next generation of enriched notification systems that can scale to societal levels. The proposed BAD Pub/Sub systems aim to ingest massive amounts of data from heterogeneous publishers and sources and deliver customized\, enriched notifications to end users that express interests in these data items via parameterized channels. To support scalability\, we employ a hierarchical architecture that combines a back-end big data cluster (to receive publications and data feeds\, store data and process subscriptions) with a client-facing distributed broker network that manages user subscriptions and scales the delivery process. The skewed distribution of subscribers\, their interests and the dynamic nature of societal scale publications\, create load imbalance in the distributed broker network. We propose a staged approach for broker load balancing that executes in multiple stages — initial placement of brokers to subscribers\, dynamic subscriber migration during operation to handle transient and instantaneous loads and occasional shuffles to re-stabilize the system.  
URL:https://isg.ics.uci.edu/event/multistage-adaptive-load-balancing-in-big-active-data-publish-subscribe-systems/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20191108T123000
DTEND;TZID=America/Los_Angeles:20191108T140000
DTSTAMP:20260521T000718
CREATED:20191103T212621Z
LAST-MODIFIED:20191103T212621Z
UID:1215-1573216200-1573221600@isg.ics.uci.edu
SUMMARY:Gift Sinthong: AsterixDB Meets Data Science
DESCRIPTION:Abstract:\nIn the last few years\, Data Science has become an increasingly important use case for data platforms. To support the full Big Data analysis lifecycle\, we have examined one of the most popular exploratory data analytics tools\, Pandas\, which has a serious problem: scalability. Exploratory tools such as Pandas only work well against locally stored data that fits in the memory of a single machine. Our plan is to integrate a Pandas-like user experience with AsterixDB to provide analysts with a familiar working environment while scaling out the evaluation of the analytical operations over a large data cluster to enable Big Data analysis. The two main components that we use to enable such a workflow are the AsterixDB UDF framework and our new Python data analytics library (“AFrame”) that operates against AsterixDB. AFrame allows users to interact with a very large volume of semi-structured data in the same way that Pandas DataFrames work against locally stored tabular data. Influenced by Spark SQL and Spark DataFrames\, our AFrame prototype leverages lazy evaluation and only performs operations once an action is invoked. AFrame operations are incrementally translated into AsterixDB SQL++ queries that are executed only when final results are called for.  In this talk\, we will demonstrate our approach using a restaurant review analytics use case.
URL:https://isg.ics.uci.edu/event/gift-sinthong-asterixdb-meets-data-science/
LOCATION:DBH 4011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20190809T150000
DTEND;TZID=America/Los_Angeles:20190809T160000
DTSTAMP:20260521T000718
CREATED:20190624T221120Z
LAST-MODIFIED:20190809T003159Z
UID:1187-1565362800-1565366400@isg.ics.uci.edu
SUMMARY:Michal Shmueli-Scheuer : Conversational bots for customer support
DESCRIPTION:Conversational bots for customer support\nMichal Shmueli-Scheuer\, IBM Research – Haifa\n\nAugust 9\, 2019\, Friday\, 3:00 pm – 4:00 pm\, DBH 4011 \n\nAbstract:\nIn this talk\, I’ll cover various aspects of conversational bots\, focusing on the domain of customer support. Often\, human conversations with bots mimic the way humans interact with each other. Moreover\, even when customers know that they are interacting with virtual agents (bots)\, they still expect them to behave like humans. One way to improve interactions with bots is by giving them some human characteristics \,such as emotion and personality. I’ll show how a model of neural response generation can be used to generate bot responses according to a target personality. I’ll then cover a methodology for detecting egregious conversations in a setting using conversational bots by examining behavioral cues from the customer\, patterns in the agents’ responses\, and customer-agent interactions. \nBio:\nMichal Shmueli-Scheuer is a senior researcher in the Language and Retrieval research group (AI Language department) in IBM Research – Haifa\, with over 13 years of industry experience. She holds a Ph.D (2009) degree in Information and Computer Science from the University of California\, Irvine\, USA. Her area of expertise is in the fields of conversational bots\, affective computing\, user modeling\, large scale analytics\, database\, and information systems\, focusing on user behavior analytics and information management on the web. Her current work focuses on computational modeling for users and conversational agents. She has published more than 30 academic papers in leading conferences\, and journals\, and book chapters. She has served as a PC member and a reviewer of numerous leading conferences and journals.
URL:https://isg.ics.uci.edu/event/michal-shmueli-scheuer-conversational-bots-for-customer-support/
LOCATION:DBH 4011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20190531T150000
DTEND;TZID=America/Los_Angeles:20190531T160000
DTSTAMP:20260521T000718
CREATED:20190415T193502Z
LAST-MODIFIED:20190530T145140Z
UID:1174-1559314800-1559318400@isg.ics.uci.edu
SUMMARY:Pat Helland: There's No Substitute for Interchangeability
DESCRIPTION:Speaker: Pat Helland (Salesforce.com) \nTitle: There’s No Substitute for Interchangeability\nTime: 3-4 PM \nPlace: 3011 DBH \n\n\n\n\n\n\n\n\n\nAbstract:\n\nDistributed systems have many challenges including loosely coupled systems\, long running work\, and distributed workflow.  In addition\,  replication with out-of-order reconciliation is quite difficult\, especially when composed with the other challenges.  In this talk\, we propose data-centric REST-style connectors that allow work to be decoupled in trust\, space\, and time.  These replication connectors support replicated services with eventual consistency.\n\nAs work comes together\, domain specific interchangeability provides the glue and lubrication that empowers work that’s decoupled across boundaries.  We see this in real life with commodities in food and interchangeable parts in manufacturing.  Work domains for the application typically support interchangeable resources like seats on an airplane or king-sided non-smoking rooms.  It is the interchangeability defined by an application domain that allows work to proceed across boundaries.  These boundaries include applications and their trust\, space and time across workflows\, and replicas of any and all of the services comprising the app.\n\nThe ideas presented in this talk are the author’s and are not tied to his work at Salesforce.\n\n\n——————–\n\nBio: \n\n\nPat Helland has been implementing databases\, transaction systems\, multiprocessors\, messaging systems\, app platforms\, and storage systems since 1978.  He studied computer science at UC Irvine from 1973 to 1976.  He currently works at Salesforce.  For recreation\, Pat likes to write papers about random stuff related to distributed computing and data for ACM Queue and Communications of the ACM.
URL:https://isg.ics.uci.edu/event/pat-helland-theres-no-substitute-for-interchangeability/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20190321T140000
DTEND;TZID=America/Los_Angeles:20190321T150000
DTSTAMP:20260521T000718
CREATED:20190311T060555Z
LAST-MODIFIED:20190311T060555Z
UID:1144-1553176800-1553180400@isg.ics.uci.edu
SUMMARY:Fatemeh Nargesian: Data Enrichment for Data Science
DESCRIPTION:Data Enrichment for Data Science\nFatemeh Nargesian\, University of Toronto\n\nMarch 21\, Thursday\, 2019\n\n2:00 – 3 pm\, DBH 3011\nRefreshments start at 1:30 pm\n\n\n\n\nData Science is built on the power of data processing and data preparation. In this talk\, I discuss the challenges of data preparation for end-to-end data science. Particularly\, I talk about data enrichment via discovery where the goal is to discover and integrate the right data to solve a given data science problem. I introduce two paradigms of data discovery. In the first paradigm\, the query is a dataset and a data scientist is interested in interactively finding datasets that can be integrated (e.g unioned) with the query. I introduce a probabilistic framework for searching for top-k unionable tables and discuss the need for distribution-aware techniques for data discovery. In the second paradigm\, search does not start with a query\, instead\, it is data-driven. I will talk about data lake organization problem where the goal is to build a directory structure that enables users to most efficiently navigate data lakes. I will present a navigation model of how users interact with a directory structure and introduce a scalable local search algorithm for optimizing data lake organizations.\n\n————————————–\n\nBio:\nFatemeh Nargesian is a PhD candidate in the Data Curation Group of the Department of Computer Science at University of Toronto. Her primary research interests are in the data management challenges of end-to-end data science. A paper she co-authored on data discovery was accorded the Best Demonstration Award at VLDB 2017. While at University of Toronto\, Fatemeh was a joint Research intern at IBM Research-NY. Prior to University of Toronto\, she worked on clinical data management at the Clinical Informatics Research Group at McGill University\, and received M.Sc. degrees in Computer Science at University of Ottawa and Artificial Intelligence at Sharif University of Technology.
URL:https://isg.ics.uci.edu/event/fatemeh-nargesian-data-enrichment-for-data-science/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20190318T110000
DTEND;TZID=America/Los_Angeles:20190318T120000
DTSTAMP:20260521T000718
CREATED:20190311T174005Z
LAST-MODIFIED:20190311T174005Z
UID:1146-1552906800-1552910400@isg.ics.uci.edu
SUMMARY:Xiangyao Yu: Transaction Processing at Scale
DESCRIPTION:Abstract: \nOnline transaction processing (OLTP) is critical for applications including finance\, e-commerce\, social networks\, and healthcare. The increasing performance demands of these applications require OLTP to scale massively. Concurrency control is a major scalability bottleneck in such systems. \nThis talk presents three projects that identify and help resolve scalability challenges. First\, I present a scalability study of concurrency control on a simulated 1000-core processor and show the bottlenecks that constrain the scaling of classic algorithms. Then\, I present a new protocol called TicToc that removes the bottleneck of central timestamp allocation on multicore processors. The key technique is data-driven timestamp management that dynamically calculates each transaction’s timestamp based on its data access pattern. Finally\, I present Sundial\, a distributed concurrency control scheme that mitigates the bottleneck of long network latency through a lightweight caching protocol. The talk ends with a vision of transaction processing in the era of cloud computing and internet of things. \nBio: \nXiangyao Yu is a postdoctoral associate at Computer Science and Artificial Intelligence Lab (CSAIL) at Massachusetts Institute of Technology (MIT) under the supervision of Prof. Michael Stonebraker. He acquired his Ph.D. at MIT in 2017 and B.S. at Tsinghua University in 2012. His research interest centers on databases with additional expertise in computer architecture. He won the best Ph.D. thesis award in EECS at MIT and has three best paper awards or nominations. \nhttp://people.csail.mit.edu/yxy \nFor directions/parking information\, please visit: \nhttps://uci.edu/visit/maps.php and http://www.ics.uci.edu/about/visit/index.php
URL:https://isg.ics.uci.edu/event/xiangyao-yu-transaction-processing-at-scale/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20190208T150000
DTEND;TZID=America/Los_Angeles:20190208T160000
DTSTAMP:20260521T000718
CREATED:20190125T030748Z
LAST-MODIFIED:20190125T210119Z
UID:1054-1549638000-1549641600@isg.ics.uci.edu
SUMMARY:Prof. Sang-Woo Jun: Lowering the cost of large-scale data analytics via efficient use of flash storage
DESCRIPTION:In this talk\, I present the storage systems aspect of the ongoing work on using relatively cheap solid-state secondary storage to replace expensive DRAM for analytics on large amounts of data\, using as examples graph analytics and the bioinformatics application somatic mutation finding.\nBoth applications are inherently random access intensive\, which is a bad fit with the coarse access granularity of secondary storage.\nIn both cases\, performance impact of random access is addressed in two ways: reorganizing the computation and access pattern to remove random access\, and modifying the internals of the storage device controller to optimize for random accesses.\nIn both cases\, a desktop-class system with secondary storage was able to match the performance of a system with an order of magnitude higher cost.\n\n\n\nBio:\n\nSang-Woo Jun is a faculty member at Donald Bren School of Information and Computer Sciences at the University of California\, Irvine. His research focus is on computer architecture and system design for lowering the cost of handling large amounts of data. Before joining UCI\, Sang-Woo did his Ph.D with Professor Arvind at the Massachusetts Institute of Technology\, and his B.S. at Seoul National University in Korea. His industry experience includes intern at the Big Data Discovery team at Oracle\, and Server/Client developer at Nexon inc.
URL:https://isg.ics.uci.edu/event/prof-sang-woo-jun-lowering-the-cost-of-large-scale-data-analytics-via-efficient-use-of-flash-storage/
LOCATION:DBH 3011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20190201T110000
DTEND;TZID=America/Los_Angeles:20190201T120000
DTSTAMP:20260521T000718
CREATED:20190125T020959Z
LAST-MODIFIED:20190125T211117Z
UID:1045-1549018800-1549022400@isg.ics.uci.edu
SUMMARY:Prof. Jeff Ullman: Data Science:  Is it Real?
DESCRIPTION:ABSTRACT: \nWe shall discuss the various ways in which data science is approached by different communities\, including the Statistics\, Machine-Learning\, and Database communities.  Each presents a different viewpoint and values different outcomes. Some consequences of these approaches will be discussed. As an example\, of why data science is not machine learning\, we shall sketch two important algorithms not in this class: locality-sensitive hashing and approximate counting. \nBIOGRAPHY: \nJeffrey David Ullman is the Stanford W. Ascherman Professor of Engineering (Emeritus) in the Department of Computer Science at Stanford and CEO of Gradiance Corp.  He received the B.S. degree in Engineering Mathematics from Columbia in 1963 and the PhD in Electrical Engineering from Princeton in 1966. Prior to his appointment at Stanford in 1979\, he was a member of the technical staff of Bell Laboratories from 1966-1969\, and on the faculty of Princeton University between 1969 and 1979.  From 1990-1994\, he was chair of the Stanford Computer Science Department. Prof. Ullman was elected to the US National Academy of Engineering in 1989\, the American Academy of Arts and Sciences in 2012\, and has held Guggenheim and Einstein Fellowships. He has received the Sigmod Contributions Award (1996)\, the ACM Karl V. Karlstrom Outstanding Educator Award (1998)\, the Knuth Prize (2000)\, the Sigmod E. F. Codd Innovations award (2006)\, the IEEE von Neumann medal (2010)\, and the NEC C&C Foundation Prize (2017).  He is the author of 16 books\, including books on database systems\, data mining\, compilers\, automata theory\, and algorithms. \nURL:  i.stanford.edu/~ullman
URL:https://isg.ics.uci.edu/event/prof-jeff-ullman-visit/
LOCATION:DBH 6011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20190125T150000
DTEND;TZID=America/Los_Angeles:20190125T160000
DTSTAMP:20260521T000718
CREATED:20181221T022339Z
LAST-MODIFIED:20190131T004734Z
UID:827-1548428400-1548432000@isg.ics.uci.edu
SUMMARY:David Lomet (Microsoft Research): How Data Caching Systems Succeed
DESCRIPTION:Data in traditional “caching” data systems resides on secondary storage\, and is read into main memory only when operated on. This limits system performance. Main memory data stores with data always in main memory are much faster. But this performance comes at a cost. In this paper\, we analyze the costs of both in-memory operations and secondary storage operations where data is not “in cache”. We study the performance impact of cache misses on caching system performance. The analysis considers both execution and storage costs. Based on our analysis\, we derive cost/performance results for a data caching system [Deuteronomy and its Bw-tree] and a main memory system [MassTree] to understand where each demonstrates the best cost per operation\, what is driving the cost differences\, and the scale of the differences. This analysis (1) provides insight into why data caching systems continue to dominate the market; (2) points to higher performance that does not rely on simply increasing main memory cache size; and (3) suggests a path to lower costs and hence better cost/performance. \n  \nSpeaker Bio: \nDavid Lomet founded the Database Group at Microsoft Research Redmond in 1995 and managed it for 20 years. His research career began at IBM where\, while on a 1975-76 sabbatical at the University of Newcastle-on-Tyne\, he invented atomic actions (a form of transactions). He later worked at Wang Institute as a faculty member\, and at Digital Equipment Corporation as a software architect and research staff member. He received a Ph.D. in computer science from the University of Pennsylvania. Lomet’s primary focus has been the engineering of database systems\, with a focus on database system kernels. His work on concurrency control and recovery contributed to making DEC’s Rdb and Microsoft’s SQL Server database management systems leaders in cost/performance. His Deuteronomy research project’s latch-free Bw-tree index and log structure store are key elements in Microsoft’s Hekaton main memory database and Azure Cosmos DB cloud data service. Deuteronomy won the Microsoft Research Redmond “2017 Best Research Project” Award. Lomet is an author of over 120 papers and over 60 patents. Lomet has won IEEE awards as well as the ACM SIGMOD Contributions Award for his 25 year tenure as EIC of the IEEE Data Engineering Bulletin. HE has also served as editor of ACM TODS\, VLDB Journal and others\, and has been a member of the VLDB Board. He has been a PC co-chair for ICDE and VLDB. He is a member of the IEEE Computer Society Board of Governors and society Secretary\, and has been First Vice President and Treasurer. He is a fellow of IEEE\, ACM\, and AAAS\, and a member of the National Academy of Engineering. \n  \nDownload slides for this talk \nWatch the video for this talk
URL:https://isg.ics.uci.edu/event/speaker-david-lomet-microsoft-research-cost-performance-in-modern-data-stores-how-data-caching-systems-succeed/
LOCATION:DBH 4011
ATTACH;FMTTYPE=image/png:https://isg.ics.uci.edu/wp-content/uploads/2018/10/Microsoft-logo_rgb_wht.png
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20181207T150000
DTEND;TZID=America/Los_Angeles:20181207T160000
DTSTAMP:20260521T000718
CREATED:20181222T021251Z
LAST-MODIFIED:20260401T210254Z
UID:838-1544194800-1544198400@isg.ics.uci.edu
SUMMARY:Vinayak Borkar (FireEye Inc.): The X15 Machine Data Management Platform
DESCRIPTION:ABSTRACT: Machine Data (aka Log Data) is continuously produced by applications and devices as a result of human-computer and computer-computer interactions. Although most of this data was initially generated for ad-hoc human consumption to aid with debugging and troubleshooting systems and deployments\, their systematic treatment using well-known data processing techniques can unlock valuable insight about operations within organizations. Log Data may sometimes be the only feasible source of some of this insight. Processing Log Data poses new challenges as compared to traditional data management. Firstly\, Log Data is mostly textual requiring ad-hoc parsing to surface its meaning. Secondly\, Log Data tends to be extremely voluminous and is generated with high velocity posing a unique set of challenges in its acquisition and storage in a cost-effective manner. Lastly\, Log Data requires new query languages and techniques to unlock its full potential. X15 is a scalable purpose-built platform to acquire\, store\, and manage Machine Data. In this talk we will look at X15’s architecture\, some of its design decisions and use cases. \nBIOS: Vinayak Borkar is Vice President of Engineering at FireEye\, Inc. Prior to FireEye’s acquisition of X15\, he was a co-founder and the CTO of X15 Software. Prior to X15\, Vinayak was the CTO and head of engineering at DVO Software and the inventor and lead contributor of Hyracks\, an open-source parallel data processing platform built at UC Irvine. He began his career at BEA Systems\, where he led the development of the first XQuery-based data integration platform. Vinayak was the recipient of the Facebook Fellowship award for his contribution in the area of large-scale data management and holds a BS in Computer Science and Engineering from University of Mumbai and an MS in Computer Science and Engineering from Indian Institute of Technology Bombay\, and a PhD. in Information and Computer Science from the University of California at Irvine.
URL:https://isg.ics.uci.edu/event/vinayak-borkar-fireeye-inc-the-x15-machine-data-management-platform/
LOCATION:DBH 4011
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=America/Los_Angeles:20181116T150000
DTEND;TZID=America/Los_Angeles:20181116T160000
DTSTAMP:20260521T000718
CREATED:20181222T021505Z
LAST-MODIFIED:20181222T021505Z
UID:842-1542380400-1542384000@isg.ics.uci.edu
SUMMARY:Dr. Andrey Balmin and Mayank Pradhan (Workday): Workday Prism Analytics: Unifying Interactive and Batch Data Processing Using Apache Spark
DESCRIPTION:Abstract: Workday Prism Analytics enables data discovery and interactive Business Intelligence analysis for Workday customers. To prepare data for analysis\, business users can setup data transformation pipelines in an interactive\, self-service\, modern data prep environment. Thus\, Workday Prism Analytics needs to run three types of scalable data processing applications: “always on” query engine and data prep applications\, and on-demand batch execution of transformation pipelines. We standardized on Apache Spark and Spark SQL for all three applications\, due to its scalability\, as well as\, flexibility and extensibility of the Spark’s Catalyst compiler. All applications share much of the compilation and execution code\, except for sampling\, caching\, and result extraction. \n  \nIn this talk we will\, first\, introduce Workday and then Prism Analytics. We will then zoom into Spark-based interactive and batch data processing components of Prism Analytics. We will then describe the data prep transformations\, and their compilation into Spark DataFrames\, through Spark-SQL Catalyst plans\, in both interactive and batch mode. We will focus on some challenges we encountered while compiling and executing complex pipelines and queries. For example\, Spark SQL compilation times exceeded execution time for some low-latency queries. And compiled plans grew dangerously for data prep pipelines with multiple self-joins and self-unions. We will describe caching\, sampling\, and query compilation techniques that allow us to support interactive user experience. This includes a join co-sampling component that improves system usability when joining large datasets. Finally\, we will conclude with an overview of the open challenges that we plan to tackle in the future. \n  \nBios: \nDr. Andrey Balmin is a Sr. Principal Engineer at Workday\, where he is building the self-service Prism Analytics platform\, continuing the work he began at Platfora (which was acquired by Workday in 2016). Prior to this\, he was a Research Staff Member at IBM Almaden Research Center where he focused on search and query processing of semi-structured and graph-structured data in Data Warehousing and\, later\, Big Data platforms. He holds a Ph.D. degree in Computer Science from UC San Diego. \n  \nMayank Pradhan is a Senior Engineering Manager at Workday focused on building Analytics Platform for Workday’s customers. His team develops the backend engines in cloud that run a variety of workloads including\, data ingestion\, interactive data preparation\, and OLAP cubes for large scale\, high volume interactive querying. Prior to Workday\, Mayank worked on industry shifting data processing products like Platfora\, ParAccel (the columnar database technology behind Redshift)\, IBM DB2. He has 18 years of industry experience building distributed databases. He did his M.S in Computer Science from Santa Clara University and BS Computer Science from Pune University India.
URL:https://isg.ics.uci.edu/event/dr-andrey-balmin-and-mayank-pradhan-workday-workday-prism-analytics-unifying-interactive-and-batch-data-processing-using-apache-spark/
LOCATION:DBH 3011
END:VEVENT
END:VCALENDAR