ISG Talks are sponsored by Couchbase.

<< All Talks

Loading Events

Past Events

Events Search and Views Navigation

Event Views Navigation

November 2018

Sushant Jain : Large scale and low latency data distribution from database to servers

November 2, 2018 @ 11:00 am - 12:00 pm
DBH 6011

Many applications at Google are structured with data stored in a transactional database (source of truth) and same data being required by servers distributed worldwide. For efficient and fast computation servers store this data in memory. Further, the database is changing continuously and we need to update the in-memory view of these large number of servers in real-time. For example, in Google Search Ads application we have Advertisers configuration stored in a database and this data is loaded in the memory of various servers to compute Ads in a scalable and fast way. In this talk, we describe our solution to this data distribution problem and the challenges that we encountered in providing a highly reliable and low latency service.

Find out more »

Dr. Andrey Balmin and Mayank Pradhan (Workday): Workday Prism Analytics: Unifying Interactive and Batch Data Processing Using Apache Spark

November 16, 2018 @ 3:00 pm - 4:00 pm
DBH 3011

Abstract: Workday Prism Analytics enables data discovery and interactive Business Intelligence analysis for Workday customers. To prepare data for analysis, business users can setup data transformation pipelines in an interactive, self-service, modern data prep environment. Thus, Workday Prism Analytics needs to run three types of scalable data processing applications: "always on" query engine and data prep applications, and on-demand batch execution of transformation pipelines. We standardized on Apache Spark and Spark SQL for all three applications, due to its scalability,…

Find out more »

December 2018

Vinayak Borkar (FireEye Inc.): The X15 Machine Data Management Platform

December 7, 2018 @ 3:00 pm - 4:00 pm
DBH 4011

ABSTRACT: Machine Data (aka Log Data) is continuously produced by applications and devices as a result of human-computer and computer-computer interactions. Although most of this data was initially generated for ad-hoc human consumption to aid with debugging and troubleshooting systems and deployments, their systematic treatment using well-known data processing techniques can unlock valuable insight about operations within organizations. Log Data may sometimes be the only feasible source of some of this insight. Processing Log Data poses new challenges as compared to…

Find out more »

January 2019

David Lomet (Microsoft Research): How Data Caching Systems Succeed

January 25 @ 3:00 pm - 4:00 pm
DBH 4011

Data in traditional "caching'' data systems resides on secondary storage, and is read into main memory only when operated on. This limits system performance. Main memory data stores with data always in main memory are much faster. But this performance comes at a cost. In this paper, we analyze the costs of both in-memory operations and secondary storage operations where data is not "in cache''. We study the performance impact of cache misses on caching system performance. The analysis considers both execution and storage costs. Based on our analysis, we derive cost/performance results for a data caching system and a main memory system to understand where each demonstrates the best cost per operation, what is driving the cost differences, and the scale of the differences. This analysis (1) provides insight into why data caching systems continue to dominate the market; (2) points to higher performance that does not rely on simply increasing main memory cache size; and (3) suggests a path to lower costs and hence better cost/performance.

Find out more »

February 2019

Prof. Jeff Ullman: Data Science: Is it Real?

February 1 @ 11:00 am - 12:00 pm
DBH 6011

ABSTRACT: We shall discuss the various ways in which data science is approached by different communities, including the Statistics, Machine-Learning, and Database communities.  Each presents a different viewpoint and values different outcomes. Some consequences of these approaches will be discussed. As an example, of why data science is not machine learning, we shall sketch two important algorithms not in this class: locality-sensitive hashing and approximate counting. BIOGRAPHY: Jeffrey David Ullman is the Stanford W. Ascherman Professor of Engineering (Emeritus) in…

Find out more »

Prof. Sang-Woo Jun: Lowering the cost of large-scale data analytics via efficient use of flash storage

February 8 @ 3:00 pm - 4:00 pm
DBH 3011

In this talk, I present the storage systems aspect of the ongoing work on using relatively cheap solid-state secondary storage to replace expensive DRAM for analytics on large amounts of data, using as examples graph analytics and the bioinformatics application somatic mutation finding. Both applications are inherently random access intensive, which is a bad fit with the coarse access granularity of secondary storage. In both cases, performance impact of random access is addressed in two ways: reorganizing the computation and…

Find out more »

March 2019

Xiangyao Yu: Transaction Processing at Scale

March 18 @ 11:00 am - 12:00 pm
DBH 3011

Abstract: Online transaction processing (OLTP) is critical for applications including finance, e-commerce, social networks, and healthcare. The increasing performance demands of these applications require OLTP to scale massively. Concurrency control is a major scalability bottleneck in such systems. This talk presents three projects that identify and help resolve scalability challenges. First, I present a scalability study of concurrency control on a simulated 1000-core processor and show the bottlenecks that constrain the scaling of classic algorithms. Then, I present a new…

Find out more »

Fatemeh Nargesian: Data Enrichment for Data Science

March 21 @ 2:00 pm - 3:00 pm
DBH 3011

Data Enrichment for Data Science Fatemeh Nargesian, University of Toronto March 21, Thursday, 2019 2:00 - 3 pm, DBH 3011 Refreshments start at 1:30 pm Data Science is built on the power of data processing and data preparation. In this talk, I discuss the challenges of data preparation for end-to-end data science. Particularly, I talk about data enrichment via discovery where the goal is to discover and integrate the right data to solve a given data science problem. I introduce…

Find out more »

May 2019

Pat Helland: There’s No Substitute for Interchangeability

May 31 @ 3:00 pm - 4:00 pm
DBH 3011

Speaker: Pat Helland (Salesforce.com) Title: There's No Substitute for Interchangeability Time: 3-4 PM Place: 3011 DBH Abstract: Distributed systems have many challenges including loosely coupled systems, long running work, and distributed workflow.  In addition,  replication with out-of-order reconciliation is quite difficult, especially when composed with the other challenges.  In this talk, we propose data-centric REST-style connectors that allow work to be decoupled in trust, space, and time.  These replication connectors support replicated services with eventual consistency. As work comes together,…

Find out more »
+ Export Events