ISG Talks are sponsored by Couchbase.
- This event has passed.
Fatemeh Nargesian: Data Enrichment for Data Science
March 21, 2019 @ 2:00 pm - 3:00 pm
Data Enrichment for Data Science
Fatemeh Nargesian, University of Toronto
March 21, Thursday, 2019
2:00 – 3 pm, DBH 3011
Refreshments start at 1:30 pm
Data Science is built on the power of data processing and data preparation. In this talk, I discuss the challenges of data preparation for end-to-end data science. Particularly, I talk about data enrichment via discovery where the goal is to discover and integrate the right data to solve a given data science problem. I introduce two paradigms of data discovery. In the first paradigm, the query is a dataset and a data scientist is interested in interactively finding datasets that can be integrated (e.g unioned) with the query. I introduce a probabilistic framework for searching for top-k unionable tables and discuss the need for distribution-aware techniques for data discovery. In the second paradigm, search does not start with a query, instead, it is data-driven. I will talk about data lake organization problem where the goal is to build a directory structure that enables users to most efficiently navigate data lakes. I will present a navigation model of how users interact with a directory structure and introduce a scalable local search algorithm for optimizing data lake organizations.
—————————— ——–
Bio:
Fatemeh Nargesian is a PhD candidate in the Data Curation Group of the Department of Computer Science at University of Toronto. Her primary research interests are in the data management challenges of end-to-end data science. A paper she co-authored on data discovery was accorded the Best Demonstration Award at VLDB 2017. While at University of Toronto, Fatemeh was a joint Research intern at IBM Research-NY. Prior to University of Toronto, she worked on clinical data management at the Clinical Informatics Research Group at McGill University, and received M.Sc. degrees in Computer Science at University of Ottawa and Artificial Intelligence at Sharif University of Technology.