Nov 20, 2009 | SPEAKER: Daniel Massaguer TITLE: SATware: a Semantic Approach for Sentient Spaces -- PhD defense practice talk |
| Details |
| Date and Time | Nov 20, 2009 3:30pm | | Location | DBH 3011 |
| Speaker | Daniel Massaguer | | Title | SATware: a Semantic Approach for Sentient Spaces -- PhD defense practice talk | | Abstract | Large and pervasive sensing, communications, and computing
infrastructures are enabling the realization of sentient
spaces. Enabling such spaces, however, encompasses a set of
challenges. First, programming each application such that
it connects to each sensor and it interprets the data being sensed
requires a concentration of expertise that is rarely
available. Second, achieving a wise and fair usage of the
infrastructures is impossible with current approaches due to their
lack of awareness of domain and application semantics. This talk
summarizes a PhD dissertation that focuses on designing and
implementing a middleware that addresses these challenges and
overcomes the limitations of previous approaches by featuring a
distributed streaming architecture and by being aware of the semantics
of the space and applications. Namely, we focus on (i) the design and
implementation of the overall system architecture and its underlying
programming and execution model, (ii) a set of mechanisms to provide
the right level of abstraction to applications, and (iii) a set of
mechanisms that are able to protect privacy due to the inclusion of
semantics in the middleware. |
|
Nov 20, 2009 | SPEAKER: Jingren Zhou, Microsoft TITLE: SCOPE: Parallel Data Processing of Massive Data Sets |
| Details |
| Date and Time | Nov 20, 2009 2pm | | Location | DBH 4011 |
| Speaker | Jingren Zhou, Microsoft | | Title | SCOPE: Parallel Data Processing of Massive Data Sets | | Abstract | Companies providing cloud-scale services have an increasing need to store and analyze massive data sets such as search logs and click streams. For cost and performance reasons, processing is typically done on large clusters of shared-nothing commodity machines. It is imperative to develop a programming model that hides the complexity of the underlying system but provides flexibility by allowing users to extend functionality to meet a variety of requirements.
In this talk, we present a new declarative and extensible scripting language, SCOPE (Structured Computations Optimized for Parallel Execution), targeted for this type of massive data analysis at Microsoft. The language is designed for ease of use with no explicit parallelism, while being amenable to efficient parallel execution on large clusters. SCOPE borrows several features from SQL. Data is modeled as sets of rows composed of typed columns. The select statement is retained with inner joins, outer joins, and aggregation allowed. Users can easily define their own functions and implement their own versions of operators: extractors (parsing and constructing rows from a file), processors (row-wise processing), reducers (group-wise processing), and combiners (combining rows from two inputs). SCOPE supports nesting of expressions but also allows a computation to be specified as a series of steps, in a manner often preferred by programmers. We also describe how scripts are compiled and optimized into efficient, parallel execution plans and executed on large clusters. |
|
Nov 18, 2009 | SPEAKER: Sihem Amer-Yahia,
Senior Research Scientist,
Yahoo!
TITLE: I’ll Have What She’s Having:
Recommendations in Social
Content Sites
|
| Details |
| Date and Time | Nov 18, 2009 12:30pm | | Location | DBH 6011 |
| Speaker | Sihem Amer-Yahia,
Senior Research Scientist,
Yahoo!
| | Title | I’ll Have What She’s Having:
Recommendations in Social
Content Sites
| | Abstract | In this talk Sihem will examine the challenges behind recommendations in
social content sites. She will use collaborative tagging sites (think
del.icio.us, YouTube and Yahoo! Travel) as her application and report on
her experiments in harvesting the collective tagging behavior to serve
relevant content (think URLs, videos, and travel destinations) to users.
She will also address well-known and lesser-known problems in
recommender systems such as over-specialization and data management
for the masses. In conclusion Sihem will take open questions like
recommendation timeliness and results presentation.
| | Speaker Bio | Sihem Amer-Yahia is a research scientist at Yahoo! Labs since June 2006.
Before that, she was member of technical staff at AT&T Labs for seven years.
Her research is on data management and query processing in search,
recommendations and advertising. Sihem was co-editor of the W3C
Recommendation on XML Full-Text search.
|
|
Nov 16, 2009 | SPEAKER: Yanlei Diao TITLE: Addressing New Challenges in Data Stream Processing |
| Details |
| Date and Time | Nov 16, 2009 11am | | Location | DBH 3011 |
| Speaker | Yanlei Diao | | Title | Addressing New Challenges in Data Stream Processing | | Abstract | Data stream processing has found application in many areas including object tracking and monitoring, network monitoring, and business analytics. While the foundation for data stream processing has been developed in prior work, recent real-world deployments are raising a host of new challenges.
The first challenge that we address regards the need to extend the data model for stream processing from set-based to sequence-based. The resulting paradigm for stream processing, which we call complex event processing, supports the filtering, correlation, and aggregation of events for complex pattern matching. In the main part of my talk, I present the SASE+ system that supports a wide variety of pattern matching features, with theoretical results on the expressive power of our language, an automata-based implementation, and a suite of optimization techniques. Other challenges to data stream systems include the need to handle data uncertainty and the need to archive and index data streams to answer continuous queries. In the rest of the talk, I survey two other projects that address these challenges.
| | Speaker Bio | Yanlei Diao is an Assistant Professor at the Department of Computer Science, University of Massachusetts Amherst. Her research interests are in information architectures and data management systems, with a focus on data streams, uncertain data management, flash memory databases, and XML query processing. She received her PhD in Computer Science from the University of California, Berkeley in 2005, her M.S. in Computer Science from the Hong Kong University of Science and Technology in 2000, and her B.S. in Computer Science from Fudan University in China in 1998.
Yanlei Diao is a recipient of the NSF Career Award and finalist for the Microsoft Research New Faculty Fellowship. She spoke at the Distinguished Faculty Lecture Series at the University of Texas at Austin in December 2005. Her PhD dissertation “Query Processing for Large-Scale XML Message Brokering” won the 2006 ACM-SIGMOD Dissertation Award Honorable Mention. She has served on the program committees for many international conferences and the organization committees for SIGMOD and DMSN. She is a main contributor to YFilter 1.0 (http://yfilter.cs.umass.edu/code_release.htm), a high-performance filtering system over XML message streams.
|
|
Nov 13, 2009 | SPEAKER: Srinvas Narayanan, Software Engineer, Facebook TITLE: Scalable Data Management at Facebook |
| Details |
| Date and Time | Nov 13, 2009 2pm | | Location | DBH 4011 |
| Speaker | Srinvas Narayanan, Software Engineer, Facebook | | Title | Scalable Data Management at Facebook | | Abstract | Facebook is one of the most trafficked websites with over 300 million active users generating hundreds of billions of page views a month. In this talk, I’ll present an overview of the data infrastructure that powers Facebook. I’ll cover aspects of the live MySQL + memcached based infrastructure as well as the offline Hive/Hadoop based infrastructure. I’ll also talk about lessons learnt in running the site at this large scale, and discuss some interesting open problems, challenges and opportunities. |
|
Nov 10, 2009 | SPEAKER: Yueting Zhuang TITLE: Digital Libraries and its potential in-depth applications |
| Details |
| Date and Time | Nov 10, 2009 11am | | Location | DBH 3011 |
| Speaker | Yueting Zhuang | | Title | Digital Libraries and its potential in-depth applications | | Abstract | Digital Libraries have been gaining great popularity in many countries during the past decade. In 2002, the Chinese government sponsored the project “China America Digital Academic Library (CADAL)” for digitizing one million books. A one-stop solution has been formed, including book digitization, storage, OCR and dissemination in the construction of the CADAL digital library. Recently, we developed novel correlation learning method, representation scheme and ranking mechanism for cross-media retrieval of heterogeneous multimedia data. Furthermore, the handwriting visualization, style-consistency Chinese calligraphic character synthesis and specific style calligraphy rendering systems for in-depth use of Chinese calligraphy contents in CADAL were developed. In this talk we will also present our latest research works on recommendation engines and frameworks to actively pushing contents to users. In the end, we will present our on-going research works concerned with various aspects, such as the Chinese literature chronicle, Chinese traditional medicine, CADAL cloud computing platform, and 3D visual library.
| | Speaker Bio | Yueting Zhuang(庄越挺) received his B.Sc., M.Sc. and Ph.D. degrees in computer science from Zhejiang University, China, in 1986, 1989 and 1998 respectively. From February 1997 to August 1998, he was a visiting scholar at Prof. Thomas Huang's group, University of Illinois at Urbana-Champaign.
Currently, he is a full professor and Dean of the College of Computer Science, Zhejiang University. His research interests mainly include artificial intelligence, multimedia retrieval, computer animation and digital library. In recent three years, he has published more than 10 papers in leading international journals or conferences such as IEEE Transactions on Multimedia, Pattern Recognition, ACM Multimedia, ICCV and CVPR etc. He has won various honors such as the Award of National Science Fund for Distinguished Young Scholars of China (2005), the Chang Jiang Professor by the Ministry of Education of China, 2008.
He is on the editorial board of Science in China Series F (Information Sciences), Journal of Zhejiang University Science. He was a Program Co-Chairs of 2006 Pacific-Rim Conference on Multimedia (PCM 2006). He is the director of MOE-Microsoft Key Laboratory of Visual Perception, Zhejiang University.
|
|
Nov 6, 2009 | SPEAKER: Raghu Ramakrishnan, Chief Scientist for Audience & Cloud Computing, Yahoo! TITLE: Cloud Data Serving: Key-Value Stores to DBMSs [Video] |
| Details |
| Date and Time | Nov 6, 2009 11am (not regular ISG seminar time!) | | Location | DBH 6011 (not regular ISG seminar location!) |
| Speaker | Raghu Ramakrishnan, Chief Scientist for Audience & Cloud Computing, Yahoo! | | Title | Cloud Data Serving: Key-Value Stores to DBMSs [Video] | | Abstract | Data-backed web applications have stringent availability, performance and partition tolerance requirements that are difficult, sometimes even impossible, to meet using conventional database management systems. On the other hand, they typically are able to trade off consistency to achieve their goals. This has led to the development of specialized key-value stores, which are now used widely in virtually every large-scale web service. On the other hand, most web services also require capabilities such as indexing provided by a DBMS. We are witnessing an evolution of data serving as systems builders seek to balance these trade-offs. In this talk, I will survey some of the solutions that have been developed, including Amazon's S3 and SimpleDB and Yahoo!'s PNUTS, and discuss the challenges in building such systems as "cloud services", providing elastic data serving capacity to developers, along with appropriately balanced consistency, availability, performance and partition tolerance. | | Speaker Bio | Raghu Ramakrishnan is Chief Scientist for Audience and Cloud Computing at Yahoo!, and is a Research Fellow, heading the Web Information Management group. His work has influenced query optimization in commercial database systems and the design of window functions in SQL:1999. His paper on the Birch clustering algorithm received the SIGMOD 10-Year Test-of-Time award, and he has written the widely-used text "Database Management Systems" (with Johannes Gehrke). Ramakrishnan is a Fellow of the ACM and IEEE, and has received several awards, including the ACM SIGKDD Innovations Award, the ACM SIGMOD Contributions Award, a Distinguished Alumnus Award from IIT Madras, a Packard Foundation Fellowship in Science and Engineering, and an NSF Presidential Young Investigator Award. He is Chair of ACM SIGMOD, on the Board of Directors of ACM SIGKDD and the Board of Trustees of the VLDB Endowment. Ramakrishnan was Professor of Computer Sciences at the University of Wisconsin-Madison, and founder and CTO of QUIQ, a company that pioneered question-answering communities, powering Ask Jeeves' AnswerPoint as well as customer-support for companies such as Compaq. |
|
Nov 5-6, 2009 | SPEAKER: TITLE: [DHS workshop] |
| Details |
| Date and Time | Nov 5-6, 2009 | | Location | |
| Speaker | | | Title | [DHS workshop] |
|
Oct 23, 2009 | SPEAKER: Goetz Graefe, HP Fellow, Hewlett-Packard Laboratories TITLE: Database scalability and indexes [Video] |
| Details |
| Date and Time | Oct 23, 2009 2pm | | Location | DBH 4011 |
| Speaker | Goetz Graefe, HP Fellow, Hewlett-Packard Laboratories | | Title | Database scalability and indexes [Video] | | Abstract | Scalability can be achieved by brute-force parallelism or by smart data structures and algorithms. This presentation and discussion will focus on data structures and algorithms that enable scalability and energy efficiency on systems of any size. |
|
Oct 16, 2009 | SPEAKER: Hojjat Jafarpour TITLE: CCD: Efficient Customized Content Dissemination in Distributed Publish/Subscribe |
| Details |
| Date and Time | Oct 16, 2009 3:30pm | | Location | DBH 3011 |
| Speaker | Hojjat Jafarpour | | Title | CCD: Efficient Customized Content Dissemination in Distributed Publish/Subscribe | | Abstract |
In this paper, we propose a new content-based publish/subscribe (pub/sub) framework that delivers matching content to subscribers in their desired format. Such a framework enables the pub/sub system to accommodate richer content formats
including multimedia publications with image and video content. In our proposed framework, users (consumers) in addition to specifying their information needs (subscription queries), also specify their profile which includes the information about
their receiving context which includes characteristics of the device used to receive the content (e.g., resolution of a PDA used by a consumer). The pub/sub system besides being responsible for matching and routing the published content, also becomes
responsible for converting the content into the suitable format for each user. Content conversion is achieved through a set of content adaptation operators (e.g., image transcoder, document translator, etc.). We study algorithms for placement of
such operators in the pub/sub broker overlay in order to minimize the communication and computation resource consumption. Our experimental results show that careful placement of operators in pub/sub overlay network results in significant cost reduct.
|
|
Oct 16, 2009 | SPEAKER: Utkarsh Srivastava, Senior Research Scientist, Yahoo! TITLE: Pig: Building High-Level Dataflows over Map-Reduce [Video] |
| Details |
| Date and Time | Oct 16, 2009 2pm | | Location | DBH 4011 |
| Speaker | Utkarsh Srivastava, Senior Research Scientist, Yahoo! | | Title | Pig: Building High-Level Dataflows over Map-Reduce [Video] | | Abstract | Internet companies routinely capture and analyze large data sets, using the obtained insights to guide product design. For such large data analysis, programmers have flocked to the Map-Reduce programming model that is simple, scalable, and at the same time, extremely versatile. The explicit dataflow programming style of Map-Reduce is preferred by many over the high-level declarative style of SQL. However, the map-reduce paradigm is too low-level and rigid, leaving the users to write a lot of custom code for common operations such as filter, join etc, and for dealing with multi-step and branching dataflows. Such custom code is hard to reuse and maintain, and also impedes optimizations.
I will talk about the design and implementation of Pig: a high-level dataflow system that aims at a sweet-spot between the declarative style of SQL and the procedural style of Map-Reduce. Pig offers SQL-style high-level data manipulation constructs, which can be assembled in an explicit dataflow and interleaved with custom Map- and Reduce-style functions or executables. Pig programs are compiled into sequences of Map-Reduce jobs, and executed in the Hadoop Map-Reduce environment. Both Pig and Hadoop are open-source projects administered by the Apache Software Foundation.
|
|
Oct 12, 2009 | SPEAKER: Dr. David Lomet, Microsoft Research TITLE: Project Deuteronomy: Re-architecting a DBMS Kernel |
| Details |
| Date and Time | Oct 12, 2009 11am | | Location | DBH 3011 |
| Speaker | Dr. David Lomet, Microsoft Research | | Title | Project Deuteronomy: Re-architecting a DBMS Kernel | | Abstract |
The traditional architecture for a DBMS has the recovery, concurrency control
and access method code tightly bound together in a storage engine for records.
We propose a different approach, where the DBMS kernel is factored into two layers
(each of which might have multiple heterogeneous instances).
A Transactional Component (TC) works at a logical level only: it knows
about transactions and their logical concurrency control and undo/redo recovery,
but it does not know about page layout, B-trees etc.
A Data Component (DC) knows about the physical storage structure.
It supports a record oriented interface that provides atomic operations,
but it does not know about transactions.
This refactoring of the system architecture could allow easier deployment for new platforms
and easier extensions for new data types supported by new indexing. We describe the principles necessary for this re-architected kernel, and discuss implementation issues.
| | Speaker Bio | David Lomet has been a principal researcher managing the Microsoft Research Database Group at Microsoft Research since 1995. Earlier, he spent seven and a half years at Digital Equipment Corporation. He has been at IBM Research in Yorktown and a Professor at Wang Institute. Dr. Lomet spent a sabbatical at University of Newcastle-upon-Tyne working with Brian Randell. He has a Computer Science Ph.D from the University of Pennsylvania.
Dr. Lomet has done research and product development in architecture, programming languages, and distributed systems. His primary interest is database systems, focusing on access methods, concurrency control, and recovery. He is one of the inventors of the transaction concept and is an author of over 90 papers and 40 patents. Two papers won SIGMOD "best paper" awards.
Dr. Lomet has served on program committees, including SIGMOD, PODS, VLDB, and ICDE. He was ICDE'2000 PC co-chair and VLDB 2006 PC core chair. He is a member of the ICDE Steering Committee and VLDB Board. He is editor-in-chief of the Data Engineering Bulletin since 1992 and past editor of ACM TODS and the VLDB Journal. Dr. Lomet is IEEE Golden Core Member and received IEEE Outstanding Contribution and Meritorious Service Awards. Dr. Lomet is both ACM Fellow and IEEE Fellow.
|
|
Oct 9, 2009 | SPEAKER: Dr Min-Yen Kan TITLE: ForeCite: towards a more integrated scholarly digital library |
| Details |
| Date and Time | Oct 9, 2009 3:30pm | | Location | DBH 3011 |
| Speaker | Dr Min-Yen Kan | | Title | ForeCite: towards a more integrated scholarly digital library | | Abstract | cholarly digital libraries (DLs) have managed to scale up
to handle millions of documents and now feature tools to track
citations and references between articles. However, users of digital
libraries
typically often access the DL merely to check references or to
download the PDF of the document. What features will the
next-generation DL need to inspire scholars to use digital library for
more than accessing the document? In ForeCite, our digital library
project at NUS, we believe part of the answer lies in integrating
common end user's concerns: annotation, sharing, off-and-online usage
and focusing on the intra-document processing. I will describe and
demonstrate some of the preliminary components of the ForeCite system:
including its web based front end, ParsCit (a backend open-source
citation segmentation system), and ForeCiteNote (TiddyWiki based
research notetaking system) and ForeCiteReader (Google Books-like
interface for annotation and collaboration on notetaking, and FireCite
(browser extension for recognizing citations on webpages).
| | Speaker Bio | Min-Yen Kan (BS;MS;PhD Columbia Univ.) is an associate professor at
the National University of Singapore. His research interests include
digital libraries and applied natural language processing. Specific
projects include work in the areas of scientific discourse analysis,
multiword expression extraction and understanding, machine translation
and applied text summarization. Currently, he is an associate editor
for "Information Retrieval" and is the Editor for the ACL Anthology,
the computational linguistics community's largest archive of published
research. More information about him and his group can be found at the
WING homepage:
http://wing.comp.nus.edu.sg/
|
|
Oct 9, 2009 | SPEAKER: Chris Smith, Xavier Stevens, John Carnahan, Fox Audience Network TITLE: Hadoop: Origins and Applications [Video] |
| Details |
| Date and Time | Oct 9, 2009 2pm | | Location | DBH 4011 |
| Speaker | Chris Smith, Xavier Stevens, John Carnahan, Fox Audience Network | | Title | Hadoop: Origins and Applications [Video] | | Abstract |
Hadoop has become a powerful tool for facile, cost-effective parallel
processing of massive quantities of data. FAN (Fox Audience Network) employs
Hadoop to rapidly build and execute innovative statistical models and perform
extensive analyses on massive quantities of data. This talk will first outline
the history of Hadoop and of the Map/Reduce framework which inspired
it. It will then describe the Hadoop execution model and distributed
filesystem. Finally, it will discuss practical applications of
Hadoop, illustrated by specific examples from FAN. |
|
Oct 2, 2009 | SPEAKER: Todd Walter, Teradata TITLE: Teradata Past, Present and Future [Video] |
| Details |
| Date and Time | Oct 2, 2009 2pm | | Location | DBH 4011 |
| Speaker | Todd Walter, Teradata | | Title | Teradata Past, Present and Future [Video] | | Abstract |
In the early 1980's, Teradata Corporation introduced the world's
first commercial shared-nothing parallel database management system.
Today, Teradata manages many of the world's largest data warehouses.
A look at Teradata's past will show how parallelism was applied over a wide variety of generations of hardware and business challenges. Then we will look at the key differentiators of the product as they exist today and where the product may go in the future. Customer examples will be used throughout. | | Speaker Bio | Todd Walter joined Teradata in 1987. He designed and implemented features of Teradata, managed engineering teams and researched advanced database topics. Todd has worked directly with many customers at the leading edge of adopting data warehouse technology, guiding the technology to solve real business problems. He holds several Teradata patents and in 1998 was named Teradata Fellow, the highest technical award granted by the company. In his current role as Chief Technical Officer of Teradata R&D, Todd is responsible for vision, strategy and technical leadership, taking Teradata into the future.
|
|
Sep 11, 2009 | SPEAKER: Volker Markl (Technische Universität Berlin, Germany) TITLE: Learning in Query Optimization |
| Details |
| Date and Time | Sep 11, 2009 TBD | | Location | TBD |
| Speaker | Volker Markl (Technische Universität Berlin, Germany) | | Title | Learning in Query Optimization | | Abstract | > Database Systems let users specify queries in a declarative language
like SQL. Most modern DBMS optimizers rely upon a cost model to choose
the best query execution plan (QEP) for any given query. Cost estimates
are heavily dependent upon the optimizer's estimates for the number
of rows that will result at each step of the QEP for complex
queries involving many predicates and/or operations. These estimates,
in turn, rely upon statistics on the database and modeling assumptions
that may or may not be true for a given database. In my talk, I will
present
an overview of the research on learning in query optimization
that I have conducted while working at IBM Almaden Research Center. I
will introduce LEO, DB2's LEarning Optimizer, as a comprehensive way to
repair incorrect statistics and cardinality estimates of a query
execution plan. By monitoring executed queries, LEO compares the
optimizer's estimates with actuals at each step in a QEP, and computes
adjustments to cost estimates and statistics that may be used during the
current and future query optimizations. LEO introduces a feedback loop
to query optimization that enhances the available information on the
| | Speaker Bio | http://www.dima.tu-berlin.de/menue/mitarbeiter/volker_markl/parameter/en/ |
|
Jun 5, 2009 | SPEAKER: Prasan Roy (Distinguished MTS, Aster Data Systems) TITLE: Integrating SQL and MapReduce in Aster nCluster |
| Details |
| Date and Time | Jun 5, 2009 3pm | | Location | DBH 3011 |
| Speaker | Prasan Roy (Distinguished MTS, Aster Data Systems) | | Title | Integrating SQL and MapReduce in Aster nCluster | | Abstract | A user-defined function (UDF) is a powerful feature that allows users
to customize database functionality. The support for UDFs in existing
database systems is cumbersome, and severely limited in functionality
-- especially when the UDFs analyze and return sets of rows. For
instance, there is no straightforward way to define an aggregate UDF
on an ordered input, making it virtually impossible to implement UDFs
for clickstream analytics, order statistics, etc. Moreover, such UDFs
are hard to develop, monitor and debug.
Aster nCluster is a parallel shared-nothing database system that runs
on commodity hardware. In this talk, I will give an overview of
nCluster's ability to specify custom relational "operators" and
incorporate them in the query execution pipeline. These user-defined
operators are a generalization of UDFs as supported in existing
database systems. They can be developed, incorporated and debugged
easily, are highly extensible, allow computation on ordered data, and
are parallelizable by design. Such user-defined operators also
generalize the "MapReduce" paradigm, and enable a MapReduce workflow
to be specified, optimized and evaluated as a part of a parallel query
execution pipeline.
| | Speaker Bio | Prasan Roy is a Distinguised Member of Technical Staff at Aster Data
Systems, where he leads the design and development of query processing
and optimization capabilities in Aster nCluster. He holds a B.Tech.
from IIT-Delhi and an M.Tech. and a Ph.D. from IIT-Bombay, all in
computer science. Prior to joining Aster Data Systems, he held
faculty/research positions at IIT-Bombay, IBM Research and Bell
Laboratories. |
|
May 29, 2009 | SPEAKER: Dr. Vuk Ercegovac (IBM Almaden) TITLE: Jaql: Pipes for Analytics in the Cloud |
| Details |
| Date and Time | May 29, 2009 3pm | | Location | DBH 4011 |
| Speaker | Dr. Vuk Ercegovac (IBM Almaden) | | Title | Jaql: Pipes for Analytics in the Cloud | | Abstract | We introduce Jaql, a query language for the JSON data model. JSON (JavaScript Object Notation) is a popular data format for many Web-based applications because of its simplicity and modeling flexibility. JSON easily models a wide spectrum of data, ranging from homogenous flat data to heterogeneous nested data, and does this in a language-independent format that easily integrates with existing programming languages. We believe that these characteristics make JSON an ideal data format for many Hadoop applications and databases in general. This talk will describe the key features of Jaql and show how it can be used to process JSON data in parallel using Hadoop's map/reduce framework. In addition, we will present several use-cases
from the enterprise setting and discuss our research efforts focused on large scale data analytics. |
|
May 15, 2009 | SPEAKER: Dr. Daniela Florescu (Oracle Corporation) TITLE: The Magic is in the Glue: XQuery + Cloud |
| Details |
| Date and Time | May 15, 2009 3pm | | Location | DBH 3011 |
| Speaker | Dr. Daniela Florescu (Oracle Corporation) | | Title | The Magic is in the Glue: XQuery + Cloud | | Abstract | Ten years have passed since the W3C initiated its effort to design a query language for what, in 1999, was a new and controversial semi-structured data format, namely XML. A decade (and a lot of effort) later, the (now programming) language and its implementations are finally reaching industrial strength and are being taken up by customers as a solid alternative for building complex applications. Meanwhile, independently of the development of XQuery, and completely orthogonal to any programming language or application development infrastructure, a new buzzword is becoming more and more visible in the IT arena: the "Cloud".
In this talk I will describe the poor state of current application development, which has serious limitations and inconveniences, and I will explain why, today, innovation in this area is unavoidable. The applications bubble is about to burst: existing software components, architectures, programming languages,
database models, and communication protocols are under significant pressure to change. I will argue that a combination of those two important technologies, "XQuery + Cloud", might provide a breakthrough in the area of application development infrastructure. I will show why and how those two technologies can, when combined, solve (or at least significantly improve) the existing limitations.
This talk will go into technical detail on how XQuery and Cloud technologies can be combined and what kinds of technical challenges people face in trying to integrate them. Also, as one might expect, both technologies are just "babies" when measured in IT years. As a result, around each one of them, and, more interestingly, in the combination between the two of them, lay a number of significant research problems; I will describe the problems that I personally consider to be the most interesting. | | Comments | (Slides) |
|
May 15, 2009 | SPEAKER: Ronald Eguchi TITLE: Earthquakes, Hurricanes and other Disasters: A View from Space |
| Details |
| Date and Time | May 15, 2009 11am | | Location | DBH 6011 |
| Speaker | Ronald Eguchi | | Title | Earthquakes, Hurricanes and other Disasters: A View from Space | | Abstract | In many instances, disasters act as catalysts in the adoption of new and emerging technologies. Spawned by the need to rapidly collect vital information for disaster management, technology innovations have often helped emergency responders to assess the impact of large disasters more efficiently and rapidly, and to track and monitor progress in critical response and recovery operations. Some examples of where technology implementation has been driven by the occurrence of a major disaster include Hurricane Andrew in 1992, where the lack of rapid damage or situation assessment tools hindered the deployment of federal resources and thus identified the need for near real-time loss estimation methodologies; the 1994 Northridge Earthquake where GIS took center stage during the initial response and recovery periods by providing important visual and spatial information on critical operations; the World Trade Center attacks which demonstrated the potential use of remote sensing technologies for damage assessment and recovery; and Hurricane Charley in 2004 where the deployment of GPS-based, field survey technologies helped to freeze in time the damage and destruction of this disaster so that researchers could study the effects of significant wind hazards in a more comprehensive and complete manner. All of these events underscore the opportunities that emerge when time-critical information can be delivered more efficiently to users making critical decisions during the disaster.
One technology which has had an enormous impact on disaster management has been remote sensing. In the past decade, this technology has been used extensively to explain the extent of impacts caused by earthquakes, tsunamis, hurricanes, floods, wildfires and terrorist attacks. Through high-resolution optical imagery and active sensors (e.g., synthetic aperture radar, or more commonly known as SAR, and light detection and ranging or LIDAR), remote sensing technologies have demonstrated significant efficacies in quantifying post-disaster damage, monitoring recovery and reconstruction progress after significant disasters, and more recently, in developing information on our urban infrastructure. One main reason for this rapid progress has been the introduction of high-resolution, commercially-available satellite imagery. Where these technologies used to be available to mainly government agencies (mostly military), they have now become readily accessible to the public. The impact of this development has been most noticeable – in my opinion - in the disaster management area.
This presentation will focus on the integration of remote sensing technologies in all aspects of disaster management, i.e., disaster preparedness, mitigation, response and recovery. In order to demonstrate their efficacy in these four areas, cases histories and examples from recent disasters, including the Bam, Iran earthquake, the Indian Ocean earthquake and tsunami, Hurricanes Katrina, Rita and Wilma (all occurring in 2005), and the World Trade Center attacks, will be presented.
Finally, the presentation will end with a view towards the future. What new developments can be expected in technology development and implementation, what future challenges must be overcome to realize broader application of these technologies in future disasters, and what role will our younger researchers play in institutionalizing these technologies as essential tools in disaster management.
| | Speaker Bio | Ronald T. Eguchi, president and CEO of ImageCat, Inc., headquartered in Long Beach, California, is EERI’s 2008 Distinguished Lecturer. During EERI’s 60th Annual Meeting in February 2008 in New Orleans, he presented his lecture, entitled “Earthquakes, Hurricanes and other Disasters: A View from Space,” for the first time.
An EERI member since 1976, Eguchi has over 30 years of experience in risk analysis and risk management studies. In 2000, with Charles K. Huyck, he formed ImageCat, Inc., a risk management company specializing in the development and use of advanced technologies for risk assessment and reduction and which has contributed substantially to research on and implementation of remote sensing technologies for earthquakes and other natural hazards, lifeline earthquake engineering, and earthquake risk assessment. He has authored over 200 publications, many of them dealing with applications of advanced technologies to remote sensing and the seismic risk of utility lifeline systems. Eguchi has directed major research and application studies in these areas for government agencies and private industry. Over the last decade, Eguchi has focused on remote sensing for reconnaissance after earthquakes, tsunamis, hurricanes, and other natural disasters and human threats. This work has revolutionized the way that reconnaissance for extreme events is performed. Ron’s work is recognized worldwide for its innovative and exceptionally useful application of advanced technology for infrastructure inventory, damage assessment, emergency response, and disaster recovery. He is a past member of the Scientific Earthquake Studies Advisory Committee of the U.S. Geological Survey and Editorial Boards of Earthquake Spectra and Natural Hazards Review, and is currently a member of the National Academies Disaster Roundtable.
In 1997, Eguchi was awarded the ASCE C. Martin Duke Award for his contributions to the area of lifeline earthquake engineering. He still remains active in the ASCE Technical Council on Lifeline Earthquake Engineering, serving on several committees and having chaired the Council’s Executive Committee in 1991. In 1992, Mr. Eguchi chaired a panel, established jointly by the Federal Emergency Management Agency and the National Institute of Standards and Technology, to develop a plan for assembling and adopting seismic design standards for public and private lifelines in the U.S. This effort led to the formation of the American Lifelines Alliance, currently managed by the National Institute of Building Sciences. In 2006, he accepted an ATC Award of Excellence on behalf of the ATC-61 project team for work on An Independent Study to Assess Future Savings from Mitigation Activities that showed that a dollar spent on hazard mitigation saves the nation about $4 in future benefits.
|
|
May 8, 2009 | SPEAKER: Leila Jalali TITLE: Time Synchronization in next generation simulation platforms |
| Details |
| Date and Time | May 8, 2009 3pm | | Location | DBH 3011 |
| Speaker | Leila Jalali | | Title | Time Synchronization in next generation simulation platforms | | Abstract | Today’s economic and organizational constraints along with the increasingly complex requirements make it infeasible to build complex simulations entirely from scratch. During the last few decades a variety of Commercial Off-The-Shelf (COTS) simulation tools have been developed which often take too large an effort to build and are generally too complex. In this presentation we discuss simulation integration that allows bringing together simulators from various modeling domains. In particular, we would like to be able to consider how we can exploit simulation integration correctly and meaningfully to understand, predict, and plan the flow of real world activities. We focus on time synchronization problem in simulation integration. While most of available time synchronization methods need the participants to agree on a common interpretation of time and a common time advancement method for a specific application and interoperability requirements, we want to develop a time synchronization mechanism in which existing simulators allow using their own internal time advancement mechanisms and the time synchronizer combines all time management mechanisms and synchronizes individual simulators in a single simulation. |
|
Apr 24, 2009 | SPEAKER: Prof. Christoph Koch (Cornell University) TITLE: MayBMS -- A Probabilistic Database Management System |
| Details |
| Date and Time | Apr 24, 2009 3pm | | Location | DBH 4011 |
| Speaker | Prof. Christoph Koch (Cornell University) | | Title | MayBMS -- A Probabilistic Database Management System | | Abstract | Probabilistic databases arise naturally in many data management scenarios,
such as Web information extraction, data cleaning, data integration, sensor
data management, and scientific databases. In this talk I present MayBMS, a
state-of-the-art database management system for efficiently managing and
processing large collections of probabilistic data. MayBMS is based on a
clean yet expressive query language that captures many important use cases of
probabilistic databases, including hypothetical query processing, managing
evidence, and Bayesian inference. MayBMS employs a succinct representation
system for probabilistic databases called U-relations, which unifies various
approaches to representing uncertain data, such as c-tables and vertical
decomposition. MayBMS follows a principled approach to leveraging the
strengths of previous database research for achieving scalability. The talk
covers the design of the query language, storage structures, as well as query
evaluation and optimization techniques. |
|
Apr 17, 2009 | SPEAKER: Shengyue Ji TITLE: Efficient Interactive Fuzzy Keyword Search |
| Details |
| Date and Time | Apr 17, 2009 3pm | | Location | DBH 3011 |
| Speaker | Shengyue Ji | | Title | Efficient Interactive Fuzzy Keyword Search | | Abstract | Traditional information systems return answers after a user
submits a complete query. Users often feel “left in the dark”
when they have limited knowledge about the underlying
data, and have to use a try-and-see approach for finding
information. A recent trend of supporting autocomplete in
these systems is a first step towards solving this problem.
In this paper, we study a new information-access paradigm,
called“interactive, fuzzy search,”in which the system searches
the underlying data “on the fly” as the user types in query
keywords. It extends autocomplete interfaces by (1) allowing
keywords to appear in multiple attributes (in an arbitrary
order) of the underlying data; and (2) finding relevant
records that have keywords matching query keywords approximately.
This framework allows users to explore data as
they type, even in the presence of minor errors. We study
research challenges in this framework for large amounts of
data. Since each keystroke of the user could invoke a query
on the backend, we need efficient algorithms to process each
query within milliseconds. We develop various incrementalsearch
algorithms using previously computed and cached results
in order to achieve an interactive speed. We have deployed
several real prototypes using these techniques. One
of them has been deployed to support interactive search on
the UC Irvine people directory, which has been used regularly
and well received by users due to its friendly interface
and high efficiency. |
|
Apr 10, 2009 | SPEAKER: Bo Xing TITLE: RADcast: Enabling Reliability Guarantees for Content Dissemination in Ad Hoc Networks / An Experimental Study on Wi-Fi Ad-Hoc Mode for Mobile Device-to-Device Video Delivery |
| Details |
| Date and Time | Apr 10, 2009 3pm | | Location | DBH 3011 |
| Speaker | Bo Xing | | Title | RADcast: Enabling Reliability Guarantees for Content Dissemination in Ad Hoc Networks / An Experimental Study on Wi-Fi Ad-Hoc Mode for Mobile Device-to-Device Video Delivery | | Abstract | This paper deals with the problem of reliable and fast broadcast of
mission-critical data with rich content over ad hoc networks.
Existing approaches to dissemination reliability often assume
network size knowledge, or that receivers know about the
dissemination in advance. Without making similar assumptions, we
propose a distinct approach which accommodates the varying
reliability needs of applications. We develop the RADcast (Reliable
Application Data broadcast) protocol as an integration of two
components: (a) Peddler, which ensures that receivers obtain the
dissemination metadata, and (b) Pryer, which delivers the actual
data to dissemination-aware receivers. We indicate how reliability
guarantees/performance tradeoffs can be achieved by a careful
instantiation of Peddler and Pryer. We implement RADcast on mobile
devices inside a middleware and determine its feasibility.
Furthermore, through extensive simulations, we show that RADcast
achieves desired reliability in all cases, and performs consistently
under varying network conditions and device mobilities. As compared
to existing approaches, RADcast either incurs significantly lower
latency/message overhead, or reduces latency by 50% with a tradeoff
in message overhead.
/
The demand for video content is continuously increasing as video
sharing on the Internet is becoming enormously popular recently.
This demand, with its high bandwidth requirements, has a
considerable impact on the load of the network infrastructure. As
more users access videos from their mobile devices, the load on the
current wireless infrastructure (which has limited capacity) will be
even more significant. Based on observations from many local video
sharing scenarios, in this paper, we study the tradeoffs of using
Wi-Fi ad-hoc mode versus infrastructure mode for video streaming
between adjacent devices. We thus show the potential of direct
device-to-device communication as a way to reduce the load on the
wireless infrastructure and to improve user experiences. Setting up
experiments for Wi-Fi devices connected in ad-hoc mode, we collect
measurements for various video streaming scenarios and compare them
to the case where the devices are connected through access points.
The results show the improvements in latency, jitter and loss rate.
More importantly, the results show that the performance in direct
device-to-device streaming is much more stable in contrast to the
access point case, where different factors affect the performance
causing widely unpredictable qualities. |
|
Apr 3, 2009 | SPEAKER: Dr. Michael Franklin (UC Berkeley and Truviso, Inc.) (Host: Michael Carey) TITLE: Continuous Analytics: Data Stream Query Processing in Practice |
| Details |
| Date and Time | Apr 3, 2009 11am | | Location | DBH 6011 | | Host | Michael Carey |
| Speaker | Dr. Michael Franklin (UC Berkeley and Truviso, Inc.) | | Title | Continuous Analytics: Data Stream Query Processing in Practice | | Abstract | Stream query processing has been one of the hotter topics in database research so far this century. The basic idea is to provide database-style query processing over data on-the-fly as they arrive at the system, in contrast to the store-first, query-later approach followed by traditional database systems. Work in this area was originally motivated by "real-time" data-intensive scenarios such as sensor networks, financial trading applications, and network security. Stream query processing caught the imagination of the research community due to the new applications it could enable as well as the large number of traditional database assumptions that needed to be rethought and the new opportunities for optimization this mode of execution provided. Lately, stream processing has been moving from the research lab into the real world through efforts at start-up companies, traditional database vendors, and open source projects. Not surprisingly, the practical uses and advantages of the technology are turning out to be different than many had originally expected. In this talk, I'll survey the state of the art in stream query processing and related technologies such as event processing, discuss some of the implications for data-intensive system architectures, and provide my views on the future role of this technology from both a research and a commercial perspective. In particular, I'll describe the notion of Continuous Analytics, which leverages Stream Query Processing techniques to solve some of the inherent bottlenecks that have existed in database systems since their inception. | | Speaker Bio | Michael Franklin is a Professor of Computer Science at the University of California, Berkeley and the Co-Founder and CTO of Truviso, Inc., a leading provider of next-generation data analytics. At Berkeley, where he has been on the faculty since 1999, his research focuses on the architecture and performance of distributed data management and information systems. His recent projects cover the areas of wireless sensor networks, pay-as-you-go data integration, data stream processing, cloud computing, and data management for the digital home. Earlier in his career, he worked as a database systems developer on compressed data structures and highly-parallel database systems. Dr. Franklin serves on the Board of Trustees of the VLDB Endowment, and has served on Technical Advisory Boards for technology companies including: Appstream (SYMC), Business Signatures (ENTU), DATAllegro (MSFT), and WiseNut (LOOK). He is a Fellow of the Association for Computing Machinery, and a recipient of the National Science Foundation Career Award and the ACM SIGMOD "Test of Time" award. He earned a Ph.D. in Computer Science from the University of Wisconsin, Madison in 1993, an M.S.E. from the Wang Institute of Graduate Studies in 1986, and a B.S. in Computer and Information Science from the University of Massachusetts Amherst in 1983. He is a 2009 recipient of the Outstanding Alumni Achievement award from the UMass Computer Science Department. |
|
Mar 6, 2009 | SPEAKER: Alexander Behm TITLE: Space-Constrained Gram-Based Indexing for Efficient Approximate String Search |
| Details |
| Date and Time | Mar 6, 2009 1pm | | Location | DBH 3011 |
| Speaker | Alexander Behm | | Title | Space-Constrained Gram-Based Indexing for Efficient Approximate String Search | | Abstract | Answering approximate queries on string collections is important in applications such as data cleaning, query relaxation, and spell checking, where inconsistencies and errors exist in user queries as well as data. Many existing algorithms use gram-based inverted-list indexing structures to answer approximate string queries. These indexing structures are ``notoriously'' large compared to the size of their original string collection. In this paper, we study how to reduce the size of such an indexing structure to a given amount of space, while retaining efficient query processing. We first study how to adopt existing inverted-list compression techniques to solve our problem. Then, we propose two novel approaches for achieving the goal: one is based on discarding gram lists, and one is based on combining correlated lists. They are both orthogonal to existing compression techniques, exploit a unique property of our setting, and offer new opportunities for improving query performance. For each approach we analyze its effect on query performance and develop algorithms for wisely choosing lists to discard or combine. Our extensive experiments on real data sets show that our approaches provide applications the flexibility in deciding the tradeoff between query performance and indexing size, and can outperform existing compression techniques. An interesting and surprising finding is that while we can reduce the index size significantly (up to 60\% reduction) with tolerable performance penalties, for 20-40\% reductions we can even improve query performance compared to original indexes. |
|
Feb 27, 2009 | SPEAKER: Rabia Nuray-Turan TITLE: Towards Breaking the Quality Curse.\\A Web-Querying Approach to Web People Search |
| Details |
| Date and Time | Feb 27, 2009 3pm | | Location | DBH 3011 |
| Speaker | Rabia Nuray-Turan | | Title | Towards Breaking the Quality Curse.\\A Web-Querying Approach to Web People Search | | Abstract | Searching for people on the Web is one of the most common query types to the web search engines today. However, when a person name is queried, the returned webpages often contain documents related to several distinct namesakes who have the queried name. The task of disambiguating and finding the webpages related to the specific person of interest is left to the user. Many Web People Search (WePS) approaches have been developed recently that attempt to automate this disambiguation process. Nevertheless, the disambiguation quality of these techniques leaves a major room for improvement.
%
This paper presents a new server-side WePS approach. It is based on collecting co-occurrence information from the Web and thus it uses the Web as an external data source. A skyline-based classification technique is developed for classifying the collected co-occurrence information in order to make clustering decisions. The clustering technique is specifically designed to (a) handle the dominance that exists in data and (b) to adapt to a given clustering quality measure. These properties allow the framework to get a major advantage in terms of result quality over all the latest WePS techniques we are aware of, including all the 18 methods covered in the recent WePS competition \cite{SemEval-2007}.
|
|
Feb 13, 2009 | SPEAKER: Nicola Onose (Host: Michael Carey) TITLE: Uncovering the Full Potential of Data Services |
| Details |
| Date and Time | Feb 13, 2009 1pm | | Location | DBH 3011 | | Host | Michael Carey |
| Speaker | Nicola Onose | | Title | Uncovering the Full Potential of Data Services | | Abstract | Making use of available Web data sources when building software applications is a major challenge for today's developers. Making data sources available as services is a related challenge, as more and more advanced features are required.
We address the first challenge by using a declarative interface for data-centric Web services (aka data services), which are published as queries over a source schema. Programmers simply write queries over the source schema and rely on the system to automatically translate them to calls to existing data services. Thus, programmers can focus on extracting the data they need, without having to understand the definition or the implementation of each individual service. In this talk we discuss the main underlying technical problem, that of deciding whether a query can be translated into service calls. We consider two settings: when the system cannot do any post-processing and hence can issue only one service call (we call that expressibility) and when it is able to issue several calls and combine the results (we call it support). Expressibility and support are studied both for services that are listed individually and for sources that publish a very large, possibly infinite number of services.
To address the second challenge, we extended the standard service infrastructure with new features, several of which were added to the Distributed XQuery (DXQ) framework. DXQ is an XML query and scripting language with support for side effects, distribution, parallelism, which we also used as implementation platform for workflow languages.
| | Speaker Bio | Nicola Onose is a PhD candidate in the Database group at the University of California, San Diego, where he has been advised by Alin Deutsch. Prior to that, he obtained an engineering degree from Ecole Polytechnique and ENSIMAG (France). His main research interests are XML data management, Web and data Services. |
|
Feb 6, 2009 | SPEAKER: Anish Das Sarma (Host: Michael Carey) TITLE: Managing Uncertain Data |
| Details |
| Date and Time | Feb 6, 2009 1pm | | Location | DBH 3011 | | Host | Michael Carey |
| Speaker | Anish Das Sarma | | Title | Managing Uncertain Data | | Abstract | The recent ubiquity of uncertain data in modern-day applications (such as data integration, sensor networks, and scientific experiments) has resulted in a growing need for principled techniques in dealing with uncertainty. At Stanford, we have been developing Trio: a system for managing data, uncertainty, and lineage. In the first part of the talk, I will describe some challenges in managing uncertain data in Trio, and how lineage helped us overcome them. Lineage allows for a simple yet expressive model for representing uncertain data, and it enables efficient query processing.
The next part of the talk covers uncertain data management in the context of automated data integration. I will describe a self-configuring data integration system that produces high-quality answers with no human intervention. The system is based on probabilistic mediated-schemas and schema-mappings. I will present results on the relative expressive power of probabilistic mediated schemas and mappings, give algorithms for creating them, and describe extensive experimental results.
Finally, I will describe briefly some of my other topics and results, and my plans for future research. |
|
Jan 30, 2009 | SPEAKER: Divesh Srivastava (Host: Chen Li) TITLE: Useful Anonymization of Associations |
| Details |
| Date and Time | Jan 30, 2009 11am | | Location | DBH 6011 | | Host | Chen Li |
| Speaker | Divesh Srivastava | | Title | Useful Anonymization of Associations | | Abstract | Private data often comes in the form of associations between entities, such as patients and their diseases, or customers and products bought from a pharmacy. These are naturally represented in the form of a large, sparse bipartite graph. Such data has many sensitive details about individuals so anonymization is required prior to attempts to make the data more widely available for scientific research.
While existing tabular and graph anonymization techniques can be applied to such bipartite graphs, they do not always give useful results; small changes to the edge structure can radically change aggregate graph properties.
We introduce a new family of anonymizations for bipartite graph data, based on grouping the entities into classes, and masking the mapping between entities and the nodes that represent them in the anonymized graph. We identify a critical safety condition of the grouping into classes to prevent inference of interactions, and show how to find such safe groupings.
Our technique permits queries over the anonymized data to be evaluated with high accuracy while guaranteeing resilience to certain types of attack. Our experiments on real data demonstrate that our anonymization technique offers strong tradeoffs between privacy and utility. |
|
Jan 23, 2009 | SPEAKER: Yun Huang TITLE: MAPGrid: Context Aware Grid Middleware for Mobile Applications |
| Details |
| Date and Time | Jan 23, 2009 3pm | | Location | DBH 3011 |
| Speaker | Yun Huang | | Title | MAPGrid: Context Aware Grid Middleware for Mobile Applications | | Abstract | In this talk, I will show how to leverage heterogeneous and intermittently available grid resources as proxies to support mobile applications; specifically, I focus on discussing our proposed resource discovery algorithms and data placement strategies. I will present two main categories of solutions: (1) on-demand techniques for an individual request and (2) aggregated approaches for large numbers of mobile requests. In the first category, knowledge of an individual client's mobility pattern, device energy profiles and grid resource availability are exploited; techniques from graph theory, neural nets, etc. are applied to select optimal localized computational and storage resources within the grid to cache requested mobile data and to provide quality-aware mobile applications (e.g. streaming multimedia). In the second category, I show the importance of applying aggregated mobile data access information in making request scheduling and data placement decisions. I will present a novel construct, Mobile Data Overlay (MDO) that captures aggregated mobile data access patterns and proxy availability information. I show how to reconfigure the MDO to determine data replication and schedule large numbers of mobile requests. I will talk about how this MDO-based approach effectively balances the tradeoff between reducing replication cost and reducing data access cost in making mobile data placement and mobile request scheduling decisions on grid proxies. | | Comments | More information of our work can be found at the MAPGrid project page: http://mapgrid.ics.uci.edu
Purpose: get your feedback about our work and explore future research directions
Estimated talk duration: 1 hour (including Q/A session) |
|
Jan 23, 2009 | SPEAKER: Johannes Gehrcke (Host: Chen Li and Sharad Mehrotra) TITLE: Declarative Processing for Computer Games |
| Details |
| Date and Time | Jan 23, 2009 11am | | Location | DBH 1600 | | Host | Chen Li and Sharad Mehrotra |
| Speaker | Johannes Gehrcke | | Title | Declarative Processing for Computer Games | | Abstract | Databases are often thought of as nothing more than solutions for persistence. However, the area of database research has developed a wide set of ideas and techniques with applicability much beyond persistence. In this talk, I will show how the idea of declarative processing from databases can be applied to computer games. I will describe our journey from declarative to imperative scripting languages for computer games, and I will introduce the state-effect pattern, a design pattern that enables game developers to design games that can be programmed imperatively, but processed declaratively. Scalable Games Language (SGL), our scripting language for games, supports this design pattern, and I will describe how database techniques can be used to process SGL resulting in performance improvements by an order of magnitude or more compared to standard scripting languages. | | Comments | I will finish with a discussion of ongoing work, future challenges for SGL, and applications of SGL to cloud computing. | | Speaker Bio | (condensed from here: http://www.cs.cornell.edu/johannes/biography.htm)
Johannes Gehrke is an Associate Professor in the Department of Computer Science at Cornell University. Johannes' research interests are in the areas of data mining, database systems, and data privacy. Johannes has received a National Science Foundation Career Award and an Arthur P. Sloan Fellowship. He co-authored the undergraduate textbook Database Management Systems (McGrawHill (2002), currently in its third edition), used at universities all over the world. Johannes is also an Adjunct Associate Professor at the University of Tromsø in Norway.
Johannes was Program co-Chair of the 2004 ACM International Conference on Knowledge Discovery and Data Mining (KDD 2004), and Program Chair of the 33rd International Conference on Very Large Data Bases (VLDB 2007). From 2007 to 2008, he was Chief Scientist at FAST, A Microsoft Subsidiary. |
|
Jan 16, 2009 | SPEAKER: Ronen Vaisenberg TITLE: Exploiting Semantics for Scheduling Data Collection from Sensors on Real-Time to Maximize Event Detection |
| Details |
| Date and Time | Jan 16, 2009 3pm | | Location | DBH 3011 |
| Speaker | Ronen Vaisenberg | | Title | Exploiting Semantics for Scheduling Data Collection from Sensors on Real-Time to Maximize Event Detection | | Abstract | A distributed camera network allows for many compelling applications such as large-scale tracking or event detection. In most practical systems, resources are constrained. Although one would like to probe every camera at every time instant and store every frame, this is simply not feasible. Constraints arise from network bandwidth restrictions, I/O and disk usage from writing images, and CPU usage needed to extract features from the images. Assume that, due to resource constraints, only a subset of sensors can be probed at any given time unit. This paper examines the problem of selecting the "best" subset of sensors to probe under some user-specified objective - e.g., detecting as much motion as possible. With this objective, we would like to probe a camera when we expect motion, but would not like to waste resources on a non-active camera. The main idea behind our approach is the use of sensor semantics to guide the scheduling of resources. We learn a dynamic probabilistic model of motion correlations between cameras, and use the model to guide resource allocation for our sensor network. Although previous work has leveraged probabilistic models for sensor-scheduling, our work is distinct in its focus on real-time building-monitoring using a camera network. We validate our approach on a sensor network of a dozen cameras spread throughout a university building, recording measurements of unscripted human activity over a two week period. We automatically learnt a semantic model of typical behaviors, and show that one can significantly improve efficiency of resource allocation by exploiting this model. | | Comments | Paper can be downloaded here (to appear in MMCN'09, San Jose, CA): http://www.ics.uci.edu/~ronen/Site/Research_files/semanticTracking.pdf
Purpose: practice talk, get your feedback about the presentation and subject.
Estimated talk duration: 30 min (will stick to that, as this is the limit imposed at the conference) |
|
Oct 31, 2008 | SPEAKER: Umeshwar Dayal (Host: Chen Li) TITLE: Automating Workload Management for Enterprise-Scale Business Thumbnail |
| Details |
| Date and Time | Oct 31, 2008 11am | | Location | DBH 6011 | | Host | Chen Li |
| Speaker | Umeshwar Dayal | | Title | Automating Workload Management for Enterprise-Scale Business Thumbnail | | Abstract | Enterprises rely on business intelligence technologies (data integration, data warehousing, data mining, and analytics) to gain an understanding of how their business is performing. As the industry evolves towards the ideal of the "real-time enterprise", the traditional off-line data warehousing, analysis, and reporting that characterize BI today are no longer adequate: users now demand up-to-date, high quality information to enable more timely decision making. Real-time BI workloads pose stringent performance requirements against enterprise data warehouses, and are notoriously difficult to manage, particularly since BI queries exhibit a huge variance in response times, ranging from fractions of a second to several hours. It is not well understood how effective existing database workload management policies are in the face of such complex workloads. Factors such as inaccurate cardinality estimates, data skew, and resource contention all make it difficult to predict how queries will behave. Experience has shown that a few "problem" queries can have drastic effects on system performance. Our goal is to automate the adaptive tuning and management of complex, mixed workloads on enterprise-scale data warehouses. There are many challenges in doing this. The first challenge is to estimate accurately how long a query will take, and what resources it will consume. Second, we must have effective strategies for admission control (which queries should be allowed to run), scheduling (which queries to run and when), and execution control (what to do when a problem query is detected). Third, we must be able to evaluate the performance of these strategies under different conditions, so we can design a system that automatically and dynamically selects the best policies to use. This talk will describe the approaches we are taking at HP Labs to address these challenges, and promising results we have obtained. We use machine learning techniques to predict query execution times and resource usage; and we are developing an experimental framework to understand the impact of existing and emerging workload management policies under different conditions. | | Comments | Slides: http://www.ics.uci.edu/computerscience/bin/pdf/seminarseries2k8/DayalUmesh.pdf |
|