
|
|
 |


|
Areas of Research
I am interested in IR (Information Retrieval) and distributed databases. Specifically, my research focuses on algorithms for distributed databases, Top-k query processing, and managing data in unstructured topologies. I have compiled a short list of topics I am currently researching and a short description for each of them.
Approximate Query Processing
Fortunately, it has
been observed that in most typical data analysis and data
mining applications, timeliness and interactivity are more
important considerations than accuracy - thus data
analysts are often willing to overlook small inaccuracies
in the answer provided the answer can be obtained fast
enough. This observation has been the primary driving
force behind recent development of approximate query
processing (AQP) techniques for aggregation queries in
traditional databases and decision support systems. Numerous AQP techniques have
been developed, the most popular ones based on random
sampling, where a small random sample of the rows of
the database is drawn, the query is executed on this small
sample, and the results extrapolated to the whole
database. In addition to simplicity of implementation,
random sampling has the compelling advantage that in
addition to an estimate of the aggregate, one can also
provide confidence intervals of the error with high
probability. Broadly, two types of sampling-based
approaches have been investigated: (a) Pre-computed
samples - where a random sample is pre-computed by
scanning the database, and the same sample is reused for
several queries, and (b) Online samples - where the
sample is drawn "on the fly" upon encountering a query.
|
|

|