Benjamin Arai Graduate Student About Benjamin Arai
Reasearch
Teaching
Contact






Main Menu

 
View Benjamin Arai's LinkedIn profileView my profile



Areas of Research

I am interested in IR (Information Retrieval) and distributed databases. Specifically, my research focuses on algorithms for distributed databases, Top-k query processing, and managing data in unstructured topologies. I have compiled a short list of topics I am currently researching and a short description for each of them.


Approximate Query Processing

Fortunately, it has been observed that in most typical data analysis and data mining applications, timeliness and interactivity are more important considerations than accuracy - thus data analysts are often willing to overlook small inaccuracies in the answer provided the answer can be obtained fast enough. This observation has been the primary driving force behind recent development of approximate query processing (AQP) techniques for aggregation queries in traditional databases and decision support systems. Numerous AQP techniques have been developed, the most popular ones based on random sampling, where a small random sample of the rows of the database is drawn, the query is executed on this small sample, and the results extrapolated to the whole database. In addition to simplicity of implementation, random sampling has the compelling advantage that in addition to an estimate of the aggregate, one can also provide confidence intervals of the error with high probability. Broadly, two types of sampling-based approaches have been investigated: (a) Pre-computed samples - where a random sample is pre-computed by scanning the database, and the same sample is reused for several queries, and (b) Online samples - where the sample is drawn "on the fly" upon encountering a query.