Benjamin Arai Graduate Student

Benjamin T. Arai, Ph.D.

I am a Ph.D. graduate from the Marlan and Rosemary Bourns College of Engineering at the University of California, Riverside. I currently work for Microsoft in Washington. My research interests include top-k retrieval methods, peer-to-peer networks, sensor networks, stability/temporal clustering, and approximate query processing. This site contains a brief history of my academic (research topics, publications, projects, courses) and extracurricular activities (pictures, hobbies, community involvement) during my undergraduate and graduate career at UCR. If you have any questions, please feel free to contact me.


Pub (PVLDB 2010): An Access Cost Aware Approach for Object Retrieval over Multiple Sources

February 27th, 2010
Source and object selection and retrieval from large multi-source data sets are fundamental operations in many applications. In this paper, we initiate research on efficient source (e.g., database) and object selection algorithms on large multi-source data sets. Specifically, in order to acquire a specified number of satisfying objects with minimum cost over multiple databases, the query engine needs to determine the access overhead...
Read on... | [Download PDF]

Pub (SIGKDD 2009): On Burstiness-Aware Search for Document Sequences

December 31st, 2009
As the number and size of large timestamped collections (e.g. sequences of digitized newspapers, periodicals, blogs) increase, the problem of efficiently indexing and searching such data becomes more important. Term burstiness has been extensively researched as a mechanism to address event detection in the context of...
Read on... | [Download PDF]

Pub (VLDBJ 2009): Anytime measures for top-k algorithms on exact and fuzzy data sets

May 10th, 2009
Top-k queries on large multi-attribute data sets are fundamental operations in information retrieval and ranking applications. In this article, we initiate research on the anytime behavior of top-k algorithms on exact and fuzzy data. In particular, given specific top-k algorithms (TA and TA-Sorted) we are interested in studying their progress toward identification of the correct ...
Read on... | [Download full paper from SpringerLink]

New UCR Dissertation/Thesis Latex Template

June 8th, 2008
I have updated the close to decade old UCR (University of California, Riverside) thesis/dissertation Latex template to work with 2008 guidelines (these template updates are only the formatting changes I was required to make for acceptance by graduate division). Read on...

I am going to Microsoft!

June 8th, 2008
As of July 7, 2008, I will be working for Microsoft in Bellevue, Washington. I will be doing data mining stuff - i.e., knowledge discovery on very large datasets. I will be working in the Avanta Building, I think, which is just east of downtown Bellevue. We'll see how this cold weather thing works out...

I have completed my PhD!

June 7th, 2008
As of June 6, 2008 I have completed my final defense and I am now officially Dr. Benjamin Arai! After four long years, it is finally over... Actually, it was kind of fun. All the interesting problems kept me adequatly busy. It's a little wierd not having to go to school anymore. Maybe I will get an MBA?

Scalable Ingestion/Presentation for the METS/ALTO Object Model

February 13th, 2008
Craig Boucher and I have developed a highly scalable ingestion and online access system for the METS/ALTO object model with logical structure. The key to our system is a highly distributed paradigm that utilizes a minimal lock contention approach for maximum system utilization - i.e., you can have users accessing the system while simultaneously ingesting large amounts of data from several independent machines without skipping a beat. Our system is built on C#/Mono and Ruby-on-Rails. Feel free to contact me for a copy of our recent publication submission. You can view the demo of our system at [Website].

Materialized Views in PostgreSQL

January 15th, 2008
Materialized views are certainly possible in PostgreSQL. Because of PostgreSQL's powerful PL/pgSQL language, and the functional trigger system, materialized views are somewhat easy to implement. I will examine several methods of implementing materialized views in PostgreSQL.
Read on...

Pub (ICDE 2008): Region Sampling: Continuous Adaptive Sampling on Sensor Networks

October 16th, 2007
Satisfying energy constraints while meeting performance requirements is a primary concern when a sensor network is being deployed. Many recent proposed techniques offer error bounding solutions for aggregate approximation but cannot guarantee energy spending. Inversely, our goal is to bound the energy consumption while minimizing the approximation error.
Webpage | PDF | Postscript

EuroTrip 2007 (i.e., Hostel ... first 15 minutes)

September 27th, 2007
Craig Boucher and I went to Hamburg, Germany for business CCS (Content Conversion Specialists) and Amsterdam for a little fun ;). We travelled by train, car, and airplane in search of our preconceived notions of Europe. Regretfully, I never found the picturesque view of a European castle and Craig did not find the the 7 foot German woman he was in search of. We took tons of pictures cataloging our excursion, feel free to check them out.

Pub (ICDM 2007): Efficient Data Sampling in Heterogeneous Peer-to-Peer Networks

July 31st, 2007
Performing data-mining tasks such as clustering, classification, and prediction on large datasets is an arduous task and, many times, it is an infeasible task given current hardware limitations. The distributed nature of peer-to-peer databases further complicates this issue by introducing an access overhead cost in addition to the cost of sending singular tuples over the network...
Webpage | PDF | Postscript

Pub (VLDB 2007): Anytime Measures for Top-k Algorithms

July 10th, 2007
Top-k queries on large multi-attribute data sets are fundamental operations in information retrieval and ranking applications. In this paper, we initiate research on the anytime behavior of top-k algorithms. In particular, given specific top-k algorithms (TA and TA-Sorted) we are interested in studying their progress toward identification of the correct result at any point during the algorithms' execution...
Webpage | PDF | Postscript

Microsoft Research (MSR) Internship, Summer 2007

June 26th, 2007
I am interning with Microsoft Research (MSR) this summer researching algorithms/techniques for reducing the number of unwanted software (malware) on Microsoft Windows Defender enabled systems.

Pub (SSDBM 2007): Reliable Hierarchical Data Storage in Sensor Networks

March 23rd, 2007
When deploying a sensor network, a major concern is the ability to provide reliable in-network storage while balancing the energy consumption of individual sensors. The main concern with data-centric storage in sensor networks is the ability to provide reliable, load-balanced storage. Energy and wireless range constraints make centralized approaches for storage impractical, and in-network, data-centric solutions can...
Webpage | PDF | Postscript | (Presentation Slides: Powerpoint | PDF)

CBSR Newspaper Search/Warehousing Project!

February 12th, 2007
My new pet project (I have been working on it since before spring 2006) CBSR has offered some of the most challenging data warehousing problems I have had a chance to deal with. The CBSR newspaper dataset is proving to be one of the most challenging datasets to deal with due to its dirty nature (OCR'd data & hand modified XML files) and ever an growing size (12+ TB). With that said, the CBSR project is turning into one of the most exciting projects that I have been a part of. Try the demo application!

I am the teaching assistant for CS166, Winter 2007

January 17th, 2007
I am the TA for CS166 for the Winter 2006 quarter. The Professor is Vassilis J. Tsotras and my lab is from 11:00am to 2:00pm on Mondays. I can be reached by email or you can schedule an appointment. My office is in the Database Lab building EBU2, room 363.

I am the teaching assistant for CS166, Fall 2006

October 1st, 2006
I am the TA for CS166 for the Fall 2006 quarter. The Professor is Vassilis J. Tsotras and my lab is from 6:00pm to 9:00pm on Mondays. I can be reached by email or you can schedule an appointment. My office is in the Database Lab building EBU2, room 363.

Upgraded InstantRails to PHP 5.1.4

June 11th, 2006
The current version of InstantRails only support PHP version 4.  I have packaged up an upgraded version that supports the current version of PHP (5.1.4).  The package should work the same as advertised on the InstantRails website.  You can download the updated package here.

Solaris Version of SCREEN

June 7th, 2006
I have working on Solaris 10 for the past couple of weeks and I noticed that the most recent version of SCREEN does not compile on Solaris 10, so I made a couple of tweaks to get the latest version to compile.  You can download it here.

I am the TA for CS183, Spring 2006

May 4th, 2006
I am the TA for the CS183 this quarter, the labs are going to be focusing on team based projects.  My hope is to have each team develop something by the end of the quarter that benefits either the department or the Systems community.  To see some of the past project just "Benjamin Arai CS 183" and a couple of the past projects should show up.

ICDE 2006 Research Session Presentation Slides

April 5th, 2006
I have uploaded the slides presented at ICDE 2006 in Atlanta, Georgia. The presentation has been formatted in PDF and PDF (6 slides per page). The corresponding paper is available in the publications section of my website.
PowerPoint | PDF | PDF (6 Slides Per Page)

Technical Report: DWT Design Exploration via ROCCC

March 28th, 2006
In this paper we evaluate the speedup that may be achieved by profiling an existing software implementation of the JPEG2000 still image compression algorithm and simulating replacements of terminal code in high frequency use source code execution paths with FPGA microprocessor implementations generated by ROCCC (Riverside Optimizing Compiler for Configurable Computing).
Webpage | Portable Document Format | Postscript

Assignment #2 has been posted

March 8th, 2006
In this assignment you will implement an incremental algorithm for computing convex hull. You may use the code from the previous assignment as a base.

News items that I have dugg on

March 6th, 2006
I have added a webpage that list the recent news items that I have dugg from Digg is a technology news website that combines social bookmarking, blogging, RSS, and non-hierarchical editorial control.

Syllabus & Assignment #1

February 16th, 2006
A syllabus has been added for CS133 and the first assignment. If you have any questions please send an email to the mailing list

New web page: Bookmarks!

February 12th, 2006
The Bookmarks page contains a list of links that I use personally and hopefully somebody might think is useful. I was generating the page using a RSS feed from but their RSS doesn't have any options for how many items to send, so I created my own online system using Postgresql.

I am the TA for CS133: Computational Geometry, Winter 2006

January 9th, 2006
I am the TA for CS133 for the Winter 2006 quarter. The Professor is Dimitrios Gunopulos. I can be reached by email or you can schedule an appointment. My office is in the DBLab building EBU2, room 363.

New Database Lab Website

December 18th, 2005
I created a new website using Ruby on Rails for the UCR Database Lab. Rails is definetly a lot easier for webpage development. The only issue with the language is that I could not find any information for some of the more complicated issues.

Pub (ICDE 2006): Approximating Aggregations in Peer-to-Peer Databases

November 12th, 2005
Peer-to-peer databases are becoming prevalent on the Internet for distribution and sharing of documents, applications, and other digital media. The problem of answering large scale, ad-hoc analysis queries e.g., aggregation queries on these databases poses unique challenges...
Webpage | Portable Document Format | Postscript | (ICDE 2006) Accepted Papers

I am the TA for CS179G: Database Design Project, Fall 2005

October 3rd, 2005
I am the TA for CS179G for the Fall 2005 quarter. The Professor is Dimitrios Gunopulos and my lab is from 6:00pm to 9:00pm on Thursdays. I can be reached by email or you can schedule an appointment. My office is in the DBLab building EBU2, room 363.

Link Reorganization and A New "Publications" Section

October 3rd, 2005
Some of the links have been reorganized as follows: Research has been renamed to Projects. Research now contains a list of my areas of interest and current research topics. In addition, a Publications section has been added.

Dreamhost Promotions ($97.00 off)

August 30th, 2005
Below, you will find generous coupons that will put a significant amount of money back in your pocket when you choose Dreamhost to host your domain. Depending on your mood, you might like to give me a few bucks for sending them your way. Choose your money-saving method below. Promotion code "PROMOBA001".
Promotion Page

CS231 Final Project: Modeling the Human Spine

June 12th, 2005
Our new approach to human spine articulation presents a novel method to human spine articulation defining both high and low-level constraints based upon the human spine dynamics congruent to known human movement, region dissipation , and transfer rates.
Presentation Page | Portable Document Format | Power Point Presentation

CS302 Education Paper, Spring 2005

May 28th, 2005
In this report, we will discuss various strengths and weaknesses of the CS department and present suggestions for possible improvements. We present real world experiments to test and validate our conclusions.
Presentation Page | Microsoft Document | Portable Document Format

New presentation, CS231 Animation

May 3rd, 2005
Artists study anatomy to understand the relationship between exterior form and the structures responsible for creating it. In this paper they follow a similar approach in developing anatomy-based models of muscles.
Presentation Page

New UCR CS Website

March 31st, 2005
Conley and I just released the new version of the University of California, Riverside department of computer science website.

I am a TA for UNIX System Administration, Winter 2005

March 26th, 2005
I am a TA for CS183 for the Spring 2005 quarter. The lecturer is Victor Hill and my lab is from 6:00pm to 9:00pm on Thursdays. Go check TA Classes for updates.

Spring Break 2005, New Mexico

March 25th, 2005
Conley Read and I went to New Mexico over spring break visting Chris Baron at (LANL) Los Alamos National Laboratory as well as various other national monuments and landmarks. We covered almost 1500 miles circling the entire state. Pictures of the trip can be viewed at my gallery.

Transition from Proprietary to Open Source

March 4th, 2005
Software development is a fast and volatile industry focusing on innovation and change. In the past decade software development has forked into two main paradigms. The first is the traditional path of the standard proprietary development methods grown out of many mature industries. The second is the open source development paradigm. This method of development has been growing rapidly in the past decade. Both methods of development have managed to achieve great accomplishments through hard work and innovation. These two paradigms though different, have achieved many of the same goals.
Microsoft Document | Portable Document Format | Powerpoint Presentation

Analysis of No Child Left Behind

March 4th, 2005
Since being signed into law by President Bush in early 2002, No Child Left Behind (NCLB) has been a source of controversy and debate. NCLB's strict testing standards are viewed by some as a refreshing step in the right direction. Others, however, view the standards and harsh penalties as too strict. In this paper we briefly outline the "four pillars" which make up NCLB. In addition, we look at common arguments made for and against NCLB.
Microsoft Document | Powerpoint Presentation

My cat and families Israel trip

February 13th, 2005
I have added the pictures from my families latest trip to Israel and I added a section for my cat Bosco.

Website moved and converted to PostgreSQL

January 26th, 2005
Converted the entire site from a file based to database driven website and moved the website to a new server. Some of the links have changed but they are all good in the website.

Added a changelogs section.

August 30th, 2004
I have added a changelog section to my website that shows the changes to the various projections I am working on. These are not research project changlogs but only projects that are currently being used or are going to be used in a production environment.

Special office hours for finals week.

June 2nd, 2004
Since there is going to be a second project for CS183, I am going to have additional office hours after my regualar office hours tomorrow and next week. For tomorrow I will have office hours from 6-7pm and on Monday from 5-6pm. So, if you need help with the project or have any question during these times, then look for me in the TA room.

Increasing Smarty Cached Page Performance

May 29th, 2004
I have been having a problem getting Smarty pages to generate quickly for semi static content such as news tickers. I have found that flat-file databases work very well for this by creating a faster method for pulling large amounts of serialized data instead of rendering/parsing all of the content from a flat PHP or other content file(s). I have found that the best implementations of this is SQLITE and DBA. SQLite offers a DBMS utility using flat-files and SQL-like queries. This offers a transparent solution for database driven sites and scalable programming for smaller or other growing web applications.

Lab 7 has been posted.

May 20th, 2004
Lab 7 is Apache with some other cool stuff to make it more useful and faster. I have also included a section for installing the IonCube accelerator.
Portable Document Format | Postscript

I have posted the PostgreSQL lab.

May 13th, 2004
I have posted the PostgreSQL lab under on CS 183 webpage. I only created the lab instructions in Microsoft Word document format only. CS183 Website

I added the notes I have for PostgreSQL "Whitepapers".

May 11th, 2004
I have uploaded my notes from class on PostgreSQL. They are currently only in Microsoft Word fromat but I hope to get other formats added shortly. Whitepapers

I added a new section called "Software".

May 10th, 2004
I decided to start posting some of the software I write. Software

I have some pictures for the Sierra Opening Day 2004

April 26th, 2004
I have added most of the pictures I took during opening day. Most of the the pictures are from around Mammoth. Gallery

Crap, I accidentally deleted the current version of my site.

April 26th, 2004
Use "rm -rf" wisely. The site is missing almost all of the data from the past month. I will add back what I can but I am going to let most if it go for the time being.

Office Hours Notes for (April 3, 2004)

April 4th, 2004
I posted most of the stuff talked about in office hours. (shmux, SSH keys, SCREEN) CS183 Website

Finished Lab1 for CS183, Notes...

March 31st, 2004
I finished lab one (Fedora Core 1 Installation). No problems, I posted my notes from my installation of Fedora Core 1 using Victor's instructions. CS183 Website

About CS183 lab for Thursdays 11am-2pm

March 31st, 2004
I am going to announce any changes or hints for the lab on my webpage, so check the site frequently (so I get more hits on the site). I am also going to try to add any additional material I have for the labs to my website, so there will be more then one reference to the material for the upcoming labs.

My new site is up and officially running!

March 31st, 2004
Well, this is it. I hope people can navigate it easily. If you have any questions or comments, feel free to email me at

Internship with Google, Summer 2006

It has been set in stone, I will be interning with Santa Monica this summer (June 20th, 2006 to October 20th, 2006).  It looks like I will be working with Dan Kegel.  I will be working on the open-source project Wine (an Open Source implementation of the Windows API on top of X and Unix).  I have also created a webpage dedicated to my experiences at .

