Projects

Anserini – Enabling Lucene for Academic IR Research

Lucene has long history of industrial adoption while it was seldom used by academic community large due to the lack of documentation and code examples. We build Anserini on top of Lucene to enable: (1) Scalable, multi-threaded inverted indexing to handle modern web-scale collections, (2) Streamlined IR evaluation for ad hoc retrieval on standard test collections, and (3) Extensible architecture for multi-stage ranking. Anserini ships with support for many TREC test collections, providing a convenient way to replicate competitive baselines right out of the box.

Skills: Lucene

Interactive Search over Academic IR Datasets with Automated Evaluation

Unlike existing command line based IR toolkits, RISE/VIRLab provides a more interactive tool that enables easy implementation of retrieval functions with only a few lines of codes, simplified evaluation process over multiple data sets and parameter settings and straightforward result analysis interface through operational search engines and pair-wise comparisons.

Skills: LAMP, Docker

Bag-of-Terms IR Ranking Model Performance Bound Analysis

Most traditional retrieval models are based on bag-of-term representations, and they model the relevance based on various collection statistics. Despite these efforts, it seems that the performance of bag-of-term based retrieval functions has reached plateau, and it becomes increasingly difficult to further improve the retrieval performance. Thus, one important research question is whether we can provide any theoretical justifications on the empirical performance bound of basic retrieval functions.

Axoimatic Ranking Models for Web Information Retrieval

Axiomatic approach basically searches for the retrieval functions that satisfy some reasonable retrieval constraints. We can also incorporate it with the semantic term matching method which does the query expansion by choosing the semantically related terms. We tested the effectiveness of combining the two methods and applied it to web search. The results turned out to be promising.

Achievements:

  • Top 1 Performance on Web track of the Text REtrieval Conference(TREC), 2014
  • Top 2 Performance on Web track of the Text REtrieval Conference(TREC), 2013

Skills: Indri, Python, Clueweb

Contextual Suggestion

User profiling is essential in contextual suggestion. However, given most users’ observed behaviors are sparse and their preferences are latent in an IR system, constructing accurate user profiles is generally difficult. We focus on location-based contextual suggestion and propose to leverage users’ opinions to construct the profiles and thus significantly improve the system over category or description based user profile modeling approaches.

Achievements:

  • Top 3 Performance on Contextual Suggestion track of the Text REtrieval Conference(TREC), 2015
  • Top 1 Performance on Contextual Suggestion track of the Text REtrieval Conference(TREC), 2014
  • Top 1 Performance on Contextual Suggestion track of the Text REtrieval Conference(TREC), 2013
  • Top 3 Performance on Contextual Suggestion track of the Text REtrieval Conference(TREC), 2012

Skills: LAMP Stack, Standford NLP Packages, sciklearn

Enable Rich Text Search over Complex Relationships

MongoDB and Solr are integrated in order to deliver an infrastructure which enables real time insert, update, delete as well as rich search features.

Skills: MEAN web service stack, Mongo-Solr Sync

Tracking Wellness via Android Phones

An Android application is developed and was used to collect users data including their movement at indoor and outdoor, the phone movement, de-identified messages and emails, phone calls. All the data is encrypted and stored on secure server. Deep learning algorithm is applied to the data and user depression level is estimated based user’s historical data. Possible feedbacks and suggestions on users behavior are sent to the user if the depression risk is detected.

Achievements: four phrases of pilot studies were conducted among a wide range of students at University of Delaware. The feedbacks from the users show that the application was very easy to use and the functionality is very useful.

Skills: Android (Passive Event Listener), Postgres, PHP

Process Mining

Analyze the real world process management log and apply process mining algorithm to detect the real patterns of the workflow. Lots of exceptions and bottlenecks are found on monthly basis, team basis and process basis respectively. More analysis on the possible reasons and the suggested improvements are proposed and confirmed by enterprise data specialist.

Skills: ProM, Python