Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Pages

Posts

Future Blog Post

less than 1 minute read

Published:

This post will show up by default. To disable scheduling of future posts, edit config.yml and set future: false.

Blog Post number 4

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 3

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 2

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 1

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

misc

Hiking

Does any one enjoy hiking?

projects

Process Mining

Analyze the real world process management log and apply process mining algorithm to detect the real patterns of the workflow. Lots of exceptions and bottlenecks are found on monthly basis, team basis and process basis respectively. More analysis on the possible reasons and the suggested improvements are proposed and confirmed by enterprise data specialist.

Skills: ProM, Python

Tracking Wellness via Android Phones

An Android application is developed and was used to collect users data including their movement at indoor and outdoor, the phone movement, de-identified messages and emails, phone calls. All the data is encrypted and stored on secure server. Deep learning algorithm is applied to the data and user depression level is estimated based user’s historical data. Possible feedbacks and suggestions on users behavior are sent to the user if the depression risk is detected.

Achievements: four phrases of pilot studies were conducted among a wide range of students at University of Delaware. The feedbacks from the users show that the application was very easy to use and the functionality is very useful.

Skills: Android (Passive Event Listener), Postgres, PHP

Enable Rich Text Search over Complex Relationships

MongoDB and Solr are integrated in order to deliver an infrastructure which enables real time insert, update, delete as well as rich search features.

Skills: MEAN web service stack, Mongo-Solr Sync

Contextual Suggestion

User profiling is essential in contextual suggestion. However, given most users’ observed behaviors are sparse and their preferences are latent in an IR system, constructing accurate user profiles is generally difficult. We focus on location-based contextual suggestion and propose to leverage users’ opinions to construct the profiles and thus significantly improve the system over category or description based user profile modeling approaches.

Achievements:

  • Top 3 Performance on Contextual Suggestion track of the Text REtrieval Conference(TREC), 2015
  • Top 1 Performance on Contextual Suggestion track of the Text REtrieval Conference(TREC), 2014
  • Top 1 Performance on Contextual Suggestion track of the Text REtrieval Conference(TREC), 2013
  • Top 3 Performance on Contextual Suggestion track of the Text REtrieval Conference(TREC), 2012

Skills: LAMP Stack, Standford NLP Packages, sciklearn

Axoimatic Ranking Models for Web Information Retrieval

Axiomatic approach basically searches for the retrieval functions that satisfy some reasonable retrieval constraints. We can also incorporate it with the semantic term matching method which does the query expansion by choosing the semantically related terms. We tested the effectiveness of combining the two methods and applied it to web search. The results turned out to be promising.

Achievements:

  • Top 1 Performance on Web track of the Text REtrieval Conference(TREC), 2014
  • Top 2 Performance on Web track of the Text REtrieval Conference(TREC), 2013

Skills: Indri, Python, Clueweb

Bag-of-Terms IR Ranking Model Performance Bound Analysis

Most traditional retrieval models are based on bag-of-term representations, and they model the relevance based on various collection statistics. Despite these efforts, it seems that the performance of bag-of-term based retrieval functions has reached plateau, and it becomes increasingly difficult to further improve the retrieval performance. Thus, one important research question is whether we can provide any theoretical justifications on the empirical performance bound of basic retrieval functions.

Interactive Search over Academic IR Datasets with Automated Evaluation

Unlike existing command line based IR toolkits, RISE/VIRLab provides a more interactive tool that enables easy implementation of retrieval functions with only a few lines of codes, simplified evaluation process over multiple data sets and parameter settings and straightforward result analysis interface through operational search engines and pair-wise comparisons.

Skills: LAMP, Docker

Anserini – Enabling Lucene for Academic IR Research

Lucene has long history of industrial adoption while it was seldom used by academic community large due to the lack of documentation and code examples. We build Anserini on top of Lucene to enable: (1) Scalable, multi-threaded inverted indexing to handle modern web-scale collections, (2) Streamlined IR evaluation for ad hoc retrieval on standard test collections, and (3) Extensible architecture for multi-stage ranking. Anserini ships with support for many TREC test collections, providing a convenient way to replicate competitive baselines right out of the box.

Skills: Lucene

publications

  • Taifei Zhao, Xizheng Ke and Peilin Yang. Local-map-based candidate node-encircling pre-configuration cycles construction in survivable mesh networks. In First International Conference on Future Informa- tion Networks (JICFIN'2009), 2009. (PDF )
  • Peilin Yang, Xizheng Ke and Taifei Zhao. Study of Ultraviolet Mobile Ad Hoc Network. In Symposium on Photonics and Optoelectronics (SOPO'2009), 2009. (PDF )
  • Taifei Zhao, Xizheng Ke and Peilin Yang. Position and Velocity Aided Routing Protocol in Mobile Ad Hoc Networks. International Journal of Digital Content Technology and its Applications (JDCTA'2010), 2010. (PDF )
  • Peilin Yang and Hui Fang. An exploration of ranking-based strategy for contextual suggestion. In Proceedings of the 21st Text REtreival Conference (TREC'2012), 2013. (PDF Slides )
  • Peilin Yang and Hui Fang. Opinion-based User Profile Modeling for Contextual Suggestions. In Proceedings of the 2013 Conference on the Theory of Information Retrieval (ICTIR'2013). ACM, New York, NY, USA, Pages 18 , 4 pages. 2014. (PDF Slides )
  • Peilin Yang and Hui Fang. Opinion-based User Profile Modeling for Contextual Suggestions. In Proceedings of the 22nd Text REtreival Conference (TREC'2013), 2013. (PDF Slides )
  • Peilin Yang and Hui Fang. Evaluating the Effectiveness of Axiomatic Approaches in Web Track. In Proceedings of the 22nd Text REtreival Conference (TREC'2013), 2013. (PDF Slides )
  • Xitong Liu, Peilin Yang and Hui Fang. EntEXPO: An Interactive Search System for Entity-Bearing Queries. In Proceedings of the 36th European Conference on IR Research on Advances in Information Retrieval - Volume 8416 (ECIR'2014), Vol. 8416. Springer-Verlag New York, Inc., New York, NY, USA, 784-788. (PDF )
  • Hui Fang, Hao Wu, Peilin Yang, Chengxiang Zhai. VIRLab: A Web-based Virtual Lab for Learning and Studying Information Retrieval Models. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval (SIGIR'2014). ACM, New York, NY, USA, 1249-1250. (PDF Slides Demo )
  • Peilin Yang and Hui Fang. Exploration of Opinion-aware Approach to Contextual Suggestion. In Proceedings of the 23rd Text REtreival Conference (TREC'2014), 2014. (PDF Slides )
  • Xitong Liu, Peilin Yang and Hui Fang. Entity Came to Rescue - Leveraging Entities to Minimize Risks in Web Search. In Proceedings of the 23rd Text REtreival Conference (TREC'2014), 2014. (PDF )
  • Peilin Yang and Hui Fang. Combining Opinion Profile Modeling with Complex Context Filtering for Contextual Suggestion. In Proceedings of the 24th Text REtreival Conference (TREC'2015), 2015. (PDF Slides )
  • Peilin Yang Hongning Wang, Hui Fang, and Dend Cai. Opinions matter: a general approach to user profile modeling for contextual suggestion. Information Retrieval 18, 6 (December 2015), 586-610. (PDF )
  • Peilin Yang and Hui Fang. A Reproducibility Study of Information Retrieval Models. In Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval (ICTIR'2016). ACM, New York, NY, USA, 77-86. (PDF Slides Demo )
  • Peilin Yang and Hui Fang. Estimating Retrieval Performance Bound for Single Term Queries. In Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval (ICTIR'2016). ACM, New York, NY, USA, 237-240. (PDF Slides Demo )
  • Peilin Yang, Hui Fang and Jimmy Lin. Anserini: Enabling the Use of Lucene for Information Retrieval Research.. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'2017). ACM, New York, NY, USA, 1253-1256. (PDF Demo Code )
  • Leif Azzopardi, Matt Crane, Hui Fang, Grant Ingersoll, Jimmy Lin, Yashar Moshfeghi, Harrisen Scells, Peilin Yang, and Guido Zuccon. The Lucene for Information Access and Retrieval Research (LIARR) Workshop at SIGIR 2017. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'2017). ACM, New York, NY, USA, 1429-1430. (PDF )
  • Peilin Yang and Hui Fang. Can Short Queries Be Even Shorter?. In Proceedings of the 2017 ACM on International Conference on the Theory of Information Retrieval (ICTIR'2017). ACM, New York, NY, USA, 43-50. (PDF Slides )
  • Yue Wang, Peilin Yang, Hui Fang. Evaluating Axiomatic Retrieval Models in the Core Track. In Proceedings of the 26th Text REtreival Conference (TREC'2017), 2017. (PDF )
  • Kuang Lu, Peilin Yang, Hui Fang. Silent Day Detection in Real-Time Summarization Track. In Proceedings of the 26th Text REtreival Conference (TREC'2017), 2017. (PDF )
  • Peilin Yang and Hui Fang. Towards Privacy-Preserving Evaluation for Information Retrieval Models Over Industry Data Sets. In Proceedings of the 13th Asia Information Retrieval Societies Conference (AIRS'2017). Springer International Publishing, Jeju Island, South Korea, 210-221. (PDF Slides )
  • Peilin Yang, Srikanth Thiagarajan, and Jimmy Lin. Robust, Scalable, Real-Time Event Time Series Aggregation at Twitter. In Proceedings of the 2018 ACM SIGMOD International Conference on Management of Data (SIGMOD'2018), June 2018, Houston, Texas. (PDF Slides )
  • Peilin Yang, Hui Fang, and Jimmy Lin. 2018. Anserini: Reproducible Ranking Baselines Using Lucene. J. Data and Information Quality 10, 4, Article 16 (October 2018), 20 pages. (PDF )
  • Peilin Yang and Jimmy Lin. Anserini at TREC 2018: CENTRE, Common Core, and News Tracks. In Proceedings of the 27th Text REtreival Conference (TREC'2018), 2018. (PDF )
  • Peilin Yang, and Jimmy Lin. Reproducing and Generalizing Semantic Term Matching in Axiomatic Information Retrieval. In Proceedings of the 41th European Conference on Information Retrieval, Part I (ECIR'2019), pages 369-381, April 2019, Cologne, Germany. (PDF )
  • Wei Yang, Kuang Lu, Peilin Yang, and Jimmy Lin. Critically Examining the "Neural Hype": Weak Baselines and the Additivity of Effectiveness Gains from Neural Ranking Models.. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'2019). July 2019, Paris, France. (PDF )
  • Jimmy Lin and Peilin Yang. The Impact of Score Ties on Repeatability in Document Ranking.. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'2019). July 2019, Paris, France. (PDF )