Release 0.10

The Mercurial changelog and the list of closed tickets provide more information on what has happened, including bugs that have been fixed.

Evaluation

  • Rewrote evaluator configuration to use Groovy rather than XML+JavaScript. The result is much more readable & easier to extend. See the manual entry for documentation of the evaluator. This also allows crossfolding to be done and written to disk independent of actual train-test evaluation.
  • Enhance train-test predict metrics to support per-user results (written to a separate table), and to have knowledge of starting and stopping the evaluation (to support additional accumulation of results that doesn't fit in the per-user or per-run paradigm).
  • Allow demo & smoke-test to use a local copy of the ML-100K data set.
  • Evaluator inputs & outputs transparently support GZip compression if the file names end in ".gz".
  • Reworked evaluation command line arguments for new Groovy-based support. Evaluation now takes a single file with the -f option (defaults to eval.groovy), and the remaining command line arguments are task names.
  • Metrics for the train-test code now take test users, in the form of TestUser, rather than ratings & predictions. This allows them to measure recommendations, access training history, and gives us flexibility to let them do even more in the future.

Algorithms

  • Made item-item CF use ItemVector rather than just ImmutableSparseVector for representing item vectors and querying for their similarities.
  • Defined the direction similarities are used in the item-item recommender and made it consistent and correct in the face of asymmetric similarity functions such as conditional probability (#131). The similarity functions are now better-documented.
  • Support unlimited neighborhood sizes (but not yet model sizes) in item-item recommender.
  • Made ItemItemModel an interface, so alternative sources of neighborhoods with similarity scores can be used. The default implementation uses a similarity matrix as before.
  • Added global recommenders and scorers that compute for items with respect to other items but independent of particular users, useful for creating “more like this” or “related items” views (#125).
    • General API — see the GlobalItemScorer and GlobalItemRecommender classes.
    • Implementation of global recommenders for item-item CF.

Cleanup and Refactoring

  • Moved predict evaluators to o.g.l.eval.metrics.predict and renamed the base interface to PredictEvalMetric.
  • Refactored cursor implementations:
    • Renamed AbstractRatingCursor to AbstractEventCursor and made it handle any event.
    • Made AbstractPollingCursor support fast polling, and make AbstractEventCursor extend it.
    • Incompatible change: renamed ScannerRatingCursor to DelimitedTextRatingCursor, and made it use buffered readers (via DelimitedTextCursor) rather than scanners. Only affects code that uses the scanner rating cursor directly.
    • Incompatible change: removed support for URL-backed streams from SimpleFileRatingDAO. If support for generic streams are needed, we can re-add this with InputSuppliers from Guava.
  • Replaced TaskTimer by commons-lang3 StopWatch. Any code using TaskTimer will need to be updated.
  • Removed now-unnecessary data tree code.