Release 0.10
The Mercurial changelog and the list of closed tickets provide more information on what has happened, including bugs that have been fixed.
Evaluation
- Rewrote evaluator configuration to use Groovy rather than XML+JavaScript. The result is much more readable & easier to extend. See the manual entry for documentation of the evaluator. This also allows crossfolding to be done and written to disk independent of actual train-test evaluation.
- Enhance train-test predict metrics to support per-user results (written to a separate table), and to have knowledge of starting and stopping the evaluation (to support additional accumulation of results that doesn't fit in the per-user or per-run paradigm).
- Allow demo & smoke-test to use a local copy of the ML-100K data set.
- Evaluator inputs & outputs transparently support GZip compression if the file names end in ".gz".
- Reworked evaluation command line arguments for new Groovy-based support. Evaluation now takes a single file with the -f option (defaults to eval.groovy), and the remaining command line arguments are task names.
- Metrics for the train-test code now take test users, in the form of TestUser, rather than ratings & predictions. This allows them to measure recommendations, access training history, and gives us flexibility to let them do even more in the future.
Algorithms
- Made item-item CF use ItemVector rather than just ImmutableSparseVector for representing item vectors and querying for their similarities.
- Defined the direction similarities are used in the item-item recommender and made it consistent and correct in the face of asymmetric similarity functions such as conditional probability (#131). The similarity functions are now better-documented.
- Support unlimited neighborhood sizes (but not yet model sizes) in item-item recommender.
- Made ItemItemModel an interface, so alternative sources of neighborhoods with similarity scores can be used. The default implementation uses a similarity matrix as before.
- Added global recommenders and scorers that compute for items with respect to other items but independent of particular users, useful for creating “more like this” or “related items” views (#125).
- General API — see the GlobalItemScorer and GlobalItemRecommender classes.
- Implementation of global recommenders for item-item CF.
Cleanup and Refactoring
- Moved predict evaluators to o.g.l.eval.metrics.predict and renamed the base interface to PredictEvalMetric.
- Refactored cursor implementations:
- Renamed AbstractRatingCursor to AbstractEventCursor and made it handle any event.
- Made AbstractPollingCursor support fast polling, and make AbstractEventCursor extend it.
- Incompatible change: renamed ScannerRatingCursor to DelimitedTextRatingCursor, and made it use buffered readers (via DelimitedTextCursor) rather than scanners. Only affects code that uses the scanner rating cursor directly.
- Incompatible change: removed support for URL-backed streams from SimpleFileRatingDAO. If support for generic streams are needed, we can re-add this with InputSuppliers from Guava.
- Replaced TaskTimer by commons-lang3 StopWatch. Any code using TaskTimer will need to be updated.
- Removed now-unnecessary data tree code.