• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • Dokkio Sidebar (from the makers of PBworks) is a Chrome extension that eliminates the need for endless browser tabs. You can search all your online stuff without any extra effort. And Sidebar was #1 on Product Hunt! Check out what people are saying by clicking here.



Page history last edited by mike@mbowles.com 11 years, 1 month ago


Pdf from cran r "recommenderlab" package.  This gives an excellent overview of basic collaborative filtering and has good references for more details.



Here's a file with the code examples from the recommenderlab package:



Here are slides for the first two lectures:




From Jure Leskovec Stanford 246 Mining Massive Data Sets



Recommender Systems  cont SVD & Netflix Challenge







From Text:  Mining Massive Data Sets by Anand Rojaraman & Jeffrey Ullman


Chapter 9   SVD




Microsoft Technical report MSR-TR-98-12 (comparison of various recommender system approaches):



The Microsoft dataset (used in the paper cited just above) can be found in the following folder: 


     In that folder you have several choices, depending on how you like your data.  The raw data directly from the microsoft site are included in the files RawTest and RawData .  Information on what's included in the data set is in the file titled info .  I wrote a short python program __init__.py to take the "rawTest" data set and convert it to a familiar matrix form where each of the 5000 rows corresponds to a user and each column corresponds to a particular web page on the microsoft site.  You'll have to make obvious changes to pathnames to suit you environment. 


The resulting matrix is MSWebMat .  This matrix has 5000 rows corresponding to different website visitors and 294 columns corresponding to different pages on the MS site.  The matrix has a 1 in the i,j th position if the ith user visited the jth page and a zero otherwise.  The file pageList gives the microsoft 4 digit page number (think of these a the column headings).  This will allow you to compare results with the original data.  The user order is preserved from the original test file. 


Here's the R-code that we showed in class.  It reads in the microsoft web hit data and calculates svd.  mswebsvd.R


Recorded Lectures:

Lecture 1:


Part 1: https://datamining.webex.com/datamining/ldr.php?AT=pb&SP=MC&rID=94590527&rKey=c6dfe1d1d966e88d


Part 2: https://datamining.webex.com/datamining/ldr.php?AT=pb&SP=MC&rID=94590537&rKey=56cc3304b0dfcd9f


Lecture 2:






Comments (0)

You don't have permission to comment on this page.