• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • Whenever you search in PBworks, Dokkio Sidebar (from the makers of PBworks) will run the same search in your Drive, Dropbox, OneDrive, Gmail, and Slack. Now you can find what you're looking for wherever it lives. Try Dokkio Sidebar for free.



Page history last edited by mike@mbowles.com 11 years, 6 months ago


Pdf from cran r "recommenderlab" package.  This gives an excellent overview of basic collaborative filtering and has good references for more details.



Here's a file with the code examples from the recommenderlab package:



Here are slides for the first two lectures:




From Jure Leskovec Stanford 246 Mining Massive Data Sets



Recommender Systems  cont SVD & Netflix Challenge







From Text:  Mining Massive Data Sets by Anand Rojaraman & Jeffrey Ullman


Chapter 9   SVD




Microsoft Technical report MSR-TR-98-12 (comparison of various recommender system approaches):



The Microsoft dataset (used in the paper cited just above) can be found in the following folder: 


     In that folder you have several choices, depending on how you like your data.  The raw data directly from the microsoft site are included in the files RawTest and RawData .  Information on what's included in the data set is in the file titled info .  I wrote a short python program __init__.py to take the "rawTest" data set and convert it to a familiar matrix form where each of the 5000 rows corresponds to a user and each column corresponds to a particular web page on the microsoft site.  You'll have to make obvious changes to pathnames to suit you environment. 


The resulting matrix is MSWebMat .  This matrix has 5000 rows corresponding to different website visitors and 294 columns corresponding to different pages on the MS site.  The matrix has a 1 in the i,j th position if the ith user visited the jth page and a zero otherwise.  The file pageList gives the microsoft 4 digit page number (think of these a the column headings).  This will allow you to compare results with the original data.  The user order is preserved from the original test file. 


Here's the R-code that we showed in class.  It reads in the microsoft web hit data and calculates svd.  mswebsvd.R


Recorded Lectures:

Lecture 1:


Part 1: https://datamining.webex.com/datamining/ldr.php?AT=pb&SP=MC&rID=94590527&rKey=c6dfe1d1d966e88d


Part 2: https://datamining.webex.com/datamining/ldr.php?AT=pb&SP=MC&rID=94590537&rKey=56cc3304b0dfcd9f


Lecture 2:






Comments (0)

You don't have permission to comment on this page.