**Lecture: **

Pdf from cran r "recommenderlab" package. This gives an excellent overview of basic collaborative filtering and has good references for more details.

recommenderlab.pdf

From Jure Leskovec Stanford 246 Mining Massive Data Sets

http://www.stanford.edu/class/cs246/cs246-11-mmds/handouts.html

06-dim_redSVD&CUR.pdf

Recommender Systems cont SVD & Netflix Challenge

**Homework: **

ML202Homework01

** **

**References: **

####

From Text: Mining Massive Data Sets by Anand Rojaraman & Jeffrey Ullman

Chapter 9 SVD

http://www.netlib.org/lapack/lug/node19.html

Microsoft Technical report MSR-TR-98-12 (comparison of various recommender system approaches):

tr-98-12.pdf

The Microsoft dataset (used in the paper cited just above) can be found in the following folder:

MSWebHitData

In that folder you have several choices, depending on how you like your data. The raw data directly from the microsoft site are included in the files RawTest and RawData . Information on what's included in the data set is in the file titled info . I wrote a short python program __init__.py to take the "rawTest" data set and convert it to a familiar matrix form where each of the 5000 rows corresponds to a user and each column corresponds to a particular web page on the microsoft site. You'll have to make obvious changes to pathnames to suit you environment.

The resulting matrix is MSWebMat . This matrix has 5000 rows corresponding to different website visitors and 294 columns corresponding to different pages on the MS site. The matrix has a 1 in the i,j th position if the ith user visited the jth page and a zero otherwise. The file pageList gives the microsoft 4 digit page number (think of these a the column headings). This will allow you to compare results with the original data. The user order is preserved from the original test file.

Here's the R-code that we showed in class. It reads in the microsoft web hit data and calculates svd. mswebsvd.R

## Comments (0)

You don't have permission to comment on this page.