| 
  • If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

View
 

RecommenderLinks

This version was saved 12 years, 8 months ago View current version     Page history
Saved by mike@mbowles.com
on August 24, 2011 at 8:14:48 am
 

Lecture:  

Pdf from cran r "recommenderlab" package.  This gives an excellent overview of basic collaborative filtering and has good references for more details.

recommenderlab.pdf

 

 

From Jure Leskovec Stanford 246 Mining Massive Data Sets

http://www.stanford.edu/class/cs246/cs246-11-mmds/handouts.html

06-dim_redSVD&CUR.pdf

Recommender Systems  cont SVD & Netflix Challenge

 

Homework: 

ML202Homework01

 

References: 

 

From Text:  Mining Massive Data Sets by Anand Rojaraman & Jeffrey Ullman

 

Chapter 9   SVD

 

http://www.netlib.org/lapack/lug/node19.html

 

Microsoft Technical report MSR-TR-98-12 (comparison of various recommender system approaches):

     tr-98-12.pdf

 

The Microsoft dataset (used in the paper cited just above) can be found in the following folder: 

     MSWebHitData

     In that folder you have several choices, depending on how you like your data.  The raw data directly from the microsoft site are included in the files RawTest and RawData .  Information on what's included in the data set is in the file titled info .  I wrote a short python program __init__.py to take the "rawTest" data set and convert it to a familiar matrix form where each of the 5000 rows corresponds to a user and each column corresponds to a particular web page on the microsoft site.  You'll have to make obvious changes to pathnames to suit you environment. 

 

The resulting matrix is MSWebMat .  This matrix has 5000 rows corresponding to different website visitors and 294 columns corresponding to different pages on the MS site.  The matrix has a 1 in the i,j th position if the ith user visited the jth page and a zero otherwise.  The file pageList gives the microsoft 4 digit page number (think of these a the column headings).  This will allow you to compare results with the original data.  The user order is preserved from the original test file. 

 

Here's the R-code that we showed in class.  It reads in the microsoft web hit data and calculates svd.  mswebsvd.R

Comments (0)

You don't have permission to comment on this page.