Differences

This shows you the differences between two versions of the page.

--- data_analysis:recommendation_bpr [2020/04/28 23:11] – prgram
+++ data_analysis:recommendation_bpr [2025/07/07 14:12] (current) – external edit 127.0.0.1
@@ Line 1: / Line 1: @@
+<wrap hide>
 <style>
   .slide {
@@ Line 6: / Line 8: @@
 <!-- $theme: default -->
+</wrap>
+====== Review on BPR: Bayesian Personalized Ranking from Implicit Feedback, Rendle et al. ======
-# Review on
-> ### BPR: Bayesian Personalized Ranking from Implicit Feedback, Rendle et al.
 ---
-# Recommendation of item
+====# Recommendation of item====
 ### Explicit Feedback : user's preference of item itself
 > eg. '1-5 score' of movies, 'like/dislike' in video streaming service
@@ Line 19: / Line 22: @@
 - non-observed user-item pairs = real negative feedback + missing value
 ---
-# BPR : Bayesian Personalized Ranking from Implicit Feedback
+====# BPR : Bayesian Personalized Ranking from Implicit Feedback====
  Algorithm = model + **objective function + optimization**
@@ Line 29: / Line 32: @@
 ---
-# Formalization
+====# Formalization====
 * $U$ : set of all users, $I$ : set of all items
 * $S \subseteq U \times I$ : implicit feedback
 * $>_u \subset I^2$ : personalied total ranking of all items
-   >  properties of total order:  $\forall i,j,k \in I$
+>  properties of total order:  $\forall i,j,k \in I$
-	  (totality) : $\text{if } i \neq j \text{, then } i >_u j \text{ or } j >_u i$
+> (totality) : $\text{if } i \neq j \text{, then } i >_u j \text{ or } j >_u i$
-	  (antisymmetry) : $\text{if } i>_uj \text{ and } j>_ui \text{, then } i = j$
+> (antisymmetry) : $\text{if } i>_uj \text{ and } j>_ui \text{, then } i = j$
-	  (transitivity) : $\text{if } i>_uj \text{ and } j>_uk \text{, then } i>_u k$
+> (transitivity) : $\text{if } i>_uj \text{ and } j>_uk \text{, then } i>_u k$
 * $I_u^+:=\{i \in I : (u,i) \in S\}$, $U_i^+:=\{u \in U : (u,i) \in S\}$
 ---
-# Approach
+====# Approach====
 * use item pairs as training data
@@ Line 55: / Line 58: @@
 ---
-# BPR optimization criterion (1/3)
+====# BPR optimization criterion (1/3)====
 Bayesian formulation : maximize posterior probability
 > $$ P(\Theta|>_u) \propto p(>_u|\Theta)p(\Theta) $$
@@ Line 61: / Line 64: @@
 ---
-# BPR optimization criterion (2/3)
+====# BPR optimization criterion (2/3)====
 set $X_{uij} = I(i>_uj)$,
 then $X_{uij} \sim Bernoulli(p_{uij})$ where $p_{uij} = p(i>_uj)$
@@ Line 69: / Line 72: @@
 ---
-# BPR optimization criterion (3/3)
+====# BPR optimization criterion (3/3)====
 with setting $p(i>_uj|\Theta) := \sigma(\hat{x}_{uij}(\Theta))$, where $\sigma (x) := {1 \over 1+ \exp(-x)}$
 and prior $p(\Theta)$ as normal distribution $N(0,\lambda_{\Theta} ^{-1}I)$
@@ Line 84: / Line 87: @@
 > where $\lambda_{\Theta}$ are model specific regularization parameters.
-※ form of MLE with $l_2$-regularization
+â€» form of MLE with $l_2$-regularization
 ---
-# Relation to AUC
+====# Relation to AUC====
 > $$ \begin{matrix}
 AUC &=& {1 \over|U|}\sum_{u\in U} AUC(u) \\
@@ Line 99: / Line 102: @@
 ---
-# Optimization
+====# Optimization====
 ### gradient descent
 > $$ {\partial \text{BPR-OPT} \over \partial \Theta} = \sum_{(u,i,j)\in D_s} {-\exp(-\hat{x}_{uij}) \over 1+\exp(-\hat{x}_{uij})} \cdot {\partial \over \partial\Theta}\hat{x}_{uij} - \lambda_\Theta \Theta$$
@@ Line 114: / Line 117: @@
 ---
-# Model
+====# Model====
 Many model predict a real number $\hat{x}_{ul}$ per user-item pair $(u,l)$
@@ Line 121: / Line 124: @@
 > $$ \hat{x}_{uij} := \hat{x}_{ui}-\hat{x}_{uj} $$
-※ use same model, but optimize it against another criterion.
+â€» use same model, but optimize it against another criterion.
 (difference of two prediction instead of single number)
 ---
-# Model : Matrix Factorization
+====# Model : Matrix Factorization====
 $X = [x_{ui}]$ is approximated by two low-rank matrices $W:|U| \times k$ and $H:|I|\times k$
 > $$ \hat{X} := WH^t $$
@@ Line 145: / Line 148: @@
 \end{cases}$$
-$\lambda_W$ $\lambda_{H^+}$ $\lambda_{H^-}$
+$\lambda_W$ $\lambda_{H^+}$ $\lambda_{H^-}$ : regularization for each $\Theta$
 ---
-# Model : Adaptive k-Nearest-Neighbor
+====# Model : Adaptive k-Nearest-Neighbor====
 > $$ \hat{x}_{ui} = \sum_{l \in I_u^+\text{ and }l \neq i} c_{il} $$
 > where $C:I \times I$ is the symmetric item-correlation/similarity matrix.
@@ Line 161: / Line 164: @@
 \end{cases}$$
-$\lambda_{+}$  $\lambda_{-}$
+$\lambda_{+}$  $\lambda_{-}$ : regularization for each $\Theta$
 ---
-# Relation to other methods
+====# Relation to other methods ====
 ### Weighted Regularized MF (WR-MF)
 implicit
 > $$ \sum_{u\in U} \sum_{i\in I} c_{ui}(\langle w_u, h_i \rangle - 1)^2 + \lambda\lVert W \rVert _f^2 + \lambda\lVert H \rVert _f^2  $$
 . this optimization is on instance level (one item) instead of pair level (two items) as BPR.
-. this optimization is a least square which is known to correspond to the MLE for normally distributed random variables.
+. this optimization is a least square - correspond to the MLE for normally distributed random variables.
-However, the task of item prediction is actually not a regression(quantitative), but a classification (qualitative) one, so the logistic optimization is more appropriate.
+	- But item prediction is actually not a regression(quantitative),
+but a classification (qualitative) one
+	- so the logistic optimization is more appropriate.
 ---
@@ Line 178: / Line 183: @@
 explicit
 > $$ \sum_{(u,i,j) \in D_s} max(1- \langle w_u,h_i-h_j \rangle ) + + \lambda_w \lVert W \rVert _f^2 + \lambda_h \lVert H \rVert _f^2  $$
-. the error functions differ – loss in BPR is smooth and motivated by the MLE.
+. the error functions differ - loss in BPR is smooth and motivated by the MLE.
 . BPR-Opt criterion is generic and can be applied to several models, whereas their method is specific for MF.
-. MMMF is designed to work with sparse explicit data, i.e. they assume that there are many missing values and thus they assume to have much less pairs than in an implicit setting.
+. MMMF is designed to work with sparse explicit data
+i.e. they assume that there are many missing values and thus they assume to have much less pairs than in an implicit setting.
 ---
-# Evaluation
+====# Evaluation====
 leave one out (remove one entry from $I_u^+$ per user $u$)
@@ Line 191: / Line 197: @@
 higher value of AUC indicates a better quality.
+---
+##### \<Data\> MovieLens100K ( user 943 $\times$ item 1,682, 100K ratings )
+![](img3.png)![](img_netflix3.png)
+{{:data_analysis:pasted:20200429-203305.png}}
+{{:data_analysis:pasted:20200429-203230.png}}
+###### remove rating scores, repeated 5 times, $\lambda = 0.0025$, $\alpha = 0.05$
+---
+# END
+==== Code ====
+<code>
+#install.packages("rrecsys", force=TRUE, dependencies=TRUE)
+test_index<- function(data, na=TRUE) {
+  index_test<-c()
+  for (i in 1:dim(data)[1]) {
+    if(na) indeX_i <- which(!is.na(data[i,]))
+    else index_i<-which(data[i,]==0)
+    index_test[i]<-index_i[sample(1:length(index_i),1)]
+  }
+  test_index <- index_test
+}
+train <- function (data, index_test, na=TRUE) {
+  train_data<-data
+  for (i in 1:dim(data)[1]) {
+    train_data[i,index_test[i]]<-NA
+  }
+  train_data[is.na(train_data)]<-0
+  train<-train_data
+}
+calc_AUC <- function(data, pred, index_test, na=TRUE) {
+  AUC_u <- 0
+  for (i in 1:dim(data)[1]) {
+    i=1
+    if(na) index_j<-which(is.na(data[i,]))
+    else  index_j<-which(data[i,]==0)
+    AUC_u <- AUC_u + 1/length(index_j) * sum(pred[i,index_test[i]] > pred[i,index_j])
+  }
+  calc_AUC <- AUC_u / dim(data)[1]
+}
+############################################
+############################################
+############################################
+############################################
+############################################
+# check data
+which(apply(is.na(data2),1,sum)>1680)
+which(apply(is.na(data2),2,sum)<10)
+sum(!is.na(data2))
+############################################
+############################################
+ml100k_ori<-ml100k
+ml100k[ml100k>0]<-1
+index_test<- test_index(ml100k, na=FALSE)
+data2_train<-train(ml100k,index_test, na=FALSE)
+############################################
+############################################
+############################################
+############################################
+####BPR
+library("rrecsys")
+setwd("C:/Users/heeseoklee/Downloads/rrecsys-master/rrecsys-master/R")
+source("ALG_BPR.R")  #BPR2
+showDeltaError()
+#rrecsys(ratings, alg="BPR")
+ratings<-defineData(data2_train)
+AUC_total<-c()
+for (i in 1:10) {
+  AUC<-0
+  for( j in 1:5){
+    model_bpr<-BPR2(ratings,k=10*i)
+    pred_bpr<-predict(model_bpr)
+    AUC<-AUC+calc_AUC(ml100k,pred_bpr,index_test,na=FALSE)
+  }
+  AUC_total[i]<-AUC/5
+}
+AUC_total_bpr<-AUC_total
+setStoppingCriteria(autoConverge = FALSE,
+                    deltaErrorThreshold = 1e-05, nrLoops = NULL, minNrLoops = 10)
+showStoppingCriteria()
+############################################
+############################################
+####WRMF
+#devtools::install_github("dselivanov/reco")
+#library("reco")
+install.packages("devtools")
+####R CMD INSTALL rsparse-master/
+devtools::install_github("dselivanov/rsparse", force=TRUE)
+library(rsparse)
+data("movielens100k")
+library(Matrix)
+Matrix(movielens100k)
+ml100k<-as(movielens100k, "matrix")#[1:10,1:10]
+AUC_total<-c()
+for (i in 1:10) {
+  AUC<-0
+  for( j in 1:5){
+    fb<-"implicit"
+    #fb<-"explicit"
+    model = WRMF$new(rank = i*10, lambda = 0.0025,
+                     feedback = fb,#c("implicit", "explicit"),
+                     init_stdv = 0.01,
+                     n_threads = parallel::detectCores(),
+                     non_negative = TRUE,
+                     solver = "conjugate_gradient",#c("conjugate_gradient", "cholesky"),
+                     cg_steps = 5L,
+                     components = NULL)
+    x<-as(data2_train, "CsparseMatrix")
+    #x<-movielens100k
+    n_users_rank<-model$fit_transform(x, n_iter = 10L)
+    rank_n_items<-model$components
+    pred_mmmf<-n_users_rank %*% rank_n_items
+    AUC<-AUC+calc_AUC(ml100k,pred_mmmf,index_test,na=FALSE)
+  }
+  AUC_total[i]<-AUC/5
+}
+AUC_total_wrmf<-AUC_total
+############################################
+############################################
+####SVD-MF
+#install.packages("recommenderlab")
+library("recommenderlab")
+ml <- as(data2_train,"realRatingMatrix")
+r_svd <- Recommender(ml, method = "SVD",param=list(k=10))
+p.svd <- predict(r_svd, ml, type = "ratings")
+pred_svd<-getModel(r_svd)$svd$u %&% diag((getModel(r_svd)$svd)$d) %*% t(getModel(r_svd)$svd$v)
+AUC<-calc_AUC(ml100k,pred_svd,index_test,na=FALSE)
+r <- Recommender(data = ml, method = "ALS",
+                 parameter = list(normalize=NULL, lambda=0.0025, n_factors=10,
+                                  n_iterations=10, seed = NULL, verbose = FALSE))
+r
+recom_ratings <- predict(r, newdata = ml, type = "ratings")
+as(recom_ratings, "matrix")
+getModel(r)
+AUC_total<-c()
+for (i in 1:10) {
+  AUC<-0
+  for( j in 1:2){
+    fsvd<-funkSVD(ml, k = 10, gamma = 0.0025, lambda = 0.01,
+               min_improvement = 1e-06, min_epochs = 30, max_epochs = 50,
+               verbose = TRUE)
+    pred_svdf <- tcrossprod(fsvd$U, fsvd$V)
+    AUC<-AUC+calc_AUC(ml100k,pred_svdf,index_test,na=FALSE)
+  }
+  AUC_total[i]<-AUC/2
+}
+AUC_total_svdf<-AUC_total
+</code>