Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
data_analysis:recommendation_bpr [2020/04/28 23:11] – prgram | data_analysis:recommendation_bpr [2025/07/07 14:12] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | |||
+ | <wrap hide> | ||
< | < | ||
.slide { | .slide { | ||
Line 6: | Line 8: | ||
<!-- $theme: default --> | <!-- $theme: default --> | ||
+ | </ | ||
+ | |||
+ | ====== Review on BPR: Bayesian Personalized Ranking from Implicit Feedback, Rendle et al. ====== | ||
- | # Review on | ||
- | > ### BPR: Bayesian Personalized Ranking from Implicit Feedback, Rendle et al. | ||
--- | --- | ||
- | # Recommendation of item | + | ====# Recommendation of item==== |
### Explicit Feedback : user's preference of item itself | ### Explicit Feedback : user's preference of item itself | ||
> eg. '1-5 score' of movies, ' | > eg. '1-5 score' of movies, ' | ||
Line 19: | Line 22: | ||
- non-observed user-item pairs = real negative feedback + missing value | - non-observed user-item pairs = real negative feedback + missing value | ||
--- | --- | ||
- | # BPR : Bayesian Personalized Ranking from Implicit Feedback | + | ====# BPR : Bayesian Personalized Ranking from Implicit Feedback==== |
| | ||
Line 29: | Line 32: | ||
--- | --- | ||
- | # Formalization | + | ====# Formalization==== |
* $U$ : set of all users, $I$ : set of all items | * $U$ : set of all users, $I$ : set of all items | ||
* $S \subseteq U \times I$ : implicit feedback | * $S \subseteq U \times I$ : implicit feedback | ||
* $>_u \subset I^2$ : personalied total ranking of all items | * $>_u \subset I^2$ : personalied total ranking of all items | ||
- | > properties of total order: | + | > properties of total order: |
- | | + | > (totality) : $\text{if } i \neq j \text{, then } i >_u j \text{ or } j >_u i$ |
- | | + | > (antisymmetry) : $\text{if } i>_uj \text{ and } j>_ui \text{, then } i = j$ |
- | | + | > (transitivity) : $\text{if } i>_uj \text{ and } j>_uk \text{, then } i>_u k$ |
* $I_u^+:=\{i \in I : (u,i) \in S\}$, $U_i^+:=\{u \in U : (u,i) \in S\}$ | * $I_u^+:=\{i \in I : (u,i) \in S\}$, $U_i^+:=\{u \in U : (u,i) \in S\}$ | ||
--- | --- | ||
- | # Approach | + | ====# Approach==== |
* use item pairs as training data | * use item pairs as training data | ||
Line 55: | Line 58: | ||
--- | --- | ||
- | # BPR optimization criterion (1/3) | + | ====# BPR optimization criterion (1/3)==== |
Bayesian formulation : maximize posterior probability | Bayesian formulation : maximize posterior probability | ||
> $$ P(\Theta|> | > $$ P(\Theta|> | ||
Line 61: | Line 64: | ||
--- | --- | ||
- | # BPR optimization criterion (2/3) | + | ====# BPR optimization criterion (2/3)==== |
set $X_{uij} = I(i> | set $X_{uij} = I(i> | ||
then $X_{uij} \sim Bernoulli(p_{uij})$ where $p_{uij} = p(i> | then $X_{uij} \sim Bernoulli(p_{uij})$ where $p_{uij} = p(i> | ||
Line 69: | Line 72: | ||
--- | --- | ||
- | # BPR optimization criterion (3/3) | + | ====# BPR optimization criterion (3/3)==== |
with setting $p(i> | with setting $p(i> | ||
and prior $p(\Theta)$ as normal distribution $N(0, | and prior $p(\Theta)$ as normal distribution $N(0, | ||
Line 84: | Line 87: | ||
> where $\lambda_{\Theta}$ are model specific regularization parameters. | > where $\lambda_{\Theta}$ are model specific regularization parameters. | ||
- | ※ form of MLE with $l_2$-regularization | + | ※ |
--- | --- | ||
- | # Relation to AUC | + | ====# Relation to AUC==== |
> $$ \begin{matrix} | > $$ \begin{matrix} | ||
AUC &=& {1 \over|U|}\sum_{u\in U} AUC(u) \\ | AUC &=& {1 \over|U|}\sum_{u\in U} AUC(u) \\ | ||
Line 99: | Line 102: | ||
--- | --- | ||
- | # Optimization | + | ====# Optimization==== |
### gradient descent | ### gradient descent | ||
> $$ {\partial \text{BPR-OPT} \over \partial \Theta} = \sum_{(u, | > $$ {\partial \text{BPR-OPT} \over \partial \Theta} = \sum_{(u, | ||
Line 114: | Line 117: | ||
| | ||
--- | --- | ||
- | # Model | + | ====# Model==== |
Many model predict a real number $\hat{x}_{ul}$ per user-item pair $(u,l)$ | Many model predict a real number $\hat{x}_{ul}$ per user-item pair $(u,l)$ | ||
Line 121: | Line 124: | ||
> $$ \hat{x}_{uij} := \hat{x}_{ui}-\hat{x}_{uj} $$ | > $$ \hat{x}_{uij} := \hat{x}_{ui}-\hat{x}_{uj} $$ | ||
- | ※ use same model, but optimize it against another criterion. | + | ※ |
(difference of two prediction instead of single number) | (difference of two prediction instead of single number) | ||
--- | --- | ||
- | # Model : Matrix Factorization | + | ====# Model : Matrix Factorization==== |
$X = [x_{ui}]$ is approximated by two low-rank matrices $W:|U| \times k$ and $H: | $X = [x_{ui}]$ is approximated by two low-rank matrices $W:|U| \times k$ and $H: | ||
> $$ \hat{X} := WH^t $$ | > $$ \hat{X} := WH^t $$ | ||
Line 145: | Line 148: | ||
\end{cases}$$ | \end{cases}$$ | ||
- | $\lambda_W$ $\lambda_{H^+}$ $\lambda_{H^-}$ | + | $\lambda_W$ $\lambda_{H^+}$ $\lambda_{H^-}$ : regularization for each $\Theta$ |
--- | --- | ||
- | # Model : Adaptive k-Nearest-Neighbor | + | ====# Model : Adaptive k-Nearest-Neighbor==== |
> $$ \hat{x}_{ui} = \sum_{l \in I_u^+\text{ and }l \neq i} c_{il} $$ | > $$ \hat{x}_{ui} = \sum_{l \in I_u^+\text{ and }l \neq i} c_{il} $$ | ||
> where $C:I \times I$ is the symmetric item-correlation/ | > where $C:I \times I$ is the symmetric item-correlation/ | ||
Line 161: | Line 164: | ||
\end{cases}$$ | \end{cases}$$ | ||
- | $\lambda_{+}$ | + | $\lambda_{+}$ |
--- | --- | ||
- | # Relation to other methods | + | ====# Relation to other methods |
### Weighted Regularized MF (WR-MF) | ### Weighted Regularized MF (WR-MF) | ||
implicit | implicit | ||
> $$ \sum_{u\in U} \sum_{i\in I} c_{ui}(\langle w_u, h_i \rangle - 1)^2 + \lambda\lVert W \rVert _f^2 + \lambda\lVert H \rVert _f^2 $$ | > $$ \sum_{u\in U} \sum_{i\in I} c_{ui}(\langle w_u, h_i \rangle - 1)^2 + \lambda\lVert W \rVert _f^2 + \lambda\lVert H \rVert _f^2 $$ | ||
1. this optimization is on instance level (one item) instead of pair level (two items) as BPR. | 1. this optimization is on instance level (one item) instead of pair level (two items) as BPR. | ||
- | 2. this optimization is a least square | + | 2. this optimization is a least square |
- | However, the task of item prediction is actually not a regression(quantitative), | + | - But item prediction is actually not a regression(quantitative), |
+ | but a classification (qualitative) one | ||
+ | - so the logistic optimization is more appropriate. | ||
--- | --- | ||
Line 178: | Line 183: | ||
explicit | explicit | ||
> $$ \sum_{(u, | > $$ \sum_{(u, | ||
- | 1. the error functions differ | + | 1. the error functions differ |
2. BPR-Opt criterion is generic and can be applied to several models, whereas their method is specific for MF. | 2. BPR-Opt criterion is generic and can be applied to several models, whereas their method is specific for MF. | ||
- | 3. MMMF is designed to work with sparse explicit data, i.e. they assume that there are many missing values and thus they assume to have much less pairs than in an implicit setting. | + | 3. MMMF is designed to work with sparse explicit data |
+ | i.e. they assume that there are many missing values and thus they assume to have much less pairs than in an implicit setting. | ||
--- | --- | ||
- | # Evaluation | + | ====# Evaluation==== |
leave one out (remove one entry from $I_u^+$ per user $u$) | leave one out (remove one entry from $I_u^+$ per user $u$) | ||
Line 191: | Line 197: | ||
higher value of AUC indicates a better quality. | higher value of AUC indicates a better quality. | ||
+ | |||
+ | --- | ||
+ | ##### \< | ||
+ | |||
+ |  | ||
+ | {{: | ||
+ | {{: | ||
+ | ###### remove rating scores, repeated 5 times, $\lambda = 0.0025$, $\alpha = 0.05$ | ||
+ | --- | ||
+ | |||
+ | # END | ||
+ | |||
+ | |||
+ | |||
+ | ==== Code ==== | ||
+ | < | ||
+ | # | ||
+ | |||
+ | |||
+ | test_index< | ||
+ | index_test< | ||
+ | for (i in 1: | ||
+ | if(na) indeX_i <- which(!is.na(data[i, | ||
+ | else index_i< | ||
+ | | ||
+ | index_test[i]< | ||
+ | } | ||
+ | test_index <- index_test | ||
+ | } | ||
+ | |||
+ | |||
+ | train <- function (data, index_test, na=TRUE) { | ||
+ | |||
+ | train_data< | ||
+ | for (i in 1: | ||
+ | train_data[i, | ||
+ | } | ||
+ | train_data[is.na(train_data)]< | ||
+ | train< | ||
+ | } | ||
+ | |||
+ | |||
+ | |||
+ | calc_AUC <- function(data, | ||
+ | AUC_u <- 0 | ||
+ | for (i in 1: | ||
+ | i=1 | ||
+ | if(na) index_j< | ||
+ | else index_j< | ||
+ | | ||
+ | AUC_u <- AUC_u + 1/ | ||
+ | } | ||
+ | | ||
+ | calc_AUC <- AUC_u / dim(data)[1] | ||
+ | } | ||
+ | |||
+ | ############################################ | ||
+ | ############################################ | ||
+ | ############################################ | ||
+ | ############################################ | ||
+ | ############################################ | ||
+ | |||
+ | |||
+ | # check data | ||
+ | which(apply(is.na(data2), | ||
+ | |||
+ | which(apply(is.na(data2), | ||
+ | sum(!is.na(data2)) | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | ############################################ | ||
+ | ############################################ | ||
+ | |||
+ | ml100k_ori< | ||
+ | ml100k[ml100k> | ||
+ | |||
+ | index_test< | ||
+ | data2_train< | ||
+ | ############################################ | ||
+ | ############################################ | ||
+ | |||
+ | ############################################ | ||
+ | ############################################ | ||
+ | ####BPR | ||
+ | library(" | ||
+ | setwd(" | ||
+ | source(" | ||
+ | |||
+ | |||
+ | showDeltaError() | ||
+ | # | ||
+ | |||
+ | ratings< | ||
+ | |||
+ | |||
+ | |||
+ | AUC_total< | ||
+ | for (i in 1:10) { | ||
+ | AUC<-0 | ||
+ | for( j in 1:5){ | ||
+ | model_bpr< | ||
+ | pred_bpr< | ||
+ | | ||
+ | AUC< | ||
+ | } | ||
+ | AUC_total[i]< | ||
+ | } | ||
+ | AUC_total_bpr< | ||
+ | |||
+ | |||
+ | setStoppingCriteria(autoConverge = FALSE, | ||
+ | deltaErrorThreshold = 1e-05, nrLoops = NULL, minNrLoops = 10) | ||
+ | showStoppingCriteria() | ||
+ | |||
+ | |||
+ | |||
+ | ############################################ | ||
+ | ############################################ | ||
+ | ####WRMF | ||
+ | |||
+ | # | ||
+ | # | ||
+ | |||
+ | install.packages(" | ||
+ | ####R CMD INSTALL rsparse-master/ | ||
+ | devtools:: | ||
+ | library(rsparse) | ||
+ | |||
+ | |||
+ | data(" | ||
+ | |||
+ | library(Matrix) | ||
+ | Matrix(movielens100k) | ||
+ | |||
+ | ml100k< | ||
+ | |||
+ | AUC_total< | ||
+ | for (i in 1:10) { | ||
+ | AUC<-0 | ||
+ | for( j in 1:5){ | ||
+ | fb< | ||
+ | # | ||
+ | model = WRMF$new(rank = i*10, lambda = 0.0025, | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | x< | ||
+ | # | ||
+ | n_users_rank< | ||
+ | rank_n_items< | ||
+ | | ||
+ | pred_mmmf< | ||
+ | | ||
+ | AUC< | ||
+ | } | ||
+ | AUC_total[i]< | ||
+ | } | ||
+ | |||
+ | AUC_total_wrmf< | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | ############################################ | ||
+ | ############################################ | ||
+ | ####SVD-MF | ||
+ | # | ||
+ | library(" | ||
+ | |||
+ | |||
+ | ml <- as(data2_train," | ||
+ | r_svd <- Recommender(ml, | ||
+ | p.svd <- predict(r_svd, | ||
+ | |||
+ | |||
+ | pred_svd< | ||
+ | |||
+ | AUC< | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | r <- Recommender(data = ml, method = " | ||
+ | | ||
+ | n_iterations=10, | ||
+ | r | ||
+ | recom_ratings <- predict(r, newdata = ml, type = " | ||
+ | as(recom_ratings, | ||
+ | |||
+ | getModel(r) | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | AUC_total< | ||
+ | for (i in 1:10) { | ||
+ | AUC<-0 | ||
+ | for( j in 1:2){ | ||
+ | fsvd< | ||
+ | | ||
+ | | ||
+ | | ||
+ | pred_svdf <- tcrossprod(fsvd$U, | ||
+ | |||
+ | AUC< | ||
+ | | ||
+ | } | ||
+ | AUC_total[i]< | ||
+ | } | ||
+ | |||
+ | AUC_total_svdf< | ||
+ | </ | ||
+ |