recommendation bpr

This is an old revision of the document!

<!–
$theme: default

$size: 4:3
page_number: true 
footer: this is footer

–>

# Slide 1

# ![](images/marp.png)

# Slide 2

# A Review on

# BRP: Bayesian Personalized Ranking from Implicit Feedback, Rendle et al.

### Hee-seok Lee

### Algorithm = model + _loss function_ + _optimization_

- Loss function = BPR-OPT ( → approximates AUC)
- Optimization = stochastic gradient descent

1. choose a rank metric
2. identify parts that are not smooth
3. replace with “close enough” approximations

often logistic function is used here

4. learn with gradient descent

- Explicit :
- Implicit : only positive observations are available

$$ \text{non-observed data} = \text{real negative feedback} + \text{missing value} $$

### properties of a total order

- totality : $\forall{i,j}\in{I},i\neq j \Rightarrow j >_{u} \text{ or } j >_{u} i$
- antisymmetry :
- transitivity :

BRP

bayesian formulation
$$\text{maximize} \ \ \ p(\Theta|>_u) \propto p(>_u|\Theta)p(\Theta) $$

$$ X_{uij} := I(i >_u j) \sim Bern(p_{uij}) s.t. p_{uij} = P(i>_u j) $$

with the assumption that each $(u,i,j) \in D_S$ is independent,

$$ \prod_{u \in U} p( >_u | \Theta) = \prod_{(u,i,j) \in D_s} p (i>_u j|\Theta)$$

$$p(i>_uj|\Theta) := \sigma( \hat{x}_{uij}(\Theta) )$$

## $$\sigma(x) := {1\over 1+ \exp(-1) } $$

$$ \begin{matrix} BPR-OPT &:=& \ln p(\Theta|>_u) \\ &=& \ln p(>_u|\Theta)p(\Theta) \\ &=& \ln \prod_{(u,i,j) \in D_s} \sigma (\hat{x}_{uij})p(\Theta) \\ &=& \sum_{(u,i,j) \in D_s} \ln\sigma (\hat{x}_{uij}) + \ln p(\Theta) \\ &=& \sum_{(u,i,j) \in D_s} \ln\sigma (\hat{x}_{uij}) + \lambda_{\Theta} \lVert\Theta\rVert^2 \end{matrix} ## $$

$$ \begin{matrix} {\partial BPR-OPT \over \partial \Theta} &=& \sum_{(u,i,j) \in D_s} {\partial \over \partial\Theta }\ln\sigma (\hat{x}_{uij}) + \lambda_{\Theta} {\partial \over \partial\Theta }\lVert\Theta\rVert^2 \\ &\propto& \sum_{(u,i,j) \in D_s} {-\exp(-\hat{x}_{uij})\over{1+\exp(-\hat{x}_{uij})}} \centerdot {\partial \over \partial\Theta }\hat{x}_{uij} - \lambda_{\Theta} \Theta \end{matrix} $$

## full gradient descent

- $\downarrow$ $O(\left|S\right|\left|I\right|)$ training triples
- $\downarrow$ skewness in the training pair lead to poor convergence
- $\downarrow$ regularization is difficult

## stochastic gradient descent

- $\uparrow$ good approach for 'our' skew problem
- $\downarrow$ typical item-wise or user-wise order in training pairs have poor convergence because there are so many consecutive updates on the same user-item pair

## $\therefore$ choose triples randomly, bootstrap sampling with replacement

## $$ \Theta \leftarrow \Theta + \alpha\left( {\exp(-\hat{x}_{uij})\over{1+\exp(-\hat{x}_{uij})}} \centerdot {\partial \over \partial\Theta }\hat{x}_{uij} + \lambda_{\Theta} \Theta \right) $$

code

data analysis, tag1, tag2