Presentation is loading. Please wait.

Presentation is loading. Please wait.

Deep Interest Network for Click-Through Rate Prediction

Similar presentations


Presentation on theme: "Deep Interest Network for Click-Through Rate Prediction"β€” Presentation transcript:

1 Deep Interest Network for Click-Through Rate Prediction
5/26/2019 Advisor: Jia-Ling Koh Presenter: You-Xiang Chen Source: kdd 2018 Data: 2019/03/25

2 Outline Introduction Method Experiment Conclusion

3 Introduction The click-through rate of an advertisement is the number of times a click is made on the ad, divided by the number of times the ad is served. π‘ͺ𝑻𝑹= π‘΅π’–π’Žπ’ƒπ’†π’“ 𝒐𝒇 π’„π’π’Šπ’„π’Œβˆ’π’•π’‰π’“π’π’–π’ˆπ’‰π’” π‘΅π’–π’Žπ’ƒπ’†π’“ 𝒐𝒇 π’Šπ’Žπ’‘π’“π’†π’”π’”π’Šπ’π’π’” Γ—πŸπŸŽπŸŽ%

4 Motivation Deep learning based methods have been proposed for CTR prediction task , but they’re difficulty to capture user’s diverse interests effectively from rich historical behaviors. Training industrial deep networks with large scales parse features is a great challenge.

5 Goal Propose a novel model : Deep Interest Network (DIN) , which designing a local activation unit to adaptively learn the representation of user interests from historical behaviors with respect to a certain ad. To make the computation acceptable , we develop a novel mini-batch aware regularization.

6 Outline Introduction Method Experiment Conclusion

7 Base Model (Embedding & MLP)

8 Feature sets in Alibaba

9 Feature Representation
Encoding vector of feature group 𝑑 𝑖 ∈ 𝑅 𝐾 𝑖 , 𝐾 𝑖 denotes the dimensionality of feature group 𝑖 𝑑 𝑖 [j] ∈ {0,1} 𝑗=1 𝐾 𝑖 𝑑 𝑖 [j] = k π‘₯ = [ 𝑑 1 𝑇 , 𝑑 2 𝑇 ,…, 𝑑 𝑀 𝑇 ] 𝑖=1 𝑀 𝐾 𝑖 = K , K is dimensionality of entire feature group

10 Embedding layer Embedding dictionary
π‘Š 𝑖 =[ 𝑀 1 𝑖 , 𝑀 2 𝑖 ,…, 𝑀 𝑗 𝑖 ,…, 𝑀 𝐾 𝑖 𝑖 ] ∈ ℝ 𝐷× 𝐾 𝑖 Embedding vector 𝑒 𝑖 = 𝑀 𝑗 𝑖 , 𝑑 𝑖 𝑖𝑠 π‘œπ‘›π‘’βˆ’β„Žπ‘œπ‘‘ 𝑒 𝑖 1 𝑖 , 𝑒 𝑖 2 𝑖 ,…, 𝑒 𝑖 π‘˜ 𝑖 = 𝑀 𝑖 1 𝑖 , 𝑀 𝑖 2 𝑖 ,…, 𝑀 𝑖 π‘˜ 𝑖 , 𝑑 𝑖 𝑖𝑠 π‘šπ‘’π‘™π‘‘π‘–βˆ’β„Žπ‘œπ‘‘

11 Pooling & Concat layer 𝑒 𝑖 =pooling( 𝑒 𝑖 1 , 𝑒 𝑖 2 ,…, 𝑒 𝑖 π‘˜ )

12 Objective function Negative log-likelihood function

13 Deep Interest Network

14 Activation Unit

15 L2 Regularization π’š=𝒂+𝒃𝒙+ 𝒄𝒙 𝟐 + 𝒅𝒙 πŸ‘ π’š=𝒂+𝒃𝒙+ πŸŽπ’™ 𝟐 + πŸŽπ’™ πŸ‘
𝐉 𝜽 = [ π’š 𝜽 𝒙 βˆ’π²] 𝟐 +[ 𝜽 𝟏 𝟐 + 𝜽 𝟐 𝟐 +…] 𝐉 𝜽 = [ π’š 𝜽 𝒙 βˆ’π²] 𝟐 +𝝀 𝜽 π’Š 𝟐

16 Mini-batch Aware Regularization
Expand the regularization on W over samples Transformed into following in the mini-batch aware manner

17 Data Adaptive Activation Function
PReLU activation function Dice

18 Outline Introduction Method Experiment Conclusion

19 Datasets

20 Train & Test Loss θ™›η·š Overfitting 深碠色:ζ²’ζœ‰ζ­£θ¦εŒ–
Training with good id is better than without (weighting)

21 Test AUC

22 Performance

23 Best AUCs of Base Model with different regularizations (comparing with first line)
ζœ‰η„‘δ½Ώη”¨ good_ids ε’Œ ζ­£θ¦εŒ– Dropout ιš¨ζ©ŸδΈŸζŽ‰ 50% ηš„feature Filter : ζŠŠεΎˆεΈΈε‡ΊηΎηš„feature η§»ε‡ΊοΌŒι€™ι‚ŠζŠŠε‰2000θ¬η­†εΈΈε‡ΊηΎηš„featureη§»ε‡ΊοΌŒε°ε‰©δΈ‹ηš„εšreg Difacto : ε¦δΈ€η―‡ζεˆ°ηš„ζ­£θ¦εŒ–ζ–Ήζ³•οΌŒζ¦‚εΏ΅ζ˜―θΌƒεΈΈε‡ΊηΎηš„feature ε°‘εšζ­£θ¦εŒ–γ€‚ MBA : ζˆ‘ε€‘ηš„ζ–Ήζ³• mini batch aware

24 Model Comparison on Alibaba Dataset with full feature sets

25 Illustration of adaptive activation in DIN

26 Outline Introduction Method Experiment Conclusion

27 Conclusion A novel approach named DIN is designed to activate related user behaviors and obtain an adaptive representation vector for user interests which varies over different ads. Two novel techniques are introduced to help training industrial deep networks and further improve the performance of DIN.


Download ppt "Deep Interest Network for Click-Through Rate Prediction"

Similar presentations


Ads by Google