Skip to content

Latest commit

ย 

History

History
122 lines (83 loc) ยท 5.15 KB

File metadata and controls

122 lines (83 loc) ยท 5.15 KB

๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜ ์ถ”์ฒœ ์‹œ์Šคํ…œ (Multi-modal Recommender System)

  • ๐Ÿฅ‡ ๋Œ€์ƒ ์ˆ˜์ƒ - Winning Solution for a Competition

์ถ”์ฒœ ์‹œ์Šคํ…œ์€ ์‚ฌ์šฉ์ž์˜ ์ •๋ณด๋ฅผ ๋ถ„์„ํ•˜์—ฌ ์‚ฌ์šฉ์ž์—๊ฒŒ ์ ํ•ฉํ•œ ์ƒํ’ˆ์„ ์ถ”์ฒœํ•ด์ฃผ๋Š” ์ธ๊ณต์ง€๋Šฅ ๊ธฐ์ˆ  ์ค‘ ํ•˜๋‚˜์ž…๋‹ˆ๋‹ค. ์ถ”์ฒœ ์‹œ์Šคํ…œ ๊ธฐ์ˆ ์„ ํ†ตํ•ด ์‚ฌ์šฉ์ž ํŽธ์˜์„ฑ ์ฆ๊ฐ€ ๋ฐ ์‚ฌ์šฉ์ž์˜ ์ƒํ’ˆ์˜ ์ ‘๊ทผ์„ฑ์„ ๋†’์—ฌ ๊ธฐ์—…์˜ ์ด์ต ์ฆ๋Œ€๋ฅผ ๊ธฐ๋Œ€ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ถ”์ฒœ ์‹œ์Šคํ…œ์€ ์ฃผ๋กœ ์‚ฌ์šฉ์ž์˜ ์ƒํ’ˆ์— ๋Œ€ํ•œ ์„ ํ˜ธ๋„ ์ •๋ณด๋ฅผ ์‚ฌ์šฉํ•˜์ง€๋งŒ, ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘์˜ ์–ด๋ ค์›€์œผ๋กœ Data Sparseness๋‚˜ Cold Start ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ๋ณด์™„ํ•˜๊ณ ์ž, ์ตœ๊ทผ ์‚ฌ์šฉ์ž ๋กœ๊ทธ ์ •๋ณด ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์ด๋ฏธ์ง€ ํ˜น์€ ๋ฆฌ๋ทฐ ์ •๋ณด๋ฅผ ๊ฒฐํ•ฉํ•˜์—ฌ Multi-modal ๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜ ์ถ”์ฒœ ์‹œ์Šคํ…œ ์—ฐ๊ตฌ๊ฐ€ ๋‹ค์ˆ˜ ์ง„ํ–‰๋˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

Multi-modal ๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜์˜ ๊ณ ์„ฑ๋Šฅ ์ถ”์ฒœ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๊ฐœ๋ฐœ์„ ํ†ตํ•ด ์ถ”์ฒœ ์‹œ์Šคํ…œ์˜ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•˜๊ณ  ์‚ฌ์šฉ์ž์—๊ฒŒ ์ตœ์ ํ™”๋œ ๊ฐœ์ธํ™” ์ถ”์ฒœ ๊ฒฝํ—˜์„ ์ œ๊ณตํ•˜๋Š” ๊ฒƒ์„ ๊ธฐ๋Œ€ํ•ฉ๋‹ˆ๋‹ค.

Index


Competition imformation

  • ์ฃผ๊ด€: ์ธ๊ณต์ง€๋Šฅ์œตํ•œ์—ฐ๊ตฌ์„ผํ„ฐ, BK ์‚ฐ์—…์œตํ•ฉํ˜• ์ฐจ์„ธ๋Œ€ ์ธ๊ณต์ง€๋Šฅ ํ˜์‹ ์ธ์žฌ ๊ต์œก์—ฐ๊ตฌ๋‹จ

  • ์šด์˜: ๋ฐ์ด์ฝ˜

  • ๋Œ€ํšŒ : link

  • ๋Œ€ํšŒ ๊ธฐ๊ฐ„ : 2023.07.04 ~ 2023.08.07

  • ํ‰๊ฐ€ ์‚ฐ์‹ : NDCG@50

    $DCG_u = \sum\limits_{l=1}^{50}\frac{relevance_i}{log2(i+1)}$

    $IDCG_u = \sum\limits_{l=1}^{50}\frac{relevance_i^{opt}}{log2(i+1)}$

    $NDCG_u = \frac{DCG_u}{IDCG_u}$

    $relevance_i$๊ฐ’์€ ํ‰์ ์ด 3์ด์ƒ์ด๋ฉด 1, ์•„๋‹ˆ๋ฉด 0์œผ๋กœ ์ด์ง„ํ™” ํ•˜์—ฌ ๊ณ„์‚ฐ

Data

name count
user_id 192403
item_id 62989
interection 1254441

item_id์— ํ•ด๋‹นํ•˜๋Š” image_feat, text_feat ์ œ๊ณต

For more : Raw data

Model

  • BM3 : paper

  • Hyperparameter table

    • metric & inference_time : 5-fold average
    • Device : GeForce RTX 3080 Ti 12GB
    • ndcg@50 ๋‚ด๋ฆผ์ฐจ์ˆœ ์ •๋ ฌ
    n_layers embedding_size feat_embed_dim ndcg@50 precision@50 recall@50 map@50 training_time_avg inference_time_avg
    4 256 128 0.036900 0.002700 0.093460 0.019720 4h 18m 0.60s 25.08s
    3 256 128 0.036800 0.002680 0.092940 0.019720 3h 55m 50.60s 20.56s
    4 128 128 0.036620 0.002740 0.094680 0.019100 3h 38m 19.20s 14.43s
    4 128 256 0.036600 0.002760 0.095020 0.019020 3h 20m 55.40s 14.37s
    4 128 64 0.036480 0.002740 0.094560 0.018980 3h 53m 47.20s 14.38s
    5 256 128 0.036380 0.002700 0.093560 0.019180 6h 33m 47.60s 29.33s
    3 128 128 0.036300 0.002700 0.093700 0.018980 4h 21m 45.40s 12.52s
    3 128 256 0.036300 0.002700 0.093660 0.019000 3h 17m 46.40s 12.46s
    3 128 64 0.036240 0.002700 0.093280 0.019020 3h 48m 49.60s 12.46s
    5 128 128 0.036140 0.002740 0.094780 0.018640 5h 33m 18.80s 16.31s
    6 128 128 0.036140 0.002740 0.094720 0.018580 4h 56m 59.20s 18.29s
  • Drop_out : 0.5๋กœ ๊ณ ์ •

    Table Visualization

Result

  • best5 parameter model ensemble : 25๊ฐœ์˜ csv ํŒŒ์ผ (model : 5 and fold : 5)
  • Hard_voting : ๊ฐ ๋ชจ๋ธ์ด ์œ ์ €๋ณ„ ์˜ˆ์ธกํ•œ ์•„์ดํ…œ์˜ ๋นˆ๋„์ˆ˜๋ฅผ ๊ธฐ์ค€์œผ๋กœ ํฐ ๊ฐ’๋ถ€ํ„ฐ ์ถ”์ฒœ
  • weighted_voting : Hard_voting์—์„œ $i$ ๋ฒˆ์งธ ๋“ฑ์žฅํ•œ ์•„์ดํ…œ์— ๋Œ€ํ•˜์—ฌ $\frac{1}{log_2(i+1)}$ ๊ฐ€์ค‘์น˜๋ฅผ ๋”ํ•˜์—ฌ ํฐ ๊ฐ’๋ถ€ํ„ฐ ์ถ”์ฒœ
Type Public(30%) Private
weighted_voting 0.0428 0.0442
Hard_voting 0.0386 0.0399

Code reproduction

# ๋ชจ๋ธ ํ›ˆ๋ จ ํ™˜๊ฒฝ ๊ตฌ์ถ•
# docker๋Š” CUDA Version: 11.2 ๊ธฐ์ค€์œผ๋กœ ์ž‘์„ฑ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. Dockerfile์„ ํ™•์ธํ•ด์ฃผ์„ธ์š”
sh docker.sh
# Data Preprocess
python preprocessing/preprocess.py

# Model Train
python src/main.py -m BM3

# Model Inference
python src/submission.py

# submission ์ƒ์„ฑ
cd ..
python src/ensemble.py -t weighted_voting -f BM3
์•™์ƒ๋ธ” ๊ฒฐ๊ณผ ๊ฒฝ๋กœ : /workspace/root/Challenge-Multi-modal-Recommender-System/submission/best.csv
docker cp [container ID]:[์•™์ƒ๋ธ” ๊ฒฐ๊ณผ ๊ฒฝ๋กœ] [host ํŒŒ์ผ๊ฒฝ๋กœ]