Skip to content

koojy1211/EPOCH-movies-ml-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

23 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿš€ EPOCH 1st Datathon: Mini Project of TEAMO

[ํ”„๋กœ์ ํŠธ ๋ช…] ์˜ํ™” ์ œ์ž‘์ž๋ฅผ ์œ„ํ•œ ์˜ˆ์‚ฐ ํŽธ์„ฑ ํ”„๋กœ๊ทธ๋žจ ๊ตฌํ˜„
[์ง„ํ–‰์ž] ๋ฐ์ดํ„ฐ์‚ฌ์ด์–ธ์Šค ์—ฐํ•ฉ ๋™์•„๋ฆฌ EPOCH 1๊ธฐ ๊ตฌ์žฌ์˜, 2๊ธฐ ์‹ ์„ฑํ˜„
[์‚ฌ์šฉ ๋ฐ์ดํ„ฐ]

  • The Movies Dataset from Kaggle
    • movies_metadata.csv
    • credits.csv
  • The Numbers budget data

[ํ”„๋กœ์ ํŠธ ๊ฐœ์š”]

  • ํ”„๋กœ์ ํŠธ ๋ชฉํ‘œ: ์˜ํ™”์˜ ์—ฌ๋Ÿฌ ํŠน์„ฑ์„ ํ™œ์šฉํ•˜์—ฌ ์˜ˆ์‚ฐ์„ ์˜ˆ์ธกํ•˜๋Š” ํšŒ๊ท€ ๋ชจ๋ธ์„ ๊ตฌ์ถ•, ์ด๋ฅผ ํ™œ์šฉํ•ด ์˜ํ™” ์ œ์ž‘์ž๊ฐ€ ์ œ์ž‘ํ•˜๊ณ ์ž ํ•˜๋Š” ์˜ํ™”์˜ ์ •๋ณด๋ฅผ ์ž…๋ ฅํ•˜๋ฉด ์˜ˆ์‚ฐ์„ ํŽธ์„ฑํ•ด์ค€ ํ›„์—, ์˜ํ™” ์ œ์ž‘์ž ํƒ€์ž…์ด ์–ด๋–ป๊ฒŒ ๋˜๋Š”์ง€ ์ถœ๋ ฅํ•ด์ฃผ๋Š” ์‹œ์Šคํ…œ์„ ๊ตฌํ˜„
  • ์‚ฌ์šฉํ•œ ๋จธ์‹ ๋Ÿฌ๋‹ ๊ธฐ๋ฒ•: ํฌ๋กค๋ง, ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ, ํšŒ๊ท€, ํด๋Ÿฌ์Šคํ„ฐ๋ง

[ํ”„๋กœ์ ํŠธ ์ƒ์„ธ]

  • ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ
    • num_? cast, crew, production_companies ๋ช…๋‹จ์„ ์ˆ˜์น˜ํ™”
    • genres ๊ณ ์œ ํ•œ ๊ฐ’์„ ๋ชจ๋‘ ๋ฝ‘์•„ ์ด 20๊ฐœ ์žฅ๋ฅด๋ฅผ ๋”๋ฏธํ™”ํ•œ ํ›„, ํ‰๊ท  ์˜ˆ์‚ฐ ๊ธฐ๋ฐ˜์œผ๋กœ ์ƒˆ๋กœ์šด ํ•˜๋‚˜์˜ ์ปฌ๋Ÿผ์œผ๋กœ ๋ณ€ํ™˜
    • belongs_to_collection ๋”๋ฏธํ™”
  • ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ: overview ์ค„๊ฑฐ๋ฆฌ ๋ฐ์ดํ„ฐ๋ฅผ Sentence Transformer๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์ฒ˜๋ฆฌ
  • ๋ชจ๋ธ ํ•™์Šต ๋ฐ ํ‰๊ฐ€
    • Random Forest Regressor: ๊ฒฐ์ •๊ณ„์ˆ˜ ๊ฐ’์ด ์Œ์ˆ˜๋กœ ์ถœ๋ ฅ๋˜์–ด ์ œ์™ธ
    • XGBoost, LightGBM, CatBoost
    • ์•™์ƒ๋ธ” ๋ชจ๋ธ: ์œ„ ์„ธ ๋ชจ๋ธ์˜ ์˜ˆ์ธก ๊ฒฐ๊ณผ๋ฅผ ํ‰๊ท ํ•˜์—ฌ ๊ตฌ์„ฑ, ๊ฐ ๋ชจ๋ธ์˜ ์žฅ์ ์„ ๊ฒฐํ•ฉํ•˜์—ฌ ์˜ˆ์ธก ์„ฑ๋Šฅ์„ ์ตœ์ ํ™”
    • ์„ธ ๋ชจ๋ธ์„ ํ†ตํ•ด ๋„์ถœํ•œ ์•™์ƒ๋ธ” ๋ชจ๋ธ์˜ ๊ฒฐ์ •๊ณ„์ˆ˜๊ฐ€ ๊ฐ€์žฅ ๋†’๊ณ  MSE ๊ฐ’์ด ๊ฐ€์žฅ ๋‚ฎ์•„, ์ตœ์ข…์ ์œผ๋กœ ์•™์ƒ๋ธ” ๊ธฐ๋ฒ•์„ ์„ ํƒ
  • ํด๋Ÿฌ์Šคํ„ฐ๋ง
    • ์‹ค๋ฃจ์—ฃ ์ ์ˆ˜ ํ™•์ธ: ํด๋Ÿฌ์Šคํ„ฐ ์ˆ˜๋ฅผ 8๋กœ ํ•˜๋Š” ๊ฒƒ์ด ์ ํ•ฉํ•จ์„ ํ™•์ธ

[ํ”„๋กœ์ ํŠธ ๊ธฐ๋Œ€ ํšจ๊ณผ]
๋‹จ์ˆœํžˆ ์˜ํ™” ์ œ์ž‘์ž๋งŒ์„ ํƒ€๊นƒ์œผ๋กœ ํ•˜๋Š” ์‹œ์Šคํ…œ์ด๋ผ๋ฉด ํ™œ์šฉ๋„๊ฐ€ ๋†’์ง€ ์•Š์„ ์ˆ˜ ์žˆ์Œ
ํ•˜์ง€๋งŒ ๋ณธ ์‹œ์Šคํ…œ์€, ์‚ฌ์šฉ์ž๊ฐ€ ์ž์‹ ์ด ์˜ํ™” ์ œ์ž‘์ž๊ฐ€ ๋˜์—ˆ๋‹ค๊ณ  ์ƒ์ƒํ•˜๋ฉฐ ์›ํ•˜๋Š” ํ‰์ , ์œ ๋ช…๋„, ๋Ÿฌ๋‹ ํƒ€์ž„, ํ•จ๊ป˜ํ•  ๋ฐฐ์šฐ์™€ ์ œ์ž‘์ง„ ์ˆ˜ ๋“ฑ์„ ์ž…๋ ฅํ•˜๋ฉด ์ž๋™์œผ๋กœ ํ•„์š”ํ•œ ๋ˆ(์˜ˆ์‚ฐ)๊ณผ ์ œ์ž‘์ž ํƒ€์ž…์„ ์•Œ๋ ค์ฃผ๋Š” ํ˜•ํƒœ
๋”ฐ๋ผ์„œ, ๋ชจ๋“  ๋Œ€์ค‘์ด ๋ถ€๋‹ด ์—†์ด ์ฆ๊ธธ ์ˆ˜ ์žˆ๋Š” ์‹œ์Šคํ…œ์ด ๋  ๊ฒƒ์œผ๋กœ ๊ธฐ๋Œ€๋จ

About

Developing a budget prediction model based on "The Movies Dataset" on Kaggle

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published