Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird results when training Dixon model #7

Open
monokizsolt opened this issue Jan 11, 2023 · 1 comment
Open

Weird results when training Dixon model #7

monokizsolt opened this issue Jan 11, 2023 · 1 comment
Labels
enhancement New feature or request

Comments

@monokizsolt
Copy link

monokizsolt commented Jan 11, 2023

Hi,

I have noticed that there is a dramatic difference in prediction results when training the dixon model with almost the same amount of data.
Traning with the first 99 rows outputs this:
Home Win: 0.4901944888036056
Draw: 0.4236429709276788
Away Win: 0.08616254025982717

But training with the first 100 (it even has a negative probability):
Home Win: 0.37407906289002624
Draw: 0.6979058936975158
Away Win: -0.07198495669064632

I have prepared a small script to demonstrate this:
`
import penaltyblog as pb

fb = pb.scrapers.FootballData("GRC Super League", "2022-2023")

Train with 99

df = fb.get_fixtures().iloc[:99]
print(df)
weight = pb.models.dixon_coles_weights(df["date"], 0.001)
clf = pb.models.DixonColesGoalModel(
df["goals_home"], df["goals_away"], df["team_home"], df["team_away"], weight
)
clf.fit()

print(clf)
print(clf.predict("Olympiakos", "Asteras Tripolis"))

Train with 100

df = fb.get_fixtures().iloc[:100]
print(df)
weight = pb.models.dixon_coles_weights(df["date"], 0.001)
clf = pb.models.DixonColesGoalModel(
df["goals_home"], df["goals_away"], df["team_home"], df["team_away"], weight
)
clf.fit()

print(clf)
print(clf.predict("Olympiakos", "Asteras Tripolis"))

`

I could not find why this happens, could you maybe take a look?
Thanks, Zsolt

@martineastwood
Copy link
Owner

Thanks Zsolt - it looks like the optimiser is coming up with a value for rho that is breaking Dixon and Cole's adjustment factor. I suspect it's because you're using quite a small amount of data so the model is not converging well and so the optimiser's output is quite volatile.

Adding in the previous season's data as well helps the model converge better.

df = pd.concat(
    [
        pb.scrapers.FootballData("GRC Super League", "2021-2022").get_fixtures(),
        pb.scrapers.FootballData("GRC Super League", "2022-2023").get_fixtures(),
    ]
)[:-2]

weight = pb.models.dixon_coles_weights(df["date"], 0.001)
clf = pb.models.DixonColesGoalModel(
df["goals_home"], df["goals_away"], df["team_home"], df["team_away"], weight
)
clf.fit()

print(clf)
print(clf.predict("Olympiakos", "Asteras Tripolis"))
df = pd.concat(
    [
        pb.scrapers.FootballData("GRC Super League", "2021-2022").get_fixtures(),
        pb.scrapers.FootballData("GRC Super League", "2022-2023").get_fixtures(),
    ]
)[:-1]

weight = pb.models.dixon_coles_weights(df["date"], 0.001)
clf = pb.models.DixonColesGoalModel(
df["goals_home"], df["goals_away"], df["team_home"], df["team_away"], weight
)
clf.fit()

print(clf)
print(clf.predict("Olympiakos", "Asteras Tripolis"))

I'll look into adding constraints around the value that rho is allowed to be to help minimise this in the future

@martineastwood martineastwood added the enhancement New feature or request label Jul 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants