Weird results when training Dixon model #7

monokizsolt · 2023-01-11T09:54:37Z

Hi,

I have noticed that there is a dramatic difference in prediction results when training the dixon model with almost the same amount of data.
Traning with the first 99 rows outputs this:
Home Win: 0.4901944888036056
Draw: 0.4236429709276788
Away Win: 0.08616254025982717

But training with the first 100 (it even has a negative probability):
Home Win: 0.37407906289002624
Draw: 0.6979058936975158
Away Win: -0.07198495669064632

I have prepared a small script to demonstrate this:
`
import penaltyblog as pb

fb = pb.scrapers.FootballData("GRC Super League", "2022-2023")

Train with 99

df = fb.get_fixtures().iloc[:99]
print(df)
weight = pb.models.dixon_coles_weights(df["date"], 0.001)
clf = pb.models.DixonColesGoalModel(
df["goals_home"], df["goals_away"], df["team_home"], df["team_away"], weight
)
clf.fit()

print(clf)
print(clf.predict("Olympiakos", "Asteras Tripolis"))

Train with 100

df = fb.get_fixtures().iloc[:100]
print(df)
weight = pb.models.dixon_coles_weights(df["date"], 0.001)
clf = pb.models.DixonColesGoalModel(
df["goals_home"], df["goals_away"], df["team_home"], df["team_away"], weight
)
clf.fit()

print(clf)
print(clf.predict("Olympiakos", "Asteras Tripolis"))

`

I could not find why this happens, could you maybe take a look?
Thanks, Zsolt

martineastwood · 2023-01-12T08:08:04Z

Thanks Zsolt - it looks like the optimiser is coming up with a value for rho that is breaking Dixon and Cole's adjustment factor. I suspect it's because you're using quite a small amount of data so the model is not converging well and so the optimiser's output is quite volatile.

Adding in the previous season's data as well helps the model converge better.

df = pd.concat(
    [
        pb.scrapers.FootballData("GRC Super League", "2021-2022").get_fixtures(),
        pb.scrapers.FootballData("GRC Super League", "2022-2023").get_fixtures(),
    ]
)[:-2]

weight = pb.models.dixon_coles_weights(df["date"], 0.001)
clf = pb.models.DixonColesGoalModel(
df["goals_home"], df["goals_away"], df["team_home"], df["team_away"], weight
)
clf.fit()

print(clf)
print(clf.predict("Olympiakos", "Asteras Tripolis"))

df = pd.concat(
    [
        pb.scrapers.FootballData("GRC Super League", "2021-2022").get_fixtures(),
        pb.scrapers.FootballData("GRC Super League", "2022-2023").get_fixtures(),
    ]
)[:-1]

weight = pb.models.dixon_coles_weights(df["date"], 0.001)
clf = pb.models.DixonColesGoalModel(
df["goals_home"], df["goals_away"], df["team_home"], df["team_away"], weight
)
clf.fit()

print(clf)
print(clf.predict("Olympiakos", "Asteras Tripolis"))

I'll look into adding constraints around the value that rho is allowed to be to help minimise this in the future

martineastwood added the enhancement New feature or request label Jul 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weird results when training Dixon model #7

Weird results when training Dixon model #7

monokizsolt commented Jan 11, 2023 •

edited

Loading

martineastwood commented Jan 12, 2023

Weird results when training Dixon model #7

Weird results when training Dixon model #7

Comments

monokizsolt commented Jan 11, 2023 • edited Loading

Train with 99

Train with 100

martineastwood commented Jan 12, 2023

monokizsolt commented Jan 11, 2023 •

edited

Loading