GitHub - hoohacks/judging: Infrastructure for handling all things judging

1. Overview

A comprehensive judging platform for hackathons

2. Installation

Created with Django in a Docker container
Hopefully will be deployable on Heroku soon

To dev locally:

Clone repo. git clone <>
Spin up. docker-compose up
Open at localhost:5082 with un:admin & pw:HackathonsRule!
- Otherwise, create superuser python manage.py createsuperuser

3. Motivation

Judging is a hard problem
- One of the main problems is judge scores are unreliable
- One solution was relative scoring with Gavel
- However, given restraints, this was not possible at HooHacks
Our solution was normalized averages
Normalization reduces variability between judges (not for each judge)
- Assumption 1. Some judges simply grade things more harshly than other judges
- Assumption 2. The distribution of scores for each team is normally distributed
- Assumption 3. The variance for each team's score distribution is the same
- Observation. The moving average of judge scores steadily increases throughout the judging time.
We collect this information by running a normalization session where every judge evaluates a couple "fake" projects before real judging begins
The empirical evidence is alright
- For each criteria (e.g. functionality, impact, creativity), if judge A's score was >= judge B's score in the normalization session, then the same held true for any project roughly 65-85% of the time.
  - Note, the fact that judges did not agree on what was more creative or more impactful (though impactful was the most agreeable of our criteria) indicates the criteria we used were subjective. And that's acceptable. Reducing what makes a project "good" into a reasonable number of objective criteria is an exercise in futility.
- The distribution of scores for the normalization "anchor" demos was (surprisingly) normally distributed so assumption 2 seems to hold.
In conclusion, this method isn't perfect, but it's a reasonable step up from just averages
- Further, there are some interesting judging ideas that run orthogonal to score normalization. For example, imagine if every team's score was actually represented as a confidence interval. Every time a judge evaluates a team, the team's confidence interval shrinks toward its "true" score. Once a team is confidently out of the race, judges can stop evaluating them and start judging teams closer to the top. At the end of the day, the accuracy of the top rankings is important than that of the bottom rankings. This is effectively a linear bandit problem so we can probably apply some state-of-the-art reinforcement learning solutions?

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
app/judging		app/judging
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

1. Overview

2. Installation

3. Motivation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

hoohacks/judging

Folders and files

Latest commit

History

Repository files navigation

1. Overview

2. Installation

3. Motivation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages