Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conditional SWAP agent confusion matrices (P("LENS"|LENSED_QUASAR) etc) #162

Open
3 tasks
drphilmarshall opened this issue Apr 28, 2015 · 11 comments
Open
3 tasks
Assignees
Milestone

Comments

@drphilmarshall
Copy link
Owner

The STRIDES group were discussing how to combine their "expert grades" for ~100 lensed quasar candidates this afternoon. I suggested that we had solved this problem with SWAP (although we did not apply it to our own expert grades!). On the STRIDES dime, then, I could:

  • Adapt SWAP to read in a csv file of expert grades, as output by a spreadsheet, and reformat it into a classification "database"
  • Figure out how to translate each expert (0/1/2/3) grade into a LENS or NOT fractional vote
  • Run, comparing offline/online, supervised/unsupervised etc

In the first instance, I guess offline and unsupervised will work best (if only because we don't have any classifications of training images!) In future though, the team seemed amenable to having training images mixed in with the candidates, which I thought was interesting.

@cpadavis: it occurs to me that the above could potentially make a nice introductory example to the eSWAP analysis. Comments welcome!

@drphilmarshall drphilmarshall self-assigned this Apr 28, 2015
@drphilmarshall drphilmarshall added this to the Someday milestone Apr 28, 2015
@anupreeta27
Copy link
Collaborator

@drphilmarshall
i believe you are referring to existing classifications of the qso candidates by some of the strides members.
if you want to use SWAP then you will have to use a training sample for calibrating the strides team's classifications - this would mean asking everyone to redo all classifications on the test + a training sample.
and, if you don't plan to use a training sample for everyone's PL-PD then why not simply take the average of their grades? how can swap provide a better solution over a simple average?

@drphilmarshall
Copy link
Owner Author

I'm talking about existing and future candidate grading, and we did talk
about putting in sims and duds to the grading exercise.

SWAP can now operate without a training set (in "unsupervised" mode, since
Taiwan last year) - Chris is testing it for the eSWAP paper. It ends up
capturing consensus between agents - which still have independent confusion
matrices, that provide sort of (but not literally) a weighted average. Plus
you could imagine assigning different initial PD and PL for each agent
(Paul Schechter seems to be being afforded significantly higher values, for
example!)

On Wed, Apr 29, 2015 at 3:01 AM, anupreeta27 [email protected]
wrote:

@drphilmarshall https://github.com/drphilmarshall

i believe you are referring to existing classifications of the qso
candidates by some of the strides members.
if you want to use SWAP then you will have to use a training sample for
calibrating the strides team's classifications - this would mean asking
everyone to redo all classifications on the test + a training sample.
and, if you don't plan to use a training sample for everyone's PL-PD then
why not simply take the average of their grades? how can swap provide a
better solution over a simple average?


Reply to this email directly or view it on GitHub
#162 (comment)
.

@cpadavis
Copy link
Collaborator

cpadavis commented May 4, 2015

I have written a very basic 'expertdb' package that takes in user classifications from a csv file (reading columns 'SubjectID', 'AgentID', and 'Classification') and has methods find and digest that can be run with SWAP.py (ie all that needs to be added is some flag in SWAP.py to tell swap that it's looking at an ExpertDB instead of a MongoDB). Currently it just takes any classification > 0 to be a LENS.

As for translating expert grades, I imagine that it becomes one more thing you need to calibrate. You expand classification types from "LENS" and "NOT" to "0", "1", etc. Now you do spacewarps with an expanded asymmetric confusion matrix. I am working out the updates to the formalism and will probably push an updated extended latex document with it to the repo tomorrow. I think the updated formalism will be relatively straightforward. An upshot of this is that you should also be able to extend this formalism in the opposite direction: instead of P("1" | LENS) - like terms, we can look at P("LENS" | Lensed Quasar) - like terms.

@drphilmarshall
Copy link
Owner Author

I thought this would be fun to code! :-) My plan for the grades was to try and interpret them as fixed fractions eg grade 2 might map to 0.67, grade 3 to 0.95 etc. But extending the way you suggest is perhaps more interesting - especially if there are volunteers out there who find quasars easy but arcs hard, or vice versa...

@cpadavis
Copy link
Collaborator

cpadavis commented May 6, 2015

okay a basic multinomial model for generic numbers of classifications and label types is written down in sw-extended.tex and pushed. See section 3.4. I didn't mention how to do this online, but I think the steps are straightforward and I could be explicit about that if we wanted.

I realized while writing this down that you can also account for multiple classifications (either of the same type or otherwise) -- in other words, a way to account for multiple markers from users. I don't think that aspect is fully fleshed, and I'm not even sure it's really worth pursuing, but I thought it was neat to mention.

@drphilmarshall
Copy link
Owner Author

Nice! I have a couple of questions/comments:

  • I think we do want to give the online steps as well, both for completeness and to help anyone coding this into a web app (hint to Zooniverse dev team). This will also help make clear what the online analysis uses when estimating the confusion matrix elements...
  • ... which is to say, for my benefit (and perhaps others!) could you spell out what you mean by a "multinomial mixture model" please?
  • Even before getting to section 1, I think you need to spell out that you are departing from the current standard of only updating agents with training images and not test images. It'd be good to show the benefits of various choices here before moving on to generalizing the model.
  • It'd be nice to show how the system sensitivity to a particular lens type changes when the types are included in the generalized model. Maybe in this document you could put in placeholders for the tests you want to do - perhaps in the form of figures you want to make?

@cpadavis
Copy link
Collaborator

cpadavis commented May 7, 2015

on the multinomial mixture model -- sorry for the jargon! It's very similar to a gaussian mixture model. (more jargon!) Multinomial distributions are a generalization of binomial distributions, where instead of drawing from (0,1) N times, you draw from (0, 1, 2, ... M) N times. You might notice that membership in a gaussian mixture model or a multinomial mixture model is itself drawn from a multinomial distribution (e.g. draw from (LENS, NOT) 1 time for each point). They are particularly common in document classification algorithms, like email classification (spam, work, purchases, etc), where the idea is that each class of documents has a different multinomial distribution for describing the probability of a given word appearing.

I think the easiest way to describe the multinomial mixture model is to say how it generates classifications

For every point, you first generate a group membership. So you say that point 1 is a part of model 2. You can do this by saying that each model i has some probability p_i of being drawn. So in the binary model, you drew membership to group LENS with probability p^0 and to group NOT with probability (1 - p^0). You can imagine that your classifications are now NOT, LENSED QUASAR, and LENSED GALAXY (for simplicity), each with p_N, p_Q, and p_G of being drawn. Incidentally, your membership draws and probabilities can be described by a multinomial distribution.

OK so now you have drawn a model for your point. Now, having drawn that, you draw what kind of response you would receive. That is, you draw a classification from the distribution of P("classification" | NOT) or P("classification" | LENSED QUASAR) or whatever your membership was. So each type of classification has some associated probability of being drawn, e.g. p_0N, p_1N for P("0"|NOT) and P("1"|NOT) respectively. This is also a multinomial distribution. If this were a guassian mixture model, you would instead draw a point from the gaussian distribution of Norm(mu_N, Sigma_N) if your point's membership were NOT. You can also draw more than one point -- so a user could be required to place N markers, so you draw N classifications from that same P("classification"|NOT) distribution.

One final note: apparently (according to wikipedia) "multinomial distributions" and "categorical distributions" are often conflated in machine learning type things. It looks to me that the real difference between the two is the multinomial coefficient (the n!/(k1!k2!...) thing) appears in one and not the other. For our purposes it doesn't really matter which we are talking about.

@drphilmarshall
Copy link
Owner Author

OK, good. Thanks!

Does this mean that we are talking about going from a 2x2 agent confusion
matrix with 2 independent elements for the LENS/NOT simple case, to an NxM
agent confusion matrix with Nx(M-1) independent elements? One practical
issue is that it will take longer to train such an agent (because they will
need to see M times as many sims to become skillful in each category). It's
going to be interesting to see whether this is outweighed by th emodel
allowing more information to be captured in the long run! I guess it will
increase the dominance of the contribution by high effort/experience
volunteers...

@aprajita and I were talking about this extended model yesterday, and
wondered how one could implement a different dimension: labels that contain
information about the object in question that has been extracted from the
image data. A test subject in a targeted search could carry with an
estimate of lens or arc brightness, and/or arc radius, etc. Can you see how
the agents could take this information into account, to allow higher
probability of being right about "easy" systems than "hard" ones? This
could be important if we make the sims more difficult in the next project...

On Thu, May 7, 2015 at 3:35 AM, Chris [email protected] wrote:

on the multinomial mixture model -- sorry for the jargon! It's very
similar to a gaussian mixture model. (more jargon!) Multinomial
distributions are a generalization of binomial distributions, where instead
of drawing from (0,1) N times, you draw from (0, 1, 2, ... M) N times. You
might notice that membership in a gaussian mixture model or a multinomial
mixture model is itself drawn from a multinomial distribution (e.g. draw
from (LENS, NOT) 1 time for each point). They are particularly common in
document classification algorithms, like email classification (spam, work,
purchases, etc), where the idea is that each class of documents has a
different multinomial distribution for describing the probability of a
given word appearing.

I think the easiest way to describe the multinomial mixture model is to
say how it generates classifications

For every point, you first generate a group membership. So you say that
point 1 is a part of model 2. You can do this by saying that each model i
has some probability p_i of being drawn. So in the binary model, you drew
membership to group LENS with probability p^0 and to group NOT with
probability (1 - p^0). You can imagine that your classifications are now
NOT, LENSED QUASAR, and LENSED GALAXY (for simplicity), each with p_N, p_Q,
and p_G of being drawn. Incidentally, your membership draws and
probabilities can be described by a multinomial distribution.

OK so now you have drawn a model for your point. Now, having drawn that,
you draw what kind of response you would receive. That is, you draw a
classification from the distribution of P("classification" | NOT) or
P("classification" | LENSED QUASAR) or whatever your membership was. So
each type of classification has some associated probability of being drawn,
e.g. p_0N, p_1N for P("0"|NOT) and P("1"|NOT) respectively. This is also a
multinomial distribution. If this were a guassian mixture model, you would
instead draw a point from the gaussian distribution of Norm(mu_N, Sigma_N)
if your point's membership were NOT.

One final note: apparently (according to wikipedia) "multinomial
distributions" and "categorical distributions" are often conflated in
machine learning type things. It looks to me that the real difference
between the two is the multinomial coefficient (the n!/(k1!k2!...) thing)
appears in one and not the other. For our purposes it doesn't really matter
which we are talking about.


Reply to this email directly or view it on GitHub
#162 (comment)
.

@cpadavis
Copy link
Collaborator

online equations added to sw-extended. I also found a typo in my summary of the online system, which is better than finding a typo in the online system's code!

As for the idea of differentiating 'harder' and 'easier' systems, the most naieve thing is to assume they are separate groups, e.g. in your confusion matrix LENS and DUD become EASY_LENS, HARD_LENS, EASY_DUD, HARD_DUD, but you still want to find LENS and DUD. You could then say your confusion matrix (derived from training) is P("LENS" | EASY_LENS), for which you then need some additional matrix P(EASY_LENS | LENS) etc -> then P("LENS" | LENS) = P("LENS" | EASY_LENS) P(EASY_LENS | LENS) + P("LENS" | HARD_LENS) P(HARD_LENS | LENS) (P(EASY_DUD | LENS) = 0, and so on) to translate to the label of interest. The problem then becomes additionally estimating how many lenses in the wild are 'hard' vs 'easy' (or you can give a reasonable estimate and keep them fixed). ('hard' and 'easy' are placeholder names -- it could be things like 'lens with arc radius < 10' or 'arc within two fwhm of galaxy' whatever, too). Is that what you were thinking?

@cpadavis
Copy link
Collaborator

by the way this should be testable since we have lensed quasar etc flavors

@drphilmarshall
Copy link
Owner Author

OK: @cpadavis has included a section in the eSWAP draft about how to do all this, but we're postponing any tests for now while we finish the earlier parts of the paper (including focusing on a best-effort re-analysis of Stage 1, as mentioned in #155). The most likely part of the above discussion to make it in is the flavor-aware agents -hence the renaming of this issue!

@drphilmarshall drphilmarshall changed the title SWAP analysis of "expert" grades Conditional SWAP agent confusion matrices (P("LENS"|LENSED_QUASAR) etc) Jun 10, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants