Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coord_ra and coord_dec values are all nan's in icSrc catalogs #425

Closed
jchiang87 opened this issue Dec 8, 2016 · 12 comments
Closed

coord_ra and coord_dec values are all nan's in icSrc catalogs #425

jchiang87 opened this issue Dec 8, 2016 · 12 comments

Comments

@jchiang87
Copy link
Contributor

In the process of looking into comparing the inferred input magnitudes from the instance catalogs to the fluxes measured by processEimage.py, I delved into one of the icSrc catalogs:

[cori05] pwd -P
/global/project/projectdirs/lsst/phosim_deep/feasibility_study/single_raft/output_repo/icSrc/v1414156-fr/R22
In [2]: import astropy.io.fits as fits

In [3]: foo = fits.open('S11.fits')

In [4]: foo[1].data.field('coord_ra')
Out[4]: 
array([ nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan])

In [5]: foo[1].data.field('coord_dec')
Out[5]: 
array([ nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,  nan])

In [6]: 

This is using a phosim deep precursor, but I see the same thing for Run1.1 and Run3 datasets at SLAC. The recent Run3 Level 2 results seem to be here:

/global/cscratch1/sd/desc/twinkles/work/5/output

but I can't read those files since I'm not in the desc group.

What do we need to do to get the Stack to produce useful coordinates in these catalogs?

@SimonKrughoff
Copy link
Contributor

We should be reading in the catalogs with the stack persistence mechanisms, not astropy.fitsio. I don't know if it would fix this, but there is more to the persistence framework than just opening the fits file. In this case you would do either (both untested):

$> import lsst.afw.table as afwTable
$> src_cat = afwTable.SourceCatalog.readFits('S11.fits')

or better yet

$> import lsst.daf.persistence as dafPersist
$> butler = dafPersist.Butler('/global/project/projectdirs/lsst/phosim_deep/feasibility_study/single_raft/output_repo/')
$> icSrc = butler.get('icSrc', dataId={'visit':1414156, 'filter':'r', 'raft':'2,2', 'sensor':'1,1'})

@jchiang87
Copy link
Contributor Author

thanks for the untested code. is there a way of making the butler serve up data faster?

@SimonKrughoff
Copy link
Contributor

I guess I'm not sure what you mean. I assume it is I/O bound.

@jchiang87
Copy link
Contributor Author

the butler's inherent slowness was my motivation for using astropy in this case. in other cases, it just falls down with complaints like "OperationalError: no such column: tract"

@SimonKrughoff
Copy link
Contributor

If you try both the methods mentioned above, does one or the other perform faster?

I'd need to see the specific cases where it gives you that error, to help debug.

@SimonKrughoff
Copy link
Contributor

O.K. I am seeing the same thing you did: i.e. coords in icSrc are nan. I think that's actually expected because the icSrc catalog is produced before the astrometric calibration. If you access the src dataset instead, you'll see that the Coord objects are populated.

Regarding the speed, I don't notice it being particularly slow. Is it possible that when you say the butler loading is slow that you are making a butler instance for every dataset you access? Instantiation of the Butler object is far slower than it should be, but you should only have to do that once.

@jchiang87
Copy link
Contributor Author

I am only doing it once in a given script, but I often have scripts that would use the butler run from the bash command line, so that start up time is always hitting me (in addition to the importing the Stack modules) and slows things down a lot...difficult to deal with when debugging stuff.

@SimonKrughoff
Copy link
Contributor

So can you use the src dataset instead of the icSrc dataset? I.e. does that solve your problem?

I would really like to use the stack persistence framework if we can. I will help do that if I can.

@jchiang87
Copy link
Contributor Author

I think I make a good faith effort to use it first and as much as possible in production code (i.e., not necessarily in tossed off examples that help make it clear where I think a problem lies, e.g., this issue), but if I hit a roadblock, like that tract thing, I'd really rather get my own work done than chase my tail trying to get it to work (which I have spent many keystrokes in the past doing.) The main issue I have with the butler is that it doesn't offer any obvious level of introspection into what sort of data are available, hence the need to post headers of catalog fits files to know what the column names are.

@SimonKrughoff
Copy link
Contributor

O.K. Two things.

I appreciate your trying to use the stack persistence mechanisms and if you are chasing your tail we should genuinely fix those things. The problem with using ad hoc solutions is that other people will cargo cult them and we'll be in real trouble if the underlying persistence mechanisms change (which they will). It may not be as efficient in the short run, but it might be worth it in the long run to spend a little extra time trying to get DM to fix the things that make it unusable for you. Now that we are on slack, you can @-mention me or Nate Pease at any point to get butler help. Nate is very responsive (and is at SLAC).

The main issue I have with the butler is that it doesn't offer any obvious level of introspection into what sort of data are available, hence the need to post headers of catalog fits files to know what the column names are.

This particular complaint I understand in general, but in this case isn't quite fair. If you had read the data into an afwTable.SourceCatalog object, there are mechanisms for introspecting the schema of the catalog. This is not really a butler problem.

@SimonKrughoff
Copy link
Contributor

@jchiang87 this being closed makes me think that using the src dataset fixes your problem. Is that accurate?

@jchiang87
Copy link
Contributor Author

Is that accurate?
yes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants