Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update_table_columns makes invalid assumptions about column ordering #2432

Open
weaverba137 opened this issue Jan 21, 2025 · 2 comments
Open
Assignees
Labels

Comments

@weaverba137
Copy link
Member

desispec.zcatalog.update_table_columns() has been failing when assembling zall-tilecumulative-daily(db).fits files. When the various individual ztile files are concatenated, the columns have this ordering:

['TARGETID', 'SURVEY', 'PROGRAM', 'FIRSTNIGHT', 'LASTNIGHT',
 'SPGRPVAL', 'CHI2', 'COEFF', 'Z', 'ZERR', 'ZWARN', 'NPIXELS',
 'SPECTYPE', 'SUBTYPE', 'NCOEFF', 'DELTACHI2', 'PETAL_LOC',
 'DEVICE_LOC', 'LOCATION', 'FIBER', 'COADD_FIBERSTATUS',
 'TARGET_RA', 'FITMETHOD', 'DESINAME', 'TARGET_DEC',
 'PMRA', 'PMDEC', 'REF_EPOCH', 'LAMBDA_REF', 'FA_TARGET',
 'FA_TYPE', 'OBJTYPE', 'FIBERASSIGN_X', 'FIBERASSIGN_Y',
 'PRIORITY', 'SUBPRIORITY', 'OBSCONDITIONS', 'MORPHTYPE',
 'FLUX_G', 'FLUX_R', 'FLUX_Z',
 'FLUX_IVAR_G', 'FLUX_IVAR_R', 'FLUX_IVAR_Z', 'REF_ID',
 'REF_CAT', 'GAIA_PHOT_G_MEAN_MAG', 'GAIA_PHOT_BP_MEAN_MAG',
 'GAIA_PHOT_RP_MEAN_MAG', 'PARALLAX', 'EBV', 'FLUX_W1', 'FLUX_W2',
 'FIBERFLUX_G', 'FIBERFLUX_R', 'FIBERFLUX_Z',
 'FIBERTOTFLUX_G', 'FIBERTOTFLUX_R', 'FIBERTOTFLUX_Z', 'MASKBITS',
 'SERSIC', 'SHAPE_R', 'SHAPE_E1', 'SHAPE_E2', 'PHOTSYS',
 'CMX_TARGET', 'PRIORITY_INIT', 'NUMOBS_INIT', 'RELEASE',
 'BRICKID', 'BRICKNAME', 'BRICK_OBJID', 'DESI_TARGET',
 'BGS_TARGET', 'PLATE_DEC', 'PLATE_RA', 'MWS_TARGET', 'TILEID',
 'COADD_NUMEXP', 'COADD_EXPTIME', 'COADD_NUMNIGHT', 'COADD_NUMTILE',
 'MEAN_DELTA_X', 'RMS_DELTA_X', 'MEAN_DELTA_Y', 'RMS_DELTA_Y',
 'MEAN_FIBER_X', 'MEAN_FIBER_Y', 'MEAN_FIBER_RA', 'STD_FIBER_RA',
 'MEAN_FIBER_DEC', 'STD_FIBER_DEC', 'MIN_MJD', 'MAX_MJD',
 'MEAN_PSF_TO_FIBER_SPECFLUX', 'MEAN_MJD',
 'TSNR2_ELG_B', 'TSNR2_GPBBACKUP_B', 'TSNR2_GPBBRIGHT_B',
 'TSNR2_GPBDARK_B', 'TSNR2_LYA_B', 'TSNR2_BGS_B', 
 'TSNR2_QSO_B', 'TSNR2_LRG_B', 'TSNR2_ELG_R',
 'TSNR2_GPBBACKUP_R', 'TSNR2_GPBBRIGHT_R', 'TSNR2_GPBDARK_R',
 'TSNR2_LYA_R', 'TSNR2_BGS_R', 'TSNR2_QSO_R', 'TSNR2_LRG_R',
 'TSNR2_ELG_Z', 'TSNR2_GPBBACKUP_Z', 'TSNR2_GPBBRIGHT_Z',
 'TSNR2_GPBDARK_Z', 'TSNR2_LYA_Z', 'TSNR2_BGS_Z', 'TSNR2_QSO_Z',
 'TSNR2_LRG_Z', 'TSNR2_ELG', 'TSNR2_GPBBACKUP', 'TSNR2_GPBBRIGHT',
 'TSNR2_GPBDARK', 'TSNR2_LYA', 'TSNR2_BGS', 'TSNR2_QSO',
 'TSNR2_LRG', 'FLUX_IVAR_W1', 'FLUX_IVAR_W2',
 'ZCAT_NSPEC', 'ZCAT_PRIMARY', 'SCND_TARGET',
 'SV1_DESI_TARGET', 'SV1_BGS_TARGET', 'SV1_MWS_TARGET', 'SV1_SCND_TARGET',
 'SV2_DESI_TARGET', 'SV2_BGS_TARGET', 'SV2_MWS_TARGET', 'SV2_SCND_TARGET',
 'SV3_DESI_TARGET', 'SV3_BGS_TARGET', 'SV3_MWS_TARGET', 'SV3_SCND_TARGET',
 'SV_NSPEC', 'SV_PRIMARY', 'MAIN_NSPEC', 'MAIN_PRIMARY']

which is apparently different from the ordering when assembling a uniformly-processed specprod. Then when the rearrangement (with all_columns=True) is performed, the indices are invalid, and as a result the final list of columns contains some duplicate column names.

Some specific examples in the list above; it is not true that:

  • TARGET columns sit between NUMOBS_INIT and PLATE_RA columns
  • Last column in [sic] TSNR2_LRG in all the redshift catalogs

If there is an option that forces a particular ordering when desispec.scripts.zcatalog.main() is run, we should probably use that. If not, how, programmatically, can one reconstruct the column ordering for a uniformly-processed specprod?

@weaverba137 weaverba137 self-assigned this Jan 21, 2025
@weaverba137
Copy link
Member Author

PS, since this is ultimately for the database, what is the advantage of rearranging the columns at all? The database loading software does not care about column ordering.

@sbailey
Copy link
Contributor

sbailey commented Jan 22, 2025

Related: PR #2368 which provided a minimal fix for another time that the fragile column order assumptions broke the code. Thanks for opening a ticket about the issue.

I agree that we should fix this, especially since it has now bitten us twice. I think the original motivation for reordering the columns was for human friendliness when inspecting the files/tables, e.g. inspecting FITS headers or Table.colnames, or even the datamodel which is likely to follow the order of the columns in the files --- it is helpful to group columns that conceptually go together. i.e. this code and zall files is not just for the database.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants