Inconsistency in crossmatch column data types #273

camposandro · 2024-04-11T14:59:45Z

Bug report

If we decide to keep the non-matches it's possible to get NaN values in our crossmatch dataframe. For every point in the left partitions we will have a row with the left point information and the information of the respective match on the right (which being inexistent will be set to NaN).

When assigning a row with NaN values on a dataframe, Pandas seems to automatically cast the whole column type to "float". Columns such as Norder_{}_xmatch, Dir_{}_xmatch and Npix_{}_xmatch, therefore have an incorrect type.

We should create an end-to-end test to verify that the column data types of the original catalogs remain unchanged.

Before submitting
Please check the following:

I have described the situation in which the bug arose, including what code was executed, information about my environment, and any applicable data others will need to reproduce the problem.
I have included available evidence of the unexpected behavior (including error messages, screenshots, and/or plots) as well as a descriprion of what I expected instead.
If I have a solution in mind, I have provided an explanation and/or pseudocode and/or task list.

The text was updated successfully, but these errors were encountered:

delucchi-cmu · 2024-05-16T23:36:58Z

Has this been addressed (or a little bit improved) by the pyarrow dtype changes?

camposandro · 2024-05-17T17:56:34Z

@delucchi-cmu yes, supporting None values by default using pyarrow should fix the column types. We're holding off on the merge of #271 this week but I might try to build some end-to-end tests in the meantime to make sure the output columns of the crossmatch indeed remain the same!

delucchi-cmu · 2024-08-13T17:15:05Z

This has been addressed by recent changes to using pyarrow types, and holding on to the pyarrow schema throughout operations.

camposandro · 2024-08-13T17:16:38Z

We should make sure the Dask DataFrame meta and the pyarrow schema are consistent whenever we address #390.

camposandro added the bug Something isn't working label Apr 11, 2024

camposandro mentioned this issue Apr 11, 2024

Keep left non-matches for nearest neighbors crossmatch #271

Open

delucchi-cmu assigned camposandro May 16, 2024

delucchi-cmu closed this as completed Aug 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistency in crossmatch column data types #273

Inconsistency in crossmatch column data types #273

camposandro commented Apr 11, 2024

delucchi-cmu commented May 16, 2024

camposandro commented May 17, 2024

delucchi-cmu commented Aug 13, 2024

camposandro commented Aug 13, 2024

Inconsistency in crossmatch column data types #273

Inconsistency in crossmatch column data types #273

Comments

camposandro commented Apr 11, 2024

delucchi-cmu commented May 16, 2024

camposandro commented May 17, 2024

delucchi-cmu commented Aug 13, 2024

camposandro commented Aug 13, 2024