Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Column types seem to change after inner-join between two DataFrames #3442

Open
KevinG1002 opened this issue May 23, 2024 · 1 comment
Open
Labels

Comments

@KevinG1002
Copy link

Hello,

Thank you for putting this package together. It has helped a lot.

I am working with time series dataframes and I've noticed that when performing join-operations with dataframes the type associated with some of the columns seem to change.

Here's an example:

I have two dataframes, each with two columns. The first one is a "date" column whose entries are Date values, the second one is the value-column of type Float64 where I get the value of the timeseries. In my example, I am looking to perform on inner-join between quarterly GDP and quarterly metal-usage, by joining on the "date" column.

The inner-join statement I use is: X_df = innerjoin(metal_usage_df, global_gdp_df, on = :date)

The GDP dataframe looks like:

date (type Date) GDP (type Float64)
2020-10-01 22024.5
2021-01-01 22600.2
2021-04-01 23292.4

and it is inner-joined with the metal-usage DF, which looks like:

date (type Date) metal-usage (type Float64)
2020-10-01 222.6
2021-01-01 212.1
2021-04-01 239.5

However, when printing out the inner-join df that I get, the GDP column now has a different type:

date (type Date) metal-usage (type Float64) GDP (type Any)
2020-10-01 222.6 22024.5
2021-01-01 212.1 22600.2
2021-04-01 239.5 23292.4

and this causes downstream issues for me. I was wondering what the root cause was for this and if there was a way for me to enforce column types during or before the inner-join operation?

Any help would be much appreciated!

@bkamins
Copy link
Member

bkamins commented May 24, 2024

Could you share a code alowing to reproduce the problem? When I run your example on sample data there are no such issues:

julia> metal_usage_df = DataFrame(date=1:3, metal=[1.5, 2.5, 3.5])
3×2 DataFrame
 Row │ date   metal
     │ Int64  Float64
─────┼────────────────
   1 │     1      1.5
   2 │     2      2.5
   3 │     3      3.5

julia> global_gdp_df = DataFrame(date=1:3, GDP=[21.5, 22.5, 23.5])
3×2 DataFrame
 Row │ date   GDP
     │ Int64  Float64
─────┼────────────────
   1 │     1     21.5
   2 │     2     22.5
   3 │     3     23.5

julia> X_df = innerjoin(metal_usage_df, global_gdp_df, on = :date)
3×3 DataFrame
 Row │ date   metal    GDP
     │ Int64  Float64  Float64
─────┼─────────────────────────
   1 │     1      1.5     21.5
   2 │     2      2.5     22.5
   3 │     3      3.5     23.5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants