How to get only unique pairs of `column1`, `column2`, and `column3` of a dataframe? #1831

Eisbrenner · 2022-01-14T11:41:59Z

Eisbrenner
Jan 14, 2022

Hi, I am looking for a way to get all rows which have a unique pair of id, x, and y. I assume this is somehow related to a question I had a while back, #1448. However, it's not quite the same.

So, adapting the example from that issue to this problem:

The setup

import vaex

id = [1, 2, 1, 2, 2]
x = [1, 3, 1, 3, 1]
y = [2, 4, 2, 4, 2]
t = [0, 0, 1, 1, 1]
df = vaex.from_arrays(**{"id": id, "x": x, "y": y, "t": t})
df

getting uniques

this is the part where I wonder if this is possible through vaex itself

prime = [6643838879, 8589935681]
df["uniques"] = (df["id"] * prime[0] + df["x"]) * prime[1] + df["y"]
df

#	id	x	y	t	uniques
0	1	1	2	0	1729916432998422434
1	2	3	4	0	3459832874586780549
2	1	1	2	1	1729916432998422434
3	2	3	4	1	3459832874586780549
4	2	1	2	1	3459832857406909185

df_unqies = df.groupby(
    by="uniques",
    agg={
        "id": vaex.agg.first("id", "id"), # this works since `id`,`x`,`y` are by definition constant across a group
        "x": vaex.agg.first("x", "id"), # same as with `id`
        "y": vaex.agg.first("y", "id"), # same as with `id`
        "t": vaex.agg.mean("t"), # some choice about other columns can be made
    },
).drop("uniques", inplace=True)
df_unqies

#	id	x	y	t
0	1	1	2	0.5
1	2	1	2	1
2	2	3	4	0.5

So the final dataframe is just the rows which have a unique pair of id, x, and y. This is also where it breaks away from #1448 since the aggregations work only since I have unique pairs, which I didn't have in the other issue.

Is there a vaex-native way of doing this?

Something like

df_unqies = df.groupby(
    by=df.unique(["id","x","y"]),
    agg={
        "id": vaex.agg.first("id", "id"),
        "x": vaex.agg.first("x", "id"),
        "y": vaex.agg.first("y", "id"),
        "t": vaex.agg.mean("t"),
    },
)

Above I use df.unique in a way it doesn't work, so thats where it breaks.

Answered by maartenbreddels

Jan 14, 2022

Hi,

what about:

df.groupby(['id', 'x', 'y'], agg={'t': vaex.agg.mean('t')})

View full answer

maartenbreddels · 2022-01-14T12:58:30Z

maartenbreddels
Jan 14, 2022
Maintainer

Hi,

what about:

df.groupby(['id', 'x', 'y'], agg={'t': vaex.agg.mean('t')})

1 reply

Eisbrenner Jan 14, 2022
Author

Yes, thank you! I didn't realize that groupby can take a list of columns...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to get only unique pairs of `column1`, `column2`, and `column3` of a dataframe? #1831

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

How to get only unique pairs of column1, column2, and column3 of a dataframe? #1831

Eisbrenner Jan 14, 2022

The setup

getting uniques

Is there a vaex-native way of doing this?

Replies: 1 comment · 1 reply

maartenbreddels Jan 14, 2022 Maintainer

Eisbrenner Jan 14, 2022 Author

How to get only unique pairs of `column1`, `column2`, and `column3` of a dataframe? #1831

Eisbrenner
Jan 14, 2022

Replies: 1 comment 1 reply

maartenbreddels
Jan 14, 2022
Maintainer

Eisbrenner Jan 14, 2022
Author