How to get only unique pairs of column1
, column2
, and column3
of a dataframe?
#1831
-
Hi, I am looking for a way to get all rows which have a unique pair of So, adapting the example from that issue to this problem: The setupimport vaex id = [1, 2, 1, 2, 2]
x = [1, 3, 1, 3, 1]
y = [2, 4, 2, 4, 2]
t = [0, 0, 1, 1, 1]
df = vaex.from_arrays(**{"id": id, "x": x, "y": y, "t": t})
df
getting uniquesthis is the part where I wonder if this is possible through vaex itself prime = [6643838879, 8589935681]
df["uniques"] = (df["id"] * prime[0] + df["x"]) * prime[1] + df["y"]
df
df_unqies = df.groupby(
by="uniques",
agg={
"id": vaex.agg.first("id", "id"), # this works since `id`,`x`,`y` are by definition constant across a group
"x": vaex.agg.first("x", "id"), # same as with `id`
"y": vaex.agg.first("y", "id"), # same as with `id`
"t": vaex.agg.mean("t"), # some choice about other columns can be made
},
).drop("uniques", inplace=True)
df_unqies
So the final dataframe is just the rows which have a unique pair of Is there a vaex-native way of doing this?Something like df_unqies = df.groupby(
by=df.unique(["id","x","y"]),
agg={
"id": vaex.agg.first("id", "id"),
"x": vaex.agg.first("x", "id"),
"y": vaex.agg.first("y", "id"),
"t": vaex.agg.mean("t"),
},
) Above I use |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Hi, what about:
|
Beta Was this translation helpful? Give feedback.
Hi,
what about: