Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make invert-categorical-map more strict on unknown reverse mapping values #395

Open
behrica opened this issue Feb 4, 2024 · 5 comments
Open

Comments

@behrica
Copy link
Contributor

behrica commented Feb 4, 2024

In order to make categorical mapping related code less brittle,
I think we should check and fail in more situations, one is this one:

(require '[tech.v3.dataset.categorical :as ds-cat]
          '[tech.v3.dataset.modelling :as ds-mod]
          '[tech.v3.dataset :as ds])

(def cat-map
  (->
   (ds/->dataset {:a [:x :y]})
   (ds-cat/fit-categorical-map :a)))


(ds-cat/invert-categorical-map (ds/->dataset {:a [0.342 1.6657]})
                               {:src-column :a
                                :lookup-table (:lookup-table cat-map)})

The initial mapping was derived as x -> 1 and y -> 0, but the current code happily maps back 0.342.
This should fail in my view, in the same way as
other numbers like 3 and 4 fail: " Unable to find src value for numeric value 0.342"

@behrica behrica changed the title make invert-categorical-map more strict on unknown reversr mapping make invert-categorical-map more strict on unknown reverse mapping values Feb 4, 2024
@cnuernber
Copy link
Collaborator

Not sure really what to do here. If you had chosen values that do not round to 0 and 1 you would have gotten an exception, perhaps we should use Math/round as opposed to a pure long cast.

@behrica
Copy link
Contributor Author

behrica commented Feb 26, 2024

This looks error prone to me, but not sure what to fix neither.
The below mapping back works due to the long cast

(->(ds/->dataset {:x [:a :b]})
   (ds/categorical->number  [:x])
   :x
   meta
   :categorical-map
   :lookup-table)

;; => {:a 0, :b 1}

|  :x |
|----:|
| 0.0 |
| 1.0 |

@behrica
Copy link
Contributor Author

behrica commented Feb 26, 2024

I would expect that the above produces a look up map:
{:a 0.0., :b 1.0} and that all values except 0.0 and 1.0 would fail when mapping back.

@cnuernber
Copy link
Collaborator

The issue there is floating point comparison

@behrica
Copy link
Contributor Author

behrica commented Sep 14, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants