Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plotting categorical values as colors #351

Closed
juliohm opened this issue May 7, 2021 · 26 comments
Closed

Plotting categorical values as colors #351

juliohm opened this issue May 7, 2021 · 26 comments

Comments

@juliohm
Copy link
Contributor

juliohm commented May 7, 2021

I have plot recipes that try to plot the categorical values as colors in a geographic map. For example, the crop type in this plot: https://juliaearth.github.io/GeoStats.jl/stable/workflow.html#Plotting-solutions

I was doing some manual pre-processing by depending on CategoricalArrays.jl and converting to a vector of level codes manually. Given that the latest CategoricalArrays.jl support plot recipes already, I wanted to stop doing this manual fix. What is the appropriate method to pass categorical arrays to be interpreted as colors with an appropriate legend containing the levels?

I tried to pass the categorical array as marker_z -> array but it didn´t work. Appreciate any help as this is the last issue I need to solve before releasing a new version of the project.

@juliohm
Copy link
Contributor Author

juliohm commented May 7, 2021

cc: @daschw

@juliohm
Copy link
Contributor Author

juliohm commented May 7, 2021

Very practically, my question is the following:

How can I plot a scatter of points where colors are levels without depending on CategoricalArrays.jl?

using Plots
using CategoricalArrays

c = categorical([1,2,3])

scatter([1,2,3], [1,2,3], marker_z=c)

@juliohm
Copy link
Contributor Author

juliohm commented May 11, 2021

Appreciate any help regarding this issue. Maybe it should be moved to Plots.jl?

@bkamins
Copy link
Member

bkamins commented May 11, 2021

I think you need to wait for @nalimilan to have time to look at the issue, as he implemented the recipe AFAICT.

@nalimilan
Copy link
Member

I have no idea, I just copied the definition provided by @daschw. Maybe @mkborregaard could help too?

@juliohm
Copy link
Contributor Author

juliohm commented May 12, 2021

Any help would be great. The issue is specific to the maker_z option. I can use categorical arrays in heatmaps for example.

@juliohm
Copy link
Contributor Author

juliohm commented May 14, 2021

I guess I will have to revert the changes in downstream projects in order to release? Who is leading Plots.jl nowadays? Should the Julia community pay a software engineer to maintain and fix these issues? It is really hard to progress otherwise.

I will start a thread on Discourse to see what people think about starting a group to split the payment of a salary to a free lancer.

@daschw
Copy link

daschw commented May 14, 2021

Unfortunately, Recipes only apply to input data and not to attributes like marker_z. This would require major changes and additions in RecipesBase.jl and RecipesPipeline.jl, which I don't have the capacity to tackle right now. However PRs are very welcome.

If I try to run your example I get the following error:

julia> scatter([1,2,3], [1,2,3], marker_z=c)
Error showing value of type Plots.Plot{Plots.GRBackend}:
ERROR: MethodError: no method matching get(::ColorSchemes.ColorScheme{Vector{RGBA{Float64}}, String, String}, ::CategoricalValue{Int64, UInt32}, ::Tuple{Float64, Float64})
Closest candidates are:
  get(::CategoricalPool, ::Any, ::Any) at /home/dani/.julia/packages/CategoricalArrays/rDwMt/src/pool.jl:55
  get(::DataStructures.RobinDict{K, V}, ::Any, ::Any) where {K, V} at /home/dani/.julia/packages/DataStructures/ixwFs/src/robin_dict.jl:384
  get(::Test.GenericDict, ::Any, ::Any) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Test/src/Test.jl:1663
  ...
Stacktrace:
  [1] get(::PlotUtils.ContinuousColorGradient, ::CategoricalValue{Int64, UInt32}, ::Tuple{Float64, Float64})
    @ PlotUtils ~/.julia/packages/PlotUtils/es5pb/src/colorschemes.jl:18

So it seems like there is no method for get(colorscheme, categoricalvalue, range). Having a look at https://github.com/JuliaGraphics/ColorSchemes.jl/blob/a50d23f9ba76bc4811fbebd94dd309d6750abe6d/src/ColorSchemes.jl#L234 I see that they only implement methods for AllowedInput, which apparently is Union{Real, AbstractArray{<:Real}}. So it seems that CategoricalValue{Int} is not a subtype of Real.

I tried to overcome this by loosening the type restrictions for get in Colorschemes locally. However, then I run into the following error:

julia> scatter([1,2,3], [1,2,3], marker_z=c)
Error showing value of type Plots.Plot{Plots.GRBackend}:
ERROR: ArgumentError: cannot compare a `CategoricalValue` to value `v` of type `CategoricalValue{Int64, UInt32}`: wrap `v` using `CategoricalValue(v, catvalue)` or `CategoricalValue(v, catarray)` first
Stacktrace:
  [1] <(x::CategoricalValue{Int64, UInt32}, y::Float64)
    @ CategoricalArrays ~/.julia/packages/CategoricalArrays/rDwMt/src/value.jl:176
  [2] <(y::Float64, x::CategoricalValue{Int64, UInt32})
    @ CategoricalArrays ~/.julia/packages/CategoricalArrays/rDwMt/src/value.jl:180
  [3] >(x::CategoricalValue{Int64, UInt32}, y::Float64)
    @ Base ./operators.jl:305
  [4] clamp(x::CategoricalValue{Int64, UInt32}, lo::Float64, hi::Float64)
    @ Base.Math ./math.jl:65
  [5] _broadcast_getindex_evalf
    @ ./broadcast.jl:648 [inlined]
  [6] _broadcast_getindex
    @ ./broadcast.jl:621 [inlined]
  [7] getindex
    @ ./broadcast.jl:575 [inlined]
  [8] copy
    @ ./broadcast.jl:898 [inlined]
  [9] materialize
    @ ./broadcast.jl:883 [inlined]
 [10] get(cscheme::ColorSchemes.ColorScheme{Vector{RGBA{Float64}}, String, String}, x::CategoricalValue{Int64, UInt32}, rangescale::Tuple{Float64, Float64})
    @ ColorSchemes ~/.julia/dev/ColorSchemes/src/ColorSchemes.jl:240

I'm not really sure what we in Plots can do here without depending on CategoricalArrays.

@juliohm
Copy link
Contributor Author

juliohm commented May 14, 2021

Would you reconsider the dependency on CategoricalArrays.jl? Without an explicit treatment of categorical variables we won't be able to generate correct legend elements for nominal/ordered variables for example.

@bkamins
Copy link
Member

bkamins commented May 14, 2021

Without an explicit treatment of categorical variables we won't be able to generate correct legend elements for nominal/ordered variables for example.

Agreed. Also given that CategoricalArrays.jl is now compiler friendly and Plots.jl is compiler-heavy anyway I would vote to add this integration. An alternative would be to add appropriate methods to DataAPI.jl and use the interface.

@nalimilan
Copy link
Member

The methods that error really look like they need a Real value, so it's not surprising that they fail for CategoricalValue and I don't see how they could work with them. Maybe what's needed is rather a fallback method for non-Real types that would attribute a real value to each unique value?

@juliohm How do you expect categorical values to be translated to colors? Should that be equivalent to passing the result of levelcode(v)?

@bkamins
Copy link
Member

bkamins commented May 14, 2021

I was referring to the general approach. In this case (the mapping @juliohm wants) - I do not know enough about the problem to informatively comment.

@juliohm
Copy link
Contributor Author

juliohm commented May 14, 2021

@juliohm How do you expect categorical values to be translated to colors? Should that be equivalent to passing the result of levelcode(v)?

Or a method that maximizes the visual discrepancy between the categories. When the notion of order is important, it could produce a sequential colormap for example. Think of a cloropleth maps like this one: https://en.wikipedia.org/wiki/Choropleth_map#/media/File:Countries_by_mean_wealth_per_adult_in_2018.png

@nalimilan
Copy link
Member

The approach I mentioned would definitely work if you ignore the order and just choose a qualitative palette. To take into account order, the implementation would have to find a reliable way of checking whether the input values are ordered or not (see JuliaData/DataAPI.jl#26).

I'm not sure whether it should be Plots or ColorSchemes' job to do this, but probably worth filing an issue in one of these packages?

@daschw
Copy link

daschw commented May 14, 2021

The marker_z attribute in Plots is a mapping from numerical values to colors in a ColorGradient or ColorScheme. So this only makes sense for <: Real values. I would not really know right now, how to handle this generally for CategoricalValues if they are not numeric.
We could take the CategoricalArrays dependency in Plots, as @bkamins suggested, if it is compiler-friendly now and fallback to levelcode. However, automatic matching with the legend entries in Plots is probably more involved, than adding this fallback.
@juliohm why do you want to remove the CategoricalArrays dependency from your recipes?

@daschw
Copy link

daschw commented May 14, 2021

Actually, I think markercolor would be the appropriate attribute here. This accepts vectors of any input that specifies colors or vectors of Int. In the latter case elements from the current ColorPalette palette are chosen via getindex. This also fails for CategoricalArrays right now, but I think it better translates to the idea behind CategoricalValues.

@juliohm
Copy link
Contributor Author

juliohm commented May 14, 2021

@juliohm why do you want to remove the CategoricalArrays dependency from your recipes?

I assumed that the plot recipes provided in CategoricalArrays.jl + Plots.jl would be the official way moving forward. So any package interested in plotting categorical variables would just assume it works out of the box and would forward a CategoricalArray to the Plots.jl pipeline.

@daschw
Copy link

daschw commented May 14, 2021

If you use marker_z you pick colors from a "continuous" color scale. In that case you get a colorbar and a single legend entry, because you are plotting a single series:

scatter([1,2,3], [1,2,3], marker_z=[1, 2, 3])

marker_z

If you use markercolor you pick colors from a "discrete/categorical" color scale. You still get a single legend entry because you plot a single series:

scatter([1,2,3], [1,2,3], markercolor=[1, 2, 3])

markercolor

I suppose (I might be wrong here) you want different legend entries for different categorical values. For this you have to group your input into multiple series. This already works for CategoricalArrays with group:

scatter([1,2,3], [1,2,3], group=categorical(["A", "B", "C"]))

group

Anyway, I think this is not at all a CategoricalArrays issue and can be closed here. @juliohm if you want we can continue the discussion in a Plots issue.

@juliohm
Copy link
Contributor Author

juliohm commented May 14, 2021

From what I understand the problem now is about figuring out automatically which attribute to set depending on the vector type. If the vector of values is a vector of Number then we should use maker_z otherwise we should use group for categorical arrays. Is that a correct statement @daschw ?

The problem remains because in order to differentiate between the two cases we need access to the categorical array type. So if Plots.jl could take CategoricalArrays.jl as a dependency, we could have a single attribute type for "color of markers" that would do the correct thing internally. Does it make sense?

I like that the issue is discussed here because then core maintainers of CategoricalArrays.jl can share their perspectives on a good design. Right now even with the group option (which I am gonna try soon), we need to be able to differentiate normal arrays from categorical arrays in user code.

@daschw
Copy link

daschw commented May 14, 2021

From what I understand the problem now is about figuring out automatically which attribute to set depending on the vector type. If the vector of values is a vector of Number then we should use maker_z otherwise we should use group for categorical arrays. Is that a correct statement @daschw ?

Actually, I'd argue we (Plots) should not automatically figure out which attribute to set, but use the attribute that is provided by the user. I think the combination of marker_z and CategoricalArray does not make sense, because marker_z only works for numerical values and, the way I see it, CategoricalValues are never really numerical - even if they have numerical values. This is also how the current recipe for CategoricalArrays works. Consider the following example, where two "numerical" CategoricalArrays are plotted. The axes have numerical labels - the values of the categorical arrays - however, they are not really numerical, as you can see, if you consider the distances between the ticks.

plot(categorical([1, 3, 7], categorical([19, -4, 100])))

categoricalaxes

@juliohm
Copy link
Contributor Author

juliohm commented May 14, 2021

Perhaps I didn't explain myself clearly. I am talking about automatic detection between non-categorical arrays and categorical arrays. End users will want to pass colors no matter if their variables are continuous or categorical. Currently they have to figure out by themselves that marker_z only works for continuous and group only works for categorical. There is no such thing as color = vector of colors or markercolor = vector of colors that works for both cases.

Package writers like myself could add a dependency on CategoricalArrays.jl to implement this basic choice for the user, but I think this would be much more useful in Plots.jl already (specifically plot recipes). Anyone wanting to plot categorical or continuous values could just pass a vector and internally Plots.jl would use the correct attribute.

@nalimilan
Copy link
Member

Why not check whether the value is a Real, and if not consider that it's categorical? That would also make sense e.g. for strings. Then you don't need to depend on CategoricalArrays.

@juliohm
Copy link
Contributor Author

juliohm commented May 15, 2021

Unfortunately the group option doesn't work within plot recipes:

using RecipesBase

struct Foo end

@recipe function f(foo::Foo, data)
  seriestype --> :scatter
  if eltype(data) <: Number
    marker_z --> data
    colorbar --> true
  else
    group --> data
  end
  [Tuple(rand(2)) for i in 1:length(data)]
end

using CategoricalArrays

plot(Foo(), categorical([1,2,3]))

image

@juliohm
Copy link
Contributor Author

juliohm commented May 15, 2021

I will go ahead and submit a release with this bug because of pressing deadlines, but it would be nice to see a workaround.

@juliohm
Copy link
Contributor Author

juliohm commented May 23, 2021

@daschw do you have a solution for the plot recipe situation above?

@juliohm
Copy link
Contributor Author

juliohm commented Apr 2, 2024

That is no longer needed, and can be handled in downstream recipes with post-processing.

@juliohm juliohm closed this as completed Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants