-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More generic data #24
Comments
The package right now is heavily focused towards computing things on a set of geometric points but as you say, you can compute distances between other objects. And yes, the correct input should just be a list of things where immutable Point{N, T}
coordinates::NTuple{N, T}
end One can then do: julia> p = rand(3, 5)
3x5 Array{Float64,2}:
0.310207 0.783898 0.780439 0.607608 0.0795108
0.221908 0.53741 0.553835 0.373283 0.522569
0.291331 0.210073 0.0570539 0.809761 0.397538
julia> reinterpret(Point{3, Float64}, p, (5,))
5-element Array{Point{3,Float64},1}:
Point{3,Float64}((0.31020720319673756,0.22190808492334102,0.2913307988933631))
Point{3,Float64}((0.783898246452825,0.5374103691511933,0.21007290525123268))
Point{3,Float64}((0.7804389420189257,0.5538345595214069,0.0570539417966307))
Point{3,Float64}((0.6076080127056103,0.3732832928705745,0.8097605299871604))
Point{3,Float64}((0.07951078253622623,0.5225687936560375,0.3975381159479192)) To quickly convert between a matrix to a vector of points. This is not type stable but the cost is only paid once by making sure to use a function barrier. |
Thanks for all these answers. Would you be willing to pay a performance penalty for those cases where the data could in fact be presented in an |
I wouldn't like keeping both the interfaces, would be too much code duplication and awkward to test everything. I would just convert a This issue is actually a bit related: #17 |
Ok. I would be a little worried that a performance penalty would show up beyond the initial reinterpret, conceptually whenever |
As an example: immutable Point{N, T} <: AbstractVector{T}
coordinates::NTuple{N, T}
end
Base.length{N}(::Point{N}) = N
Base.size{N}(::Point{N}) = (N,)
Base.linearindexing(::Type{Point}) = Base.LinearFast()
function Base.getindex(p::Point, i::Int)
@inbounds v = p.coordinates[i]
return v
end
function compare{N, T}(A1::Matrix{T}, A2::Matrix{T},
V1::Vector{Point{N, T}}, V2::Vector{Point{N, T}})
m = Euclidean()
@time begin
for i in 1:length(V1)
evaluate(m, V1[i], V2[i])
end
end
@time begin
for i in 1:size(A1, 2)
evaluate(m, slice(A1, :, i), slice(A2, :, i))
end
end
end
A1 = rand(3, 10^6)
A2 = rand(3, 10^6)
V1 = reinterpret(Point{3, Float64}, A1, (10^6,))
V2 = reinterpret(Point{3, Float64}, A2, (10^6,))
compare(A1, A2, V1, V2) gives: julia> compare(A1, A2, V1, V2)
0.012314 seconds
0.070496 seconds (4.00 M allocations: 152.588 MB, 26.75% gc time) so using |
Very cool & illuminating! Thanks. |
Hi Kristoffer, |
Ball trees should work with arbitrary objects which has a defined metric. However, I am no expert no these stuff and I don't know how to compute the bounding hyper sphere nor how to split the points into sub spheres. Right now I am splitting them in the same way as for the KD trees where you sort the objects along a certain dimension but for general objects you don't even have he concept of a dimension. |
I see, thanks. I guess a refactor of the package towards more generic data would not be super useful without an abstract tree algorithm for metric spaces. Although, in order to separate the workload, I think the interface for generic data could be implemented with a fallback on BruteTree for abstract metric spaces with minimal work, as a start. Assuming abstract metric space tree algorithms would suffer from a performance penalty (probable), would you be willing to pay that price for BallTrees, or would you prefer two implementations living side by side? |
I believe we could use multiple dispatch to overload things in such a way that we don't need two completely separate implementations. The |
A couple of things I found out after digging for a little time. Unfortunately it looks like what we want for abstract metric spaces are "VP trees" and that the algorithm for traversing VP trees is not the same as the one for ball trees. Here is a description: As a much quicker hack, I think dispatching a |
This is better now and you can have a Vector of |
Thanks Kristoffer, this looks great as always. Implementing all the changes we mentioned in this thread proved a bit overwhelming for me, especially since I don't have your command of efficient generic programming. I ended up putting together a basic VP Tree algorithm for my own use. I just pushed some comments to it here. The data is given as |
(In my understanding) the package currently assumes the
n
sample points constituting the data are given as an(d,n)
array, where each sample point is ad
-dimensional vector of some sort. It would be great if we could have access to a more generic data presentation (n
widgets ofWidgetType
types). This would work particularly well with #23 and would have applications in tons of contexts. Eg you have sentences and you define distance between sentences, US Senators and you define ideological distances between senators, pictures of cats, Julia packages, etc.The text was updated successfully, but these errors were encountered: