Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert SnpArray into matrix{Int} doesn't obey conversion convention #79

Open
biona001 opened this issue Oct 14, 2020 · 5 comments
Open

Comments

@biona001
Copy link
Member

According to documentation, 0x01 encodes missing (NaN), 0x02 encode 1, 0x03 encodes 2. But if we convert a SnpArray into matrix of subtype Int, 0x01 will become 1, 0x02 = 2, and 0x03 = 3.

using SnpArrays
const mouse = SnpArray(SnpArrays.datadir("mouse.bed"))
julia> convert(Matrix{Float64}, mouse)[1:10]
10-element Array{Float64,1}:
 1.0
 1.0
 2.0
 1.0
 2.0
 1.0
 1.0
 1.0
 1.0
 2.0
julia> convert(Matrix{Int64}, mouse)[1:10]
10-element Array{Int64,1}:
 2
 2
 3
 2
 3
 2
 2
 2
 2
 3

Is this by design or a bug?

@biona001
Copy link
Member Author

biona001 commented Oct 14, 2020

The encoding also messes up if I try to convert a snparray into a Matrix{Union{Float64, Missing}}

using SnpArrays
const mouse = SnpArray(SnpArrays.datadir("mouse.bed"))
convert(Matrix{Union{Float64, Missing}}, mouse)[1:10]
10-element Array{Union{Missing, Float64},1}:
 2.0
 2.0
 3.0
 2.0
 3.0
 2.0
 2.0
 2.0
 2.0
 3.0

@kose-y
Copy link
Member

kose-y commented Nov 6, 2020

The function Base.convert(::Type{T}, x::UInt8, ::Union{Val{1}, Val{2}, Val{3}}) and Base.copyto!( v::AbstractVecOrMat{T}, s::AbstractSnpArray; model::Union{Val{1}, Val{2}, Val{3}} = ADDITIVE_MODEL, center::Bool = false, scale::Bool = false, impute::Bool = false ) are only defined for T <: AbstractFloat.

from https://github.com/OpenMendel/SnpArrays.jl/blob/master/src/snparray.jl#L290:

@inline function convert(::Type{T}, x::UInt8, ::Val{1}) where T <: AbstractFloat
    iszero(x) ? zero(T) : isone(x) ? T(NaN) : T(x - 1)
end

@inline function convert(::Type{T}, x::UInt8, ::Val{2}) where T <: AbstractFloat
    iszero(x) ? zero(T) : isone(x) ? T(NaN) : one(T)
end

@inline function convert(::Type{T}, x::UInt8, ::Val{3}) where T <: AbstractFloat
    (iszero(x) || x == 2) ? zero(T) : isone(x) ? T(NaN) : one(T)
end

If T is not <:AbstractFloat, the values from s.data are directly copied to the newly allocated array by the indexing rule of SnpArray and Base implementation of Base.copyto!().

I think it would be straightforward to define convert(::Type{T}, x::UInt8, ::Union{Val{1}, Val{2}, Val{3}}) for T <: Union{<:AbstractFloat, Missing}.

However, I'm not sure if converting SnpArray to an integer array in the way you suggested is a good idea because the result would be different from the indexing rule of SnpArrays, and that might cause another type of confusion. Is there a specific reason to convert an SnpArray to Matrix{Int64}?

@Hua-Zhou What do you think?

+) Also, there is no NaN for Integers.

@biona001
Copy link
Member Author

biona001 commented Nov 6, 2020

I had to convert a SnpArray to a Matrix{Union{Float64, Missing}} at some point, but it was handled manually. No use case for Matrix{Int64} so far.

Probably the easiest thing to do is make SnpArrays throw errors for these case instead of converting incorrectly

@kose-y
Copy link
Member

kose-y commented Nov 7, 2020

Changing

Array{T,N}(s::AbstractSnpArray; kwargs...) where {T,N} = 
    copyto!(Array{T,N}(undef, size(s)), s; kwargs...)

on https://github.com/OpenMendel/SnpArrays.jl/blob/master/src/snparray.jl#L370
to something like

Array{T,N}(s::AbstractSnpArray; kwargs...) where {T,N} = 
    throw(ArgumentError("Cannot convert $(typeof(s)) to Array{$T,$N}")) 
Array{T,N}(s::AbstractSnpArray; kwargs...) where {T <: AbstractFloat,N} = 
    copyto!(Array{T,N}(undef, size(s)), s; kwargs...)

sounds reasonable to me.

@kose-y
Copy link
Member

kose-y commented Mar 12, 2021

In Plink 2's fixed-width format, 0x00 encodes 0, 0x01 encodes 1, 0x02 encodes 2, and 0x03 encodes missing. This might be a good Int-representation for a SnpArray.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants