Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error using detect command #3

Open
dy-lin opened this issue Jul 5, 2022 · 4 comments
Open

Error using detect command #3

dy-lin opened this issue Jul 5, 2022 · 4 comments

Comments

@dy-lin
Copy link

dy-lin commented Jul 5, 2022

I get a ValueError when using the detect command.

python3 main.py detect -i infile.npz -r reference.250kb.npz -o output

Where the reference was built using 19 NK samples (logfile for reference building in #1)

Traceback (most recent call last):
  File "/projects/karsanlab/jbridgers_dev/PEGASUS/KARSANBIO-2955_Testing_WisecondorFF/WisecondorFF/src/main.py", line 1203, in <module>
    main()
  File "/projects/karsanlab/jbridgers_dev/PEGASUS/KARSANBIO-2955_Testing_WisecondorFF/WisecondorFF/src/main.py", line 41, in wrap
    output = f(*args, **kwargs)
  File "/projects/karsanlab/jbridgers_dev/PEGASUS/KARSANBIO-2955_Testing_WisecondorFF/WisecondorFF/src/main.py", line 1199, in main
    args.func(args)
  File "/projects/karsanlab/jbridgers_dev/PEGASUS/KARSANBIO-2955_Testing_WisecondorFF/WisecondorFF/src/main.py", line 41, in wrap
    output = f(*args, **kwargs)
  File "/projects/karsanlab/jbridgers_dev/PEGASUS/KARSANBIO-2955_Testing_WisecondorFF/WisecondorFF/src/main.py", line 900, in wcr_detect
    _rc_results["results_nr"] + (_fs_results["results_nr"])
ValueError: operands could not be broadcast together with shapes (761,19) (761,) 

I'm not sure what the second shape is or where the 761 value comes from, but the 19 is probably to do with the 19 samples used to generate the reference. WisecondorX recommends at least 50 reference samples, is this an issue with WisecondorFF?

@dy-lin
Copy link
Author

dy-lin commented Jul 5, 2022

Is there a public NIPT test dataset that can be used to determine if it is the installation that is malfunctioning and not specific to the dataset?

@tomokveld
Copy link
Owner

The 761 refers to the number of regions selected across the genome that are in the reference set. Such that there is a 761 x 19 (region x sample) matrix for the read count, and there should be matrix of equal dimensionality for the fragment size.
However the error (merging the null ratios) happens after combining the z-scores, r-scores, and weights, which do not give an error (so the matrix dimensionalities match here). So something must have gone wrong while determining the ratios during reference construction (line 444-452)...

About the number of samples. In theory fewer than 50 samples will work but the effects of variability between samples will become stronger as the number decreases, such that calls may not be as reliable as you would want... Still that should not be a problem when you're just testing things out.

At the moment I am in the final months of my PhD, finalizing other projects, hence it is difficult for me to give support to this (documentation is extremely lacking right now) and other projects.

@dy-lin
Copy link
Author

dy-lin commented Jul 6, 2022

At the moment I am in the final months of my PhD, finalizing other projects, hence it is difficult for me to give support to this (documentation is extremely lacking right now) and other projects.

Thanks for letting me know! I'l conduct some independent investigation into the reference construction code and see if there are any unexpected values.

@dy-lin
Copy link
Author

dy-lin commented Nov 24, 2022

I'd really like to test the performance of WisecondorFF against WisecondorX. Is there anymore bandwidth for fixing this bug given its been a few months?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants