-
Notifications
You must be signed in to change notification settings - Fork 108
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Description
I face a problem when I try to reproduce the paper code GIANT. I used my own text-atttibuted graph dataset and followed the data processing instruction by GIANT.
It seems really strange that this problem occurred at training level 1, while it can be well at training level 0.
I try to direct this issue, and the only problem I can find is that it may occur at sparse_matmul() function in matcher._predict().
Steps to reproduce
The command is
CUDA_VISIBLE_DEVICES=1 python3 -m pecos.xmc.xtransformer.train -t X.trn.txt -x X.trn.tfidf.npz -y Y.trn.npz -m xrt_models --batch-gen-workers 0
Error message or code output
12/29/2023 13:02:58 - INFO - pecos.xmc.xtransformer.matcher - | [ 5/ 5][ 7150/ 7220] | 1373/1444 batches | ms/batch 451.6586 | train_loss 7.300417e-01 | lr 9.695291e-07
12/29/2023 13:03:24 - INFO - pecos.xmc.xtransformer.matcher - | [ 5/ 5][ 7200/ 7220] | 1423/1444 batches | ms/batch 451.0563 | train_loss 7.260027e-01 | lr 2.770083e-07
12/29/2023 13:03:24 - INFO - pecos.xmc.xtransformer.matcher - | **** saving model (avg_prec=0) to /tmp/tmpo8wg3j8h at global_step 7200 ****
12/29/2023 13:03:26 - INFO - pecos.xmc.xtransformer.matcher - -----------------------------------------------------------------------------------------
12/29/2023 13:03:36 - INFO - pecos.xmc.xtransformer.matcher - Reload the best checkpoint from /tmp/tmpo8wg3j8h
Floating point exception (core dumped)
Environment
- Operating system: Ubuntu-22.04.1 (X86)
- Python version: 3.9.18
- PECOS version: 1.2.2
- torch: 1.13.1
- numpy: 1.26.2
- scipy: 1.11.4
- transformers: 4.36.2
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working