Some program problems when using my own dataset #1

GoooDte · 2022-07-19T10:17:56Z

Hello! It's really an excellent work! Thanks for releasing the huggingface transformers based version. Recently, I'm doing experiments on some other datasets. Unfortunately, I met some problems when running the code:

When clustering the datastore, kmeans.train() in Kmeans.py line 41 reports the following error:

My cudatoolkit version is 11.0 and faiss-gpu version is 1.7.2
When getting knns, the knn search process reports the following error:

Maybe I'm not too familiar with the faiss-gpu package, so could you please help me to see how to solve the above problems ? Thanks a lot!

The text was updated successfully, but these errors were encountered:

urialon · 2022-07-20T01:18:08Z

Hi @GoooDte ,
Thank you for your interest in our work!

Thank you for reporting these problems.

I am not sure. Can you share your keys and the exact command line, and I will try running it myself?
I am guessing that you refer to our knn-transformers version, right? I just fixed the --k flag to be of type int rather than float. Can you please git pull and try again? I believe that it will solve this problem. By the way, how do you use the code? using our example scripts, or did you modify it?

Best,
Uri

GoooDte · 2022-07-20T09:44:26Z

Thanks very much, Uri!

Your suggestion really works that my second problem has been solved. I mainly use the code on some other datasets, so maybe only modify a little in preprocess progress.

I have searched on the Internet about my first error. The most likely cause of the problem is the matching problem among the versions of faiss, faiss-gpu, cudatoolkit and CUDA. So could you please tell me your exact versions of the above packages?

Thanks a lot!

urialon · 2022-07-20T12:35:38Z

I'm using:

CUDA 11.2
faiss-gpu 1.7.2
python 3.9
not sure about cudatoolkit, I run it in a shared server and I can't find cudatoolkit.

Questions for you:

What is your python version?
What is your operating system?
Do you have both faiss and faiss-gpu installed? you should install only one of them. Did you install it using pip or conda? I found that on linux it works best if you install using pip, and on a mac it works better if you install it using conda .

Best,
Uri

GoooDte · 2022-07-20T13:57:51Z

I'm using python 3.6 on linux operating system.

I tried to uninstall faiss and used pip to reinstall faiss-gpu to 1.7.2, but it still reports the error bellow:
Faiss assertion 'err == CUBLAS_STATUS_SUCCESS' failed in void faiss::gpu::runMatrixMult(faiss::gpu::Tensor<float, 2, true>&, bool, faiss::gpu::Tensor<T, 2, true>&, bool, faiss::gpu::Tensor<IndexType, 2, true>&, bool, float, float, cublasHandle_t, cudaStream_t) [with AT = float; BT = float; cublasHandle_t = cublasContext*; cudaStream_t = CUstream_st*] at /project/faiss/faiss/gpu/utils/MatrixMult-inl.cuh:265; details: cublas failed (13): (512, 512) x (13, 512)' = (512, 13) gemm params m 13 n 512 k 512 trA T trB N lda 512 ldb 512 ldc 13

urialon · 2022-07-20T14:32:21Z

Can you try:

Set gpu=False here: https://github.com/neulab/retomaton/blob/main/kmeans.py#L39
I vaguely remember that faiss-gpu required a higher python version. Can you try creating a virtual environment or a conda environment using python 3.7 or python 3.9, and then reinstall faiss-gpu in the new environment?
If you are working with small sets, you can use faiss-cpu instead of faiss-gpu

Please let me know how it goes.
Uri

GoooDte · 2022-07-25T14:34:01Z

Sorry for a long time.

I tried your first suggestion on a small part of the wikitext dataset. It works but still need to annotate line54-58 in kmeans.py.

Due to the large size of wikitext, it is hard to use cpu to cluster. I tried to install CUDA 11.2 and construct a new virtual environment using python 3.9 with faiss-gpu 1.7.2. But it still reports the same error. So could you please provide me with a detailed environment information (such as a requirements.txt file).

Many thanks!

urialon · 2022-07-25T15:34:16Z

What kind of GPU do you have?

I have read a bit online about the error you're getting, and some suggested that there's not enough GPU memory.

GoooDte · 2022-07-25T15:36:54Z

My GPU is NVIDIA GeForce RTX 3090

urialon · 2022-07-25T15:46:45Z

Can you take a look at the list of flags, and verify that all of them are correct?

There might be default values that I set which do not match your settings, like dimensions, size of datastore etc?

Another question: if you set the --sample flag to a much smaller value like 1000 - does anything change?

GoooDte · 2022-07-25T15:56:08Z

My flag settings are below:

I try to set --sample to 100 and it adds a WARNING:
WARNING clustering 100 points to 13 centroids: please provide at least 507 training points

urialon · 2022-07-25T17:27:03Z

Yeah, it just means that clustering 100 examples into 13 clusters is likely to result in "bad" clusters.

But does it work without errors? At what sample size does it crash?

I suspect that maybe the GPU memory is the limitation.

What is your overall datastore size? Only 1341?

GoooDte · 2022-07-31T09:49:10Z

Sorry for late again.

I have solved my problem by accidentally changing another linux server. The problem is really caused by environmental problems, but I still don't know which exact environment can run 100% successfully. My new environments are CUDA 10.2, faiss-gpu 1.7.2 and python 3.7.

Thank you for your attention to this issue for so long！

urialon · 2022-07-31T12:30:52Z

Great, I'm glad to hear!
Let me know if you have any more questions.

GoooDte added the question Further information is requested label Jul 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some program problems when using my own dataset #1

Some program problems when using my own dataset #1

GoooDte commented Jul 19, 2022

urialon commented Jul 20, 2022

GoooDte commented Jul 20, 2022

urialon commented Jul 20, 2022

GoooDte commented Jul 20, 2022

urialon commented Jul 20, 2022 •

edited

Loading

GoooDte commented Jul 25, 2022

urialon commented Jul 25, 2022

GoooDte commented Jul 25, 2022

urialon commented Jul 25, 2022

GoooDte commented Jul 25, 2022 •

edited

Loading

urialon commented Jul 25, 2022

GoooDte commented Jul 31, 2022 •

edited

Loading

urialon commented Jul 31, 2022

Some program problems when using my own dataset #1

Some program problems when using my own dataset #1

Comments

GoooDte commented Jul 19, 2022

urialon commented Jul 20, 2022

GoooDte commented Jul 20, 2022

urialon commented Jul 20, 2022

GoooDte commented Jul 20, 2022

urialon commented Jul 20, 2022 • edited Loading

GoooDte commented Jul 25, 2022

urialon commented Jul 25, 2022

GoooDte commented Jul 25, 2022

urialon commented Jul 25, 2022

GoooDte commented Jul 25, 2022 • edited Loading

urialon commented Jul 25, 2022

GoooDte commented Jul 31, 2022 • edited Loading

urialon commented Jul 31, 2022

urialon commented Jul 20, 2022 •

edited

Loading

GoooDte commented Jul 25, 2022 •

edited

Loading

GoooDte commented Jul 31, 2022 •

edited

Loading