Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slurm self supervised #1

Open
wants to merge 38 commits into
base: master
Choose a base branch
from

Conversation

rrichajalota
Copy link

@rrichajalota rrichajalota commented Jan 12, 2023

Optimised the code to use FAISS indexing for nearest neighbour search

  • also debugged the previous implementation
  • added support for Mac M1 (to be tested)
  • current implementation has been tested on 1 GPU
  • implemented other FAISS indexes for approximate nearest neighbour search
  • logging the time it takes to finish indexing
  • fixed the fairseq-generate pipeline.

Differences:

  • Unlike the previous implementation that required breaking down a large dataset into smaller files to be able to compute the dot product btw matrices (MxN multiplication), the code now works on the entire dataset of 50k+ sentences.
  • Reduced the number of nested for-loops, saving O(mn) time with each.

Observations:

  • In terms of speed, IndexFlatL2 < IndexIVFPQ < IndexFlatIVF ; making IndexFlatIVF the best choice
  • the speed up is almost 50% going from IndexFlatL2 (~7.2 mins per epoch) to IndexFlatIVF (~3.6 mins per epoch)

TODOS:

  • Optimize further:
    • test on multiple GPUs/ distributed training
    • test on MAC M1
  • Log the time it takes to complete one iteration

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants