share idea

lucidrains · Oct 2, 2023 · 78fde6f · 78fde6f
1 parent fd2cc0f
commit 78fde6f
Show file tree

Hide file tree

Showing 2 changed files with 3 additions and 2 deletions.
diff --git a/README.md b/README.md
@@ -18,6 +18,7 @@ Also have a few ideas of my own that I will try and share in this repository, if
     - [x] complete prophet net with hierarchical transformer training
     - [ ] complete the spec decoding algorithm using trained prophet net transformer
 
+- [ ] for early exit strategy, try randomly summing last cached embedding back to the same model (a la alphafold2 recycling), randomly cropped along sequence length, and train early exit loss this way. see if one can improve the gamma this way
 - [ ] dedicate a morning to microoptimizations
 
 ## Citations

diff --git a/train_prophet.py b/train_prophet.py
@@ -75,8 +75,8 @@ def inner(*args, **kwargs):
 
 prophet = Decoder(
     num_tokens = 256,
-    dim = 128,
-    depth = 4
+    dim = 512,
+    depth = 2
 )
 
 model_and_prophet = ModelWithProphetWrapper(