Guided generation to classify Python projects #379
lmmx
started this conversation in
Show and tell
Replies: 1 comment
-
Thanks, that's really helpful. If only every user would send us detailed reports like this 🙏 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Wanted to share some examples of how I tried to use Outlines and what did/didn't work so well, and what seemed not possible.
My idea (use case) was based around PyPI trove classifiers, which are the tags you put on a project, and I wanted to experiment with whether I could automatically select some of them, building towards automated tag suggestion based on project description/README etc.
So first off I wrote this example of a program that picks an "Intended Audience" (program 0) and when that failed I tried another to pick Development Status, thinking that'd be an easier task (program 1). I concluded that TinyLlama was too small a model, because I didn't manage to get good results.
I attach the source for these below followed by their output. (I only noticed that I wasn't using the GPU in the 4th script 😅 I had to look up how the device gets passed through to the transformers code but this was ultimately fine, and bugs were not a fault in this library).
Environment setup note
Note that for installation I added outlines as a dependency of the project this demo relies on, classipypi - specifically here. By using the newly added PDM per-dependency URL I was able to install PyTorch entirely via pip (I usually use conda). This means that to reproduce the following results you can choose to
pdm install
and the lockfile will give the same environment (by default it'll make a.venv
but I use a conda env which I activate then runwhich python > .pdm-python
which then lets PDM pick up the env without me even needing to activate it).Code
Click to show
0_pick_audience_demo.py
Result: it always chooses "Information Technology" (or otherwise fails).
Next I tried...
Click to show
1b_pick_devel_status_demo_tinyllama_1b_fixed_prompt_formatting.py
The results here were essentially random, as if the model was just guessing not following the prompt.
Next I did some more reading around, and recalled the Zephyr model (Mistral finetuned by HuggingFace) and got that to run, switching to the beta version.
Click to show
3_pick_devel_status_demo_zephyr_beta.py
Result: it suddenly began to work really well, and to emphasise where it was/wasn't stable I added 3 attempts per prompt (so repeated answers indicate confidence). This suggested to me that the Zephyr model was the way to go and that I could do something with this library!
Next I tried the audience one again, since now I had confidence it might be able to do the harder task... I also stopped chopping the full multi-part trove classifier tag apart as I realised it didn't change the effectiveness.
Click to show
4_pick_audience_demo_zephyr_beta.py
The results here were even stronger: essentially getting 100%
Lastly I realised I didn't need to include the list of generation choices in the prompt (but I did need to include ICL demos in it). When I removed the examples from the prompt the performance dropped, but when I took the generation choices out there was no change. This is hardly new, I've read about these techniques before, and it's intuitive that the guided generation wouldn't need to be made aware of the options in advance (in the middle of the prompt).
Click to show
5_pick_audience_demo_zephyr_beta_no_tag_list_in_prompt.py
Generation failure/slowdown when given many choices
I also ran an attempt to pass in all of the trove classifier tags and when I did this it just 'gummed up' (froze, and maybe it was going to complete but I cancelled out). I then began to think about how to break the problem down into nested Pydantic models, or to do each category of trove classifier separately.
(I don't have the code to hand for this but it was essentially as above, but with all of the trove classifiers i.e. running
list_tags
withoutinclude
filters in theListingConfig
: the 839 trove classifiers you get on the command line from runningclassipypi ls
).This was something of a weekend hack and I didn't get back to look at it again until @rlouf nudged me to send this user report, I hope it's helpful and needless to say bravo! 🙂
Beta Was this translation helpful? Give feedback.
All reactions