Skip to content

Conversation

@jmhb0
Copy link
Contributor

@jmhb0 jmhb0 commented Sep 18, 2025

Summary

  • Replaces multi‑MB inline payloads (e.g., base64 images) with small hash strings during adapter formatting (few‑shot demos and history), restoring the original data right before sending to the LM.
  • This is much faster for requests with many images (~95% faster in this experiment, but it depends on size and number of images). It also uses less memory.
  • If the maintainers agree with the PR, then I can extend it to the other larger dspy Types like Audio and Document

Problem

Building prompts in Adapter is slow when using large data types like dspy.Image. This gist is a realistic reproducible example: a (1000,1000) image, with 15 few-shot images. It profiles running a prediction for 100 images after warming up the cache:

Total processing time: 53.33 seconds
Examples processed: 100
Average time per example: 0.5333 seconds
Throughput: 1.88 examples/second

Peak RSS memory usage: 6.466 GB
Average RSS memory usage: 3.564 GB
Memory samples collected: 18

Function-level profiling results:
--------------------------------------------------------------------------------
Top 20 functions by TOTAL TIME (self time - actual CPU work):
         1282039 function calls (1208276 primitive calls) in 53.330 seconds

   Ordered by: internal time
   List reduced from 625 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      100   32.564    0.326   36.929    0.369 /Users/jamesburgess/dspy/dspy/adapters/types/base_type.py:104(split_message_content_for_custom_types)
     1600   12.177    0.008   12.177    0.008 /Users/jamesburgess/miniconda3/envs/dspy-dev/lib/python3.11/json/encoder.py:205(iterencode)
     1700    4.261    0.003    4.262    0.003 /Users/jamesburgess/miniconda3/envs/dspy-dev/lib/python3.11/json/decoder.py:343(raw_decode)
      100    2.300    0.023    2.300    0.023 {built-in method _hashlib.openssl_sha256}
      100    0.699    0.007    0.699    0.007 {orjson.dumps}
     6100    0.388    0.000    0.388    0.000 {method 'strip' of 'str' objects}
        4    0.154    0.038    0.154    0.038 {built-in method gc.collect}

Building 100 prompts takes 53 seconds, which is very slow for just making a prompt. This is especially noticeable when trying to rerun a cached prediction with a few hundred samples. E.g. in my workflow, I do this as part of dataset preprocessing.

There's also a memory issue: in the Gist, if you look at the example_RSS_increase.log file, you'll also see that RSS memory also grows steadily, from 1.192GB to 6.5GB over 100 samples. For a different (real) experiment, this issue eventually led to MemoryError (however the behaviour here did depend on the system: memory kept increasing on one server, but RSS was reclaimed on my laptop, so I'm less sure what's going on here).

Cause

In adapters.base.Adapter, the format function builds a prompt as one big string. So it stringifies large data (dspy types like Image, Audio) and sticks it to the rest of the prompt. Since special data types are sent to the LM as a separate item in the messages list, the method searches the string to extract those data - that's the split_message_content_for_custom_types function that's taking all the runtime. This is slow because the strings are so big.

Also, there is high memory demand because each large Image string is copied and concatted together to make the large prompt string.

Solution

Temporarily convert large data to a hash before doing the formatting. Then after the messages is build, replace the hash with the real data.

Key Changes

  • dspy/adapters/utils.py
    - Introduce a LargePayloadHashManager manager to be used in Adapter.format(). It has one func for replacing large objects inside inputs and demos with hashes; and one func for restoring the final messages with the full data.
  • dspy/adapters/base.py
    - Use the payload manager in format
  • dspy/adapters/types/image.py
    - ensure that encode_image() accepts hash identifiers and returns them unchanged. This is required to avoid misclassifying hash tokens as invalid inputs.

results: faster performance

Top 20 functions by TOTAL TIME (self time - actual CPU work):
         1478335 function calls (1369172 primitive calls) in 3.349 seconds

   Ordered by: internal time
   List reduced from 642 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      100    2.258    0.023    2.258    0.023 {built-in method _hashlib.openssl_sha256}
      100    0.606    0.006    0.606    0.006 {orjson.dumps}
        4    0.120    0.030    0.120    0.030 {built-in method gc.collect}
72400/1800    0.031    0.000    0.062    0.000 /Users/jamesburgess/miniconda3/envs/dspy-dev/lib/python3.11/copy.py:128(deepcopy)
      100    0.020    0.000    2.885    0.029 /Users/jamesburgess/dspy/dspy/clients/cache.py:65(cache_key)
     2103    0.016    0.000    0.017    0.000 {built-in method builtins.next}

Runtime reduced from 53.3 seconds to 3.4 seconds, so 94% faster, though this will change as the number and size of images change.

(BTW, the major remaining bottleneck is now from cache.py, which runs dumps() on the whole request and hashes it)

Extensions

  • If maintainers agree with the PR, I can also extend this to other large dspy Types like Document and Audio.
  • Probably optimizations could be done for cache key optimizations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant