Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MLOB-1954] feat(langchain): generically patch embeddings to enable tracing all embeddings calls #4970

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

sabrenner
Copy link
Collaborator

What does this PR do?

Generically patches LangChain embeddings. We previously hadn't done this because there was no way to patch the prototype for the Embeddings class, since it was an abstract class transpiled from TypeScript, and its abstract functions embedQuery and embedDocuments which we wanted to patch did not end up on the transpiled class.

Instead, we will patch the embeddings exports. Its constructor will now try and shim the embedQuery and embedDocuments functions, as all implementers of Embeddings should implement these methods. While this isn't ideal, it does unblock us from adding provider-specific patches per embedding calls (i.e., we specifically patched @langchain/openai before). Any implementer of Embeddings from @langchain/core/embeddings will be patched.

To go along with this, I added a small test for another embeddings provider (Google Gemini), which revealed some mistakes in logic in grabbing the API key to truncate and tag. One of the downsides of this approach is that we may run into more cases like this (since Embeddings do not have a strictly typed config or properties for config).

Motivation

Reduce noise/line count in the langchain patching for future Embeddings support, just support all instances of Embeddings instead.

Additional Notes

I will try and rework langchain version matrixing for these tests... it's a bit ugly atm.

Copy link

github-actions bot commented Dec 5, 2024

Overall package size

Self size: 8.22 MB
Deduped: 94.73 MB
No deduping: 95.29 MB

Dependency sizes
name version self size total size
@datadog/libdatadog 0.2.2 29.27 MB 29.27 MB
@datadog/native-appsec 8.3.0 19.37 MB 19.38 MB
@datadog/native-iast-taint-tracking 3.2.0 13.9 MB 13.91 MB
@datadog/pprof 5.4.1 9.76 MB 10.13 MB
protobufjs 7.2.5 2.77 MB 5.16 MB
@datadog/native-iast-rewriter 2.5.0 2.51 MB 2.65 MB
@opentelemetry/core 1.14.0 872.87 kB 1.47 MB
@datadog/native-metrics 3.0.1 1.06 MB 1.46 MB
@opentelemetry/api 1.8.0 1.21 MB 1.21 MB
import-in-the-middle 1.11.2 112.74 kB 826.22 kB
msgpack-lite 0.1.26 201.16 kB 281.59 kB
source-map 0.7.4 226 kB 226 kB
opentracing 0.14.7 194.81 kB 194.81 kB
lru-cache 7.18.3 133.92 kB 133.92 kB
pprof-format 2.1.0 111.69 kB 111.69 kB
@datadog/sketches-js 2.1.0 109.9 kB 109.9 kB
semver 7.6.3 95.82 kB 95.82 kB
lodash.sortby 4.7.0 75.76 kB 75.76 kB
ignore 5.3.1 51.46 kB 51.46 kB
int64-buffer 0.1.10 49.18 kB 49.18 kB
shell-quote 1.8.1 44.96 kB 44.96 kB
istanbul-lib-coverage 3.2.0 29.34 kB 29.34 kB
rfdc 1.3.1 25.21 kB 25.21 kB
@isaacs/ttlcache 1.4.1 25.2 kB 25.2 kB
tlhunter-sorted-set 0.1.0 24.94 kB 24.94 kB
limiter 1.1.5 23.17 kB 23.17 kB
dc-polyfill 0.1.4 23.1 kB 23.1 kB
retry 0.13.1 18.85 kB 18.85 kB
jest-docblock 29.7.0 8.99 kB 12.76 kB
crypto-randomuuid 1.0.0 11.18 kB 11.18 kB
koalas 1.0.2 6.47 kB 6.47 kB
path-to-regexp 0.1.10 6.38 kB 6.38 kB
module-details-from-path 1.0.3 4.47 kB 4.47 kB

🤖 This report was automatically generated by heaviest-objects-in-the-universe

@pr-commenter
Copy link

pr-commenter bot commented Dec 5, 2024

Benchmarks

Benchmark execution time: 2024-12-05 16:33:00

Comparing candidate commit bcab958 in PR branch sabrenner/langchain-generic-embeddings-patching with baseline commit 865654c in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 263 metrics, 3 unstable metrics.

@sabrenner sabrenner changed the title [MLOB-1954] feat(langchain) generically patch embeddings to enable tracing all embeddings calls [MLOB-1954] feat(langchain): generically patch embeddings to enable tracing all embeddings calls Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant