Metrics Initiative #260

New issue

Open

Metrics Initiative#260

Assignees

Labels

C-tracking-issueT-compilerT-infra

Milestone

2025h1

nikomatsakis

Metadata
Point of contact	@yaahc
Team(s)	compiler, infra
Goal document	2025h1/metrics-initiative
Tracking Issue	rust-lang/rust#128914

Summary

Build out the support for metrics within the rust compiler and starting with a Proof of concept dashboard for viewing unstable feature usage statistics over time.

Tasks and status

Discussion and moral support (compiler, infra )
Implementation (@yaahc)
backend for storing metrics (@estebank)
integration with docs.rs or crates.io to gather metrics from open source rust projects (@yaahc)
proof of concept dashboard visualizing unstable feature usage data ()
- Publicly accessible dashboard
To pick up a draggable item, press the space bar. While dragging, use the arrow keys to move the item. Press space again to drop the item in its new position, or press escape to cancel.

Standard reviews (compiler

)

added

added this to the 2025h1 milestone

on Feb 18, 2025

nikomatsakis

ContributorAuthor

This issue is intended for status updates only.

For general questions or comments, please contact the owner(s) directly.

assigned

Member

I'm very excited to see that this got accepted as a project goal 🥰 🎉

Let me go ahead and start by giving an initial status update of where I'm at right now.

We've already landed the initial implementation for configuring the directory where metrics should be stored which also acts as the enable flag for a default set of metrics, right now that includes ICE reports and unstable feature usage metrics
Implemented basic unstable feature usage metrics, which currently dumps a json file per crate that is compiled showing which metrics are enabled. (example below)
Implemented rust-lang/rust/src/tools/features-status-dump which dumps the status information for all unstable, stable, and removed features as a json file
setup an extremely minimal initial Proof of Concept implementation locally on my laptop using InfluxDB 3.0 Alpha and Graphana (image below)
- I have a small program I wrote that converts the json files into influxdb's line protocol for both the usage info and the status info (snippets shown below)
  - The timestamps are made up, since they all need to be distinct or else influxdb will treat them all as updates to the same row, I'd like to preserve this information from when the metrics were originally dumped, either in the json or by changing rustc to dump via influxdb's line format directly, or some equivalent protocol. (note this is probably only necessary for the usage metrics, but for the status metrics I'd have to change the line format schema from the example below to avoid the same problem, this has to do with how influxdb treats tags vs fields)
- I gathered a minimal dataset by compiling rustc with RUSTFLAGS_NOT_BOOTSTRAP="-Zmetrics-dir=$HOME/tmp/metrics" ./x build --stage 2 and ./x run src/tools/features-status-dump/, save the output to the filesystem, and convert the output to the line protocol with the aforementioned program
- Write the two resulting files to influxdb
- I then setup the table two different ways, once by directly querying the database using influxdb's cli (query shown below), then again by trying to setup an equivalent query in graphana (there's definitely some kinks to work out here, I'm not an expert on graphana by any means.)

from unstable_feature_usage_metrics-rustc_hir-3bc1eef297abaa83.json

{"lib_features":[{"symbol":"variant_count"}],"lang_features":[{"symbol":"associated_type_defaults","since":null},{"symbol":"closure_track_caller","since":null},{"symbol":"let_chains","since":null},{"symbol":"never_type","since":null},{"symbol":"rustc_attrs","since":null}]}

Snippet of unstable feature usage metrics post conversion to line protocol

featureUsage,crateID="bc8fb5c22ba7eba3" feature="let_chains" 1739997597429030911
featureUsage,crateID="439ccecea0122a52" feature="assert_matches" 1739997597429867374
featureUsage,crateID="439ccecea0122a52" feature="extract_if" 1739997597429870052
featureUsage,crateID="439ccecea0122a52" feature="iter_intersperse" 1739997597429870855
featureUsage,crateID="439ccecea0122a52" feature="box_patterns" 1739997597429871639

Snippet of feature status metrics post conversion to line protocol

featureStatus,kind=lang status="unstable",since="1.5.0",has_gate_test=false,file="/home/jlusby/git/rust-lang/rust/compiler/rustc_feature/src/unstable.rs",line=228,name="omit_gdb_pretty_printer_section" 1739478396884006508
featureStatus,kind=lang status="accepted",since="1.83.0",has_gate_test=false,tracking_issue="123742",file="/home/jlusby/git/rust-lang/rust/compiler/rustc_feature/src/accepted.rs",line=197,name="expr_fragment_specifier_2024" 1739478396884040564
featureStatus,kind=lang status="accepted",since="1.0.0",has_gate_test=false,file="/home/jlusby/git/rust-lang/rust/compiler/rustc_feature/src/accepted.rs",line=72,name="associated_types" 1739478396884042777
featureStatus,kind=lang status="unstable",since="1.79.0",has_gate_test=false,tracking_issue="123646",file="/home/jlusby/git/rust-lang/rust/compiler/rustc_feature/src/unstable.rs",line=232,name="pattern_types" 1739478396884043914
featureStatus,kind=lang status="accepted",since="1.27.0",has_gate_test=false,tracking_issue="48848",file="/home/jlusby/git/rust-lang/rust/compiler/rustc_feature/src/accepted.rs",line=223,name="generic_param_attrs" 1739478396884045054
featureStatus,kind=lang status="removed",since="1.81.0",has_gate_test=false,tracking_issue="83788",file="/home/jlusby/git/rust-lang/rust/compiler/rustc_feature/src/removed.rs",line=245,name="wasm_abi" 1739478396884046179

Run with influxdb3 query --database=unstable-feature-metrics --file query.sql

SELECT
  COUNT(*) TotalCount, "featureStatus".name
FROM
  "featureStatus"
INNER JOIN "featureUsage" ON
  "featureUsage".feature = "featureStatus".name
GROUP BY
  "featureStatus".name
ORDER BY
    TotalCount DESC

yaahc

Member

My next step is to revisit the output format, currently a direct json serialization of the data as it is represented internally within the compiler. This is already proven to be inadequate by personal experience, given the need for additional ad-hoc conversion into another format with faked timestamp data that wasn't present in the original dump, and by conversation with @badboy (Jan-Erik), where he recommended we explicitly avoid ad-hoc definitions of telemetry schemas which can lead to difficult to manage chaos.

I'm currently evaluating what options are available to me, such as a custom system built around influxdb's line format, or opentelemetry's metrics API.

Either way I want to use firefox's telemetry system as inspiration / a basis for requirements when evaluating the output format options

Relevant notes from my conversation w/ Jan-Erik

firefox telemetry starts w/ the question, what is it we want to answer by collecting this data? has to be explicitly noted by whoever added new telemetry, there's a whole process around adding new telemetry
- defining metric
- description
- owner
- term limit (expires automatically, needs manual extension)
- data stewards
  - do data review
  - checks that telemetry makes sense
  - check that everything adheres to standards
- can have downside of increasing overhead to add metrics
- helps w/ tooling, we can show all this info as documentation
- schema is generated from definitions

yaahc

Member

After further review I've decided to limit scope initially and not get ahead of myself so I can make sure the schemas I'm working with can support the kind of queries and charts we're going to eventually want in the final version of the unstable feature usage metric. I'm hoping that by limiting scope I can have most of the items currently outlined in this project goal done ahead of schedule so I can move onto building the proper foundations based on the proof of concept and start to design more permanent components. As such I've opted for the following:

minimal change to the current JSON format I need, which is including the timestamp
Gain clarity on exactly what questions I should be answering with the unstable feature usage metrics, the desired graphs and tables, and how this influences what information I need to gather and how to construct the appropriate queries within graphana
gathering a sample dataset from docs.rs rather than viewing it as the long term integration, since there are definitely some sampleset bias issues in that dataset from initial conversations with docs.rs
- Figure out proper hash/id to use in the metrics file names to avoid collisions with different conditional compilation variants of the same crate with different feature enabled.

For the second item above I need to have more detailed conversations with both @rust-lang/libs-api and @rust-lang/lang

yaahc

mentioned this

on Mar 3, 2025

Unstable Feature Usage Metrics rust-lang/rust#129485

yaahc

Member

Small progress update:

following the plan mentioned above plus some extra bits, I've implemented the following changes

changed the json output to include the timestamp
changed the file naming to ensure uniqueness and not overwrite metrics for the same crate when built with different configurations
- previously I was piggybacking on the hash used to name artifacts in the .cargo or build directories, which in the compiler is known as extra_filename and is configured by cargo, but it turns out this doesn't guarantee uniqueness
- I've replaced this with the "Strict Version Hash" (SVH)
  - Doing so introduced an ICE when compiling some crates with incremental compilation enabled. I've since resolved this in fix "still mutable" ice while metrics are enabled rust#139502 and tested this version against the top 100 crates in the ecosystem and their dependencies to verify its working
I've been working with the infra team and they've setup a cloud instance of influxdb 3.0 and grafana, influxdb is setup, grafana in progress
I met with both libs and lang to discuss their needs related to the unstable feature usage metrics and metrics in general

Next Steps:

I've got everything setup for the docs.rs team to start gathering a sample dataset for which I will then upload to the server the infra team setup
update locally hosted PoC impl to work with recent changes to metrics files and naming and validate that it's working as expected
work on the queries for the grafana instance to setup a graph per feature showing usage over time
- probably going to create fake usage data to with for this
on the side I'm also looking into how much work it would be to track relative usage of various library APIs under a single feature flag (e.g. Tracking Issue for exact_div rust#139911 tracking the specific functions used)
develop a better understanding of the expected cost of running an influxdb server

yaahc

Member

posting this here so I can link to it in other places, I've setup the basic usage over time chart using some synthesized data that just emulates quadraticly (is that a word?) increasing feature usage for my given feature over the course of a week (the generated data starts at 0 usages per day and ends at 1000 usages per day). This chart counts the usage over each day long period and charts those counts over a week. The dip at the end is the difference between when I generated the data, after which there is zero usage data, and when I queried it.

With this I should be ready to just upload the data once we've gathered it from docs.rs, all I need to do is polish and export the dashboards I've made from grafana to the rust-lang grafana instance, connect that instance to the rust-lang influxdb instance, and upload the data to influxdb once we've gathered it.

yaahc

Member

Quick update, Data is currently being gathered (and has been for almost 2 weeks now) on docs.rs and I should have it uploaded and accessible on the PoC dashboard within the next week or two (depending on how long I want to let the data gather).

yaahc

Member

Bigger Update,

I've done the initial integration with the data gathered so far since rustweek. I have the data uploaded to the influxdb cloud instance managed by the infra team, I connected the infra team's grafana instance to said influxdb server and I imported my dashboards so we now have fancy graphs with real data on infra managed servers 🎉

I'm now working with the infra team to see how we can open up access of the graphana dashboard so that anyone can go and poke around and look at the data.

Another issue that came up is that the influxdb cloud serverless free instance that we're currently using has a mandatory max 30 day retention policy, so either I have to figure out a way to get that disabled on our instance or our data will get steadily deleted and will only be useful as a PoC demo dashboard for a short window of time.

to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

yaahc

Labels

C-tracking-issueT-compilerT-infra

Type

No type

Projects

No projects

Milestone

2025h1
Due by June 30, 2025

Relationships

None yet

Development

No branches or pull requests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metrics Initiative #260

Summary

Tasks and status

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Metrics Initiative #260

Description

Summary

Tasks and status

Activity

nikomatsakis commented on Feb 18, 2025

yaahc commented on Feb 25, 2025

yaahc commented on Feb 25, 2025

yaahc commented on Mar 3, 2025

yaahc commented on Apr 16, 2025

yaahc commented on Apr 22, 2025

yaahc commented on May 26, 2025

yaahc commented on Jun 3, 2025

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions