Policy requiring disclosure of standard development tools is impracticle #2434

djaglowski · 2024-11-11T09:57:12Z

If you are using LLMs to help you write code, that is fine; You should be clear about this in pull requests and reviews. If you are using LLMs to understand code so that you can participate in issues or reviews, that is also fine -- but you should be clear about this as well.

We should expect that most contirbutors are using tools based on generative AI in their pull requests and reviews. Therefore, as written, this policy requires that most PRs and most reviews within the project contain a disclosure of the obvious. Developer tooling is evolving quickly so we should expect this requirement to become increasingly obtuse over time. I suggest we remove the disclosure language and instead assume that contributors are using modern tools.

svrnm · 2024-11-11T10:20:39Z

Would it be enough to remove the two "but you should be clear about this"? Here is how it would look like without them and I think it still reads well, and clarifies the intend (it's fine to use AI, but only if you also use your own intelligence)

Q: How do I know the difference between allowed and disallowed usages of LLMs?

"If you have to ask, you already know the answer." This policy is not a broad ban of LLMs, it is a request that you -- as an individual -- use them in a way that adds value to the project and respects the time of other contributors and maintainers. If you are using LLMs to help you write code, that is fine; If you are using LLMs to understand code so that you can participate in issues or reviews, that is also fine. What is not fine is copying and pasting a GitHub issue into an LLM prompt and asking it to write the PR for you, then blindly submitting that response. You must be an active and willing participant in the process of contributing to OpenTelemetry.

I think the purpose of the "you should be clear about this" was in the context of the question to differentiate between acceptable and not acceptable usage of LLMs? If we want to keep that point, maybe we can add a sentence like

If you are unsure, if your usage of LLM is within this policy, we recommend you disclose the tools you have used. This allows maintainers to provide you with guidance, under which conditions they are willing to accept your contribution.

yurishkuro · 2024-11-11T16:59:53Z

I closed #2435 as duplicate of this. Copying my comment here:

My suggestion is to rewrite #2417 without referencing LLMs. The overall guidance boils down to "don't create garbage/useless PRs", and "if you do we will block you". If a PR is high quality then it's irrelevant if it's created with GenAI help or not. We can expect most PRs to be done with AI assistance in the near future, so the guidance to "disclose that fact" serves no purpose.

svrnm · 2024-11-12T07:33:51Z

The overall guidance boils down to "don't create garbage/useless PRs", and "if you do we will block you"

Nowhere in this document do we state that we block or ban contributors that create "garbage/useless PRs". The "if you do" is that maintainers can close/hide that individual PR (or issue or other kind of contribution).

If a PR is high quality then it's irrelevant if it's created with GenAI help or not.

I agree with that, and I think also the policy acknowledges that multiple times, especially it states that "this policy does not prohibit the use of LLMS to assist"

so the guidance to "disclose that fact" serves no purpose.

The is no general guidance to disclose the fact that one is using such a tool. The guidance is contextual since it is given as an answer to the question "how do I know the difference between allowed and disallowed usage of LLMS". To make this clearer, I proposed the change above to remove the "but you should be clear about this" language and add a sentence that suggests calling out the usage of LLM if a contributor is unsure. This serves the purpose of a maintainer and contributor to have a transparent conversation, where the maintainer can either let the contributor know that what they do is not helpful (and close the contribution based on that), or they can acknowledge that the contributor is using that tool properly and they have no issue with that.

The overall guidance boils down to "don't create garbage/useless PRs", and "if you do we will block you".

I disagree with that. The guidance is "don't use LLMs to create contributions that mimic higher quality than you are able to produce yourself, because it is impolite towards other contributors" and it gives maintainers a document they can point to when they close a contribution assuming it is such a case.

The difference between a non-AI-assisted garbage/useless PR and an AI-assisted garbage/useless PR is that the latter can be harder to recognize. The initial PR may look acceptable, so a maintainer engages in the review, and only during that process they figure out that the submitter of that PR feeds their questions into a LLM and sends the maintainer back the answers. In that case a maintainer can write their own words why they feel disrespected or they can point the contributor to the GenAI policy document and allow them to educate themselves.

yurishkuro · 2024-11-13T02:37:35Z

Most people are not going to read "nuanced" guidance - it's unnecessary contribution overhead with close to zero ROI
Someone who used LLM "in bad faith" (as you perceive it) is not going to disclose that fact, and it cannot be proven otherwise
If the LLM-produced PR is so high-quality that a maintainer cannot tell right away and needs an external signal about it, then the maintainer has to review the PR on merit anyway
The maintainer can always ask if PR was produced w/ LLM if it comes to that, it does not need to be a contributor guidance

svrnm · 2024-11-14T14:56:49Z

Most people are not going to read "nuanced" guidance - it's unnecessary contribution overhead with close to zero ROI

They will not read it upfront, but maintainer can point them to it if needed.

Also, we have unexperienced maintainer looking for guidance how to handle GenAI PRs, they can lean into such guidance.

I regularly use guidelines that we have to point things out, instead of explaining them in long words in a comment.

That's the ROI for me.

Someone who used LLM "in bad faith" (as you perceive it) is not going to disclose that fact, and it cannot be proven otherwise

The contributions that triggered that guideline were mostly from unexperienced contributors that tried to create a "quick win". So it is less about bad faith and more about being unexperienced, where some guidance can be helpful.

If the LLM-produced PR is so high-quality that a maintainer cannot tell right away and needs an external signal about it, then the maintainer has to review the PR on merit anyway

That's the point. If the review the PR and then during the PR figure out that the contributor is not able to fix the issues on the PR the maintainer wasted a lot of time.

The maintainer can always ask if PR was produced w/ LLM if it comes to that, it does not need to be a contributor guidance

Sure, but it helps. The maintainer can write long comments on why and how LLMs should be used or they can point to a guideline.

Note, that is how I think about it, and how I will be using it, and that's why I defend it.

You suggested that you rewrite it in a way that it works without referencing LLMs/GenAI. If this is possible, I am OK with that as well. So I am happy to review your PR on that matter!

danielgblanco · 2024-11-15T00:08:08Z

I personally think the guidance should not be about "don't create garbage/useless PRs" but rather "you should be able to engage in constructive conversation, justify your design decisions, and apply feedback giving on a PR". This is how I understood the "but you should be clear about this" part of the guidance proposed. Not as a disclosure of using LLMs on every PR raised (which I think it's unreasonable), but as a need to be clear about usage of LLM/GenAI if asked about it.

I believe the mention of LLM/GenAI is important though, but perhaps as exemplification of one of the cases in which a PR is raised without sufficient knowledge of the change proposed. The same would apply if someone opens a PR using someone else's code and then is not able to reason about it.

yurishkuro mentioned this issue Nov 11, 2024

Reframe LLM policy to be about quality, not tools #2435

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Policy requiring disclosure of standard development tools is impracticle #2434

Policy requiring disclosure of standard development tools is impracticle #2434

djaglowski commented Nov 11, 2024

svrnm commented Nov 11, 2024

yurishkuro commented Nov 11, 2024

svrnm commented Nov 12, 2024

yurishkuro commented Nov 13, 2024

svrnm commented Nov 14, 2024

danielgblanco commented Nov 15, 2024

Policy requiring disclosure of standard development tools is impracticle #2434

Policy requiring disclosure of standard development tools is impracticle #2434

Comments

djaglowski commented Nov 11, 2024

svrnm commented Nov 11, 2024

yurishkuro commented Nov 11, 2024

svrnm commented Nov 12, 2024

yurishkuro commented Nov 13, 2024

svrnm commented Nov 14, 2024

danielgblanco commented Nov 15, 2024