Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unlawful use of my code #17

Open
adryzz opened this issue Mar 22, 2024 · 8 comments
Open

Unlawful use of my code #17

adryzz opened this issue Mar 22, 2024 · 8 comments

Comments

@adryzz
Copy link

adryzz commented Mar 22, 2024

The readme of this repo reads the following:

StarCoder2 is a family of code generation models (3B, 7B, and 15B), trained on 600+ programming languages from The Stack v2 [...]

The dataset linked contains my code, without following its license (or lack thereof).

Consent is not opt-out. You trained an LLM on code you are not allowed to use.

@yamiteru
Copy link

Stop being a little cry baby

@adryzz
Copy link
Author

adryzz commented Jun 30, 2024

womp womp you cant use my code without following its license

@Snowman-25
Copy link

Quote from https://policies.stackoverflow.co/company/trademark-guidance/:

"We decided early on that all user-generated content in the Stack Exchange Network would be given back to the community under a Creative Commons license."

Furthermore, see Point 6 of the ToS Section "Subscriber Content":

You agree that any and all content, including without limitation any and all text, graphics, logos, tools, photographs, images, illustrations, software or source code, audio and video, animations, and product feedback (collectively, “Content”) that you provide to the public Network (collectively, “Subscriber Content”), is perpetually and irrevocably licensed to Stack Overflow on a worldwide, royalty-free, non-exclusive basis pursuant to Creative Commons licensing terms (CC BY-SA 4.0), and you grant Stack Overflow the perpetual and irrevocable right and license to access, use, process, copy, distribute, export, display and to commercially exploit such Subscriber Content, even if such Subscriber Content has been contributed and subsequently removed by you as reasonably necessary [...]
[...]
This means that you cannot revoke permission for Stack Overflow to publish, distribute, store and use such content and to allow others to have derivative rights to publish, distribute, store and use such content. The CC BY-SA 4.0 license terms are explained in further detail by Creative Commons, and the license terms applicable to content are explained in further detail here. You should be aware that all Public Content you contribute is available for public copy and redistribution, and all such Public Content must have appropriate attribution.

@adryzz
Copy link
Author

adryzz commented Sep 24, 2024

this isnt stack overflow lmfao i dont care about their tos

@Snowman-25
Copy link

Your code was taken from Stack Overflow, where it was available under the license mentioned above. With posting it on Stack Overflow, you've forfeit all rights to the content you posted. Thus, inclusion in starcoder2 is lawful

@adryzz
Copy link
Author

adryzz commented Sep 26, 2024

i have never posted it on stack overflow, show me where exactly i have done so

@Snowman-25
Copy link

Where is the code you're referring to and why do you think it's in the dataset?

@adryzz
Copy link
Author

adryzz commented Sep 27, 2024

image
image

this issue was created before the opt-out was even a thing, meaning the model was trained on code it wasn't allowed to use (and even then, consent is not opt-out). you can't "untrain" stuff.

bigcode-project/opt-out-v2#1814

as this reply says:

Your opt-out request has been processed and your data was removed in version v2.1 of The Stack and all future versions.

this means that my code is in all previous versions of the dataset, and given that this repository was created before the opt-out (which wouldn't hold water anyway because consent is not opt-out) was even a thing. stop making this about stack overflow when it wasn't ever mentioned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants