Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring: Weird steps in the pipeline (cleaning at various steps that could be streamlined) #61

Closed
MinaAlmasi opened this issue Aug 6, 2024 · 1 comment
Assignees

Comments

@MinaAlmasi
Copy link
Collaborator

The current pipeline is displayed in the image below.

Some steps that may need to be reconsidered

  • When extracting metrics (step 3) for both human and ai text, AI is lowercased / cleaned here first, but it could be done in a seperate step and saved / stored. The reason I haven't done this is that the repo will end up a little big.
  • When using the metrics for classification (step 4B), I only then remove the few faulty generations that are below minimum length. It should ideally be removed prior to this steps 4A and 4B to avoid any mistakes (accidentally including them in other analysis work).
Screenshot 2024-08-06 at 11 02 02
@MinaAlmasi
Copy link
Collaborator Author

Fixed this in #73 (now cleaning AI data in separate script before extracting metrics). Might want to draw a pipeline again digitally for the README.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant