Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dealing with forward- and reverse-strand in datasets #19

Open
yangtcai opened this issue Jun 10, 2022 · 2 comments
Open

Dealing with forward- and reverse-strand in datasets #19

yangtcai opened this issue Jun 10, 2022 · 2 comments

Comments

@yangtcai
Copy link
Collaborator

Hi, @williamstark01, when I implement normalizing the label, the labels are represented by a tuple(left, right), and the original sequence(seqstart: seqend) should convert into the range from 0 to 1. So, if in the tuple, the sequence should be ((left - seqstart) /  (seqstart - seqend), (right - seqend) / (seqstart - seqend)). In this procedure, we should concern with the double strands, the forward strand has a situation that seqstart < seqend, and the reverse strand has the opposite property. So, two types of strands will feed into our model, and our model will have to produce its own output types, it will require our model have the power of identifying the two types. It will be a burden for our model. We should promise our model there is only one type of strand will be the input. There is a workaround every reverse strand can be converted to a forward strand, so everything will be solved, also, to quickly prove our concept of DETR in biology, we can temporarily ignore the reverse strand. what do you think about this? 😃

@williamstark01
Copy link
Collaborator

Hey Yantong, I think you are right. Forward and reverse is a property that the model doesn't need to know about the sequences. For prototyping the solution you propose sounds good, we can only use the forward strand initially. (And later on we can create a helper function to convert reverse strands and their coordinates as if they were forward strands, so we can include them in the training dataset and be able to run inference on them. This can be an issue to be added in the TODO list (just renamed it) in the Kanban board.)

@williamstark01
Copy link
Collaborator

Actually this issue, adding it now.

@williamstark01 williamstark01 changed the title About forward- and reverse-strand in datasets Dealing with forward- and reverse-strand in datasets Jun 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants