Skip to content

Commit

Permalink
The prompt generator in TDI adds unneaded white space - this should f…
Browse files Browse the repository at this point in the history
…ix that (#23)

Co-authored-by: MATAN NINIO <[email protected]>
  • Loading branch information
matanninio and MATAN NINIO authored Dec 4, 2024
1 parent 459978e commit b6c491b
Showing 1 changed file with 4 additions and 4 deletions.
8 changes: 4 additions & 4 deletions mammal/examples/dti_bindingdb_kd/task.py
Original file line number Diff line number Diff line change
Expand Up @@ -126,10 +126,10 @@ def data_preprocessing(
ground_truth_value = sample_dict.get(ground_truth_key, None)

sample_dict[ENCODER_INPUTS_STR] = (
f"<@TOKENIZER-TYPE=AA><MASK> \
<@TOKENIZER-TYPE=AA@MAX-LEN={target_max_seq_length}><MOLECULAR_ENTITY><MOLECULAR_ENTITY_GENERAL_PROTEIN><SEQUENCE_NATURAL_START>{target_sequence}<SEQUENCE_NATURAL_END> \
<@TOKENIZER-TYPE=SMILES@MAX-LEN={drug_max_seq_length}><MOLECULAR_ENTITY><MOLECULAR_ENTITY_SMALL_MOLECULE><SEQUENCE_NATURAL_START>{drug_sequence}<SEQUENCE_NATURAL_END> \
<EOS>"
"<@TOKENIZER-TYPE=AA><MASK>"
f"<@TOKENIZER-TYPE=AA@MAX-LEN={target_max_seq_length}><MOLECULAR_ENTITY><MOLECULAR_ENTITY_GENERAL_PROTEIN><SEQUENCE_NATURAL_START>{target_sequence}<SEQUENCE_NATURAL_END>"
f"<@TOKENIZER-TYPE=SMILES@MAX-LEN={drug_max_seq_length}><MOLECULAR_ENTITY><MOLECULAR_ENTITY_SMALL_MOLECULE><SEQUENCE_NATURAL_START>{drug_sequence}<SEQUENCE_NATURAL_END>"
"<EOS>"
)
tokenizer_op(
sample_dict,
Expand Down

0 comments on commit b6c491b

Please sign in to comment.