Skip to content

Add TFLOP pipeline walkthrough documentation #21

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ankitaggnitt
Copy link

This commit adds a comprehensive Markdown document that provides a step-by-step walkthrough of the TFLOP pipeline, from data preprocessing to model architecture and training/evaluation. It bridges the concepts from the research paper with their concrete implementations in the Python codebase, citing relevant files and code snippets.

Key areas covered:

  • Data Preprocessing: Raw inputs, HTML to OTSL conversion, text region processing (bounding box normalization, layout embedding), pointer target creation, and final batch assembly.
  • Model Architecture: Overall TFLOP class structure, Image Encoder (Swin), Layout Encoder components, Logical Structure Decoder (MBART), Layout Pointer mechanism (pointer head, dot-product similarity, pointer loss, empty cell handling), and Span-aware Contrastive Supervision (loss module, positive/negative set construction, span coefficients, final contrastive loss).
  • Training/Evaluation: Main training script, key arguments, optimizer/scheduler, inference process (autoregressive generation, pointer usage, HTML construction), and TEDS evaluation details (use of apted library).

This commit adds a comprehensive Markdown document that provides a step-by-step walkthrough of the TFLOP pipeline, from data preprocessing to model architecture and training/evaluation. It bridges the concepts from the research paper with their concrete implementations in the Python codebase, citing relevant files and code snippets.

Key areas covered:
- Data Preprocessing: Raw inputs, HTML to OTSL conversion, text region processing (bounding box normalization, layout embedding), pointer target creation, and final batch assembly.
- Model Architecture: Overall TFLOP class structure, Image Encoder (Swin), Layout Encoder components, Logical Structure Decoder (MBART), Layout Pointer mechanism (pointer head, dot-product similarity, pointer loss, empty cell handling), and Span-aware Contrastive Supervision (loss module, positive/negative set construction, span coefficients, final contrastive loss).
- Training/Evaluation: Main training script, key arguments, optimizer/scheduler, inference process (autoregressive generation, pointer usage, HTML construction), and TEDS evaluation details (use of `apted` library).
Copy link
Author

@ankitaggnitt ankitaggnitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if its any good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant