Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom Classification model behavior #38981

Open
galvangoh opened this issue Dec 24, 2024 · 1 comment
Open

Custom Classification model behavior #38981

galvangoh opened this issue Dec 24, 2024 · 1 comment
Assignees
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Document Intelligence needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that

Comments

@galvangoh
Copy link

  • Package Name: azure-ai-documentintelligence
  • Package Version: 1.0.0b2
  • Operating System: Windows
  • Python Version: 3.10.9

Describe the bug
I trained the custom model via studio on 2 different titled document which share very similar pattern in its template. My classified label from the custom model is used to decide the logic flow of my application later on. Sometimes, document A gets classified as document B and vice versa.

The documentation mentioned "Custom classification models are deep-learning-model types that combine layout and language features to accurately detect and identify documents...". I don't think the "layout" here points to the layout model because blocks of text is extracted by the layout model which the custom classification does not do that. Unless there is a way to composed prebuilt and custom models, how can I classify my documents more properly? I'm happy to remain at the current version of the API and will only upgrade if there are improvement to the classification capability of the base neural model itself.

In the screenshot below, I show 2 document which I want to classify into its own label (see top right).

Questions:

  1. As the document are titled differently, does the custom model picks up the title as a feature to distinguish them properly?
  2. Handwritings and stamps can appear in random locations in the document. Does the custom model picks these as features during training?
  3. What sort of training data is sufficient so that both document can be correctly classified?

Expected behavior
Accurate classification of different titled document despite sharing the same template.

Screenshots
Image

@github-actions github-actions bot added customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Dec 24, 2024
@xiangyan99 xiangyan99 added Client This issue points to a problem in the data-plane of the library. Document Intelligence and removed needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. labels Dec 30, 2024
@xiangyan99
Copy link
Member

Thanks for the feedback, we’ll investigate asap.

@github-actions github-actions bot added the needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team label Dec 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Document Intelligence needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that
Projects
None yet
Development

No branches or pull requests

3 participants