Skip to content
/ SBD Public

๐Ÿ“œ [NLLP 2022] "Efficient Deep Learning-based Sentence Boundary Detection in Legal Text", Reshma Sheik and Gokul T. Adethya and Dr. S. Jaya Nirmala

Notifications You must be signed in to change notification settings

NLLP-ML/SBD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

8 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Efficient Deep Learning-based Sentence Boundary Detection in Legal Text

Accepted at NLLP 2022

| [ Paper ] |


Introduction

A key component of the Natural Language Processing (NLP) pipeline is Sentence Boundary Detection (SBD). Erroneous SBD could affect other processing steps and reduce performance. A few criteria based on punctuation and capitalization are necessary to identify sentence borders in well-defined corpora. However, due to several grammatical ambiguities, the complex structure of legal data poses difficulties for SBD. In this paper, we have trained a neural network framework for identifying the end of the sentence in legal text. We used several state-of-the-art deep learning models, analyzed their performance, and identified that Convolutional Neural Network(CNN) outperformed other deep learning frameworks. We compared the results with rule-based, statistical, and transformer-based frameworks. The best neural network model outscored the popular rule-based framework with an improvement of 8% in the F1 score. Although domain-specific statistical models have slightly improved performance, the trained CNN is 80 times faster in run-time and doesn{'}t require much feature engineering. Furthermore, after extensive pretraining, the transformer models fall short in overall performance compared to the best deep learning model.

Results


Citation

If you find our code implementation helpful for your own research or work, please cite our paper.

@inproceedings{sheik-etal-2022-efficient,
    title = "Efficient Deep Learning-based Sentence Boundary Detection in Legal Text",
    author = "Sheik, Reshma  and
      T, Gokul  and
      Nirmala, S",
    booktitle = "Proceedings of the Natural Legal Language Processing Workshop 2022",
    month = dec,
    year = "2022",
    address = "Abu Dhabi, United Arab Emirates (Hybrid)",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.nllp-1.18",
    pages = "208--217",
    abstract = "A key component of the Natural Language Processing (NLP) pipeline is Sentence Boundary Detection (SBD). Erroneous SBD could affect other processing steps and reduce performance. A few criteria based on punctuation and capitalization are necessary to identify sentence borders in well-defined corpora. However, due to several grammatical ambiguities, the complex structure of legal data poses difficulties for SBD. In this paper, we have trained a neural network framework for identifying the end of the sentence in legal text. We used several state-of-the-art deep learning models, analyzed their performance, and identified that Convolutional Neural Network(CNN) outperformed other deep learning frameworks. We compared the results with rule-based, statistical, and transformer-based frameworks. The best neural network model outscored the popular rule-based framework with an improvement of 8{{\%} in the F1 score. Although domain-specific statistical models have slightly improved performance, the trained CNN is 80 times faster in run-time and doesn{'}t require much feature engineering. Furthermore, after extensive pretraining, the transformer models fall short in overall performance compared to the best deep learning model.",
}

About

๐Ÿ“œ [NLLP 2022] "Efficient Deep Learning-based Sentence Boundary Detection in Legal Text", Reshma Sheik and Gokul T. Adethya and Dr. S. Jaya Nirmala

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published