From b41f6de87295ed18a9654d281558b6e4e8dc287f Mon Sep 17 00:00:00 2001 From: Suraj Baloni Date: Wed, 22 May 2024 09:46:50 +0530 Subject: [PATCH] Update supported doccano version --- .../labeling-text-using-doccano.ipynb | 161 +++++++++++++++++- 1 file changed, 160 insertions(+), 1 deletion(-) diff --git a/guide/14-deep-learning/labeling-text-using-doccano.ipynb b/guide/14-deep-learning/labeling-text-using-doccano.ipynb index 7d236c049e..1b820a374c 100644 --- a/guide/14-deep-learning/labeling-text-using-doccano.ipynb +++ b/guide/14-deep-learning/labeling-text-using-doccano.ipynb @@ -1 +1,160 @@ -{"cells": [{"cell_type": "markdown", "metadata": {}, "source": ["# Labeling text using Doccano\n", "\n", "[Doccano](https://github.com/doccano/doccano) is an open source text annotation tool. It can be used to create labeled datasets for:\n", "- Text classification\n", "- Entity extraction \n", "- Sequence to sequence translation \n", "\n", "Doccano can be used to create labeled data for training the `EntityRecongnizer` model in `arcgis.learn`.\n", "\n", "This software is created by: Hiroki Nakayama and\n", " Takahiro Kubo and\n", " Junya Kamura and\n", " Yasufumi Taniguchi and\n", " Xu Liang"]}, {"cell_type": "markdown", "metadata": {"toc": true}, "source": ["

Table of Contents

\n", "
"]}, {"cell_type": "markdown", "metadata": {}, "source": ["# Deploying doccano for data labeling"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## For Windows"]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Method 1 (Using docker desktop requires Microsoft Windows 10 Professional or Enterprise 64-bit):\n", "\n", "1. Install [docker for desktop](https://hub.docker.com/editions/community/docker-ce-desktop-windows).\n", "2. Launch _Command Prompt(cmd.exe)_ as _Administrator_ and run the below commands: \n", " - `docker pull doccano/doccano:1.2.4`\n", " - `docker container create --name doccano -e \"ADMIN_USERNAME=admin\" -e \"ADMIN_EMAIL=admin@example.com\" -e \"ADMIN_PASSWORD=password\" -p 8000:8000 doccano/doccano:1.2.4`\n", " - `docker start doccano`\n", "3. You can now access Doccano at _http://localhost:8000_"]}, {"cell_type": "markdown", "metadata": {}, "source": ["### Method 2:\n", "1. Download or clone the [arcgis-python-api](https://github.com/Esri/arcgis-python-api) githup repo.\n", "2. Navigate to `misc/tools/doccano_deployment` folder.\n", "3. Run _**install.bat**_ as administrator.\n", "4. On the command prompt, you will be asked to create your username and password for accessing Doccano.\n", "5. Once the install script completes, you should have Doccano running on your local system.\n", "6. Open your browser and go to _http://localhost:8000/_"]}, {"cell_type": "markdown", "metadata": {}, "source": ["## For Linux\n", "\n", "1. Install [Docker Engine (Community)](https://hub.docker.com/search?q=&type=edition&offering=community&operating_system=linux) for your linux distribution. \n", "2. Launch a _terminal_ and run the below commands:\n", " - `sudo docker pull doccano/doccano:1.2.4`\n", " - `sudo docker container create --name doccano -e \"ADMIN_USERNAME=admin\" -e \"ADMIN_EMAIL=admin@example.com\" -e \"ADMIN_PASSWORD=password\" -p 80:8000 doccano/doccano:1.2.4`\n", " - `docker start doccano`\n", " - You can modify the ADMIN_USERNAME, ADMIN_EMAIL and ADMIN_PASSWORD values.\n", "3. You can now access Doccano at _http://localhost:8000_ \n", " "]}, {"cell_type": "markdown", "metadata": {}, "source": ["# How to label training data for _named entity recognition_ with _doccano_.\n", "\n", "1.\tAfter Doccano has been deployed to the local machine, go to [Doccano hompage](http://localhost:8000) and login with your credentials.\n", "2.\tCreate new project with project type *_'Sequence labeling'_*: \n", "\n", "\n", "\n", "3. To import data for annotation, go to _Dataset_ from the left panel then click on _Actions_ > _Import dataset_. \n", "\n", "\n", "\n", "4. Select 'JSONL' and then click on 'Select file(s)' and point it to the reports file (docanno_deployment\\reports_label.jsonl). Alternatively, text documents can also be uploaded using the \u2018Plain text\u2019 option.\n", "\n", "5. After the file has been imported, you will see the documents loaded on the screen. \n", "6. Click on 'Start annotation' from the top menu bar.\n", "\n", "\n", "7. All the documents are pre-labeled, just 3 (document number 2,3 & 4) are intentionally left unlabeled for you to try labeling. Analyze the first labeled document and then move on to second document (use the bottom navigation bar for sifting through the docs). Mark sequences with your mouse and select the relevant title.\n", "\n", "\n", "8.\tNew labels can also be created by navigating to \u2018Labels\u2019 from the ;eft panel.\n", "9.\tOnce all the documents have been labeled, go to 'Dataset' > 'Actions' > 'Export dataset'.\n", "10.\tSelect JSONL(Text-Labels).\n", "11. Set an export file name.\n", "11.\tClick Export.\n", "\n", "\n", "\n", "The downloaded file can be used to train an `EntityRecognizer` model from `arcgis.learn`. You can find a sample notebook [here](https://developers.arcgis.com/python/sample-notebooks/information-extraction-from-madison-city-crime-incident-reports-using-deep-learning/#Publishing-the-results-as-feature-layer) "]}], "metadata": {"kernelspec": {"display_name": "Python 3", "language": "python", "name": "python3"}, "language_info": {"codemirror_mode": {"name": "ipython", "version": 3}, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.2"}, "toc": {"base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": true, "toc_position": {}, "toc_section_display": true, "toc_window_display": true}}, "nbformat": 4, "nbformat_minor": 4} \ No newline at end of file +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Labeling text using Doccano\n", + "\n", + "[Doccano](https://github.com/doccano/doccano) is an open source text annotation tool. It can be used to create labeled datasets for:\n", + "- Text classification\n", + "- Entity extraction \n", + "- Sequence to sequence translation \n", + "\n", + "Doccano can be used to create labeled data for training the `EntityRecongnizer` model in `arcgis.learn`.\n", + "\n", + "This software is created by: Hiroki Nakayama and\n", + " Takahiro Kubo and\n", + " Junya Kamura and\n", + " Yasufumi Taniguchi and\n", + " Xu Liang" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "toc": true + }, + "source": [ + "

Table of Contents

\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Deploying doccano for data labeling" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## For Windows" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Method 1 (Using docker desktop requires Microsoft Windows 10 Professional or Enterprise 64-bit):\n", + "\n", + "1. Install [docker for desktop](https://hub.docker.com/editions/community/docker-ce-desktop-windows).\n", + "2. Launch _Command Prompt(cmd.exe)_ as _Administrator_ and run the below commands: \n", + " - `docker pull doccano/doccano:1.2.0`\n", + " - `docker container create --name doccano -e \"ADMIN_USERNAME=admin\" -e \"ADMIN_EMAIL=admin@example.com\" -e \"ADMIN_PASSWORD=password\" -p 8000:8000 doccano/doccano:1.2.0`\n", + " - `docker start doccano`\n", + "3. You can now access Doccano at _http://localhost:8000_" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Method 2:\n", + "1. Download or clone the [arcgis-python-api](https://github.com/Esri/arcgis-python-api) githup repo.\n", + "2. Navigate to `misc/tools/doccano_deployment` folder.\n", + "3. Run _**install.bat**_ as administrator.\n", + "4. On the command prompt, you will be asked to create your username and password for accessing Doccano.\n", + "5. Once the install script completes, you should have Doccano running on your local system.\n", + "6. Open your browser and go to _http://localhost:8000/_" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## For Linux\n", + "\n", + "1. Install [Docker Engine (Community)](https://hub.docker.com/search?q=&type=edition&offering=community&operating_system=linux) for your linux distribution. \n", + "2. Launch a _terminal_ and run the below commands:\n", + " - `sudo docker pull doccano/doccano:1.2.4`\n", + " - `sudo docker container create --name doccano -e \"ADMIN_USERNAME=admin\" -e \"ADMIN_EMAIL=admin@example.com\" -e \"ADMIN_PASSWORD=password\" -p 80:8000 doccano/doccano:1.2.4`\n", + " - `docker start doccano`\n", + " - You can modify the ADMIN_USERNAME, ADMIN_EMAIL and ADMIN_PASSWORD values.\n", + "3. You can now access Doccano at _http://localhost:8000_ \n", + " " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# How to label training data for _named entity recognition_ with _doccano_.\n", + "\n", + "1.\tAfter Doccano has been deployed to the local machine, go to [Doccano hompage](http://localhost:8000) and login with your credentials.\n", + "2.\tCreate new project with project type *_'Sequence labeling'_*: \n", + "\n", + "\n", + "\n", + "3. To import data for annotation, go to _Dataset_ from the left panel then click on _Actions_ > _Import dataset_. \n", + "\n", + "\n", + "\n", + "4. Select 'JSONL' and then click on 'Select file(s)' and point it to the reports file (docanno_deployment\\reports_label.jsonl). Alternatively, text documents can also be uploaded using the ‘Plain text’ option.\n", + "\n", + "5. After the file has been imported, you will see the documents loaded on the screen. \n", + "6. Click on 'Start annotation' from the top menu bar.\n", + "\n", + "\n", + "7. All the documents are pre-labeled, just 3 (document number 2,3 & 4) are intentionally left unlabeled for you to try labeling. Analyze the first labeled document and then move on to second document (use the bottom navigation bar for sifting through the docs). Mark sequences with your mouse and select the relevant title.\n", + "\n", + "\n", + "8.\tNew labels can also be created by navigating to ‘Labels’ from the ;eft panel.\n", + "9.\tOnce all the documents have been labeled, go to 'Dataset' > 'Actions' > 'Export dataset'.\n", + "10.\tSelect JSONL(Text-Labels).\n", + "11. Set an export file name.\n", + "11.\tClick Export.\n", + "\n", + "\n", + "\n", + "The downloaded file can be used to train an `EntityRecognizer` model from `arcgis.learn`. You can find a sample notebook [here](https://developers.arcgis.com/python/sample-notebooks/information-extraction-from-madison-city-crime-incident-reports-using-deep-learning/#Publishing-the-results-as-feature-layer) " + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.8" + }, + "toc": { + "base_numbering": 1, + "nav_menu": {}, + "number_sections": true, + "sideBar": true, + "skip_h1_title": false, + "title_cell": "Table of Contents", + "title_sidebar": "Contents", + "toc_cell": true, + "toc_position": {}, + "toc_section_display": true, + "toc_window_display": true + } + }, + "nbformat": 4, + "nbformat_minor": 4 +}