Skip to content

varvar/nlp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Natural language processing

Project Setup

git clone https://github.com/varvar/nlp.git
cd to folder nlp
npm install
npm update

Run

npm start

Once server started it will be accessible on http://localhost:3000/

API Reference

Perform tokenization process

Run tokenizer process for provided .txt file. Average processing time for 5 MB file is about 10 sec and depends on download speed.

  • URL

    http://localhost:3000/process

  • Method:

    POST

  • Data Params

      {"file":"http://www.gutenberg.org/cache/epub/10/pg10.txt"}
    
    

    Please note, that file property required and not optional

  • Success Response:

    • Code: 200
      Content:
      {
          "processStatus": "Done",
          "fileName": "pg10.txt",
          "chunksProcessed": 148,
          "state": {
              "totalProcessingTime": "7 sec",
              "fileSize": "4.25 MB",
              "downloadSpeed": "601.7 kB/sec"
          }
      }
      
  • Error Response:

    • Code: 500 SERVER ERROR
      Content: { errorObj }

    OR

    • Code: 400 BAD REQUEST
      Content: { "message": "File value can not be empty!" }
  • Notes:

    "fileName" property from response object required for getting the words list in next API call, since it's an identifier for retrieving relevant json data. The rest of properties is for information only.

Get words list and repetitions with sorting options

Returns json data for provided file name with words list and repetitions.

  • URL

    http://localhost:3000/words/{fileName}/{sort}/{order}

    For example: http://localhost:3000/words/pgs10.txt/repetitions/desc

  • Method:

    GET

  • URL Params

    Required:

    fileName=[string]

    Optional:

    `sort=[repetitions/word]`
    `order=[asc/desc]`
    
  • Success Response:

    • Code: 200
      Content:
      [
        {
          "word": "project",
          "repetitions": 5
        },
        {
          "word": "gutenberg",
          "repetitions": 4
        },
        {
          "word": "ebook",
          "repetitions": 8
        },
        ....
      ]
      
  • Error Response:

    • Code: 500 SERVER ERROR
      Content:
      {
        "message": {
            "errno": -2,
            "code": "ENOENT",
            "syscall": "open",
            "path": "/nlp/app/controllers/../../files/pgs10.txt.json"
        }
      }
      
      

Pipe words list to client

Serving json data for provided file name via readable stream. This type of serving can increase request speed and performance.

  • URL

    http://localhost:3000/pipe/{fileName}

    For example: http://localhost:3000/pipe/pgs10.txt

  • Method:

    GET

  • URL Params

    Required:

    fileName=[string]

  • Success Response:

    • Code: 200
      Content:
      [
        {
          "word": "project",
          "repetitions": 5
        },
        {
          "word": "gutenberg",
          "repetitions": 4
        },
        {
          "word": "ebook",
          "repetitions": 8
        },
        ....
      ]
      
  • Error Response:

    • Code: 500 SERVER ERROR
      Content:
      {
        "message": {
            "errno": -2,
            "code": "ENOENT",
            "syscall": "open",
            "path": "/nlp/app/controllers/../../files/pgs10.txt.json"
        }
      }
      
      

Tests

run "npm test" inside project folder.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published