module dsconcept missing #5

bluetyson · 2020-08-13T10:30:31Z

sorry, should have put it here 👎

~/concept-tagging-api/tests$ python context.py
Traceback (most recent call last):
File "context.py", line 8, in
import app
File "/home/ubuntu/concept-tagging-api/service/app.py", line 6, in
import dsconcept.get_metrics as gm
ModuleNotFoundError: No module named 'dsconcept'

and re: other repo

fatal: unable to access 'https://developer.nasa.gov/DataSquad/classifier_scripts.git/': Could not resolve host: developer.nasa.gov
ERROR: Command errored out with exit status 128: git clone -q https://developer.nasa.gov/DataSquad/classifier_scripts.git

abuonomo · 2020-08-13T13:33:55Z

Did you try installing the library with this command?

pip install git+https://github.com/nasa/[email protected]_source_release#egg=dsconcept

What was the exact command you executed that resulted in the fatal: unable to access error?

bluetyson · 2020-08-13T22:16:19Z

Trying to run the app ... no have not tried that one, will give it a shot later today thanks

bluetyson · 2020-08-17T05:25:24Z

That appears to have worked thank you Anthony. Now just have to wait to be allowed to access it to try it.

abuonomo · 2020-08-17T15:47:04Z

Glad to hear it. Sorry about those misleading links. Would love to hear how it ends up working.

bluetyson · 2020-08-24T07:38:23Z

Anthony, have it running now. Have to seriously test soon - probably have a project full of documents coming too.

bluetyson · 2020-08-27T03:01:11Z

Speaking of which, do you have a recommended size limit for firing a batch at it, do you think?

bluetyson · 2020-08-27T23:39:31Z

e.g. context being - if this generally working on short abstracts - and start throwing long reports at it to see what comes out - reasonable length for model to handle, that sort of thing - 10MB text dump of a report being like 10K 200 word abstracts?

bluetyson · 2020-08-28T06:51:19Z

Seems at the moment approx 4MB might be a limit (at least at this machine capacity) for a single text payload. Everything from 1KB up to 7MB or so in this 7000 odd I am trying.

This is one example: https://mer-env.s3-ap-southeast-2.amazonaws.com/ENV01111.pdf-textract-text.txt

abuonomo · 2020-08-30T16:52:27Z

To optimize speed, my tests (using abstracts) indicated that a batch size of about 700 documents is ideal. More than that gave diminishing returns on speed. If you are working with longer documents, the ideal batch size may be lower. However, I would recommend automatically summarizing the text before sending it to the API, as other tests indicated that this is best for getting the most accurate tags.

bluetyson · 2020-08-31T10:31:44Z

Thanks Anthony, so maybe a megabyte or so in that case?

I am going to do the summarizer as well for the same set and compare.

abuonomo · 2020-09-04T15:39:18Z

I'm not totally sure, but that sounds about right.

bluetyson · 2020-09-06T02:22:41Z

Seems like around 3MB docs with presumably lots of words etc. can run 32GB out of memory and core dump a machine

RichardScottOZ · 2023-10-10T07:49:40Z

Notes on installing this repo: change scm_version to False to get it to in
sudo apt-get install cargo
for rust package manager for example

RichardScottOZ · 2023-10-10T07:54:48Z

and if running local python3 -m spacy download en_core_web_sm

JustinGOSSES added the wontfix This will not be worked on label Jul 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

module dsconcept missing #5

module dsconcept missing #5

bluetyson commented Aug 13, 2020

abuonomo commented Aug 13, 2020 •

edited

Loading

bluetyson commented Aug 13, 2020

bluetyson commented Aug 17, 2020

abuonomo commented Aug 17, 2020

bluetyson commented Aug 24, 2020 •

edited

Loading

bluetyson commented Aug 27, 2020

bluetyson commented Aug 27, 2020 •

edited

Loading

bluetyson commented Aug 28, 2020 •

edited

Loading

abuonomo commented Aug 30, 2020

bluetyson commented Aug 31, 2020

abuonomo commented Sep 4, 2020

bluetyson commented Sep 6, 2020

RichardScottOZ commented Oct 10, 2023

RichardScottOZ commented Oct 10, 2023

module dsconcept missing #5

module dsconcept missing #5

Comments

bluetyson commented Aug 13, 2020

abuonomo commented Aug 13, 2020 • edited Loading

bluetyson commented Aug 13, 2020

bluetyson commented Aug 17, 2020

abuonomo commented Aug 17, 2020

bluetyson commented Aug 24, 2020 • edited Loading

bluetyson commented Aug 27, 2020

bluetyson commented Aug 27, 2020 • edited Loading

bluetyson commented Aug 28, 2020 • edited Loading

abuonomo commented Aug 30, 2020

bluetyson commented Aug 31, 2020

abuonomo commented Sep 4, 2020

bluetyson commented Sep 6, 2020

RichardScottOZ commented Oct 10, 2023

RichardScottOZ commented Oct 10, 2023

abuonomo commented Aug 13, 2020 •

edited

Loading

bluetyson commented Aug 24, 2020 •

edited

Loading

bluetyson commented Aug 27, 2020 •

edited

Loading

bluetyson commented Aug 28, 2020 •

edited

Loading