Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Norway? #6

Open
bluetyson opened this issue Aug 14, 2020 · 23 comments
Open

Norway? #6

bluetyson opened this issue Aug 14, 2020 · 23 comments
Labels
wontfix This will not be worked on

Comments

@bluetyson
Copy link

Anecdotally, 'norway' seems to be popular. Even for example when looking at SA Exploration reports. Or here:-
This was from a proposal to do predictive analytics on a bunch of geoscience stuff with some dubious analogies - from a Canadian company.

"sti_keywords": [
[
{
"keyword": "education",
"probability": 0.9999927878379822,
"unstemmed": "EDUCATION"
},
{
"keyword": "prediction",
"probability": 0.9999526143074036,
"unstemmed": "PREDICTIONS"
},
{
"keyword": "stability",
"probability": 0.9999324679374695,
"unstemmed": "STABILITY"
},
{
"keyword": "frequency",
"probability": 0.9998124837875366,
"unstemmed": "FREQUENCIES"
},
{
"keyword": "simulation",
"probability": 0.9997246265411377,
"unstemmed": "SIMULATION"
},
{
"keyword": "detection",
"probability": 0.9996479153633118,
"unstemmed": "DETECTION"
},
{
"keyword": "organization",
"probability": 0.999604344367981,
"unstemmed": "ORGANIZATIONS"
},
{
"keyword": "information system",
"probability": 0.9994807243347168,
"unstemmed": "INFORMATION SYSTEMS"
},
{
"keyword": "test",
"probability": 0.9952819347381592,
"unstemmed": "TESTS"
},
{
"keyword": "management method",
"probability": 0.9947834014892578,
"unstemmed": "MANAGEMENT METHODS"
},
{
"keyword": "document",
"probability": 0.9927297234535217,
"unstemmed": "DOCUMENTS"
},
{
"keyword": "sensitivity analysi",
"probability": 0.9906492829322815,
"unstemmed": "SENSITIVITY ANALYSIS"
},
{
"keyword": "survey",
"probability": 0.9885730743408203,
"unstemmed": "SURVEYS"
},
{
"keyword": "measurement",
"probability": 0.9878258109092712,
"unstemmed": "MEASUREMENT"
},
{
"keyword": "on-line system",
"probability": 0.9819966554641724,
"unstemmed": "ON-LINE SYSTEMS"
},
{
"keyword": "probability theory",
"probability": 0.9808198809623718,
"unstemmed": "PROBABILITY THEORY"
},
{
"keyword": "decision making",
"probability": 0.9807706475257874,
"unstemmed": "DECISION MAKING"
},
{
"keyword": "statistical analysi",
"probability": 0.9802524447441101,
"unstemmed": "STATISTICAL ANALYSIS"
},
{
"keyword": "commerce",
"probability": 0.9727779626846313,
"unstemmed": "COMMERCE"
},
{
"keyword": "wind (meteorology)",
"probability": 0.9680668115615845,
"unstemmed": "WIND (METEOROLOGY)"
},
{
"keyword": "ranking",
"probability": 0.966843843460083,
"unstemmed": "RANKING"
},
{
"keyword": "norway",
"probability": 0.9654883742332458,
"unstemmed": "NORWAY"
},
{
"keyword": "turbulence",
"probability": 0.9554965496063232,
"unstemmed": "TURBULENCE"
},
{
"keyword": "environment",
"probability": 0.9523953795433044,
"unstemmed": "ENVIRONMENTS"
},
{
"keyword": "artificial intelligence",
"probability": 0.9461550712585449,
"unstemmed": "ARTIFICIAL INTELLIGENCE"
},
{
"keyword": "spectrum",
"probability": 0.9431440830230713,
"unstemmed": "SPECTRA"
},
{
"keyword": "supplying",
"probability": 0.9292123317718506,
"unstemmed": "SUPPLYING"
},
{
"keyword": "thesis",
"probability": 0.9177210330963135,
"unstemmed": "THESES"
},
{
"keyword": "industry",
"probability": 0.8984866738319397,
"unstemmed": "INDUSTRIES"
},
{
"keyword": "research management",
"probability": 0.8902236223220825,
"unstemmed": "RESEARCH MANAGEMENT"
},
{
"keyword": "logistic",
"probability": 0.8653072118759155,
"unstemmed": "LOGISTICS"
},
{
"keyword": "marketing",
"probability": 0.8555718660354614,
"unstemmed": "MARKETING"
},
{
"keyword": "material",
"probability": 0.8554256558418274,
"unstemmed": "MATERIALS"
},
{
"keyword": "exploration",
"probability": 0.8326903581619263,
"unstemmed": "EXPLORATION"
},
{
"keyword": "finance",
"probability": 0.7999643683433533,
"unstemmed": "FINANCE"
},
{
"keyword": "procedure",
"probability": 0.7985092997550964,
"unstemmed": "PROCEDURES"
},
{
"keyword": "plasmas (physics)",
"probability": 0.7585477828979492,
"unstemmed": "PLASMAS (PHYSICS)"
},
{
"keyword": "management system",
"probability": 0.7525011897087097,
"unstemmed": "MANAGEMENT SYSTEMS"
},
{
"keyword": "personnel",
"probability": 0.7381218075752258,
"unstemmed": "PERSONNEL"
},
{
"keyword": "quality control",
"probability": 0.6919220685958862,
"unstemmed": "QUALITY CONTROL"
},
{
"keyword": "binders (materials)",
"probability": 0.682461678981781,
"unstemmed": "BINDERS (MATERIALS)"
},
{
"keyword": "statistic",
"probability": 0.6598255634307861,
"unstemmed": "STATISTICS"
},
{
"keyword": "technology transfer",
"probability": 0.6495755910873413,
"unstemmed": "TECHNOLOGY TRANSFER"
},
{
"keyword": "economic",
"probability": 0.6489657163619995,
"unstemmed": "ECONOMICS"
},
{
"keyword": "environment management",
"probability": 0.6489007472991943,
"unstemmed": "ENVIRONMENT MANAGEMENT"
},
{
"keyword": "reliability",
"probability": 0.6394709348678589,
"unstemmed": "RELIABILITY"
},
{
"keyword": "methodology",
"probability": 0.5842968225479126,
"unstemmed": "METHODOLOGY"
},
{
"keyword": "life (durability)",
"probability": 0.560967743396759,
"unstemmed": "LIFE (DURABILITY)"
},
{
"keyword": "task",
"probability": 0.5498309135437012,
"unstemmed": "TASKS"
},
{
"keyword": "feedback",
"probability": 0.5493220686912537,
"unstemmed": "FEEDBACK"
},
{
"keyword": "embedding",
"probability": 0.5298750996589661,
"unstemmed": "EMBEDDING"
},
{
"keyword": "model",
"probability": 0.5171043276786804,
"unstemmed": "MODELS"
},
{
"keyword": "image",
"probability": 0.5064711570739746,
"unstemmed": "IMAGES"
},
{
"keyword": "market research",
"probability": 0.5063830018043518,
"unstemmed": "MARKET RESEARCH"
}
]
],
"topic_probabilities": [
[
{
"keyword": "social and information science",
"probability": 0.9999999967051496,
"unstemmed": "social and information sciences"
},
{
"keyword": "life science",
"probability": 0.9997857836273429,
"unstemmed": "life sciences"
},
{
"keyword": "mathematical and computer science",
"probability": 0.9995673040968835,
"unstemmed": "mathematical and computer sciences"
},
{
"keyword": "geoscience",
"probability": 0.6036738988592714,
"unstemmed": "geosciences"
},
{
"keyword": "aeronautic",
"probability": 0.006360614090616023,
"unstemmed": "aeronautics"
},
{
"keyword": "astronautic",
"probability": 0.0010483669327786213,
"unstemmed": "astronautics"
},
{
"keyword": "engineering",
"probability": 0.0006828873398622611,
"unstemmed": "engineering"
},
{
"keyword": "chemistry and material",
"probability": 0.0006251699807542186,
"unstemmed": "chemistry and materials"
},
{
"keyword": "space science",
"probability": 0.0004059099588450474,
"unstemmed": "space sciences"
},
{
"keyword": "physic",
"probability": 0.000008485140273070631,
"unstemmed": "physics"
},
{
"keyword": "general",
"probability": 4.569450859931134e-8,
"unstemmed": "general"
}
]
],
"topic_threshold": 1

@abuonomo
Copy link
Contributor

That is strange. Not sure why Norway shows up. There is no mention of Norway in the report? Also, how long is the report? One thing to note is these models were trained on abstracts. The performance tends to degrade a bit with very long texts. We actually found that the performance nearly matched that of an abstract when we automatically summarized longer texts down to abstract length (for example, using sumy).

@bluetyson
Copy link
Author

No, and seen it in quite a few others...so relating Norway to something. So summarise and keyword the summary, as well?

@bluetyson
Copy link
Author

Sumy link says not found btw.

@JustinGOSSES
Copy link
Contributor

@JustinGOSSES
Copy link
Contributor

I think Anthony was suggesting to use 'sumy' to summarize large documents down to abstract length.

It is weird that "Norway" is one of the predicted tags. Are there place names, proper names, or acronyms in the text that could relate to Norway?

@bluetyson
Copy link
Author

Didn't look like it and I am somewhat familiar with Norwegian and place names. Came up some times in some purely South Australian things too, The above had a Canadian address in it. Might have to run it on some chunks to narrow it down.

@bluetyson
Copy link
Author

That library does look handy, thanks.

@bluetyson
Copy link
Author

Checked a few of them... maybe 20% of docs had it come up in this 600 odd sample. Checked a few just then for the existence of norway - not that, so something related.

@bluetyson
Copy link
Author

Took the example above and kept shrinking it to find the 'Norwayish' bit. When doing so the Norway probability declined finally down to 0.018 - shrunk it further, then had 0.025 Sweden. Only proper nouns were two company names and SOW as an acronym. So interesting.

@abuonomo
Copy link
Contributor

I'm not sure what the particular reason is for Norway appearing. I think we would have to manually inspect the model for that individual word and see how it is weighting term importances. Could be a case of overfitting. When you say shrink, do you mean you are using automated summarization?

@JustinGOSSES
Copy link
Contributor

JustinGOSSES commented Aug 17, 2020

Just saw this tool, might be useful for debugging this type of problem https://github.com/pair-code/lit

@bluetyson
Copy link
Author

Anthony, no, taking sections of the text to try and see which indicated Norway. Justin, saw that one yesterday, will have to investigate. Maybe a higher proportion of abstracts mentioning norway are exploration geosciencey than others, so a longer document like that gets more similarities.

@bluetyson
Copy link
Author

Tomorrow should be able to do a 'Norway' count on a fair set of documents.

@bluetyson
Copy link
Author

7000 odd reports (some from mangled OCR).

Norway keyword percentage: 82.1

@abuonomo
Copy link
Contributor

abuonomo commented Sep 4, 2020

Interesting. There must be a really strong correlation between the type of content you're feeding in and Norway in the training data. I'll try to take a closer look at the Norway model at some point and see if it leads to any insights.

@bluetyson
Copy link
Author

Thanks! Haven't had time yet to try any of the techniques above, or get to the summarisation comparison. Hopefully this week.

@bluetyson
Copy link
Author

Stability is more common than norway, too, although that at least seems more likely.

@bluetyson
Copy link
Author

Random report summary picked out :

Apart from fair dealing for the purposes of study, research, criticism or review as permitted under the Copyright Act, no part may be reproduced without written permission of the Chief Executive of Primary Industries and Resources South Australia, GPO Box 1671, Adelaide, SA 5001.
A combined seismic reflection and gravity survey was completed across the southeastern portion of Yorke Peninsula for Beach Petroleum Exploration was carried out over a geological province known by drilling to be underlain by )Upper Proterozoic, Cambrian, Permain and Tertiary sediments, the Cambrian and Permian sections of which had been The seismic survey has satisfactorily demonstrated Palaeozoic sedimentary thicknesses to exceed 3500 feet in more southerly areas, and that stratigraphic thickening occurs generally to the south and east.
ments occupy broad glacial valleys, and the sequences tend to be tillitic in the approaches to basal contacts Ludbrook (1961) has recorded thin marine or paralic developments near the base of the section in the Minlaton Comparable developments are believed to have been recorded by A stratigraphic well put down by Beach Petroleum N. L. within the Gulf St. Vincent environs (on Troubridge Island) revealed Permian grey shales and intervening thin sands from 850 feet to the limit of drilling at Undoubted glacial facies appear to be little in evidence.
In the north, submarine topography suggests structural bifurcation with an east segment trending Cambrian limestones outcropping on the coast just south of Ardrossan are similarly monoclinally down-folded to gulfwards (Howchin, 1918).
Primitive estimations place the depth to basement north of Edithburgh approximately 1500 to 2000 feet shallower than in the Yorketown area, The possibility of thrusting of a wedge of base- ment over younger sediments in this coastal zone cannot be ignored entirely.
A complementary south-pitching syn- Local Iclosure'l along the crest of the foregoing anticline is outlined immediately west of Stansbury to the extent of This is herein termed the The 'C' or Deep Horizon map is poorer in reflection quality than the Values are only obtained in one region, representing a limited proportion of the full survey area.
037 - 2 - Appendix II 1 only Mayhew 1000 drilling rig mounted on a 1964 (4 X 4) Bedford truck fitted with 18:00 tyres; short mast, square kelly; Gardner Denver 41" X 6" mud pump; WCQ 427 c.f.m.
The present extension was designed to provide more control for contouring to the north of Stansbury, and also to define the To the east of Currumulka an anomalous NNE-SSW "minimum" trend in both gravity and aeromagnetic corresponds to thickening of Cambrian sediments.
and DARLING, F.W., 1931: "Tables of Theoretical Gravity according to the International Formula" Bulletin SOUTH AUSTRALIAN DEPARTMENT OF MINES, 1955: "Yorke Peninsula - Aeromagnetic Map of Total Intensity" Sheet No.55 - 164, Scale - 13 -
et al, 1966 "Stansbury West No.1 Well Completion Report" by Cundill, Meyers and Associates for Beach Petroleum "The Petrology and Hydrocarbon Potential of some of the Cambrian Carbonates of Yorke Peninsula, South Australia" Unpubl.

@bluetyson
Copy link
Author

"probability_threshold": 0.5,
"request_id": "0",
"sti_keywords": [
[
{
"keyword": "sediment",
"probability": 0.9928513765335083,
"unstemmed": "SEDIMENTS"
},
{
"keyword": "topography",
"probability": 0.9896014928817749,
"unstemmed": "TOPOGRAPHY"
},
{
"keyword": "coast",
"probability": 0.9625886082649231,
"unstemmed": "COASTS"
},
{
"keyword": "drilling",
"probability": 0.9405832886695862,
"unstemmed": "DRILLING"
},
{
"keyword": "australia",
"probability": 0.9212014079093933,
"unstemmed": "AUSTRALIA"
},
{
"keyword": "sedimentary rock",
"probability": 0.8415853381156921,
"unstemmed": "SEDIMENTARY ROCKS"
},
{
"keyword": "ocean bottom",
"probability": 0.8099287152290344,
"unstemmed": "OCEAN BOTTOM"
},
{
"keyword": "survey",
"probability": 0.7431656718254089,
"unstemmed": "SURVEYS"
},
{
"keyword": "limestone",
"probability": 0.731167733669281,
"unstemmed": "LIMESTONE"
},
{
"keyword": "region",
"probability": 0.7080652713775635,
"unstemmed": "REGIONS"
},
{
"keyword": "submarine",
"probability": 0.6398620009422302,
"unstemmed": "SUBMARINES"
},
{
"keyword": "stratigraphy",
"probability": 0.6206201910972595,
"unstemmed": "STRATIGRAPHY"
},
{
"keyword": "thicknes",
"probability": 0.6151227355003357,
"unstemmed": "THICKNESS"
},
{
"keyword": "exploration",
"probability": 0.5438369512557983,
"unstemmed": "EXPLORATION"
},
{
"keyword": "geomorphology",
"probability": 0.5113862156867981,
"unstemmed": "GEOMORPHOLOGY"
}
]
],
"topic_probabilities": [
[
{
"keyword": "geoscience",
"probability": 0.9981602672086761,
"unstemmed": "geosciences"
},
{
"keyword": "life science",
"probability": 0.8026356958046824,
"unstemmed": "life sciences"
},
{
"keyword": "chemistry and material",
"probability": 0.06012665383649461,
"unstemmed": "chemistry and materials"
},
{
"keyword": "engineering",
"probability": 0.054703640877981734,
"unstemmed": "engineering"
},
{
"keyword": "physic",
"probability": 0.0320413217119646,
"unstemmed": "physics"
},
{
"keyword": "mathematical and computer science",
"probability": 0.019253640010804155,
"unstemmed": "mathematical and computer sciences"
},
{
"keyword": "astronautic",
"probability": 0.01115925991857776,
"unstemmed": "astronautics"
},
{
"keyword": "space science",
"probability": 0.010097497008795819,
"unstemmed": "space sciences"
},
{
"keyword": "social and information science",
"probability": 0.0024410125782841226,
"unstemmed": "social and information sciences"
},
{
"keyword": "aeronautic",
"probability": 0.002183980775956557,
"unstemmed": "aeronautics"
},
{
"keyword": "general",
"probability": 0.0007967235959187748,
"unstemmed": "general"
}
]
],
"topic_threshold": 1

@bluetyson
Copy link
Author

and the keywords from the whole report:

,"sti_keywords":[[{"keyword":"detection","probability":1.0,"unstemmed":"DETECTION"},{"keyword":"frequency","probability":1.0,"unstemmed":"FREQUENCIES"},{"keyword":"norway","probability":1.0,"unstemmed":"NORWAY"},{"keyword":"simulation","probability":1.0,"unstemmed":"SIMULATION"},{"keyword":"stability","probability":1.0,"unstemmed":"STABILITY"},{"keyword":"velocity","probability":1.0,"unstemmed":"VELOCITY"},{"keyword":"wind (meteorology)","probability":1.0,"unstemmed":"WIND (METEOROLOGY)"},{"keyword":"drainage","probability":0.9999985098838806,"unstemmed":"DRAINAGE"},{"keyword":"boundary","probability":0.9999971389770508,"unstemmed":"BOUNDARIES"},{"keyword":"binders (materials)","probability":0.9999895691871643,"unstemmed":"BINDERS (MATERIALS)"},{"keyword":"spectrum","probability":0.9999833106994629,"unstemmed":"SPECTRA"},{"keyword":"topography","probability":0.9999693632125854,"unstemmed":"TOPOGRAPHY"},{"keyword":"turbulence","probability":0.9999633431434631,"unstemmed":"TURBULENCE"},{"keyword":"gas","probability":0.9999365210533142,"unstemmed":"GASES"},{"keyword":"gravitation","probability":0.9999074935913086,"unstemmed":"GRAVITATION"},{"keyword":"scattering","probability":0.9998400807380676,"unstemmed":"SCATTERING"},{"keyword":"coast","probability":0.9998008012771606,"unstemmed":"COASTS"},{"keyword":"geology","probability":0.9997913837432861,"unstemmed":"GEOLOGY"},{"keyword":"circulation","probability":0.9997692704200745,"unstemmed":"CIRCULATION"},{"keyword":"horizontal orientation","probability":0.9996856451034546,"unstemmed":"HORIZONTAL ORIENTATION"},{"keyword":"survey","probability":0.9996093511581421,"unstemmed":"SURVEYS"},{"keyword":"deposition","probability":0.9995365738868713,"unstemmed":"DEPOSITION"},{"keyword":"sediment","probability":0.9993331432342529,"unstemmed":"SEDIMENTS"},{"keyword":"crystal","probability":0.9993000030517578,"unstemmed":"CRYSTALS"},{"keyword":"plasmas (physics)","probability":0.9992559552192688,"unstemmed":"PLASMAS (PHYSICS)"},{"keyword":"adult","probability":0.9988338947296143,"unstemmed":"ADULTS"},{"keyword":"rock","probability":0.9987765550613403,"unstemmed":"ROCKS"},{"keyword":"prediction","probability":0.9986652731895447,"unstemmed":"PREDICTIONS"},{"keyword":"error","probability":0.99862140417099,"unstemmed":"ERRORS"},{"keyword":"canada","probability":0.9981276392936707,"unstemmed":"CANADA"},{"keyword":"position (location)","probability":0.9980089068412781,"unstemmed":"POSITION (LOCATION)"},{"keyword":"perturbation","probability":0.9977721571922302,"unstemmed":"PERTURBATION"},{"keyword":"geophysic","probability":0.9969445466995239,"unstemmed":"GEOPHYSICS"},{"keyword":"image","probability":0.9956791996955872,"unstemmed":"IMAGES"},{"keyword":"ion","probability":0.9956188201904297,"unstemmed":"IONS"},{"keyword":"loss","probability":0.9947844743728638,"unstemmed":"LOSSES"},{"keyword":"wake","probability":0.9905879497528076,"unstemmed":"WAKES"},{"keyword":"projectile","probability":0.9886454939842224,"unstemmed":"PROJECTILES"},{"keyword":"resolution","probability":0.987578272819519,"unstemmed":"RESOLUTION"},{"keyword":"x ray","probability":0.9852748513221741,"unstemmed":"X RAYS"},{"keyword":"water","probability":0.9840564727783203,"unstemmed":"WATER"},{"keyword":"longitude","probability":0.9819511771202087,"unstemmed":"LONGITUDE"},{"keyword":"injection","probability":0.9818744659423828,"unstemmed":"INJECTION"},{"keyword":"test","probability":0.9723952412605286,"unstemmed":"TESTS"},{"keyword":"mollusk","probability":0.963683009147644,"unstemmed":"MOLLUSKS"},{"keyword":"attitude (inclination)","probability":0.956942081451416,"unstemmed":"ATTITUDE (INCLINATION)"},{"keyword":"deformation","probability":0.9378564357757568,"unstemmed":"DEFORMATION"},{"keyword":"united kingdom","probability":0.9181330800056458,"unstemmed":"UNITED KINGDOM"},{"keyword":"sensitivity analysi","probability":0.9107515811920166,"unstemmed":"SENSITIVITY ANALYSIS"},{"keyword":"australia","probability":0.9103879928588867,"unstemmed":"AUSTRALIA"},{"keyword":"crystallinity","probability":0.8907729387283325,"unstemmed":"CRYSTALLINITY"},{"keyword":"mathematic","probability":0.870129406452179,"unstemmed":"MATHEMATICS"},{"keyword":"excitation","probability":0.8667937517166138,"unstemmed":"EXCITATION"},{"keyword":"thicknes","probability":0.8121394515037537,"unstemmed":"THICKNESS"},{"keyword":"measurement","probability":0.8025984764099121,"unstemmed":"MEASUREMENT"},{"keyword":"reliability","probability":0.7956236600875854,"unstemmed":"RELIABILITY"},{"keyword":"drying","probability":0.7842564582824707,"unstemmed":"DRYING"},{"keyword":"angular resolution","probability":0.7671957612037659,"unstemmed":"ANGULAR RESOLUTION"},{"keyword":"male","probability":0.6826151609420776,"unstemmed":"MALES"},{"keyword":"wall","probability":0.6692981719970703,"unstemmed":"WALLS"},{"keyword":"oscillation","probability":0.6411301493644714,"unstemmed":"OSCILLATIONS"},{"keyword":"female","probability":0.6219513416290283,"unstemmed":"FEMALES"},{"keyword":"shape","probability":0.49701258540153503,"unstemmed":"SHAPES"},{"keyword":"biodynamic","probability":0.48482751846313477,"unstemmed":"BIODYNAMICS"},{"keyword":"target","probability":0.4754076600074768,"unstemmed":"TARGETS"},{"keyword":"heating","probability":0.46355319023132324,"unstemmed":"HEATING"},{"keyword":"europe","probability":0.4513147473335266,"unstemmed":"EUROPE"},{"keyword":"infrared radiation","probability":0.36240354180336,"unstemmed":"INFRARED RADIATION"},{"keyword":"analysis (mathematics)","probability":0.357046902179718,"unstemmed":"ANALYSIS (MATHEMATICS)"}]],"topic_probabilities":[[{"keyword":"geoscience","probability":0.9999999999995617,"unstemmed":"geosciences"},{"keyword":"life science","probability":0.9999999998044267,"unstemmed":"life sciences"},{"keyword":"space science","probability":5.952435167211817e-05,"unstemmed":"space sciences"},{"keyword":"physic","probability":4.797571174203055e-07,"unstemmed":"physics"},{"keyword":"aeronautic","probability":3.452614189217303e-07,"unstemmed":"aeronautics"},{"keyword":"social and information science","probability":2.775078370610512e-07,"unstemmed":"social and information sciences"},{"keyword":"engineering","probability":2.1643729127246794e-07,"unstemmed":"engineering"},{"keyword":"astronautic","probability":3.961110196099098e-08,"unstemmed":"astronautics"},{"keyword":"chemistry and material","probability":6.182643119277685e-10,"unstemmed":"chemistry and materials"},{"keyword":"mathematical and computer science","probability":4.1405143328909e-10,"unstemmed":"mathematical and computer sciences"},{"keyword":"general","probability":2.2434030242077953e-19,"unstemmed":"general"}]],"topic_threshold":1.0},"service_version":"unspecified","status":"okay"}

@bluetyson
Copy link
Author

I made a graph out of what I have done, here's the edge counts for 'norway'

[('stability', {'weight': 12740}),
('simulation', {'weight': 12502}),
('detection', {'weight': 12152}),
('spectrum', {'weight': 12022}),
('drainage', {'weight': 11570}),
('survey', {'weight': 11534}),
('wind (meteorology)', {'weight': 11514}),
('frequency', {'weight': 11514}),
('boundary', {'weight': 11266}),
('scattering', {'weight': 11106}),
('binders (materials)', {'weight': 10732}),
('turbulence', {'weight': 10722}),
('canada', {'weight': 10514}),
('geology', {'weight': 10392}),
('plasmas (physics)', {'weight': 10370}),
('prediction', {'weight': 10004}),
('deposition', {'weight': 9838}),
('rock', {'weight': 9748}),
('circulation', {'weight': 9592}),
('water', {'weight': 9564}),
('sediment', {'weight': 9484}),
('velocity', {'weight': 8930}),
('geophysic', {'weight': 8788}),
('excitation', {'weight': 8514}),
('position (location)', {'weight': 8494}),
('crystal', {'weight': 8336}),
('injection', {'weight': 8102}),

@bluetyson
Copy link
Author

and coal
[('stability', {'weight': 5478}),
('simulation', {'weight': 5344}),
('norway', {'weight': 5254}),
('detection', {'weight': 5168}),
('spectrum', {'weight': 5116}),
('drainage', {'weight': 5074}),
('wind (meteorology)', {'weight': 4916}),
('frequency', {'weight': 4900}),
('binders (materials)', {'weight': 4766}),
('deposition', {'weight': 4732}),
('geology', {'weight': 4700}),
('scattering', {'weight': 4686}),
('survey', {'weight': 4684}),
('boundary', {'weight': 4648}),
('rock', {'weight': 4558}),
('turbulence', {'weight': 4546}),
('sediment', {'weight': 4512}),
('water', {'weight': 4492}),
('canada', {'weight': 4420}),

@bluetyson
Copy link
Author

One area perhaps where Northern Hemisphere dominated keywords will be a problem - would be differences between petroleum and gas if they call them all gas?

@JustinGOSSES JustinGOSSES added the wontfix This will not be worked on label Jul 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

3 participants