-
Notifications
You must be signed in to change notification settings - Fork 222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Norwegian language support #18
base: master
Are you sure you want to change the base?
Conversation
smak 2 | ||
begrensning -2 | ||
grenser -1 | ||
prosedyre, rettstvist -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a comma here. maybe you should erase "prosedyre" and just maintain "rettstvist" on -1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your quick response! I will go through these reviews and with your reccomendation submit a new PR.
Yes, you are right, this should just be "rettstvist".
perfeksjonert 2 | ||
fullkommenhet 3 | ||
perfeksjonerer 2 | ||
peril -2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a Norwegian word?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The list was translated with Google Translate, and the list I submitted is compiled from 4 lists:
- words translated to norwegian
- english words that could not be translated
- new words I manually added
- a "stoplist" of words to be removed from the final list
I have left the untranslatable english words in the list as I thought it couldn't harm and sometimes english is mised in with norwegian anyway.
I can resubmit the list without the english words if you prefer.
popularitet 3 | ||
positiv 2 | ||
positivt 2 | ||
eiendomspronomen -2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this make sense as a word with sentiment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove 'eiendomspronomen'
positiv 2 | ||
positivt 2 | ||
eiendomspronomen -2 | ||
post traumatisk -2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be written in one word in Norwegian
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Three variations are commonly used:
- post traumatisk
- post-traumatisk
- posttraumatisk
When classifying text I replace "-" with a space so the keyword "post traumatisk" would catch the first two variations.
I can add 'posttraumatisk' to catch the third variation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine
love 1 | ||
lovet 1 | ||
løfter 1 | ||
reklamere 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can mean "reklamasjon" in Norwegian and might have a negative sentiment. I suppose it is a doubtful translation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this has probably a more negative sentiment. I can remove it for now.
forfremmet 1 | ||
fremmer 1 | ||
fremme 1 | ||
omgå 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should have a negative sentiment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
begeistrede 4 | ||
utslett -2 | ||
ratifisert 2 | ||
å nå 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you check this. Is "å" necessary here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this one should be ok actually.
"nå" means just "now" and is too broad.
"å nå" means "to reach" and I've double checked in news and twitter search and is generally about reaching targets/milestones, reaching mount everest, reaching the youth.
konsekvenser -2 | ||
reprimande -2 | ||
irettesettelser -2 | ||
repulse -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a Norwegian word?
salutter 2 | ||
frelse 2 | ||
sarkastisk -2 | ||
lagre 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this word and the next have sentiment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove these for now.
skammelig -2 | ||
dele 1 | ||
delt 1 | ||
aksjer 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose that in the normal sense this word has usually little sentiment value to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove this one
stjal -2 | ||
stjålet -2 | ||
stoppe -1 | ||
stoppe -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This word is repeated twice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicates was supposed to have been removed. I'll double check the pipeline for the next list export.
stopp -1 | ||
traust 2 | ||
rett 1 | ||
rar -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This word seems to have a wrong sentiment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"rar" is maybe a bit different in norwegian than in danish.
In norwegian "rar" means odd/strange, so I think the sentiment is correct.
tard -2 | ||
anløpe -2 | ||
anløpet -2 | ||
oksiderer -2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This word should probably have no sentiment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added to stoplist
undergraver -2 | ||
tapte terreng -2 | ||
dårlige -2 | ||
dårlige resultater -2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems superfluous given the one above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove this..
uverdig -2 | ||
oppløftende 2 | ||
besvær -2 | ||
som haster -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why "som"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
krenket -2 | ||
bryter med -2 | ||
vold -3 | ||
vold relatert -3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why "vold relatert" when "vold" is there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be one word?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is very rarely used so I'll remove it
veletablerte 2 | ||
godt fokusert 2 | ||
velstelt 2 | ||
vel proporsjonert 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be one word?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
vinne 4 | ||
vinner 4 | ||
visdom 1 | ||
skulle ønske 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't "skulle" be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
fanatiker -2 | ||
fanatikere -2 | ||
nidkjær 2 | ||
abhorred -3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a considerable number of English words below here.
reddes 2 | ||
brakseier 2 | ||
drømmetur 2 | ||
årets 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does "årets" have a sentiment value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"årets" in itself is not positive, but I've found it used more in a positive context than negative.
I was doing SA on news headlines and "årets" is often used when there is a significant event either the first of the year or an annual event like a festival or an award for person of the year.
For these headlines "årets" is really the only word that indicates that the sentiment of the sentence is positive:
"Christian er årets trønder 2017"
"Salah er årets spiller i Premier League"
Other typical uses:
News:
Alexander Rybak vant årets Melodi Grand Prix
Northug slo landslagskollega og tok årets første seier
Lothepus ble kåret til «årets kjendis»
Årets viktigste bok om vår tids viktigste tema
Årets hederspris til Mari Boine
Terje fra Snåsa ble årets Ildsjel
Hvem av disse fortjener tittelen årets bygdeprofil?
Har store forventninger til årets «Hver gang vi møtes»
Hva er årets tv-øyeblikk? Stem på din Gullruten-favoritt her!
Twitter::
"Følg årets viktigste konferanse"
"Årets viktigaste budskap. "
"årets nyhet"
"Det skal i alle fall ikke stå på været når vi drar igang årets littfest"
It can also be used in a negative context:
Drilltropp gikk for sakte - får ikke gå i årets 17. mai-tog
Nedjustert anslag for årets oljeinvesteringer
AaFK gikk på årets første serietap
kåret 2 | ||
drømmejobb 2 | ||
tjener 1 | ||
kommer til oslo 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this phrase and the 3 below make sense to be given a sentiment value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can remove these as they are maybe just relevant to my usecase.
As I deal mainly with news and if they write that someone is coming to the city it is generally someone famous and the sentence therefore has a positive sentiment.
sjekk resultatet 1 | ||
trosser 2 | ||
trosse 2 | ||
her er resultatet 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this have a sentiment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove sjekk resultatet /her er resultatet.
Again this was used with news headlines.
hetere -2 | ||
steinkasterne -2 | ||
ikke invitert -3 | ||
aldri opplevd maken -3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This phrase does not always seems to be negative?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I see now it is also used in a positive context.
I think it is mainly used in negative context, so I can maybe reduce the weight to -1.
ekstraskatt -2 | ||
uforståelig -2 | ||
skodrama -1 | ||
derfor skal ikke -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be included?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
fortviler -2 | ||
avvikling -1 | ||
ryktene -1 | ||
hardporno -2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be negative?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed for now. As I deal mainly with news, if this is mentioned it is usually in a negative context.
oppdiktet -1 | ||
storslått 3 | ||
mishandling -2 | ||
påbudt, bindende -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two words here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll fix this
apati -2 | ||
apokalyptisk -2 | ||
be om unnskyldning -1 | ||
apologized -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
English?
be om unnskyldning -1 | ||
apologized -1 | ||
beklager -3 | ||
apologizing -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
English?
overse -1 | ||
ignorert -2 | ||
ignorerer -1 | ||
jeg vil -2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be included?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed.
påfører -2 | ||
innflytelsesrik 2 | ||
overtredelse -2 | ||
gjøre rasende -2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just "rasende"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'll fix this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made some comments on the individual entries. I hope you might want to go over them?
Thank you for taking the time to look through this! I'm going to make some changes to the pipeline and build process and compile a new list soon. Do you know of any tools to calculate optimized sentiment weights for the keywords? I have different labelled training sets I could test this on. |
If you have a word embedding it is possible to compute another scoring, see the way I have used AFINN in this article: http://www2.compute.dtu.dk/pubdb/views/edoc_download.php/7029/pdf/imm7029.pdf |
@olavski I am wondering if you had time to look at the possible changes? |
Hi @fnielsen |
@olavski Fine! Thanks for the information. |
Hi!
I've translated the english list to Norwegian and have added about 400 new or variations of existing words.
I hope you can add this list to the project.