Skip to content
This repository has been archived by the owner on May 8, 2024. It is now read-only.

Remove strange wikidata punctuation on location specifiers #220

Closed
MansMeg opened this issue Jan 19, 2023 · 11 comments
Closed

Remove strange wikidata punctuation on location specifiers #220

MansMeg opened this issue Jan 19, 2023 · 11 comments

Comments

@MansMeg
Copy link
Collaborator

MansMeg commented Jan 19, 2023

See for example:
Q5885293,"Kråkered," (now fixed)

We should not include location specifiers with punctuation if the ordinary name exist (like Kråkered in this case).

see:
https://github.com/welfare-state-analytics/riksdagen-corpus/blob/main/corpus/metadata/location_specifier.csv

@ninpnin
Copy link
Collaborator

ninpnin commented Jan 26, 2023

This is a problem upstream https://www.wikidata.org/wiki/Q5885293

@MansMeg
Copy link
Collaborator Author

MansMeg commented Jan 27, 2023

Yes. The question is how to solve this. I guess we would like to remove stuff in our corpus but that people might want to keep in wikidata, so that there will not be a perfect alignment with wikidata. Maybe add a csv with stuff we exclude from wikidata we add to the updating script from wikidata? Or do you have another solution?

@ninpnin
Copy link
Collaborator

ninpnin commented Jan 27, 2023

I mean those misspellings could be just fixed on wikidata?

EDIT: AFAIK those additional commas don't introduce any errors to our corpus

@MansMeg
Copy link
Collaborator Author

MansMeg commented Jan 27, 2023

No. I know. My point is that sooner or later we might end up with differences. But maybe not in the next couple of moths. Then fixing this in wikidata is probably easiest.

@BobBorges
Copy link
Collaborator

BobBorges commented Sep 22, 2023

They need to be edited on wikidata:

  • Q116687501,"Gärde, Malcolm"
  • Q117011372,"Falla, Ernst"
  • Q117039047,"Gärestad,Hans Alfred Petersson"
  • Q117280842,"Mora-Noret, Nils J, 5:79"
  • Q117288109,"Stockholm, Per Oskar Samuelson"
  • Q117288989,"Göteborg, Gustav Harald Svensson"
  • Q15956417,"Stockholm, Gustaf, 1:152"
  • Q3373681,"Äppelviken, Axel"
  • Q459226,"Stockholm, Anita I, 1:91"
  • Q4730705,"Alversjö, Allan F A, 2:160"
  • Q4795536,"Lekåsa, J Aron, 4:321"
  • Q4955783,"Sundbyberg, Margo"
  • Q5712545,"Stockholm,"
  • Q5715803,"Myckelgård, Gustaf"
  • Q5721610,"Stockholm, Knut G"
  • Q5723194,"Multrå, Johan, 5:219"
  • Q5724038,"Jönköping, Carl B N"
  • Q5747977,"Stockholm, Thorvald, 1:90"
  • Q5768473,"Harads, Johan Erik"
  • Q5768483,"Gränna, Johan Gustaf"
  • Q5770994,"Skövde, J M Alfred"
  • Q5777495,"Sjögesta senare Örebro, Anders P, 4:486"
  • Q5780366,"Stockholm,"
  • Q5785596,"Härnösand,"
  • Q5786130,"Blomberg,"
  • Q5789110,"Gårda, Gustav W"
  • Q5795578,"Riseberga,"
  • Q5795659,"Stjärnebo, F A Hugo, 2:73"
  • Q5854947,"Visby, C Suno H"
  • Q5885438,"Kalmar, J August"
  • Q5928617,"Kyrkdal senare Sollefteå, E Harald, 5:235"
  • Q5942265,"Ystad,"
  • Q5951819,"Göteborg,"
  • Q5961317,"Tjörn, Axel V, 4:118"
  • Q6001576,"Stockholm, Carl Göran D"
  • Q6011317,"Stävie, Nils"
  • Q6026693,"Rögle,"
  • Q6026862,"Kullenbergstorp, Gillis O T C , 3:252"
  • Q6027237,"Kvarnbrodda, Jöns"
  • Q6031148,"Anderstorp, C E Holge, 2:174"
  • Q6044908,"Hasselstad, August, 3:71"
  • Q6045550,"Ugglekull, Peter"
  • Q6062302,"Övedskloster, Otto A P, 3:265"
  • Q6139775,"Öckerö,"
  • Q6157386,"Hammerdal, Johan"
  • Q6161169,"Hofors, H Hjalmar, 5:167"
  • Q6195438,"Växjö, S A Gustaf, 2:238"
  • Q6199292,"Stockholm, David C, 1:177"
  • Q6199894,"Örebro, G Ruben"
  • Q6255608,"Stångby, Jöns, 3:293"
  • Q6298643,"Gäre, Carl"

@MansMeg
Copy link
Collaborator Author

MansMeg commented Sep 22, 2023

Ping @salgo60 . Is this something you could take a pass on?

@salgo60
Copy link
Contributor

salgo60 commented Sep 22, 2023

@salgo60
Copy link
Contributor

salgo60 commented Sep 22, 2023

@MansMeg what problem did you find with Q117288109

image

@MansMeg
Copy link
Collaborator Author

MansMeg commented Sep 22, 2023

I think that one is actually a problem with us grabbing the data. Here we use the alias that is incorrect. @BobBorges , right?

@salgo60
Copy link
Contributor

salgo60 commented Sep 22, 2023

All checked not all changed as I didnt see a problem...


Off topic I mentioned your project today as a pattern how other organizations should work with its metadata

image

@BobBorges
Copy link
Collaborator

Should be fixed now. If we find this as an issue again, we could write a unit test. Caused by trailing commas (removed on wikidata) and alias/i-ort in the format surname-iort, firstname. Fixed on wikidata.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants