Rank / Classify Sources #34

salgo60 · 2024-10-30T09:04:43Z

see Wikidata_talk:WikiProject_Reference_Verification

I stated in 2019 that we need to rank sources see T222142 Wikidata has now been used a lot of a research project "Riksdagens Corpus" ( @BobBorges ) and we agree that a sources like Svenskt Biografiskt Lexikon-ID (P3217) / Svenskt biografiskt lexikon (Q379406) / Tvåkammar-riksdagen 1867–1970 (Q110346241) are very good sources, they are just textstrings so to use them in Wikidata its some manual work see issue #78

My suggestion: add a ranking value for sources so more people can agree and understand that e.g. Svenskt Biografiskt Lexikon-ID (P3217) is high quality and have a quality process I think there was some measurement for prizes i.e. that getting the Nobelpriset (Q7191) is ranked higher than getting a prize xxx see my thoughts 2019 that prizes could be a way of evaluating research in different countries... "T216409 Nobelprize as part of evaluating research in different countries"

Maybe we can have dashboards how different research projects support PROV and use quality sources to motivate research to move faster in the right direction....

see problem we have with occupations that is often just text strings and not HISCO codes

salgo60 · 2024-10-30T09:28:00Z

Denny Vrandečić about his vision of sources

BobBorges · 2024-10-30T11:15:08Z

It would be really good to rank sources if objective criteria could be applied to the ranking.

salgo60 · 2024-11-01T06:45:25Z

@BobBorges listen to Denny above he tells that en:Wikipedia rank sources. Guess it would be better if the ranking is Done by your project and SBL….

I use Wikidara rank feature and mark wrong facts by e.g. bad precision or not States in the birth record…. —> in the long run we get a rather good quality measurement. I like the way your project test your data against external “sources” like Wikidata but miss that I don’t see SBL in a metadata roundtrip echosystem….

Using Wikidata for handling contradicting sources

SPARQL
- example Q53294 - place of birth not confirmed by birth records

albertmeronyo · 2024-11-29T09:29:54Z

Thanks @salgo60 and @BobBorges for the insightful discussion, quality of references is something we deeply care about.

Let me first just say that ProVe is based on research [1] that takes quality of sources into account, by comparing the degree to which the textual content of external references supports the verbalisation of Wikidata triples. We only take that as a basis to build a tool (the one in this repo) that could be of use to Wikidata editors. The output classifies sources into several types/boxes/colours which goes exactly into the direction Denny is pointing at.

That said, I tend to agree with @BobBorges that objective criteria here are a challenging issue. We would be really keen on compiling different 'feelings' and approaches to quality of sources under various perspectives, perhaps by building a dataset that we can use to improve the model behind ProVe.

[1] Amaral, G., Rodrigues, O. and Simperl, E., 2022. ProVe: A pipeline for automated provenance verification of knowledge graphs against textual sources. Semantic Web, (Preprint), pp.1-34.

salgo60 · 2024-11-29T09:34:26Z

Thanks @albertmeronyo

I recommend delving into the "architecture" behind Wikidata and Denny Vrandečić's vision, particularly on the types of research projects that can be undertaken regarding sources video when I pointed out that we need facts with sources and also metadata if we can trust a source.

——

I tend to agree with @BobBorges that objective criteria here are a challenging issue.

I believe one key takeaway from @BobBorges’ project is that:

Each project defines its own trust criteria.
1. This aligns with the original vision of Linked Data as envisioned by Tim Berners-Lee
2. Wikidata appears to lack flexibility in allowing users to explicitly define their own trusted sources, relying instead on a more general approach. All users contribute to editing all objects, requiring collaboration and consensus on what is deemed most trustworthy, while also supporting over 200 languages. This seems like an unsolvable equation; however, the lesson learned is that this approach does bring some value, even though it is far from perfect. Wikidata also remains vulnerable to vandalism or poorly executed edits, even when made with good intentions, which highlights its fragility in maintaining data integrity and should just be a POC for research data...
  1. Wikidata support of handling contradicting sources is something we need to see in research datasets
Over time, you develop a deeper understanding of the quality of the sources you rely on, which naturally shapes your level of trust in them
Research projects, however, often lack a "generic" data model that incorporates PROV, which I see as a sign of immaturity in producing reliable, trusted data. This also reflects a missed opportunity to adopt a data-driven approach with the goal of generating high-quality data that can be effectively reused by other research projects.
1. I also lack a clear understanding of the importance of 5-star data and its role in ensuring high-quality, reusable information. I hope lessons learned from using data from Wikidata might inspire adopting a similar approach—leveraging references for facts, effectively managing contradictory information, and assigning persistent identifiers (PIDs) to all sources. These practices could showcase their benefits, and the ability to easily retrieve data using SPARQL could become a best practice for future research projects, enhancing both transparency and reusability.

Looking ahead, as data-driven research becomes more prominent and metadata round-tripping improves, it will become increasingly important to explicitly define the trustworthiness and quality of datasets.

Example of research project using Wikibase

Factgrid - a project of the Gotha Research Centre operated by the computer lab of the Thuringian State and University Library (ThULB) in Jena
* see example connecting Research projects to facts - P131
* FactGrid:The Gotha Illuminati Research Base Jump to search
* see my try 2020 to connect 4200 people in FactGrid from research done by Andreas Önnerfors at Göteborgs University with WD

salgo60 changed the title ~~Rank (Classify Sources~~ Rank / Classify Sources Oct 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rank / Classify Sources #34

Rank / Classify Sources #34

salgo60 commented Oct 30, 2024

salgo60 commented Oct 30, 2024

BobBorges commented Oct 30, 2024

salgo60 commented Nov 1, 2024 •

edited

Loading

albertmeronyo commented Nov 29, 2024

salgo60 commented Nov 29, 2024 •

edited

Loading

Rank / Classify Sources #34

Rank / Classify Sources #34

Comments

salgo60 commented Oct 30, 2024

salgo60 commented Oct 30, 2024

BobBorges commented Oct 30, 2024

salgo60 commented Nov 1, 2024 • edited Loading

Using Wikidata for handling contradicting sources

albertmeronyo commented Nov 29, 2024

salgo60 commented Nov 29, 2024 • edited Loading

Example of research project using Wikibase

salgo60 commented Nov 1, 2024 •

edited

Loading

salgo60 commented Nov 29, 2024 •

edited

Loading