Blocked records #1159
Replies: 2 comments
-
if the answer is that all records should get at least 1 block, then i think something is cutting the fingerprinting short.. I'm currently looking into that.. maybe the full_data set is causing something to crash when using that generator. |
Beta Was this translation helpful? Give feedback.
-
my latest understanding (which may be incorrect), is that recall needs to be 1, for every record to be included in a block.. is that correct? I was getting about 70k out of the 90k records showing up in the entity map.. If I understand the mapping, a record that matches to no other record won't show up in the entity map at all... as well as any that weren't even included in the matching due to recall not being 1. I'm likely over simplifying, but would love to know if I'm wrong on these things or not. |
Beta Was this translation helpful? Give feedback.
-
Likely a simple question, but I am not too clear on if I should expect every source record to at least have 1 block record for every field variable I define.
In other words, the blocking table should have at least 1 record in it for every record in the source? There would normally be many blocking records for each source record due to the various predicates / variables... But I had thought I should expect AT LEAST one record per source?
This isn't holding true, so I'm assuming I'm missing something. There are many source records that do not appear in my blocking table at all, and so are not being considered for matches at all I assume. How can I ensure that every record is included in the blocks? Or am I off, and even records that are not in the blocks, get considered for matches..
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions