-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhancement of the intialization of the SKOS index #15
base: master
Are you sure you want to change the base?
Conversation
…ed, in order to make the initialization of the plugin more faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like your idea but you need to address some issues before I can merge this.
@@ -114,6 +137,7 @@ public boolean needsScores() { | |||
private static final String FIELD_BROADER_TRANSITIVE = "broaderTransitive"; | |||
private static final String FIELD_NARROWER_TRANSITIVE = "narrowerTransitive"; | |||
private static final String FIELD_RELATED = "related"; | |||
private static final String PRINT_EXT = ".hash"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think PRINT_EXT should reflect the hash used. i.e.: .sha1sum
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
import java.io.BufferedReader; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are many differences here triggered by some kind of formatter.
MessageDigest crypt = MessageDigest.getInstance("SHA-1"); | ||
crypt.reset(); | ||
while (reader.read(cbuf) > 0) | ||
crypt.update(new String(cbuf).getBytes(StandardCharsets.ISO_8859_1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a valid reason to work with strings instead of bytes here. I don't think you need to mess with charsets. Just calculate the hash of the whole file by reading in bytes. i.e.: the sha1sum tool should get the same hash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, you're right. The is no valid reason. I wanted to make a patch as fast as possible for my users who are blocked with the issue. So, I develop it very quicly. I think the code could be improve. Il will think about it and I will propose you improvement suggestions.
@@ -227,24 +277,51 @@ public SKOSEngineImpl(InputStream inputStream, String format, List<String> langu | |||
searcher = new IndexSearcher(DirectoryReader.open(indexDir)); | |||
} | |||
|
|||
private static String getHash(Reader reader) throws IOException, NoSuchAlgorithmException |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest using an InputStream as param for this function. Then you can pass that InputStream and the MessageDigest object into DigestInputStream to calculate the hash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I understand your remark. But, it is usefull for a private method?
+ "{ ?subject skos:altLabel ?text } UNION\n" | ||
+ "{ ?subject skos:hiddenLabel ?text }\n" | ||
+ "}"; | ||
String sparqlQuery = "PREFIX skos: <http://www.w3.org/2004/02/skos/core#>\n" + "PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>\n" + "INSERT { ?subject rdf:type skos:Concept }\n" + "WHERE {\n" + "{ ?subject skos:prefLabel ?text } UNION\n" + "{ ?subject skos:altLabel ?text } UNION\n" + "{ ?subject skos:hiddenLabel ?text }\n" + "}"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These whitespace/formatting changes need to be gone before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I will see If I can fix it.
The SKOS Lucene index is rebuilt only if the input thesaurus file has changed in order to make the initialization of the plugin more faster.