Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement of the intialization of the SKOS index #15

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

gquaire
Copy link
Contributor

@gquaire gquaire commented Dec 13, 2016

The SKOS Lucene index is rebuilt only if the input thesaurus file has changed in order to make the initialization of the plugin more faster.

…ed, in order to make the initialization of the plugin more faster.
Copy link
Collaborator

@flaviomartins flaviomartins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like your idea but you need to address some issues before I can merge this.

@@ -114,6 +137,7 @@ public boolean needsScores() {
private static final String FIELD_BROADER_TRANSITIVE = "broaderTransitive";
private static final String FIELD_NARROWER_TRANSITIVE = "narrowerTransitive";
private static final String FIELD_RELATED = "related";
private static final String PRINT_EXT = ".hash";
Copy link
Collaborator

@flaviomartins flaviomartins Dec 15, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think PRINT_EXT should reflect the hash used. i.e.: .sha1sum

* See the License for the specific language governing permissions and
* limitations under the License.
*/
import java.io.BufferedReader;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many differences here triggered by some kind of formatter.

MessageDigest crypt = MessageDigest.getInstance("SHA-1");
crypt.reset();
while (reader.read(cbuf) > 0)
crypt.update(new String(cbuf).getBytes(StandardCharsets.ISO_8859_1));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a valid reason to work with strings instead of bytes here. I don't think you need to mess with charsets. Just calculate the hash of the whole file by reading in bytes. i.e.: the sha1sum tool should get the same hash.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, you're right. The is no valid reason. I wanted to make a patch as fast as possible for my users who are blocked with the issue. So, I develop it very quicly. I think the code could be improve. Il will think about it and I will propose you improvement suggestions.

@@ -227,24 +277,51 @@ public SKOSEngineImpl(InputStream inputStream, String format, List<String> langu
searcher = new IndexSearcher(DirectoryReader.open(indexDir));
}

private static String getHash(Reader reader) throws IOException, NoSuchAlgorithmException
Copy link
Collaborator

@flaviomartins flaviomartins Dec 15, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest using an InputStream as param for this function. Then you can pass that InputStream and the MessageDigest object into DigestInputStream to calculate the hash.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I understand your remark. But, it is usefull for a private method?

+ "{ ?subject skos:altLabel ?text } UNION\n"
+ "{ ?subject skos:hiddenLabel ?text }\n"
+ "}";
String sparqlQuery = "PREFIX skos: <http://www.w3.org/2004/02/skos/core#>\n" + "PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>\n" + "INSERT { ?subject rdf:type skos:Concept }\n" + "WHERE {\n" + "{ ?subject skos:prefLabel ?text } UNION\n" + "{ ?subject skos:altLabel ?text } UNION\n" + "{ ?subject skos:hiddenLabel ?text }\n" + "}";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These whitespace/formatting changes need to be gone before merging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will see If I can fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants