-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Migrate fetchers to Search.g4 ANTLR parser. #13691
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
-Changed inherited fetcher classes to use ANTLR generated classes instead of lucene libraries. - Changed ACMPortalFetcher.java logic for transforming the parsed syntax to URL
jablib/src/main/java/org/jabref/logic/importer/fetcher/ACMPortalFetcher.java
Outdated
Show resolved
Hide resolved
Great start, to me it looks like your on the right track. |
- Changed ACMPortalFetcherTest unit test code to use Search.g4 generated classes instead of Lucene - Removed trivial comment from ACMPortalFetcher
- Changed AbstractQueryTransformer methods to obey Search.g4 parser rules - Modified ACMPortalFetcher to use the changed transformer logic
…d the search based fetcher classes to use it
…de while still being compatible with Search.g4 parser
queryNode = parser.parse(searchQuery, NO_EXPLICIT_FIELD); | ||
} catch (QueryNodeParseException e) { | ||
queryNode = visitor.visitStart(searchQueryObject.getContext()); | ||
} catch (Exception e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Catching a generic Exception is too broad and can mask specific issues that should be handled differently. More specific exception types should be caught.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. Be specific with your Exception handling. There might always be other exceptions caused by whatever reason that might not be FetcherExceptions in the end.
|
||
// There is only a single paper found by searching that contains the exact sequence "Taxonomy of Distributed" in the title. | ||
assertEquals(List.of(expected), resultWithPhraseSearch); | ||
// The first result should be the expected paper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment is trivial and can be derived from the code. It doesn't provide additional information about the reasoning or purpose of the test.
// Check the actual operator token | ||
if (ctx.bin_op.getType() == SearchParser.AND) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment provides no additional information beyond what is clearly visible in the code. The comment simply restates what the code does and should be removed.
|| operator == SearchParser.NCEEQUAL | ||
|| operator == SearchParser.NREQUAL | ||
|| operator == SearchParser.NCREEQUAL) { | ||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Public method should return Optional instead of null to better handle absence of value. This follows modern Java practices and prevents potential NullPointerExceptions.
@@ -53,7 +51,7 @@ class BvbFetcherTest { | |||
|
|||
@Test | |||
void performTest() throws FetcherException { | |||
String searchquery = "effective java author:bloch"; | |||
String searchquery = "effective java author=bloch"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable name contains a typo ('searchquery' instead of 'searchQuery'). Variable names should be correctly spelled to maintain code quality and readability.
@@ -71,7 +71,8 @@ public static Stream<Arguments> performSearch() { | |||
.withField(StandardField.LANGUAGE, "English") | |||
.withField(StandardField.KEYWORDS, "Religion, thema EDItEUR::Q Philosophy and Religion::QR Religion and beliefs::QRM Christianity::QRMF Christianity: sacred texts and revered writings::QRMF1 Bibles::QRMF13 New Testaments") | |||
.withField(StandardField.PUBLISHER, "Brill"), | |||
"Four Kingdom Motifs before and beyond the Book of Daniel" | |||
"\"Four Kingdom Motifs before and beyond the Book of Daniel\"" | |||
// The title needs to be in quotes in order for and to be parsed correctly here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment merely explains what is visible in the code (the quotes around the title) without providing additional information about why this is necessary for the parser implementation.
@@ -184,9 +184,9 @@ public Optional<HelpFile> getHelpPage() { | |||
} | |||
|
|||
@Override | |||
public URL getURLForQuery(QueryNode luceneQuery) throws URISyntaxException, MalformedURLException { | |||
public URL getURLForQuery(BaseQueryNode queryNode) throws URISyntaxException, MalformedURLException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Method lacks null check for queryNode parameter. According to modern Java practices and JabRef's standards, parameters should be validated using Objects.requireNonNull.
… new parser and node logic
…the new Search.g4 logic. Fixed how AbstractQueryTransformer handles fields.
@@ -46,14 +46,14 @@ public class LOBIDFetcher implements PagedSearchBasedParserFetcher, IdBasedParse | |||
/** | |||
* Gets the query URL | |||
* | |||
* @param luceneQuery the search query | |||
* @param queryNode the list that contains the parsed nodes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The JavaDoc parameter description is inaccurate and misleading. It describes queryNode as a list when it's actually a BaseQueryNode object representing a search query structure.
@@ -96,9 +96,9 @@ public URL getURLForEntry(BibEntry entry) throws URISyntaxException, MalformedUR | |||
} | |||
|
|||
@Override | |||
public URL getURLForQuery(QueryNode luceneQuery) throws URISyntaxException, MalformedURLException { | |||
public URL getURLForQuery(BaseQueryNode queryNode) throws URISyntaxException, MalformedURLException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Method lacks null check for queryNode parameter. According to modern Java practices and JSpecify guidelines, parameters should be validated or annotated appropriately.
return this.performSearchPaged(parser.parse(searchQuery, NO_EXPLICIT_FIELD), pageNumber); | ||
} catch (QueryNodeParseException e) { | ||
return this.performSearchPaged(visitor.visitStart(searchQueryObject.getContext()), pageNumber); | ||
} catch (Exception e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try to avoid general exceptions, be specific.
|
||
/** | ||
* Looks for hits which are matched by the given free-text query. | ||
* | ||
* @param searchQuery query string that can be parsed into a lucene query | ||
* @param searchQuery query string that can be parsed into a search.g4 query |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'g4' is just the default file extension for antlr4 files. just write search query.
queryNode = parser.parse(searchQuery, NO_EXPLICIT_FIELD); | ||
} catch (QueryNodeParseException e) { | ||
queryNode = visitor.visitStart(searchQueryObject.getContext()); | ||
} catch (Exception e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. Be specific with your Exception handling. There might always be other exceptions caused by whatever reason that might not be FetcherExceptions in the end.
throw new FetcherException("An error occurred during parsing of the query."); | ||
return this.performSearchPaged(visitor.visitStart(searchQueryObject.getContext()), pageNumber); | ||
} catch (ParseCancellationException e) { | ||
throw new FetcherException("A syntax error occurred during parsing of the query"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exception message does not end with a period, which violates the text formatting guidelines for consistent message formatting across the codebase.
URIBuilder uriBuilder = new URIBuilder("https://zbmath.org/bibtexoutput/"); | ||
uriBuilder.addParameter("q", new ZbMathQueryTransformer().transformLuceneQuery(luceneQuery).orElse("")); // search all fields | ||
uriBuilder.addParameter("q", new ZbMathQueryTransformer().transformSearchQuery(queryNode).orElse("")); // search all fields |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment is trivial and does not provide additional information beyond what is obvious from the code. The comment should either be removed or enhanced with meaningful explanation.
JUnit tests of You can then run these tests in IntelliJ to reproduce the failing tests locally. We offer a quick test running howto in the section Final build system checks in our setup guide. |
Closes #13607
TO DO:
Steps to test
Using the Search.g4 syntax for searching on the web with different options.
Mandatory checks
CHANGELOG.md
described in a way that is understandable for the average user (if change is visible to the user)