-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider standardizing on pre-conversion to lower case strings where possible #29
Comments
This has blossomed into a bigger issue, and I have lots more thoughts about it. Firstly, apart from the side-note about quirks mode in browsers, there is another consideration: In XHTML, the element tag names are case-sensitive, but not in HTML. This is causing many failures in our test-suite. Secondly, I have a better idea than storing differently-cased copies of strings. I have tried a quick POC and it gives a huge performance boost, but it will require a big change in API. The idea is to enrich the w3c interfaces to add more methods that are relevant to jStyleParser. Imagine an interface like this: interface EnrichedElement extends org.w3c.dom.Element {
public boolean matchTagName(String name);
} Throughout the jStyleParser code, the The same concept would be applied to attributes, nodes, documents, etc. If this is too big a change, then a workaround is to use the standard |
I have been thinking about this a bit and it seems to me that we need different matching strategies depending on the used DOM implementation and the document being processed. I would like to preserve the possibility of processing a DOM coming from any source (parser). The idea of Currently, most of the related code is concentrated in the static ElementUtil class. So would it make sense to make this class configurable? E.g. implement something like This would allow to provide an optimized implementation of the matching routines according to the used DOM source. We could provide a standard (bulletproof) DOM support (both case-sensitive and case-insensitive) as well as a simplified version assuming lowercase names and even the version with the What's your opinion? |
Yes, this sounds like a good solution. 👍 The drawback is that it would require typecasts in the enriched matcher's implementation, but the typecasts don't need an instanceof check and the cost for typecasts is much less than Let's go for it. |
I have pushed a first implementation to the matching branch. I have also implemented a few different matchers (case sensitive and insensitive). It's prepared for further performance experiments. |
@radkovo Thanks; it looks awesome. I am trying to benchmark the branch right now. Some quick thoughts:
|
One more thought:
|
Ok, I couldn't find any reason why
I agree. However, this requires to pass the matcher to the selectors when they are being matched. I have added the same mechanism that is used for Actually the
You're right. On the other hand, since there are typically a few classes (up to 4) specified for the element, I mean the performance gain wouldn't be even measurable. And I find the |
The method
String.toLowerCase
is used at multiple places in the code, and gets called a lot of times.Sometimes, these calls are avoidable. For example, when a selector stores a class name, it can convert the class name to lower-case when the string is stored, rather than when the string is being matched.
Caveats
We have to be careful in this conversion, because the compiler can't check whether we are doing the right thing. In a way, the current implementation is more robust because it always calls
.toLowerCase()
and the logic is easy to understand.One way out of this could be to follow a naming convention. Whenever a string is stored as lower-case, the field name can indicate that. For example, instead of defining a field called
className
we can define it asclassNameLC
. (LC == Lower Case)Side-note
The attributes in DOM specs are case-sensitive, while the selector names in CSS spec are insensitive in quirks mode. In fully-standard compliant mode, the selectors in CSS are supposed to be case-sensitive.
It might be possible to take a stand and make the entire library fully-standards compliant, and thus avoid the conversions to lower case entirely.
Side-note 2
There is a new selector in Selector Level 4 Spec, which explicitly defines a case-sensitive selector (even if the User agent is in quirks mode).
The text was updated successfully, but these errors were encountered: