-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamically Bootstrap Named Analysed Fields for Searching and Boosting #69
Comments
Hi @ahagenbruch sounds really interesting. Many thanks for such detailed proposal. I have to read again your proposal and then investigate what kind of impacts it should have on the existing code. In the meantime a question: let's suppose we changed the schema in such way. What kind of queries are you issuing to SolRDF? I think, using plain SPARQL, you won't get any benefit from such schema. Do you want to use Solr built-in parsers and get results in SPARQL-results? Thanks again BTW: I created a user list on google. If you want feel free to join us. We could discuss about this thing also with other (few at the moment) users. |
@ahagenbruch I'm moving the discussion back here as these are concrete implementation details. Two doubts: Field nameYou said, in your proposal:
What about the prefix? In your schema example we have a skos:notation and ok, skos is a widely used / standard namespace. But what about custom namespaces? It doesn't sound good to index something like:
because "pippo" could be known only at index time; at query time you couldn't be aware about prefixes I previously used in indexing or, you could use the same namespace mapped with a different prefix (e.g. pluto:mynote at query time and pippo:mynote at index time, where pippo and pluto points to the same namespace URI) Multivalued fieldsYou said
Why? Each triple (i.e. each document) will have exactly one value for the object field, regardless the schema we will use. Am I missing something about your proposal? |
Am 18.05.15 um 15:11 schrieb Andrea Gazzarini: Hi Andrea,
I see your point, but I had these two use cases in mind when I wrote the
By document I mean the subject URI as the document ID, the predicates as
<thsys/72180> Cheers, Andre |
Hi @agazzarini,
the current schema in SolRDF is mostly focused on the use case as a SPARQL endpoint, i.e. its object literals are being indexed into unanalysed string fields. To accomodate a more common use case where we also want to be able to do analysed field searching and per field boosting we could write object literals into named fields derived from the QNames. As Solr provides the mechanism of dynamic fields we propose the following enhancement:
Transform the QName and optional datatype and language information into a field name of the following structure:
Use abstract heuristics to provide a basic search schema. This can be adapted to the actual requirements of the dataset. We make the genral assumption that all fields can have multiple values:
Map untyped and language less literals to text_general:
<dynamicField name="*_xsd_string" type="text_general" indexed="true" stored="true" multiValued="true"/>
Map literals with language information to corresponding language text fields:
<dynamicField name="*_xsd_string_de" type="text_de" indexed="true" stored="true" multiValued="true"/>
...
Map typed literals with datatypes to corresponding fields:
xsd:integer =>
<dynamicField name="*_xsd_integer" type="tint" indexed="true" stored="true" multiValued="true"/>
xsd:nonPositiveInteger =>
<dynamicField name="*_xsd_nonPositiveInteger" type="tint" indexed="true" stored="true" multiValued="true"/>
xsd:NegativeInteger =>
<dynamicField name="*_xsd_negativeInteger" type="tint" indexed="true" stored="true" multiValued="true"/>
xsd:long =>
<dynamicField name="*_xsd_long" type="tlong" indexed="true" stored="true" multiValued="true"/>
xsd:unsignedLong =>
<dynamicField name="*_xsd_unsignedLong" type="tlong" indexed="true" stored="true" multiValued="true"/>
xsd:int =>
<dynamicField name="*_xsd_int" type="tint" indexed="true" stored="true" multiValued="true"/>
xsd:unsignedInt =>
<dynamicField name="*_xsd_unsignedInt" type="tint" indexed="true" stored="true" multiValued="true"/>
xsd:short =>
<dynamicField name="*_xsd_short" type="tint" indexed="true" stored="true" multiValued="true"/>
xsd:unsignedShort =>
<dynamicField name="*_xsd_unsignedShort" type="tint" indexed="true" stored="true" multiValued="true"/>
xsd:byte =>
<dynamicField name="*_xsd_byte" type="tint" indexed="true" stored="true" multiValued="true"/>
xsd:unsignedByte =>
<dynamicField name="*_xsd_unsignedByte" type="tint" indexed="true" stored="true" multiValued="true"/>
xsd:nonNegativeInteger =>
<dynamicField name="*_xsd_nonNegativeInteger" type="tint" indexed="true" stored="true" multiValued="true"/>
xsd:positiveInteger =>
<dynamicField name="*_xsd_positiveInteger" type="tint" indexed="true" stored="true" multiValued="true"/>
xsd:float =>
<dynamicField name="*_xsd_float" type="tfloat" indexed="true" stored="true" multiValued="true"/>
xsd:decimal =>
<dynamicField name="*_xsd_decimal" type="tfloat" indexed="true" stored="true" multiValued="true"/>
xsd:double =>
<dynamicField name="*_xsd_double" type="tdouble" indexed="true" stored="true" multiValued="true"/>
xsd:boolean =>
<dynamicField name="*_xsd_boolean" type="boolean" indexed="true" stored="true" multiValued="true"/>
xsd:string =>
<dynamicField name="*_xsd_string" type="text_general" indexed="true" stored="true" multiValued="true"/>
xsd:hexBinary =>
<dynamicField name="*_xsd_hexBinary" type="string" indexed="true" stored="true" multiValued="true"/>
xsd:base64Binary =>
<dynamicField name="*_xsd_base64Binary" type="binary" indexed="true" stored="true" multiValued="true"/>
xsd:anyURI =>
<dynamicField name="*_xsd_anyURI" type="string" indexed="true" stored="true" multiValued="true"/>
xsd:QName =>
<dynamicField name="*_xsd_QName" type="string" indexed="true" stored="true" multiValued="true"/>
xsd:NOTATION =>
<dynamicField name="*_xsd_NOTATION" type="string" indexed="true" stored="true" multiValued="true"/>
xsd:normalizedString =>
<dynamicField name="*_xsd_normalizedString" type="text_general" indexed="true" stored="true" multiValued="true"/>
xsd:token =>
<dynamicField name="*_xsd_token" type="text_general" indexed="true" stored="true" multiValued="true"/>
xsd:language =>
<dynamicField name="*_xsd_language" type="string" indexed="true" stored="true" multiValued="true"/>
xsd:IDREFS =>
<dynamicField name="*_xsd_IDREFS" type="string" indexed="true" stored="true" multiValued="true"/>
xsd:IDREF =>
<dynamicField name="*_xsd_IDREF" type="string" indexed="true" stored="true" multiValued="true"/>
xsd:ENTITIES =>
<dynamicField name="*_xsd_ENTITIES" type="string" indexed="true" stored="true" multiValued="true"/>
xsd:ENTITY =>
<dynamicField name="*_xsd_ENTITY" type="string" indexed="true" stored="true" multiValued="true"/>
xsd:NMTOKENS =>
<dynamicField name="*_xsd_NMTOKENS" type="string" indexed="true" stored="true" multiValued="true"/>
xsd:Name =>
<dynamicField name="*_xsd_Name" type="string" indexed="true" stored="true" multiValued="true"/>
xsd:NCName =>
<dynamicField name="*_xsd_NCName" type="string" indexed="true" stored="true" multiValued="true"/>
xsd:ID =>
<dynamicField name="*_xsd_ID" type="string" indexed="true" stored="true" multiValued="true"/>
Map date and dateTime types to a date field and supplement the missing values (e.g. "2015" => "2015-01-01T00:00:00Z"):
xsd:date =>
<dynamicField name="*_xsd_date" type="tdate" indexed="true" stored="true" multiValued="true"/>
Map duration to a string field:
xsd:duration =>
<dynamicField name="*_xsd_duration" type="string" indexed="true" stored="true" multiValued="true"/>
Map Gregorian date fields to a string field:
xsd:gYearMonth =>
<dynamicField name="*_xsd_gYearMonth" type="string" indexed="true" stored="true" multiValued="true"/>
xsd:gYear =>
<dynamicField name="*_xsd_gYear" type="string" indexed="true" stored="true" multiValued="true"/>
xsd:gMonthDay =>
<dynamicField name="*_xsd_gMonthDay" type="string" indexed="true" stored="true" multiValued="true"/>
xsd:gDay =>
<dynamicField name="*_xsd_gDay" type="string" indexed="true" stored="true" multiValued="true"/>
xsd:gMonth =>
<dynamicField name="*_xsd_gMonth" type="string" indexed="true" stored="true" multiValued="true"/>
The text was updated successfully, but these errors were encountered: