feat: search bar #190

kunfang98927 · 2025-04-12T12:59:38Z

Implement the search bar using Django ORM search.

Resolves: #186

ref: #186

Copilot

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

dchiller · 2025-04-12T20:22:24Z

Do we not want to use solr for this?

fujinaga · 2025-04-13T01:23:03Z

Yes, I would like to use Solr eventually.

ref: #186

ahankinson · 2025-04-14T07:11:05Z

I hope it's ok if I comment here....

The way you're indexing data by concatenating it into a single field with a pipe delimiter, and then doing a regex search on that string, is really not the way you're supposed to do it with Solr.

Solr has multi-valued fields. So if you have a list of strings, you should pass it in as a Python list. I note that you're not using a Solr client -- pysolr is an easy-to-use one. See the "solr.add" example in the docs for Pysolr: https://pypi.org/project/pysolr/
For actually querying you should define a default text field, and, using the copyField directives in Solr, automatically copy the values from your string fields into a single text field. This text field (text fields are different from string fields) lets you definte multiple processing steps for the text, and lets you copy values from different fields into a single field, meaning your keyword queries are run against this default field.

You already seem to have this in place: https://github.com/DDMAL/VIM/blob/develop/solr/cores/conf/schema.xml#L58C5-L68C17

However, you're missing a text field, even though your query handlers are looking for one in your df (default field -- the field that is used by default to run a query against). So I'm not entirely sure how your Solr installation is working.

To fix this, you need to define a text field in your schema, make it a text_general field type, then use a copyField to copy any values from your string fields into this single field. Then your q queries will work.

There's no requirement for this, but the query parameter for a text query is, by convention, q... so where you have "example.org/search?search_query=blahblah", it could be shortened to "example.org/search?q=blahblah".
Your query process uses Solr to look up a list of Instrument IDs from Solr, and then does a lookup in the database. This is absolutely not scalable. What you should do is store enough in Solr to do the querying and to show a brief search result in a list of results. Then, when the user clicks the result you can render the page from a database lookup. You can also process and store any data as a solr document, so you don't need to do things like strip off the instrument- string to get the ID -- just store the ID directly.
Once you get that up and running, I strongly encourage you to figure out the ICU field types for tokenization, since you're going to be dealing with all sorts of non-ASCII text, and these processors make it possible to do queries against this text. For example, if your text contains a "U" with an umlaut ("Ü"), then without "character folding", users querying for words like "über" and "uber" won't match. The I18N field processors do things like character folding, which is that letters lik "ü" get automatically converted to "u" at index and query time, making matches possible. It also handles ligatures like œ and Æ.

kunfang98927 · 2025-04-14T07:19:19Z

I hope it's ok if I comment here....

The way you're indexing data by concatenating it into a single field with a pipe delimiter, and then doing a regex search on that string, is really not the way you're supposed to do it with Solr.

Solr has multi-valued fields. So if you have a list of strings, you should pass it in as a Python list. I note that you're not using a Solr client -- pysolr is an easy-to-use one. See the "solr.add" example in the docs for Pysolr: https://pypi.org/project/pysolr/

For actually querying you should define a default text field, and, using the copyField directives in Solr, automatically copy the values from your string fields into a single text field. This text field (text fields are different from string fields) lets you definte multiple processing steps for the text, and lets you copy values from different fields into a single field, meaning your keyword queries are run against this default field.

You already seem to have this in place: https://github.com/DDMAL/VIM/blob/develop/solr/cores/conf/schema.xml#L58C5-L68C17

However, you're missing a text field, even though your query handlers are looking for one in your df (default field -- the field that is used by default to run a query against). So I'm not entirely sure how your Solr installation is working.

To fix this, you need to define a text field in your schema, make it a text_general field type, then use a copyField to copy any values from your string fields into this single field. Then your q queries will work.

There's no requirement for this, but the query parameter for a text query is, by convention, q... so where you have "example.org/search?search_query=blahblah", it could be shortened to "example.org/search?q=blahblah".

Your query process uses Solr to look up a list of Instrument IDs from Solr, and then does a lookup in the database. This is absolutely not scalable. What you should do is store enough in Solr to do the querying and to show a brief search result in a list of results. Then, when the user clicks the result you can render the page from a database lookup. You can also process and store any data as a solr document, so you don't need to do things like strip off the instrument- string to get the ID -- just store the ID directly.

Once you get that up and running, I strongly encourage you to figure out the ICU field types for tokenization, since you're going to be dealing with all sorts of non-ASCII text, and these processors make it possible to do queries against this text. For example, if your text contains a "U" with an umlaut ("Ü"), then without "character folding", users querying for words like "über" and "uber" won't match. The I18N field processors do things like character folding, which is that letters lik "ü" get automatically converted to "u" at index and query time, making matches possible. It also handles ligatures like œ and Æ.

Thank you very much for the comments! I’ll make the changes based on your comments.

feat: search bar

1574614

ref: #186

kunfang98927 requested review from dchiller and Copilot April 12, 2025 12:59

Copilot AI reviewed Apr 12, 2025

View reviewed changes

feat: solr query for instruments

94d521f

ref: #186

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: search bar #190

feat: search bar #190

kunfang98927 commented Apr 12, 2025

Copilot AI left a comment

dchiller commented Apr 12, 2025

fujinaga commented Apr 13, 2025

ahankinson commented Apr 14, 2025

kunfang98927 commented Apr 14, 2025

feat: search bar #190

Are you sure you want to change the base?

feat: search bar #190

Conversation

kunfang98927 commented Apr 12, 2025

Copilot AI left a comment

Choose a reason for hiding this comment

dchiller commented Apr 12, 2025

fujinaga commented Apr 13, 2025

ahankinson commented Apr 14, 2025

kunfang98927 commented Apr 14, 2025