Add a search box to the Sample Problems page and a script to build a search file #2733

pstaabp · 2025-05-27T13:36:01Z

This PR has two related features

A script in bin/dev_scripts to parse all of the sample problems and the POD to build up a "database" (JSON file) of non-common words for a given problem/POD page. The JSON file contains information to be searched on within the browser using the package MiniSearch.
Add search boxes to the sample-problems/POD pages to return pages that match the search criteria. The searching is done using MiniSearch.

There is a file htdocs/DATA/search.json that is provided that will be used while searching. This is not complete but currently should give some sense about how this works. The script will rebuild this file as needed.

Note: a subsequent PR will be posted to update the POD in the macros to give the search tools needed terms.

pstaabp · 2025-06-10T15:31:54Z

Updated the script to parse the POD in macro files to

parse the =head1 NAME section to extract the filename and short description of the macro. (Note: Restructure the POD of macros pg#1244 has an update on all macros to fix formats for this script to better parse).
parse the =head2 blocks where method and function names have been included. This will allow to search for the documentation of a particular function name.

dlglin · 2025-06-17T21:23:41Z

When I start typing I get suggestions, but if I type more after that then the suggestions disappear. Then if I type even more they reappear.

pstaabp · 2025-06-21T16:57:14Z

Added an update-ww.pl script that will

run OPL-update without the interactive loadOPLstats part of the script
run build-search-json.pl to generate the search-json for the POD/sample probs.

Note: this will need to be added to the upgrade instructions.

So now to test, run update-ww.pl. To skip the OPL-update use the -s flag. Then test the search box on the sample problems page.

Alex-Jordan · 2025-06-25T06:22:25Z

I'm getting a console error Uncaught ReferenceError: MiniSearch is not defined.

pstaabp · 2025-06-25T09:53:09Z

did you npm ci ?

Alex-Jordan · 2025-06-25T17:15:34Z

Yes, I've run npm ci and restarted webwork2. Looking at the files changed here, I don't see anything that would have affected the npm packages though. The only references to MiniSearch are where it is used.

drgrice1 · 2025-06-25T17:19:06Z

I am also getting the MiniSearch is not defined error.

Alex-Jordan · 2025-06-25T17:19:32Z

There was a change to package-lock.json here before Danny did testing. But subsequent force pushes don't have anything changed for that file.

I'm testing by checking out a copy of the WeBWorK-2.20 branch, and then merging this branch into that one. I think that is the most accurate way to test things. Maybe the behavior would be different if I just literally checked out this branch...

drgrice1

It seems that you need to run npm install minisearch, and then add the changed package.json and package-lock.json files to this pull request.

I haven't delved into this in much depth yet. I can see that this pull request clearly needs a lot of work before it is really ready for review.

drgrice1 · 2025-06-26T01:51:40Z

bin/build-search-json.pl

+                         Default value is tutorial/samples-problems
+   -b|--build            One of (all, macros, samples) to determine if the macros, sample
+                         problems or both should be scraped for data.
+   -v|--verbose          Setting this flag provides details as the script runs.


This script needs help. I mean that literally. Any script with options should have a -h|--help flag that shows how to use the script.

bin/build-search-json.pl

bin/download-OPL-metadata-release.pl

drgrice1 · 2025-06-26T02:11:38Z

bin/update-ww.pl

+
+=head1 NAME
+
+update-ww.pl -- run all needed scripts when webwork is updated.


Don't add this script. The OPL-update script is already a script that just runs other scripts. Now you are going to add another one that runs that script and one more? Find another way to do it.

In addition the options for this script are not well explained, and there is no -h help flag.

@drgrice1 You had suggested that I add this. I thought the idea was that this script would run anything that we needed when WeBWorK was upgraded. Are you suggesting in the OPL-update to include the building of the search JSON?

Yes, that is what I actually meant with the suggestion. That the OPL-update includes running this script. Not the creation of a new script. It is a bit of a stretch that this goes under the description of "updating the OPL", but the point is that everything be handled by a single script. However, that is minor, and the point is to not need to change the instructions for installation/upgrade.

Although, I have also though about just removing the need for this new script entirely. Have you actually benchmarked the script? How bad would performance be if the files were dynamically combed at runtime, and the search actually done live? If it isn't bad, or with some tweaking could be done efficiently at runtime, then that is the ideal way to go.

Another possibility is to effectively run the script the first time the server starts if the JSON file does not already exist. Perhaps with a job queue task that handles this.

Running it seems to take 5-10 seconds which would be pretty bad to run on the page load.

I'll try to see if we can run it at server start and hand off to the job queue. That sounds like a good idea.

Have you considered that the reason the script takes 5-10 seconds to run is that your code is highly inefficient? And that improvements could be made to make it run in a fraction of the time? I know this to be true in fact, and I made a small change that makes it so that running the script takes around 4 tenths of a second. I am certain better can be done with more effort.

Have you also actually looked at the result of running the script and the words that are being listed in the JSON file? That clearly needs more thought. I am pretty sure that we don't want the word "exppi" which is listed (for example from the ProvingTrigIdentities.pg sample problem) because "exp(pi" appears in a file and the script removes the parenthesis.

bin/update-ww.pl

drgrice1 · 2025-06-27T01:48:25Z

@pstaabp: I have been working on this, and I have quite a bit more to do. I wanted to let you know, so that you don't work on this and conflict with what I am doing.

This PR has two related features 1. A script in `dev_scripts` to parse all of the sample problems and the POD to build up a "database" of non-common words for a given problem/POD page. 2. Add search boxes to the sample-problems/POD pages to return pages that match the search criteria.

Also add ww-update script to call OPL-update and build-search-json.pl

to it's previous state and now run the parse-search-json.pl script upon startup of webwork2. Restore the minisearch package.

pstaabp · 2025-06-27T11:53:03Z

I just noticed your comment and just pushed. I can revert to the previous commit if needed.

pstaabp · 2025-06-27T11:56:33Z

If you put in a PR to this branch, I'll deal with the conflicts. I can made it run in the webwork2 script and probably is unnecessary and then can remove that.

drgrice1 · 2025-06-27T12:20:26Z

Most likely I am just going to take this over and put in a completely new pull request. I am sorry, but there is a lot wrong with this pull request, and almost everything needs to be completely rewritten. There aren't going to be any scripts. Everything is going to be done efficiently and dynamically directly from the code.

pstaabp · 2025-06-27T18:11:46Z

That's fine @drgrice1

This is a rework of openwebwork#2733. This does not add any scripts that need to be run when webwork is installed or upgraded. Instead the search data is dynamically generated. The first time that the sample problems home page is loaded all PG macros and sample problems are parsed and the search data extracted. Then the data is saved to the `DATA/sample-problem-search-data.json` file. This takes less than half a second on my computer, but that will probably vary some depending on the server's processing capabilities. The last modified time of each file is also saved in that search data. On subsequent reuqests the last modified time is checked, and a file is only parsed again if it has been modified since the last time the page was loaded. If a file is deleted from PG, the data is removed from the file. So even in development the data will always reflect the current state of the PG repository on the server. Subsequent requests when no changes to PG occur usually complete in around 1/25th of a second. A change of a single file increases that time only slightly. Note that the search data is accessed by JavaScript via the new `/sampleproblems/search_data` route. The layout of the sample problems home page is changed considerably from openwebwork#2733. The layout now works on narrow screens and some accessibility and html validation issues were addressed. There is also quite a bit of improvement in the search data that is saved. There is no attempt to single out methods or functions for macros from the POD headers. Unfortunately, that was never really going to work. Instead the words from all headers (that are not stop words) are used. So the methods or functions that are in a POD header will be in the search data, but other things will also be there. There are some custom stop words added that are not desirable in the search data, like "description", "synopsis", "podlink", and "problink". Also, macros that don't have POD in them are not indexed at all since there isn't anything to show for these files.

This is a rework of openwebwork#2733. This does not add any scripts that need to be run when webwork is installed or upgraded. Instead the search data is dynamically generated. The first time that the sample problems home page is loaded all PG macros and sample problems are parsed and the search data extracted. Then the data is saved to the `DATA/sample-problem-search-data.json` file. This takes less than half a second on my computer, but that will probably vary some depending on the server's processing capabilities. The last modified time of each file is also saved in that search data. On subsequent requests the last modified time is checked, and a file is only parsed again if it has been modified since the last time the page was loaded. If a file is deleted from PG, the data is removed from the file. So even in development the data will always reflect the current state of the PG repository on the server. Subsequent requests when no changes to PG occur usually complete in around 1/25th of a second. A change of a single file increases that time only slightly. Note that the search data is accessed by JavaScript via the new `/sampleproblems/search_data` route. The layout of the sample problems home page is changed considerably from openwebwork#2733. The layout now works on narrow screens and some accessibility and html validation issues were addressed. There is also quite a bit of improvement in the search data that is saved. There is no attempt to single out methods or functions for macros from the POD headers. Unfortunately, that was never really going to work. Instead the words from all headers (that are not stop words) are used. So the methods or functions that are in a POD header will be in the search data, but other things will also be there. There are some custom stop words added that are not desirable in the search data, like "description", "synopsis", "podlink", and "problink". Also, macros that don't have POD in them are not indexed at all since there isn't anything to show for these files.

pstaabp · 2025-07-01T11:25:31Z

This is closed in favor of #2759

This is a rework of openwebwork#2733. This does not add any scripts that need to be run when webwork is installed or upgraded. Instead the search data is dynamically generated. The first time that the sample problems home page is loaded all PG macros and sample problems are parsed and the search data extracted. Then the data is saved to the `DATA/sample-problem-search-data.json` file. This takes less than half a second on my computer, but that will probably vary some depending on the server's processing capabilities. The last modified time of each file is also saved in that search data. On subsequent requests the last modified time is checked, and a file is only parsed again if it has been modified since the last time the page was loaded. If a file is deleted from PG, the data is removed from the file. So even in development the data will always reflect the current state of the PG repository on the server. Subsequent requests when no changes to PG occur usually complete in around 1/25th of a second. A change of a single file increases that time only slightly. Note that the search data is accessed by JavaScript via the new `/sampleproblems/search_data` route. The layout of the sample problems home page is changed considerably from openwebwork#2733. The layout now works on narrow screens and some accessibility and html validation issues were addressed. There is also quite a bit of improvement in the search data that is saved. There is no attempt to single out methods or functions for macros from the POD headers. Unfortunately, that was never really going to work. Instead the words from all headers (that are not stop words) are used. So the methods or functions that are in a POD header will be in the search data, but other things will also be there. There are some custom stop words added that are not desirable in the search data, like "description", "synopsis", "podlink", and "problink". Also, macros that don't have POD in them are not indexed at all since there isn't anything to show for these files.

This is a rework of #2733. This does not add any scripts that need to be run when webwork is installed or upgraded. Instead the search data is dynamically generated. The first time that the sample problems home page is loaded all PG macros and sample problems are parsed and the search data extracted. Then the data is saved to the `DATA/sample-problem-search-data.json` file. This takes less than half a second on my computer, but that will probably vary some depending on the server's processing capabilities. The last modified time of each file is also saved in that search data. On subsequent requests the last modified time is checked, and a file is only parsed again if it has been modified since the last time the page was loaded. If a file is deleted from PG, the data is removed from the file. So even in development the data will always reflect the current state of the PG repository on the server. Subsequent requests when no changes to PG occur usually complete in around 1/25th of a second. A change of a single file increases that time only slightly. Note that the search data is accessed by JavaScript via the new `/sampleproblems/search_data` route. The layout of the sample problems home page is changed considerably from #2733. The layout now works on narrow screens and some accessibility and html validation issues were addressed. There is also quite a bit of improvement in the search data that is saved. There is no attempt to single out methods or functions for macros from the POD headers. Unfortunately, that was never really going to work. Instead the words from all headers (that are not stop words) are used. So the methods or functions that are in a POD header will be in the search data, but other things will also be there. There are some custom stop words added that are not desirable in the search data, like "description", "synopsis", "podlink", and "problink". Also, macros that don't have POD in them are not indexed at all since there isn't anything to show for these files.

pstaabp force-pushed the search-docs branch from d6fae0b to 4efbdb7 Compare May 27, 2025 14:50

This was referenced Jun 5, 2025

Move some macros into the deprecated folder openwebwork/pg#1242

Merged

Restructure the POD of macros openwebwork/pg#1244

Closed

pstaabp force-pushed the search-docs branch 3 times, most recently from ccd33bf to 76d22dd Compare June 21, 2025 16:57

pstaabp marked this pull request as ready for review June 21, 2025 16:57

drgrice1 requested changes Jun 26, 2025

View reviewed changes

pstaabp added 5 commits June 27, 2025 07:43

run prettier-format

f7df5cb

Parse the POD from macro files .

18100d5

improve the search and add a "no results"

2f15fa7

Also add ww-update script to call OPL-update and build-search-json.pl

Remove the update-ww script, return the download-OPL script ...

245b48c

to it's previous state and now run the parse-search-json.pl script upon startup of webwork2. Restore the minisearch package.

pstaabp force-pushed the search-docs branch from 76d22dd to 245b48c Compare June 27, 2025 11:51

drgrice1 mentioned this pull request Jun 27, 2025

Rework the search box for the sample problems page. #2759

Merged

pstaabp closed this Jul 1, 2025


		=head1 NAME

		update-ww.pl -- run all needed scripts when webwork is updated.

Uh oh!

Add a search box to the Sample Problems page and a script to build a search file #2733

Add a search box to the Sample Problems page and a script to build a search file #2733

Uh oh!

Conversation

pstaabp commented May 27, 2025

Uh oh!

pstaabp commented Jun 10, 2025

Uh oh!

dlglin commented Jun 17, 2025

Uh oh!

pstaabp commented Jun 21, 2025

Uh oh!

Alex-Jordan commented Jun 25, 2025

Uh oh!

pstaabp commented Jun 25, 2025

Uh oh!

Alex-Jordan commented Jun 25, 2025

Uh oh!

drgrice1 commented Jun 25, 2025

Uh oh!

Alex-Jordan commented Jun 25, 2025

Uh oh!

drgrice1 left a comment

Choose a reason for hiding this comment

Uh oh!

drgrice1 Jun 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

drgrice1 Jun 26, 2025

Choose a reason for hiding this comment

Uh oh!

pstaabp Jun 26, 2025

Choose a reason for hiding this comment

Uh oh!

drgrice1 Jun 26, 2025

Choose a reason for hiding this comment

Uh oh!

drgrice1 Jun 26, 2025

Choose a reason for hiding this comment

Uh oh!

pstaabp Jun 26, 2025

Choose a reason for hiding this comment

Uh oh!

drgrice1 Jun 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

drgrice1 commented Jun 27, 2025

Uh oh!

pstaabp commented Jun 27, 2025

Uh oh!

pstaabp commented Jun 27, 2025

Uh oh!

drgrice1 commented Jun 27, 2025

Uh oh!

pstaabp commented Jun 27, 2025

Uh oh!

pstaabp commented Jul 1, 2025

Uh oh!

Uh oh!