Skip to content

Add a search box to the Sample Problems page and a script to build a search file #2733

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

pstaabp
Copy link
Member

@pstaabp pstaabp commented May 27, 2025

This PR has two related features

  1. A script in bin/dev_scripts to parse all of the sample problems and the POD to build up a "database" (JSON file) of non-common words for a given problem/POD page. The JSON file contains information to be searched on within the browser using the package MiniSearch.
  2. Add search boxes to the sample-problems/POD pages to return pages that match the search criteria. The searching is done using MiniSearch.

There is a file htdocs/DATA/search.json that is provided that will be used while searching. This is not complete but currently should give some sense about how this works. The script will rebuild this file as needed.

Note: a subsequent PR will be posted to update the POD in the macros to give the search tools needed terms.

@pstaabp
Copy link
Member Author

pstaabp commented Jun 10, 2025

Updated the script to parse the POD in macro files to

  1. parse the =head1 NAME section to extract the filename and short description of the macro. (Note: Restructure the POD of macros pg#1244 has an update on all macros to fix formats for this script to better parse).
  2. parse the =head2 blocks where method and function names have been included. This will allow to search for the documentation of a particular function name.

@dlglin
Copy link
Member

dlglin commented Jun 17, 2025

When I start typing I get suggestions, but if I type more after that then the suggestions disappear. Then if I type even more they reappear.

@pstaabp pstaabp force-pushed the search-docs branch 3 times, most recently from ccd33bf to 76d22dd Compare June 21, 2025 16:57
@pstaabp
Copy link
Member Author

pstaabp commented Jun 21, 2025

Added an update-ww.pl script that will

  1. run OPL-update without the interactive loadOPLstats part of the script
  2. run build-search-json.pl to generate the search-json for the POD/sample probs.

Note: this will need to be added to the upgrade instructions.

So now to test, run update-ww.pl. To skip the OPL-update use the -s flag. Then test the search box on the sample problems page.

@pstaabp pstaabp marked this pull request as ready for review June 21, 2025 16:57
@Alex-Jordan
Copy link
Contributor

I'm getting a console error Uncaught ReferenceError: MiniSearch is not defined.

@pstaabp
Copy link
Member Author

pstaabp commented Jun 25, 2025

did you npm ci ?

@Alex-Jordan
Copy link
Contributor

Yes, I've run npm ci and restarted webwork2. Looking at the files changed here, I don't see anything that would have affected the npm packages though. The only references to MiniSearch are where it is used.

@drgrice1
Copy link
Member

I am also getting the MiniSearch is not defined error.

@Alex-Jordan
Copy link
Contributor

There was a change to package-lock.json here before Danny did testing. But subsequent force pushes don't have anything changed for that file.

I'm testing by checking out a copy of the WeBWorK-2.20 branch, and then merging this branch into that one. I think that is the most accurate way to test things. Maybe the behavior would be different if I just literally checked out this branch...

Copy link
Member

@drgrice1 drgrice1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that you need to run npm install minisearch, and then add the changed package.json and package-lock.json files to this pull request.

I haven't delved into this in much depth yet. I can see that this pull request clearly needs a lot of work before it is really ready for review.

Default value is tutorial/samples-problems
-b|--build One of (all, macros, samples) to determine if the macros, sample
problems or both should be scraped for data.
-v|--verbose Setting this flag provides details as the script runs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script needs help. I mean that literally. Any script with options should have a -h|--help flag that shows how to use the script.

bin/update-ww.pl Outdated

=head1 NAME

update-ww.pl -- run all needed scripts when webwork is updated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't add this script. The OPL-update script is already a script that just runs other scripts. Now you are going to add another one that runs that script and one more? Find another way to do it.

In addition the options for this script are not well explained, and there is no -h help flag.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drgrice1 You had suggested that I add this. I thought the idea was that this script would run anything that we needed when WeBWorK was upgraded. Are you suggesting in the OPL-update to include the building of the search JSON?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is what I actually meant with the suggestion. That the OPL-update includes running this script. Not the creation of a new script. It is a bit of a stretch that this goes under the description of "updating the OPL", but the point is that everything be handled by a single script. However, that is minor, and the point is to not need to change the instructions for installation/upgrade.

Although, I have also though about just removing the need for this new script entirely. Have you actually benchmarked the script? How bad would performance be if the files were dynamically combed at runtime, and the search actually done live? If it isn't bad, or with some tweaking could be done efficiently at runtime, then that is the ideal way to go.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another possibility is to effectively run the script the first time the server starts if the JSON file does not already exist. Perhaps with a job queue task that handles this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running it seems to take 5-10 seconds which would be pretty bad to run on the page load.

I'll try to see if we can run it at server start and hand off to the job queue. That sounds like a good idea.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered that the reason the script takes 5-10 seconds to run is that your code is highly inefficient? And that improvements could be made to make it run in a fraction of the time? I know this to be true in fact, and I made a small change that makes it so that running the script takes around 4 tenths of a second. I am certain better can be done with more effort.

Have you also actually looked at the result of running the script and the words that are being listed in the JSON file? That clearly needs more thought. I am pretty sure that we don't want the word "exppi" which is listed (for example from the ProvingTrigIdentities.pg sample problem) because "exp(pi" appears in a file and the script removes the parenthesis.

@drgrice1
Copy link
Member

@pstaabp: I have been working on this, and I have quite a bit more to do. I wanted to let you know, so that you don't work on this and conflict with what I am doing.

pstaabp added 5 commits June 27, 2025 07:43
This PR has two related features
1. A script in `dev_scripts` to parse all of the sample problems and the POD to build up a "database" of non-common words for a given problem/POD page.
2. Add search boxes to the sample-problems/POD pages to return pages that match the search criteria.
Also add ww-update script to call OPL-update and build-search-json.pl
to it's previous state and now run the parse-search-json.pl script upon startup of webwork2.

Restore the minisearch package.
@pstaabp
Copy link
Member Author

pstaabp commented Jun 27, 2025

I just noticed your comment and just pushed. I can revert to the previous commit if needed.

@pstaabp
Copy link
Member Author

pstaabp commented Jun 27, 2025

If you put in a PR to this branch, I'll deal with the conflicts. I can made it run in the webwork2 script and probably is unnecessary and then can remove that.

@drgrice1
Copy link
Member

Most likely I am just going to take this over and put in a completely new pull request. I am sorry, but there is a lot wrong with this pull request, and almost everything needs to be completely rewritten. There aren't going to be any scripts. Everything is going to be done efficiently and dynamically directly from the code.

@pstaabp
Copy link
Member Author

pstaabp commented Jun 27, 2025

That's fine @drgrice1

drgrice1 added a commit to drgrice1/webwork2 that referenced this pull request Jun 27, 2025
This is a rework of openwebwork#2733.

This does not add any scripts that need to be run when webwork is
installed or upgraded.  Instead the search data is dynamically
generated.  The first time that the sample problems home page is loaded
all PG macros and sample problems are parsed and the search data
extracted.  Then the data is saved to the
`DATA/sample-problem-search-data.json` file. This takes less than half a
second on my computer, but that will probably vary some depending on the
server's processing capabilities. The last modified time of each file is
also saved in that search data.  On subsequent reuqests the last
modified time is checked, and a file is only parsed again if it has been
modified since the last time the page was loaded.  If a file is deleted
from PG, the data is removed from the file.  So even in development the
data will always reflect the current state of the PG repository on the
server.  Subsequent requests when no changes to PG occur usually
complete in around 1/25th of a second. A change of a single file
increases that time only slightly.

Note that the search data is accessed by JavaScript via the new
`/sampleproblems/search_data` route.

The layout of the sample problems home page is changed considerably
from openwebwork#2733.  The layout now works on narrow screens and some
accessibility and html validation issues were addressed.

There is also quite a bit of improvement in the search data that is
saved.  There is no attempt to single out methods or functions for
macros from the POD headers.  Unfortunately, that was never really going
to work.  Instead the words from all headers (that are not stop words)
are used.  So the methods or functions that are in a POD header will be
in the search data, but other things will also be there.  There are some
custom stop words added that are not desirable in the search data, like
"description", "synopsis", "podlink", and "problink".

Also, macros that don't have POD in them are not indexed at all since
there isn't anything to show for these files.
drgrice1 added a commit to drgrice1/webwork2 that referenced this pull request Jun 27, 2025
This is a rework of openwebwork#2733.

This does not add any scripts that need to be run when webwork is
installed or upgraded.  Instead the search data is dynamically
generated.  The first time that the sample problems home page is loaded
all PG macros and sample problems are parsed and the search data
extracted.  Then the data is saved to the
`DATA/sample-problem-search-data.json` file. This takes less than half a
second on my computer, but that will probably vary some depending on the
server's processing capabilities. The last modified time of each file is
also saved in that search data.  On subsequent reuqests the last
modified time is checked, and a file is only parsed again if it has been
modified since the last time the page was loaded.  If a file is deleted
from PG, the data is removed from the file.  So even in development the
data will always reflect the current state of the PG repository on the
server.  Subsequent requests when no changes to PG occur usually
complete in around 1/25th of a second. A change of a single file
increases that time only slightly.

Note that the search data is accessed by JavaScript via the new
`/sampleproblems/search_data` route.

The layout of the sample problems home page is changed considerably
from openwebwork#2733.  The layout now works on narrow screens and some
accessibility and html validation issues were addressed.

There is also quite a bit of improvement in the search data that is
saved.  There is no attempt to single out methods or functions for
macros from the POD headers.  Unfortunately, that was never really going
to work.  Instead the words from all headers (that are not stop words)
are used.  So the methods or functions that are in a POD header will be
in the search data, but other things will also be there.  There are some
custom stop words added that are not desirable in the search data, like
"description", "synopsis", "podlink", and "problink".

Also, macros that don't have POD in them are not indexed at all since
there isn't anything to show for these files.
drgrice1 added a commit to drgrice1/webwork2 that referenced this pull request Jun 27, 2025
This is a rework of openwebwork#2733.

This does not add any scripts that need to be run when webwork is
installed or upgraded.  Instead the search data is dynamically
generated.  The first time that the sample problems home page is loaded
all PG macros and sample problems are parsed and the search data
extracted.  Then the data is saved to the
`DATA/sample-problem-search-data.json` file. This takes less than half a
second on my computer, but that will probably vary some depending on the
server's processing capabilities. The last modified time of each file is
also saved in that search data.  On subsequent requests the last
modified time is checked, and a file is only parsed again if it has been
modified since the last time the page was loaded.  If a file is deleted
from PG, the data is removed from the file.  So even in development the
data will always reflect the current state of the PG repository on the
server.  Subsequent requests when no changes to PG occur usually
complete in around 1/25th of a second. A change of a single file
increases that time only slightly.

Note that the search data is accessed by JavaScript via the new
`/sampleproblems/search_data` route.

The layout of the sample problems home page is changed considerably
from openwebwork#2733.  The layout now works on narrow screens and some
accessibility and html validation issues were addressed.

There is also quite a bit of improvement in the search data that is
saved.  There is no attempt to single out methods or functions for
macros from the POD headers.  Unfortunately, that was never really going
to work.  Instead the words from all headers (that are not stop words)
are used.  So the methods or functions that are in a POD header will be
in the search data, but other things will also be there.  There are some
custom stop words added that are not desirable in the search data, like
"description", "synopsis", "podlink", and "problink".

Also, macros that don't have POD in them are not indexed at all since
there isn't anything to show for these files.
@pstaabp
Copy link
Member Author

pstaabp commented Jul 1, 2025

This is closed in favor of #2759

@pstaabp pstaabp closed this Jul 1, 2025
drgrice1 added a commit to drgrice1/webwork2 that referenced this pull request Jul 2, 2025
This is a rework of openwebwork#2733.

This does not add any scripts that need to be run when webwork is
installed or upgraded.  Instead the search data is dynamically
generated.  The first time that the sample problems home page is loaded
all PG macros and sample problems are parsed and the search data
extracted.  Then the data is saved to the
`DATA/sample-problem-search-data.json` file. This takes less than half a
second on my computer, but that will probably vary some depending on the
server's processing capabilities. The last modified time of each file is
also saved in that search data.  On subsequent requests the last
modified time is checked, and a file is only parsed again if it has been
modified since the last time the page was loaded.  If a file is deleted
from PG, the data is removed from the file.  So even in development the
data will always reflect the current state of the PG repository on the
server.  Subsequent requests when no changes to PG occur usually
complete in around 1/25th of a second. A change of a single file
increases that time only slightly.

Note that the search data is accessed by JavaScript via the new
`/sampleproblems/search_data` route.

The layout of the sample problems home page is changed considerably
from openwebwork#2733.  The layout now works on narrow screens and some
accessibility and html validation issues were addressed.

There is also quite a bit of improvement in the search data that is
saved.  There is no attempt to single out methods or functions for
macros from the POD headers.  Unfortunately, that was never really going
to work.  Instead the words from all headers (that are not stop words)
are used.  So the methods or functions that are in a POD header will be
in the search data, but other things will also be there.  There are some
custom stop words added that are not desirable in the search data, like
"description", "synopsis", "podlink", and "problink".

Also, macros that don't have POD in them are not indexed at all since
there isn't anything to show for these files.
drgrice1 added a commit to drgrice1/webwork2 that referenced this pull request Jul 2, 2025
This is a rework of openwebwork#2733.

This does not add any scripts that need to be run when webwork is
installed or upgraded.  Instead the search data is dynamically
generated.  The first time that the sample problems home page is loaded
all PG macros and sample problems are parsed and the search data
extracted.  Then the data is saved to the
`DATA/sample-problem-search-data.json` file. This takes less than half a
second on my computer, but that will probably vary some depending on the
server's processing capabilities. The last modified time of each file is
also saved in that search data.  On subsequent requests the last
modified time is checked, and a file is only parsed again if it has been
modified since the last time the page was loaded.  If a file is deleted
from PG, the data is removed from the file.  So even in development the
data will always reflect the current state of the PG repository on the
server.  Subsequent requests when no changes to PG occur usually
complete in around 1/25th of a second. A change of a single file
increases that time only slightly.

Note that the search data is accessed by JavaScript via the new
`/sampleproblems/search_data` route.

The layout of the sample problems home page is changed considerably
from openwebwork#2733.  The layout now works on narrow screens and some
accessibility and html validation issues were addressed.

There is also quite a bit of improvement in the search data that is
saved.  There is no attempt to single out methods or functions for
macros from the POD headers.  Unfortunately, that was never really going
to work.  Instead the words from all headers (that are not stop words)
are used.  So the methods or functions that are in a POD header will be
in the search data, but other things will also be there.  There are some
custom stop words added that are not desirable in the search data, like
"description", "synopsis", "podlink", and "problink".

Also, macros that don't have POD in them are not indexed at all since
there isn't anything to show for these files.
drgrice1 added a commit that referenced this pull request Jul 3, 2025
This is a rework of #2733.

This does not add any scripts that need to be run when webwork is
installed or upgraded.  Instead the search data is dynamically
generated.  The first time that the sample problems home page is loaded
all PG macros and sample problems are parsed and the search data
extracted.  Then the data is saved to the
`DATA/sample-problem-search-data.json` file. This takes less than half a
second on my computer, but that will probably vary some depending on the
server's processing capabilities. The last modified time of each file is
also saved in that search data.  On subsequent requests the last
modified time is checked, and a file is only parsed again if it has been
modified since the last time the page was loaded.  If a file is deleted
from PG, the data is removed from the file.  So even in development the
data will always reflect the current state of the PG repository on the
server.  Subsequent requests when no changes to PG occur usually
complete in around 1/25th of a second. A change of a single file
increases that time only slightly.

Note that the search data is accessed by JavaScript via the new
`/sampleproblems/search_data` route.

The layout of the sample problems home page is changed considerably
from #2733.  The layout now works on narrow screens and some
accessibility and html validation issues were addressed.

There is also quite a bit of improvement in the search data that is
saved.  There is no attempt to single out methods or functions for
macros from the POD headers.  Unfortunately, that was never really going
to work.  Instead the words from all headers (that are not stop words)
are used.  So the methods or functions that are in a POD header will be
in the search data, but other things will also be there.  There are some
custom stop words added that are not desirable in the search data, like
"description", "synopsis", "podlink", and "problink".

Also, macros that don't have POD in them are not indexed at all since
there isn't anything to show for these files.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants