-
-
Notifications
You must be signed in to change notification settings - Fork 165
Add a search box to the Sample Problems page and a script to build a search file #2733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Updated the script to parse the POD in macro files to
|
When I start typing I get suggestions, but if I type more after that then the suggestions disappear. Then if I type even more they reappear. |
ccd33bf
to
76d22dd
Compare
Added an
Note: this will need to be added to the upgrade instructions. So now to test, run |
I'm getting a console error |
did you |
Yes, I've run |
I am also getting the |
There was a change to I'm testing by checking out a copy of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that you need to run npm install minisearch
, and then add the changed package.json
and package-lock.json
files to this pull request.
I haven't delved into this in much depth yet. I can see that this pull request clearly needs a lot of work before it is really ready for review.
Default value is tutorial/samples-problems | ||
-b|--build One of (all, macros, samples) to determine if the macros, sample | ||
problems or both should be scraped for data. | ||
-v|--verbose Setting this flag provides details as the script runs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This script needs help. I mean that literally. Any script with options should have a -h|--help
flag that shows how to use the script.
bin/update-ww.pl
Outdated
|
||
=head1 NAME | ||
|
||
update-ww.pl -- run all needed scripts when webwork is updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't add this script. The OPL-update
script is already a script that just runs other scripts. Now you are going to add another one that runs that script and one more? Find another way to do it.
In addition the options for this script are not well explained, and there is no -h
help flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@drgrice1 You had suggested that I add this. I thought the idea was that this script would run anything that we needed when WeBWorK was upgraded. Are you suggesting in the OPL-update
to include the building of the search JSON?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is what I actually meant with the suggestion. That the OPL-update
includes running this script. Not the creation of a new script. It is a bit of a stretch that this goes under the description of "updating the OPL", but the point is that everything be handled by a single script. However, that is minor, and the point is to not need to change the instructions for installation/upgrade.
Although, I have also though about just removing the need for this new script entirely. Have you actually benchmarked the script? How bad would performance be if the files were dynamically combed at runtime, and the search actually done live? If it isn't bad, or with some tweaking could be done efficiently at runtime, then that is the ideal way to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another possibility is to effectively run the script the first time the server starts if the JSON file does not already exist. Perhaps with a job queue task that handles this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running it seems to take 5-10 seconds which would be pretty bad to run on the page load.
I'll try to see if we can run it at server start and hand off to the job queue. That sounds like a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered that the reason the script takes 5-10 seconds to run is that your code is highly inefficient? And that improvements could be made to make it run in a fraction of the time? I know this to be true in fact, and I made a small change that makes it so that running the script takes around 4 tenths of a second. I am certain better can be done with more effort.
Have you also actually looked at the result of running the script and the words that are being listed in the JSON file? That clearly needs more thought. I am pretty sure that we don't want the word "exppi" which is listed (for example from the ProvingTrigIdentities.pg sample problem) because "
exp(pi" appears in a file and the script removes the parenthesis.
@pstaabp: I have been working on this, and I have quite a bit more to do. I wanted to let you know, so that you don't work on this and conflict with what I am doing. |
This PR has two related features 1. A script in `dev_scripts` to parse all of the sample problems and the POD to build up a "database" of non-common words for a given problem/POD page. 2. Add search boxes to the sample-problems/POD pages to return pages that match the search criteria.
Also add ww-update script to call OPL-update and build-search-json.pl
to it's previous state and now run the parse-search-json.pl script upon startup of webwork2. Restore the minisearch package.
I just noticed your comment and just pushed. I can revert to the previous commit if needed. |
If you put in a PR to this branch, I'll deal with the conflicts. I can made it run in the webwork2 script and probably is unnecessary and then can remove that. |
Most likely I am just going to take this over and put in a completely new pull request. I am sorry, but there is a lot wrong with this pull request, and almost everything needs to be completely rewritten. There aren't going to be any scripts. Everything is going to be done efficiently and dynamically directly from the code. |
That's fine @drgrice1 |
This is a rework of openwebwork#2733. This does not add any scripts that need to be run when webwork is installed or upgraded. Instead the search data is dynamically generated. The first time that the sample problems home page is loaded all PG macros and sample problems are parsed and the search data extracted. Then the data is saved to the `DATA/sample-problem-search-data.json` file. This takes less than half a second on my computer, but that will probably vary some depending on the server's processing capabilities. The last modified time of each file is also saved in that search data. On subsequent reuqests the last modified time is checked, and a file is only parsed again if it has been modified since the last time the page was loaded. If a file is deleted from PG, the data is removed from the file. So even in development the data will always reflect the current state of the PG repository on the server. Subsequent requests when no changes to PG occur usually complete in around 1/25th of a second. A change of a single file increases that time only slightly. Note that the search data is accessed by JavaScript via the new `/sampleproblems/search_data` route. The layout of the sample problems home page is changed considerably from openwebwork#2733. The layout now works on narrow screens and some accessibility and html validation issues were addressed. There is also quite a bit of improvement in the search data that is saved. There is no attempt to single out methods or functions for macros from the POD headers. Unfortunately, that was never really going to work. Instead the words from all headers (that are not stop words) are used. So the methods or functions that are in a POD header will be in the search data, but other things will also be there. There are some custom stop words added that are not desirable in the search data, like "description", "synopsis", "podlink", and "problink". Also, macros that don't have POD in them are not indexed at all since there isn't anything to show for these files.
This is a rework of openwebwork#2733. This does not add any scripts that need to be run when webwork is installed or upgraded. Instead the search data is dynamically generated. The first time that the sample problems home page is loaded all PG macros and sample problems are parsed and the search data extracted. Then the data is saved to the `DATA/sample-problem-search-data.json` file. This takes less than half a second on my computer, but that will probably vary some depending on the server's processing capabilities. The last modified time of each file is also saved in that search data. On subsequent reuqests the last modified time is checked, and a file is only parsed again if it has been modified since the last time the page was loaded. If a file is deleted from PG, the data is removed from the file. So even in development the data will always reflect the current state of the PG repository on the server. Subsequent requests when no changes to PG occur usually complete in around 1/25th of a second. A change of a single file increases that time only slightly. Note that the search data is accessed by JavaScript via the new `/sampleproblems/search_data` route. The layout of the sample problems home page is changed considerably from openwebwork#2733. The layout now works on narrow screens and some accessibility and html validation issues were addressed. There is also quite a bit of improvement in the search data that is saved. There is no attempt to single out methods or functions for macros from the POD headers. Unfortunately, that was never really going to work. Instead the words from all headers (that are not stop words) are used. So the methods or functions that are in a POD header will be in the search data, but other things will also be there. There are some custom stop words added that are not desirable in the search data, like "description", "synopsis", "podlink", and "problink". Also, macros that don't have POD in them are not indexed at all since there isn't anything to show for these files.
This is a rework of openwebwork#2733. This does not add any scripts that need to be run when webwork is installed or upgraded. Instead the search data is dynamically generated. The first time that the sample problems home page is loaded all PG macros and sample problems are parsed and the search data extracted. Then the data is saved to the `DATA/sample-problem-search-data.json` file. This takes less than half a second on my computer, but that will probably vary some depending on the server's processing capabilities. The last modified time of each file is also saved in that search data. On subsequent requests the last modified time is checked, and a file is only parsed again if it has been modified since the last time the page was loaded. If a file is deleted from PG, the data is removed from the file. So even in development the data will always reflect the current state of the PG repository on the server. Subsequent requests when no changes to PG occur usually complete in around 1/25th of a second. A change of a single file increases that time only slightly. Note that the search data is accessed by JavaScript via the new `/sampleproblems/search_data` route. The layout of the sample problems home page is changed considerably from openwebwork#2733. The layout now works on narrow screens and some accessibility and html validation issues were addressed. There is also quite a bit of improvement in the search data that is saved. There is no attempt to single out methods or functions for macros from the POD headers. Unfortunately, that was never really going to work. Instead the words from all headers (that are not stop words) are used. So the methods or functions that are in a POD header will be in the search data, but other things will also be there. There are some custom stop words added that are not desirable in the search data, like "description", "synopsis", "podlink", and "problink". Also, macros that don't have POD in them are not indexed at all since there isn't anything to show for these files.
This is closed in favor of #2759 |
This is a rework of openwebwork#2733. This does not add any scripts that need to be run when webwork is installed or upgraded. Instead the search data is dynamically generated. The first time that the sample problems home page is loaded all PG macros and sample problems are parsed and the search data extracted. Then the data is saved to the `DATA/sample-problem-search-data.json` file. This takes less than half a second on my computer, but that will probably vary some depending on the server's processing capabilities. The last modified time of each file is also saved in that search data. On subsequent requests the last modified time is checked, and a file is only parsed again if it has been modified since the last time the page was loaded. If a file is deleted from PG, the data is removed from the file. So even in development the data will always reflect the current state of the PG repository on the server. Subsequent requests when no changes to PG occur usually complete in around 1/25th of a second. A change of a single file increases that time only slightly. Note that the search data is accessed by JavaScript via the new `/sampleproblems/search_data` route. The layout of the sample problems home page is changed considerably from openwebwork#2733. The layout now works on narrow screens and some accessibility and html validation issues were addressed. There is also quite a bit of improvement in the search data that is saved. There is no attempt to single out methods or functions for macros from the POD headers. Unfortunately, that was never really going to work. Instead the words from all headers (that are not stop words) are used. So the methods or functions that are in a POD header will be in the search data, but other things will also be there. There are some custom stop words added that are not desirable in the search data, like "description", "synopsis", "podlink", and "problink". Also, macros that don't have POD in them are not indexed at all since there isn't anything to show for these files.
This is a rework of openwebwork#2733. This does not add any scripts that need to be run when webwork is installed or upgraded. Instead the search data is dynamically generated. The first time that the sample problems home page is loaded all PG macros and sample problems are parsed and the search data extracted. Then the data is saved to the `DATA/sample-problem-search-data.json` file. This takes less than half a second on my computer, but that will probably vary some depending on the server's processing capabilities. The last modified time of each file is also saved in that search data. On subsequent requests the last modified time is checked, and a file is only parsed again if it has been modified since the last time the page was loaded. If a file is deleted from PG, the data is removed from the file. So even in development the data will always reflect the current state of the PG repository on the server. Subsequent requests when no changes to PG occur usually complete in around 1/25th of a second. A change of a single file increases that time only slightly. Note that the search data is accessed by JavaScript via the new `/sampleproblems/search_data` route. The layout of the sample problems home page is changed considerably from openwebwork#2733. The layout now works on narrow screens and some accessibility and html validation issues were addressed. There is also quite a bit of improvement in the search data that is saved. There is no attempt to single out methods or functions for macros from the POD headers. Unfortunately, that was never really going to work. Instead the words from all headers (that are not stop words) are used. So the methods or functions that are in a POD header will be in the search data, but other things will also be there. There are some custom stop words added that are not desirable in the search data, like "description", "synopsis", "podlink", and "problink". Also, macros that don't have POD in them are not indexed at all since there isn't anything to show for these files.
This is a rework of #2733. This does not add any scripts that need to be run when webwork is installed or upgraded. Instead the search data is dynamically generated. The first time that the sample problems home page is loaded all PG macros and sample problems are parsed and the search data extracted. Then the data is saved to the `DATA/sample-problem-search-data.json` file. This takes less than half a second on my computer, but that will probably vary some depending on the server's processing capabilities. The last modified time of each file is also saved in that search data. On subsequent requests the last modified time is checked, and a file is only parsed again if it has been modified since the last time the page was loaded. If a file is deleted from PG, the data is removed from the file. So even in development the data will always reflect the current state of the PG repository on the server. Subsequent requests when no changes to PG occur usually complete in around 1/25th of a second. A change of a single file increases that time only slightly. Note that the search data is accessed by JavaScript via the new `/sampleproblems/search_data` route. The layout of the sample problems home page is changed considerably from #2733. The layout now works on narrow screens and some accessibility and html validation issues were addressed. There is also quite a bit of improvement in the search data that is saved. There is no attempt to single out methods or functions for macros from the POD headers. Unfortunately, that was never really going to work. Instead the words from all headers (that are not stop words) are used. So the methods or functions that are in a POD header will be in the search data, but other things will also be there. There are some custom stop words added that are not desirable in the search data, like "description", "synopsis", "podlink", and "problink". Also, macros that don't have POD in them are not indexed at all since there isn't anything to show for these files.
This PR has two related features
bin/dev_scripts
to parse all of the sample problems and the POD to build up a "database" (JSON file) of non-common words for a given problem/POD page. The JSON file contains information to be searched on within the browser using the package MiniSearch.There is a file
htdocs/DATA/search.json
that is provided that will be used while searching. This is not complete but currently should give some sense about how this works. The script will rebuild this file as needed.Note: a subsequent PR will be posted to update the POD in the macros to give the search tools needed terms.