-
Notifications
You must be signed in to change notification settings - Fork 9
No such file or directory: 'sample_logs/counter_2018-05-01.log' #7
Comments
Hi Philip, Sorry for the problems. By default it will process everything for a monthly report that it thinks it needs to and hasn't been processed yet (up until yesterday) and it also tracks state in the state/statefile.json to track what daily logs have already been processed for the month already. It will not re-process days into the database logs that have already been processed into it (based on the last_processed_day value). It has a bunch of automation in there for updating the logs up until yesterday (or the last day of the month), whichever was last. One option is to set the state in the json file so that it thinks it has already processed up to day 7 for that month and year. That is pretty ugly to have to edit the state/statefile.json each time, though. Your solution is probably better to just create blank files. Another option, I think, is to explicitly change the log_name_pattern to process just one file or process files in more manual way. It replaces the "(yyyy-mm-dd)" string with actual dates and if it doesn't exist in the filename pattern it doesn't replace the date. Something like this might do the job since it doesn't have a string like (yyyyy-mm-dd) to indicate a filename replacement.
I'm looking at the code at https://github.com/CDLUC3/counter-processor/blob/6f9459f25cb1c7c01660edbf744a4c049ce8abed/config/config.py The other thing: I'm not sure what dates it would think it had processed already for that month. Clearly if someone is going to feed different files in manually then they'll need to track stuff themselves and not re-process data that has already gone into the database. Most likely it would eliminate the duplicate lines as "double-clicks" anyway, but it would be a waste of time to re-read duplicate log files. Another other option might be to explicitly let people set the "last_processed_day" (via an environment variable or similar) for the month manually, so it would only process log files with names after the last_processed_day until the end date. I believe the SIMULATE_DATE option just tells it to suck in files up until the day before that date. If not set, then the end of the day before the current day or the end of the month is the cutoff for the end date. Probably in the long term it would make sense to have more manual ways to specify some of these things for people who want to manage processing more manually. Also, splitting the option for sucking log lines into the database vs calculating stats would be nice. I can look and test a little more in the morning to see if there is an easy way to skip some files daily logs earlier in a month. |
Thanks for the feedback, Phillip. I've simplified the processing model a little for those who don't want to do full, in-order, daily log processing for past data for a month. It's in the branch https://github.com/CDLUC3/counter-processor/tree/non-daily-logs . It's kind of a quick fix and there are ways it could be more clear, but I think it works. Give it a try and if things look OK to you I'll merge it into our master branch and create another release. Here is an example like yours with the slightly revised code that I tested out and it worked for me.
We will also be doing something like this soon for Dryad and they are producing full monthly files (rather than daily) for back-processing their old usage data. I assume we'll be using this too, probably something like this for each month and do it one month at a time rather than with daily logs for their historical data.
PS. I believe "simulate date" is unnecessary when doing an old full month report since the default end is for the end of the month so long as the clock time is after the end of the month. |
@sfisher thanks for the detailed brain dump and the proposed code change at 92f7d8f. It looks like it will give us more flexibility in choosing between @matthew-a-dunlap and I discussed all of this yesterday and I think we're going to stick to our Thanks again, especially simply for confirming that some kind of workaround or code change is necessary. This was on our list to ask you about! (Please stay tuned for at least one more unrelated question. 😄 ) Thanks! |
I'm finding that when I run Counter Processor for the first time I have to
touch
or create files for all the days leading up to the days that I actually have logs.For example, if counter_2018-05-08.log is my only log file, I'll touch the files for 01 through 07 like this:
In practice, I just touch all the possible dates like this:
Is this a bug or am I simply confused or doing something wrong?
The text was updated successfully, but these errors were encountered: