From b8bf2a7784023c1c1e2f977355d14c9b2e01c398 Mon Sep 17 00:00:00 2001 From: sfisher Date: Tue, 19 Mar 2024 13:52:16 -0700 Subject: [PATCH] Giving things a final update and information prior to archiving this github repo. It is no longer used by Dryad who was one of the primary users of this afaik. --- .python-version | 2 +- README.md | 41 +++++++++++++++++++++++++++----- requirements.txt | 62 ++++++++++++++++++++++++------------------------ 3 files changed, 67 insertions(+), 38 deletions(-) diff --git a/.python-version b/.python-version index c77a7de..9919bf8 100644 --- a/.python-version +++ b/.python-version @@ -1 +1 @@ -3.7.9 +3.10.13 diff --git a/README.md b/README.md index 0529656..023427c 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ ## Introduction -The Counter Processor is a Python 3 (3.7+ required) script for processing dataset access statistics from logs +The Counter Processor is a Python 3 (3.10) script for processing dataset access statistics from logs using the COUNTER Code of Practice for Research Data. The software assumes you area already logging your COUNTER dataset *investigations* and *requests* to a log file using a format somewhat similar to extended log format. The COUNTER Code of Practice requires that descriptive metadata be submitted along with statistics--these items are included in logs to ease later processing. @@ -49,10 +49,20 @@ with some examples. You will need to run the script on a computer where the log files you're trying to process are available on the file system for the script to access. ## Download the free IP to geolocation database -This product includes GeoLite2 data created by MaxMind, available from -http://www.maxmind.com. +The geo-ip uses GeoLite2 data created by MaxMind and is available from + +Internet Archive +(you only need the country database in binary database format). -GeoLite2 is a free IP geolocation database that must be installed in the product. You can download the database from [https://dev.maxmind.com/geoip/geoip2/geolite2/](https://dev.maxmind.com/geoip/geoip2/geolite2/). Choose the GeoLite2 Country database (binary, gzipped) and extract it to the maxmind_geoip directory inside the application to use with default configuration, or put it elsewhere and configure the path as mentioned below. +GeoLite2 is a free IP geolocation database that must be installed. You can download the +database above. Choose the GeoLite2 Country database (binary, gzipped) and extract it to +the maxmind_geoip directory inside the application to use with default configuration, +or put it elsewhere and configure the path as mentioned below. + +Newer versions of the database cannot be used with the current version of the script since +additional licensing terms are required such as registering for accounts, having an auto-update +functionality and ensuring it runs regularly. The script has not been updated to take these +additional requirements into account. ## Set up the configuration file The script takes a number of different configuration parameters in order to run correctly. See **config/config.yaml** for an example. To change the configuration you may edit it at config/config.yaml or you can put it at a different location and then specify it with an environment variable when starting the script like the example below. @@ -165,13 +175,32 @@ Some possible submission problems: ## Examples/notes -An example of processing only one day to test functioning (for January 1st, 2019 an using a log with a name pattern for that day) +An example of processing only one day to test functioning, using the sample log in this repository. ``` -YEAR_MONTH=2019-01 LOG_NAME_PATTERN="log/counter_(yyyy-mm-dd).log" UPLOAD_TO_HUB=False SIMULATE_DATE=2019-01-02 ./main.py +YEAR_MONTH=2018-05 SIMULATE_DATE=2018-05-02 LOG_NAME_PATTERN=sample_logs/counter_2018-05-01.log UPLOAD_TO_HUB=False ./main.py ``` An example of processing an entire month at a time. There is no literal string of "(yyyy-mm-dd)" in the filename so it will not be used to process daily logs and will take the filename completely literally. ``` YEAR_MONTH=2019-01 LOG_NAME_PATTERN="/path/to/my/log/counter_2019-01.log" UPLOAD_TO_HUB=False ./main.py ``` + +## Updated for Python 3.10 (2024) + +I've updated dependencies to try and address older libraries and update them where +possible. + +I installed version 3.10.13 of Python using the `pyenv` tool and was able to run +`pip install -r requirements.txt` to install the newer dependencies. + +I was able to process the sample log files using the first example above after +downloading the GeoLite2-Country.mmdb file from the Internet Archive (see link above) +and placing it in the maxmind_geoip directory. + +Some of the geolocation libraries were not updated due to licensing changes and +remain at older versions since updating would require additional work to comply +with the new licensing terms. + +This is no longer a script in use by Dryad and they have moved on to using the +DataCite web tracker for the statistics they are tracking. diff --git a/requirements.txt b/requirements.txt index 15cc375..eee6c4d 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,32 +1,32 @@ -aiohttp==3.7.2 -appnope==0.1.0 -async-timeout==3.0.1 -attrs==20.3.0 -backcall==0.2.0 -certifi==2020.11.8 -chardet==3.0.4 -decorator==4.4.2 -geoip2==4.1.0 -idna==2.10 -ipdb==0.13.4 -ipython==7.19.0 -ipython-genutils==0.2.0 -jedi==0.17.2 +aiohttp~=3.9.3 +appnope~=0.1.4 +async-timeout~=4.0.3 +attrs~=23.2.0 +backcall~=0.2.0 +certifi~=2024.2.2 +chardet~=3.0.4 +decorator~=5.1.1 +geoip2~=4.1.0 +idna~=3.6 +ipdb~=0.13.13 +ipython~=8.22.2 +ipython-genutils~=0.2.0 +jedi~=0.19.1 maxminddb==2.0.3 -multidict==5.0.0 -parso==0.7.1 -peewee==3.14.0 -pexpect==4.8.0 -pickleshare==0.7.5 -prompt-toolkit==3.0.8 -ptyprocess==0.6.0 -Pygments==2.7.2 -python-dateutil==2.8.1 -PyYAML==5.3.1 -requests==2.24.0 -six==1.15.0 -traitlets==5.0.5 -typing-extensions==3.7.4.3 -urllib3==1.25.11 -wcwidth==0.2.5 -yarl==1.6.2 +multidict~=6.0.5 +parso~=0.8.3 +peewee~=3.17.1 +pexpect~=4.9.0 +pickleshare~=0.7.5 +prompt-toolkit~=3.0.43 +ptyprocess~=0.7.0 +Pygments~=2.17.2 +python-dateutil~=2.8.1 +PyYAML~=6.0.1 +requests~=2.31.0 +six~=1.15.0 +traitlets~=5.14.2 +typing-extensions~=4.10.0 +urllib3~=1.26.18 +wcwidth~=0.2.13 +yarl~=1.9.4