Skip to content
This repository has been archived by the owner on Mar 19, 2024. It is now read-only.

Commit

Permalink
Merge pull request #30 from CDLUC3/fix-securitiy
Browse files Browse the repository at this point in the history
Trying to update security dependencies, where possible
  • Loading branch information
sfisher authored Mar 19, 2024
2 parents 1b44b18 + a94fc8f commit dfd0e7b
Show file tree
Hide file tree
Showing 3 changed files with 67 additions and 38 deletions.
2 changes: 1 addition & 1 deletion .python-version
Original file line number Diff line number Diff line change
@@ -1 +1 @@
3.7.9
3.10.13
41 changes: 35 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

## Introduction

The Counter Processor is a Python 3 (3.7+ required) script for processing dataset access statistics from logs
The Counter Processor is a Python 3 (3.10) script for processing dataset access statistics from logs
using the COUNTER Code of Practice for Research Data.

The software assumes you area already logging your COUNTER dataset *investigations* and *requests* to a log file using a format somewhat similar to extended log format. The COUNTER Code of Practice requires that descriptive metadata be submitted along with statistics--these items are included in logs to ease later processing.
Expand Down Expand Up @@ -49,10 +49,20 @@ with some examples.
You will need to run the script on a computer where the log files you're trying to process are available on the file system for the script to access.

## Download the free IP to geolocation database
This product includes GeoLite2 data created by MaxMind, available from
<a href="http://www.maxmind.com">http://www.maxmind.com</a>.
The geo-ip uses GeoLite2 data created by MaxMind and is available from
<a href="https://web.archive.org/web/20191222130401/https://dev.maxmind.com/geoip/geoip2/geolite2/" target="_blank">
Internet Archive</a>
(you only need the country database in binary database format).

GeoLite2 is a free IP geolocation database that must be installed in the product. You can download the database from [https://dev.maxmind.com/geoip/geoip2/geolite2/](https://dev.maxmind.com/geoip/geoip2/geolite2/). Choose the GeoLite2 Country database (binary, gzipped) and extract it to the maxmind_geoip directory inside the application to use with default configuration, or put it elsewhere and configure the path as mentioned below.
GeoLite2 is a free IP geolocation database that must be installed. You can download the
database above. Choose the GeoLite2 Country database (binary, gzipped) and extract it to
the maxmind_geoip directory inside the application to use with default configuration,
or put it elsewhere and configure the path as mentioned below.

Newer versions of the database cannot be used with the current version of the script since
additional licensing terms are required such as registering for accounts, having an auto-update
functionality and ensuring it runs regularly. The script has not been updated to take these
additional requirements into account.

## Set up the configuration file
The script takes a number of different configuration parameters in order to run correctly. See **config/config.yaml** for an example. To change the configuration you may edit it at config/config.yaml or you can put it at a different location and then specify it with an environment variable when starting the script like the example below.
Expand Down Expand Up @@ -165,13 +175,32 @@ Some possible submission problems:

## Examples/notes

An example of processing only one day to test functioning (for January 1st, 2019 an using a log with a name pattern for that day)
An example of processing only one day to test functioning, using the sample log in this repository.

```
YEAR_MONTH=2019-01 LOG_NAME_PATTERN="log/counter_(yyyy-mm-dd).log" UPLOAD_TO_HUB=False SIMULATE_DATE=2019-01-02 ./main.py
YEAR_MONTH=2018-05 SIMULATE_DATE=2018-05-02 LOG_NAME_PATTERN=sample_logs/counter_2018-05-01.log UPLOAD_TO_HUB=False ./main.py
```

An example of processing an entire month at a time. There is no literal string of "(yyyy-mm-dd)" in the filename so it will not be used to process daily logs and will take the filename completely literally.
```
YEAR_MONTH=2019-01 LOG_NAME_PATTERN="/path/to/my/log/counter_2019-01.log" UPLOAD_TO_HUB=False ./main.py
```

## Updated for Python 3.10 (2024)

I've updated dependencies to try and address older libraries and update them where
possible.

I installed version 3.10.13 of Python using the `pyenv` tool and was able to run
`pip install -r requirements.txt` to install the newer dependencies.

I was able to process the sample log files using the first example above after
downloading the GeoLite2-Country.mmdb file from the Internet Archive (see link above)
and placing it in the maxmind_geoip directory.

Some of the geolocation libraries were not updated due to licensing changes and
remain at older versions since updating would require additional work to comply
with the new licensing terms.

This is no longer a script in use by Dryad and they have moved on to using the
DataCite web tracker for the statistics they are tracking.
62 changes: 31 additions & 31 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,32 +1,32 @@
aiohttp==3.7.2
appnope==0.1.0
async-timeout==3.0.1
attrs==20.3.0
backcall==0.2.0
certifi==2020.11.8
chardet==3.0.4
decorator==4.4.2
geoip2==4.1.0
idna==2.10
ipdb==0.13.4
ipython==7.19.0
ipython-genutils==0.2.0
jedi==0.17.2
aiohttp~=3.9.3
appnope~=0.1.4
async-timeout~=4.0.3
attrs~=23.2.0
backcall~=0.2.0
certifi~=2024.2.2
chardet~=3.0.4
decorator~=5.1.1
geoip2~=4.1.0
idna~=3.6
ipdb~=0.13.13
ipython~=8.22.2
ipython-genutils~=0.2.0
jedi~=0.19.1
maxminddb==2.0.3
multidict==5.0.0
parso==0.7.1
peewee==3.14.0
pexpect==4.8.0
pickleshare==0.7.5
prompt-toolkit==3.0.8
ptyprocess==0.6.0
Pygments==2.7.2
python-dateutil==2.8.1
PyYAML==5.4.0
requests==2.24.0
six==1.15.0
traitlets==5.0.5
typing-extensions==3.7.4.3
urllib3==1.25.11
wcwidth==0.2.5
yarl==1.6.2
multidict~=6.0.5
parso~=0.8.3
peewee~=3.17.1
pexpect~=4.9.0
pickleshare~=0.7.5
prompt-toolkit~=3.0.43
ptyprocess~=0.7.0
Pygments~=2.17.2
python-dateutil~=2.8.1
PyYAML~=6.0.1
requests~=2.31.0
six~=1.15.0
traitlets~=5.14.2
typing-extensions~=4.10.0
urllib3~=1.26.18
wcwidth~=0.2.13
yarl~=1.9.4

0 comments on commit dfd0e7b

Please sign in to comment.