Skip to content

Commit

Permalink
fixing report, README updating WIP
Browse files Browse the repository at this point in the history
  • Loading branch information
rizac committed Jun 25, 2023
1 parent dfc52b3 commit c42c98d
Show file tree
Hide file tree
Showing 7 changed files with 153 additions and 314 deletions.
98 changes: 49 additions & 49 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,19 +4,16 @@
and migration from a private repository, please DO NOT CLONE or USE. In case of info, contact
me or open an issue**

Program to compute energy Magnitude (Me) from downloaded seismic waveforms:

- It downloads data (waveform segments) and metadata from a FDSN event
web service using [stream2segment](https://github.com/rizac/stream2segment) (available
with this package)
- It computes the energy Magnitude (Me) for each downloaded segment, producing a tabular
data (one row per segment) stored in HDF format
(exploiting [stream2segment](https://github.com/rizac/stream2segment) processing tools)
- It produces even-based HTML reports from each HDF table and relative QuakeML: the
report should visualize easily the content of the HDF in the user browser, the QuakeML(s)
(one per report event) are the event QuakeML downloaded from the event web service, with
the inclusion of the computed Energy Magnitude
Program to compute energy Magnitude (Me) from downloaded seismic events. The download
must be performed via [stream2segment](https://github.com/rizac/stream2segment)
(shipped with this package) into a custom SQLite or Postgres database (in this case,
the database has to be setup beforehand).

Once downloaded, events and their data are fetched to compute each event Me (Me = mean
of all stations energy magnitudes in the 5-95 percentiles). The computed Me are available
in several formats: **CSV** (parametric table summarizing all events in rows),
**HTML** (report to visualize all events and their Me on a map) and
**QuakeMl** (one file per event, updated with their computed Me).


## Installation:
Expand Down Expand Up @@ -67,63 +64,65 @@ where the waveforms and metadata downloaded from the event (parameter `events_ur
and dataselect (`data_url`) FDSN web services, and all other parameters, if needed.
### Download:
### Events and data Download:
The download routine downloads data and metadata from the configured FDSN
event and dataselect web services into the database. The command is simply an
alias to [stream2segment](https://github.com/rizac/stream2segment) `download`
command with the configured `download.yaml`. Within the me-computed repository:
event and dataselect web services into the database (Sqlite or Postgres using
[stream2segment](https://github.com/rizac/stream2segment) (with Postgres,
the db has to be setup beforehand) . Open `download.yaml`
(or a copy of it) and cconfigure `dburl` (ideally, you might want to setup also
`start`, `end`, `events_url` and `data_url`):
```commandline
me-compute download
s2s download -d download.yaml
```
(the `-c` option allows to specify a different config file. Type
`me-compute download --help` for details)
### Process
The process routine compute the station magnitude for a temporal selection of
waveforms saved on the database, producing a HDF file where each row is
a waveform, and columns are the waveform properties among which
"station_energy_magnitude":
### Me computation
To compute the energy magnitude of events within a certain time range from the
data downloaded in the database
```bash
me-compute process -s [START] -e [END] -d [download.yaml] [ROOT_DIR]
me-compute -s [START] -e [END] -d [download.yaml] [OUTPUT_DIR]
```
(type `me-compute process --help` for details)
> Note: Because by default we download only one channel per station, a
waveform always correspond to a station. See 'channel' in download.yaml
An excerpt of the program usage is available below (type `me-compute --help` for more
details):
The produced outoput is a **directory** inside [ROOT_DIR], containing several
files (log file for inspecting the processing) and the HDF file mentioned above:
OUTPUT_DIR: the destination root directory. You can use the special characters %S%
and %E% that will be replaced with the start and end time in ISO format, computed
from the given parameters. The output directory and its parents will be created if
they do not exist
- me-compute_[START]_[END]:
- me-compute_[START]_[END].hdf
- me-compute_[START]_[END].log
In the output directory, the following files will be saved:
### Report
- station-energy-magnitude.hdf A tabular files where each row represents a
station/waveform and each column the station computed data and metadata,
including the station energy magnitude.
Note that the program assumes that a single channel (the vertical) is
downloaded per station, so that 1 waveform <=> 1 station
This final command sums up the routine chain computing the final energy
magnitude at event level: it takes as input one or more HDF file produced with
the `process` command, and for each HDF file computes the energy magnitude for
each event:
- energy-magnitude.csv A tabular file (one row per event) aggregating the result
of the previous file into the final event energy magnitude. The final event Me
is the mean of all station energy magnitudes within the 5-95 percentiles
```bash
me-compute report [HDF_FILE_PAH] ...
```
- energy-magnitude.html A report that can be opened in the user browser to
visualize the computed energy magnitudes on maps and HTML tables
The command saves, alongside the HDF file, at least three files:
- [eventid1].xml, ..., [eventid1].xml All processed events saved in QuakeMl
format, updated with the information on their energy magnitude
- me-compute_[START]_[END].csv: a CSV file where each row is an event, and
columns are the event properties among which "Me" is the energy magnitude
- me-compute_[START]_[END].html: an interactive HTML file where the CSV data
can be more easily visualized
- [event_id].xml: The **event QuakeML file with the energy magnitude field
appended**. The number of xml files depends on the distinct events present in the
input proicessing file (HDF)
- energy-magnitude.log the log file where the info, errors and warnings
of the routine are stored. The core energy magnitude computation at station
level (performed via stream2segment utilities) has a separated and more
detailed log file (see below)
- station-energy-magnitude.log the log file where the info, errors and warnings
of the station energy magnitude computation have been stored
<!--
### Cron job (schedule downloads+process+report regularly)
Assuming your Python virtualenv is at `[VEN_PATH]`
Expand Down Expand Up @@ -164,6 +163,7 @@ a currently working example on a remote server
0 4 * * * [VENV_PATH]/bin/python [VENV_PATH]/bin/me-compute process -d [DOWNLOAD_YAML] [START] [END]
30 7 * * * [VENV_PATH]/bin/python [VENV_PATH]/bin/me-compute report /home/me/mecompute/mecomputed/
```
-->
<!--
## Misc
Expand Down
115 changes: 72 additions & 43 deletions mecompute/run.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,8 @@
import csv
from datetime import datetime, date, timedelta
from http.client import HTTPException
from os.path import join, dirname, isdir, basename, splitext, isfile, abspath
from os.path import join, dirname, isdir, basename, splitext, isfile, abspath, isabs, \
relpath
from urllib.error import URLError, HTTPError

import click
Expand Down Expand Up @@ -38,27 +39,25 @@
@click.command(context_settings=dict(max_content_width=89),)
@click.option('d_config', '-d', type=click.Path(exists=True),
help=f"The path of the download configuration file used to download "
f"the data. Used get the URL of the database where events and "
f"waveforms will be fetched (all other properties will be ignored). "
f"If the output directory already exists and force_overwrite is "
f"False, this parameter will be ignored")
f"the data. Used to get the URL of the database where events and "
f"waveforms will be fetched (all other properties will be ignored)")
@click.option('start', '-s', type=click.DateTime(), default=None,
help="if the database data has to be used, set the start time of the "
help="the start time of the "
"db events to fetch (UTC ISO-formatted string). "
"If missing, it is set as `end` minus `duration` days")
@click.option('end', '-e', type=click.DateTime(), default=None,
help="if the database data has to be used, set end start time of the "
help="the end time of the "
"db events to fetch (UTC ISO-formatted string). "
"If missing, it is set as `start` plus `duration`. If `start` is "
"also missing, it defaults as today at midnight")
@click.option('time_window', '-t', type=int, default=None,
help="if the database data has to be used, set the start time of the "
"db events to fetch, set the time window, in days of teh events to "
help="the time window, in days of teh events to "
"fetch. If missing, it defaults to 1. If both time bounds (start, "
"end) are provided, it is ignored")
@click.option('-f', '--force-overwrite', is_flag=True,
help='Force overwrite all files if it already exist. Default is false '
'(use existing files - if found - and do not overwrite them)')
@click.option('force_overwrite', '-f', is_flag=True,
help='Force overwrite existing files. Default is false which will try to '
'preserve existing files (outdated files, if found, will be '
'overwritten anyway')
@click.option('p_config', '-pc', type=click.Path(exists=True),
default=None,
help=f"The path of the configuration file used for processing the data. "
Expand All @@ -76,15 +75,61 @@
@click.argument('output_dir', required=True)
def cli(d_config, start, end, time_window, force_overwrite, p_config, h_template,
output_dir):
"""
Computes the energy magnitude (Me) from a selection of events and waveforms
previously downloaded with stream2segment and saved on a SQLite or Postgres database.
OUTPUT_DIR: the destination root directory. You can use the special characters %S%
and %E% that will be replaced with the start and end time in ISO format, computed
from the given parameters. The output directory and its parents will be created if
they do not exist
In the output directory, the following files will be saved:
- station-energy-magnitude.hdf A tabular files where each row represents a
station/waveform and each column the station computed data and metadata,
including the station energy magnitude.
Note that the program assumes that a single channel (the vertical) is
downloaded per station, so that 1 waveform <=> 1 station
- energy-magnitude.csv A tabular file (one row per event) aggregating the result
of the previous file into the final event energy magnitude. The final event Me
is the mean of all station energy magnitudes within the 5-95 percentiles
- energy-magnitude.html A report that can be opened in the user browser to
visualize the computed energy magnitudes on maps and HTML tables
- [eventid1].xml, ..., [eventid1].xml All processed events saved in QuakeMl
format, updated with the information on their energy magnitude
- energy-magnitude.log the log file where the info, errors and warnings
of the routine are stored. The core energy magnitude computation at station
level (performed via stream2segment utilities) has a separated and more
detailed log file (see below)
- station-energy-magnitude.log the log file where the info, errors and warnings
of the station energy magnitude computation have been stored
Examples. In order to process all segments of the events occurred ...
... yesterday:
me-compute OUT_DIR
... in the last 2 days:
me-compute -t 2 OUT_DIR
... on January the 2nd and January the 3rd, 2016:
process -s 2016-01-02 -t 2 OUT_DIR
"""
# create output directory within destdir and assign new name:
start, end = _get_timebounds(start, end, time_window)
dest_dir = output_dir.replace("%S%", start).replace("%E%", end)
file_handler = logging.FileHandler(filename='energy-magnitude.log')
file_handler = logging.FileHandler(mode='w+',
filename=join(dest_dir,
'energy-magnitude.log'))
file_handler.setLevel(logging.INFO)
logger.addHandler(file_handler)

# if output_dir is None and not all(_ is None for _ )
try:
process(d_config, start, end, time_window, dest_dir,
process(d_config, start, end, dest_dir,
force_overwrite=force_overwrite, p_config=p_config,
html_template=h_template)
except MeRoutineError as merr:
Expand All @@ -102,32 +147,11 @@ class MeRoutineError(Exception):
pass


def process(dconfig, start, end, duration, dest_dir,
def process(dconfig, start, end, dest_dir,
force_overwrite=False,
p_config=None, html_template=None):
"""
process downloaded events computing their energy magnitude (Me).
ROOT_OUTPUT_DIR: the destination root directory. NOTE: The output of this command
is a **directory** that will be created inside ROOT_OUTPUT_DIR: the directory
will contain several files, including a .HDF file with all waveforms processed (one
row per waveform) and several columns
Examples. In order to process all segments of the events occurred ...
... yesterday:
process ROOT_OUT_DIR
"""process downloaded events computing their energy magnitude (Me)"""

... in the last 2 days:
process ROOT_OUT_DIR -d 2
... on January the 2nd and January the 3rd, 2016:
process -s 2016-01-02 -d 2 ROOT_OUT_DIR
"""
start, end = _get_timebounds(start, end, duration)

# # in case we want to query the db (e.g., min event, legacy code not used anymore):
# from stream2segment.process import get_session
Expand All @@ -148,6 +172,11 @@ def process(dconfig, start, end, duration, dest_dir,
try:
with open(dconfig) as _:
dburl = yaml.safe_load(_)['dburl']
sqlite = "sqlite:///"
if dburl.lower().startswith(sqlite):
dburl_ = dburl[len(sqlite):]
if not isabs(dburl_):
dburl = "sqlite:///" + abspath(join(dirname(dconfig), dburl_))
except (FileNotFoundError, yaml.YAMLError, KeyError) as exc:
raise MeRoutineError(f'Unable to read "dburl" from {dconfig}. '
f'Check that file exists and is a well-formed '
Expand Down Expand Up @@ -201,7 +230,8 @@ def process(dconfig, start, end, duration, dest_dir,
ev_catalog_id = ev_catalog_url.split('eventid=')[-1]
# write QuakeML:
try:
_write_quekeml(dest_dir, ev_catalog_url, ev_catalog_id,
quakeml_file = join(dest_dir, ev_catalog_id + '.xml')
_write_quekeml(quakeml_file, ev_catalog_url,
evt['Me'], evt['Me_stddev'], evt['Me_waveforms_used'],
author_uri, force_overwrite)
except (OSError, HTTPError, HTTPException, URLError) as exc:
Expand Down Expand Up @@ -266,7 +296,7 @@ def _compute_station_me(outfile, dburl, segments_selection, p_config=None):
'location': {},
'channel': {},
'event_magnitude_type': {},
'event_catalog_url': {}
'event_url': {}
}

if p_config is None:
Expand Down Expand Up @@ -298,14 +328,13 @@ def _compute_station_me(outfile, dburl, segments_selection, p_config=None):
if dtype == 'str':
min_itemsize[col] = max(len(v) for v in mapping.values())

dataframe.to_hdf(outfile, format='table', key='me_computed_waveforms_table',
dataframe.to_hdf(outfile, format='table', key='station_energy_magnitudes',
min_itemsize=min_itemsize or None)
return outfile


def _write_quekeml(dest_dir, event_url, event_id, me, me_u=None, me_stations=None,
def _write_quekeml(dest_file, event_url, me, me_u=None, me_stations=None,
author="", force_overwrite=False):
dest_file = join(dest_dir, event_id + '.xml')
if isfile(dest_file) and not force_overwrite:
return dest_file

Expand Down
2 changes: 1 addition & 1 deletion mecompute/stats.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ def get_report_rows(hdf_path_or_df):
# see process.py:main for a list of columns:
dfr = hdf_path_or_df
if not isinstance(hdf_path_or_df, pd.DataFrame):
dfr: pd.DataFrame = pd.read_hdf(hdf_path) # noqa
dfr: pd.DataFrame = pd.read_hdf(hdf_path_or_df) # noqa

for ev_db_id, df_ in dfr.groupby('event_db_id'):

Expand Down

This file was deleted.

Binary file not shown.
2 changes: 1 addition & 1 deletion test/data/download.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
# suggest sqlite for small to medium data or enough system RAM (as a rule of thumb:
# less than a million segments, and/or more than 8GB of RAM) and postgres otherwise.
# For info see: http://docs.sqlalchemy.org/en/latest/core/engines.html#database-urls
dburl: sqlite:////./db.sqlite
dburl: sqlite:///./db.sqlite

# Limit to events / data centers / station / channels on or after the specified start
# time. Specify an ISO-formatted date or date-time string, or an integer >=0 to denote
Expand Down
Loading

0 comments on commit c42c98d

Please sign in to comment.