The Alpine-Himalayan belt provides a unique opportunity to investigate plate dynamics and large-scale crustal deformation and earthquake hazard given the complex tectonic interactions involving multiple plates and tectonic regimes. The expansion of regional GNSS networks and the availability of published velocities have facilitated a broader understanding of active tectonics in this region. However, despite these advancements, few attempts have been made to integrate the available GNSS velocities, and consensus on the optimal methods for filtering and harmonizing geodetic datasets remains elusive. The primary objective of this project is to filter, combine and rotate GNSS velocity fields from column-formatted text files. These data are obtained from different geodetic studies of lithospheric deformation, with each input file comprising 13 distinct columns, including 'Lon', 'Lat', 'E.vel', 'N.vel', 'E.adj', 'N.adj', 'E.sig', 'N.sig', 'Corr', 'U.vel', 'U.adj', 'U.sig', 'Stat'. The goal is to combine all the published GNSS velocity fields along the Alpine-Himalayan belt, rotate them into different reference frames, filter outliers, and manage repeated stations that might differ in names or have coordinates varying by
The methodology implemented in the code combines some of the approaches by previous research on aggregated GNSS velocity fields, including Piña-Valdez., et al., (2022) and Zeng et al., (2022).
Note: This code currently relies on the installation of GAMIT/GLOBK, as it utilizes the Fortran codes VELROT and CVFRAME included with GAMIT/GLOBK to align and rotate velocity fields. In upcoming releases, I plan to provide Python scripts that will eliminate the need for GAMIT/GLOBK. Until then, please ensure that GAMIT/GLOBK is installed before running this code.
This software comprises a main Jupyter notebook named FICORO_GNSS.ipynb
and an input folder titled raw_input
. Within the raw_input
folder, you'll find the input velocity fields stored as column-formatted text files with the .raw
extension. To facilitate data processing, there's a scripts
folder containing additional Python scripts designed for filtering, rotating and combining GNSS velocity fields. If you need to manually remove outliers from the data, you can utilize the manual_filter
folder, which houses a CSV file that enables you to define specific geographic coordinates (latitude and longitude) and corresponding radii (in kilometers) for the removal of outliers from the combined velocity fields in different reference frames. To provide clarity, the folder structure is organized as follows:
📦FICORO_GNSS ┣ 📜FICORO_GNSS.ipynb ┣ 📂raw_input ┃ ┣ 📜alchalbi_2013.raw ┃ ┣ 📜bahrouni_2020.raw ┃ ┣ 📜billi_2023.raw ┃ ┣ 📜bougrine_2019.raw ┃ ┣ 📜briole_2021.raw ┃ ┣ 📜castro_2021.raw ┃ ┣ 📜england_2016.raw ┃ ┣ 📜ergintav_2023.raw ┃ ┣ 📜euref_all.raw ┃ ┣ 📜euref_ch8.raw ┃ ┣ 📜floyd_2023.raw ┃ ┣ 📜gomez_2020.raw ┃ ┣ 📜graham_2021.raw ┃ ┣ 📜hamiel_2021.raw ┃ ┣ 📜khorrami_2019.raw ┃ ┣ 📜kurt_2023.raw ┃ ┣ 📜liang_2013.raw ┃ ┣ 📜mcclusky_2010.raw ┃ ┣ 📜nocquet_2012.raw ┃ ┣ 📜ozdemir_2019.raw ┃ ┣ 📜ozkan_2022.raw ┃ ┣ 📜pinaValdes_2022.raw ┃ ┣ 📜reilinger_2006.raw ┃ ┣ 📜saleh_2015.raw ┃ ┣ 📜serpelloni_2022.raw ┃ ┣ 📜sokhadze_2018.raw ┃ ┣ 📜stamps.raw ┃ ┣ 📜viltres_2020.raw ┃ ┣ 📜viltres_2022.raw ┃ ┣ 📜wang_barbot_2023.raw ┃ ┣ 📜wang_shen_2020.raw ┃ ┗ 📜wedmore_2021.raw ┣ 📂scripts ┃ ┣ 📜coherence_filter.py ┃ ┣ 📜combine_vel.py ┃ ┣ 📜lognorm_filter.py ┃ ┣ 📜plot_maps_filtering.py ┃ ┗ 📜plot_rotated_vels.py ┣ 📂manual_filter ┃ ┗ 📜filter_criteria.csv ┣ 📂Readme_figures ┃ ┣ 📜filtering.jpg ┃ ┣ 📜gps_map.jpg ┃ ┗ 📜num_estimates.jpg ┗ 📜README.md
Key Steps:
-
Data importing: Load raw data from an input directory.
-
Filtering by uncertainty distribution: Analyse the lognormal distribution of velocity uncertainties, excluding stations that fall outside the 99 percentile in the East and North velocity uncertainty components.
-
Z-score Filtering: Remove stations if velocity magnitudes in the East and North velocity components diverge over 2 sigma from the mean, considering a radius of 20 km. Additionally, the code allows applying geographic-based stringency levels (n-sigma), allowing for a customizable approach to data filtering.
-
Velocity field alignment to a common reference frame: Implement a least squares approach to align all the data sets to a reference velocity field using a 6-parameter Helmert transformation (3 translations and 3 rotations), leveraging on repeated stations in both the input and reference data sets.
-
Velocity field rotation: Rotate velocity fields to different reference frames using published Euler vectors.
-
Velocity field combination: Combine individual velocity fields into a single comprehensive velocity field.
-
Station Proximity Analysis: Cluster stations within a
$0.01^\circ$ (1.11 km) range of one another (they are probably the same station reported by different studies. Although, in some cases, collocated stations might be included too). For such proximate stations:-
Implements the Interquartile Range (IQR) method to detect outliers, omitting solutions displaying disparities in magnitude, azimuthal direction, or both. The thresholds for outlier detection are set as:
- Lower threshold =
$Q1 -1.5 * IQR$ - Upper threshold =
$Q3 + 1.5 * IQR$
- Lower threshold =
-
Calculate median velocities for the East, North, and vertical components. Handle vertical velocities when some of the input solutions do not include verticals.
-
Derive uncertainties for each velocity component, taking the median value of stations within a
$0.01^\circ$ (1.11 km) radius. -
Save the combined velocity field data and plot velocity maps.
-
-
-
Manual filtering: Remove outliers based on geographical coordinates and radii, generating cleaned data files and logs of removed stations. The script uses parallel processing to handle multiple input velocity fields in different reference frames efficiently.
Here is a detailed breakdown of what the master Jupyter Notebook and ancillary scripts do:
This script operates as the orchestrator, calling various specialised scripts and organising the flow of data through the filtering, combining, and rotating processes, ultimately aimed at preparing GNSS velocity fields for further geospatial analysis.
This master Jupyter Notebook follows a streamlined process composed of 8 steps:
- Step 1: Define input & output parameters
- Step 2: Load input velocity fields
- Step 3: Identify and remove outliers
- Step 4: Plot filtered GNSS velocities and outliers
- Step 5: Align velocity fields to the International Terrestrial Reference Frame (ITRF14)
- Step 6: Rotate velocity fields using Euler poles
- Step 7: Combine velocity fields using a least squares approach
- Step 8: Manual filtering of outliers
- Step 9: Plot the combined velocity field in different reference frames
-
Library Imports and Path/Parameter Definitions:
- Import required libraries for handling file systems (
os
), running subprocesses (subprocess
), working with dates and times (datetime
), moving files (shutil
), data processing (pandas
), scientific computing (numpy
) and plotting geophysical data (pygmt
). - Define paths for input files, output destinations, scripts, and results. These paths are essential for organizing files and directories for the processing that follows. Also, specific parameters related to file types are set.
- Import required libraries for handling file systems (
-
Input Parameters:
- Define directory and file structures, including paths for raw inputs, output formatting, scripts, and various stages of processing results.
-
Script Path Definitions:
- Paths to ancillary Python scripts are declared. These scripts perform log-normal and coherence filtering, velocity combination, and plotting GNSS velocity maps.
This step handles the initial processing of the input files, ensuring they are available, accessible, and formatted correctly for the subsequent stages of the pipeline. This process is essential for preparing the raw velocity data for further filtering, combination, and analysis.
-
Initial status print:
- The code prints to the console, indicating it is in the process of checking the input files.
-
Input path validation:
- The code first checks whether the input directory exists. If the directory doesn't exist, the script prints an error message and terminates with an exit code of 1, indicating an abnormal termination.
- It then checks if the input directory is empty (i.e., no input files). Again, if this check fails, it's considered a fatal error, leading to the script's termination.
-
Output path handling:
- The script checks for the existence of the output directory (containing formatted input velocity fields).
- If the directory exists, it cleans it by removing all files within. This step ensures that previous run outputs don't mix with the current run.
- If the directory doesn't exist, it's created. This setup ensures the script is self-sufficient in managing its required resources.
-
File loading:
- The script lists all files in the input directory and begins processing each one that matches the expected file extension (defined in step 1 as
.raw
). - For each valid input file, the code:
- Extracts the base name of the file (name without extension).
- Prints an ongoing status message indicating the translation is being performed. This serves as a log point, helping track each file's processing.
- Reads the content of the file, splitting it into individual lines and further tokenizing each line by spaces, thus forming a list of lists (each representing a row of space-separated values from the file).
- Prepares a header line indicating the 13 columns of the expected format.
- Joins the tokenized lines back into a single string per line and prefixes all the lines with the header, creating a fully formatted text block.
- Writes the formatted content into a new file in the output directory (default: raw_input_column_formatted) with an updated extension (default:
.vel
). This action transforms the file into the '.vel' format expected for subsequent operations.
- The script lists all files in the input directory and begins processing each one that matches the expected file extension (defined in step 1 as
In summary, the second step is dedicated to validating and preparing the input data, ensuring it's in the right format and the right path for further stages of the pipeline. It avoids involuntary data mixing and provides console logs for monitoring the process.
This step deals with the cleaning and filtering of the input GNSS velocity fields. The operations use ancillary Python scripts, specifically designed for these tasks. Here are the detailed steps:
-
Initial status print:
- The script provides a console output indicating the initiation of the GNSS velocity fields' cleaning process.
-
Lognorm Filtering:
- The code checks whether the 'lognorm' filter script exists at the specified path. If it does not, an error message is printed, and the program terminates. This check prevents the program from crashing and provides a clear indication of the failure.
- If the 'lognorm' filter script exists, the master script prepares to execute it. It defines paths where the input data is stored and where to store the outputs after the 'lognorm' script processes the data. This process includes:
- Input data folder (
input_lognorm_path
). - Output folder for filtered data (
output_lognorm_path
). - A directory for data excluded during filtering (
excluded_lognorm_path
). - A folder for figures (
figure_folder_path
).
- Input data folder (
- The code then calls
subprocess.run()
, executing the 'lognorm' filter script with the necessary input and output arguments.
-
Coherence Filtering:
- Similar to the 'lognorm' filter script, the master script checks the existence of the 'coherence' filter script. If the script is missing, the program prints an error message and exits, ensuring that the failure point is known.
- If the script is available, it executes the 'coherence' script using
subprocess.run()
, but this time, it passes the output path from the 'lognorm' filtering as an input. The 'coherence' filtering is a subsequent step after 'lognorm' filtering, and it processes the already cleaned data further.
Step 4 in the workflow pertains to the visual representation of the GNSS velocity fields. This step is crucial for several reasons, as it transitions from data cleaning and manipulation to visual interpretation and analysis. Here's what happens during this step:
-
Initial status print:
- The code begins with printing lines to the console, indicating that the plotting of velocity fields has been initiated.
-
Plotting script execution check:
- The code checks whether the required plotting script (
plot_maps_filter_path
) exists in the specified path. This is a preventative measure ensuring that the workflow doesn't proceed with a missing file, which would lead to errors. - If the script is found, it is executed using
subprocess.run()
, which is a Python standard library method for invoking subprocesses. In this context, it's used to run a Python script that handles the map plotting using the pygmt module. - If the script is not found, an error message prints, and the program terminates by calling
exit(1)
, indicating that an error has occurred. - For each input velocity field, the plotting script takes the cleaned data and outliers from the previous velocity filtering steps and creates a map showing vectors indicating the velocity at each GNSS station. Different colours are used to denote whether a station passed the filtering step or was classified as an outlier.
- The code checks whether the required plotting script (
This step allows users to visualise both, cleaned velocities and outliers with different colours on the same map for each of the input velocity fields. Below I show the results for one of the velocity fields by Billi., et al., (2023) covering northern Africa. The left panel shows velocities and outliers based on the two criteria discussed in Step 3, the centre and right panels show the the lognormal fit to the velocity uncertainty distribution in East and North velocity components expressed in mm/yr. Stations with velocity uncertainties above the 99th lognormal percentile (black dashed line) in East or North velocity components are classified as outliers.
Step 5 is a comprehensive data preparation and processing stage ensuring that GNSS data is correctly formatted for use with the external Fortran script VELROT, which is distributed with the GAMIT/GLOBK software package. VELROT performs a least squares inversion leveraging common sites between a reference velocity field and an input velocity field, performing a 6-parameter Helmert transformation (3 rotations and 3 translations without scale). The master velocity field is expressed in a no-net-rotation reference frame (ITRF2014), and each input file is aligned with the master velocity field (Serpelloni et al., 2022).
- Column Formatting for velrot Compatibility
-
Folder Checks:
- Checks
coherence_results_path
for data; terminates script if empty. - Ensures
input_files_rot_path
exists, creates it if necessary.- For each
.vel
file ininput_files_rot_path
, the script dynamically creates a new folder inITRF14_vel_path
. - Each folder is named after the base name of the corresponding velocity file, ensuring a clear link between the input file and its processing directory.
- For each
- Checks
-
Data Formatting and file/folder management:
- Processes
.csv
files incoherence_results_path
, converting them to.vel
format. - Writes formatted data to
input_files_rot_path
for alignment. - The script copies three essential types of files into these newly created folders:
- The input velocity file (
*.vel
). - The reference velocity file (
reference_vel
). - The
VELROT
link file (lnk_file
).
- Processes
This setup guarantees that each folder contains all necessary files for velrot
alignment, isolated from other datasets.
- Aligning Velocity Fields to ITRF14 with Log Creation
-
Pre-Alignment Setup:
- Confirms
input_files_rot_path
is populated. - Manages
ITRF14_vel_path
for output data. - Creates
lnk_file
if absent. - Verifies existence of
reference_vel
file.
- Confirms
-
Conditional Processing to Avoid Redundant Rotation:
- Identifies if a
.vel
file is thereference_vel
(serpelloni et al., 2022 velocity field). - Instead of running
velrot
, the script renames the reference velocity file, appending _igb14.vel to its base name. - Proceeds with rotation for other files.
- Identifies if a
-
Rotation and Log File Creation:
- Executes the
velrot
command for all the other.vel
files, aligning them to Serpelloni's velocity field expressed in the IGB14 realisation of ITRF14. - Captures output statistics from the alignment process in log files (
*_align.log
).
- Executes the
-
Post-Rotation Processing:
- Cleans and extracts data from
velrot
output. - Produces aligned
.vel
files with_igb14.vel
suffix.
- Cleans and extracts data from
-
Organizing Aligned Data:
- Copies aligned velocity files into
igb_nocomb_path/igb_nocomb_subpath
.
- Copies aligned velocity files into
Step 5 involves several directories and files, each serving a specific purpose in the data processing workflow. I'll detail each path and its role within the script:
-
coherence_results_path
:- This directory contains the input CSV files that the script processes. These files correspond to the output from the coherence filtering step. The script checks whether the directory is empty before proceeding.
-
input_files_rot_path
:- The code uses this directory to store '.vel' files, converted from the original CSVs. If the directory doesn't exist, the script creates it; if it does, the script clears any existing files to prevent data confusion.
-
ITRF14_vel_path
:- After converting and processing the '.vel' files, the script uses this directory to handle the outputs that have been aligned with the ITRF14 reference frame. Similar to before, the script prepares this directory by creating or cleaning it as necessary.
-
lnk_file_path
andlnk_folder_path
:- These paths relate to the VELROT 'link' file required for the alignment process with the Fortran code VELROT, distributed with GAMIT/GLOBK. The script ensures the link file exists, creating it if it doesn't.
-
reference_vel_path
:- This is a critical file acting as a reference for velocity fields. The script checks explicitly that this file exists because it's necessary for the alignment of velocity fields to the ITRF14 standard. In the code, the reference velocity field to which all the other input velocity fields are aligned is that from Serpelloni et al., (2022), which is expressed in ITRF14 and represents one of the most reliable and extensive velocity fields in the data set. A large number of stations in the reference velocity field is crucial because the least squares alignment and estimation of Helmert parameters is done relying on common stations between the input velocity fields and the master velocity field.
-
results_path
,igb_nocomb_path
, andigb_nocomb_subpath
:- These directories are involved in the final stages of the process, where the script stores the final outputs. Specifically, 'igb14_no_comb' is a structured directory where the script organizes the processed data aligned to the 'igb14' reference frame. The code handles the creation of these directories if they don't exist to properly archive the final products.
This step focuses on rotating GNSS velocity fields into different tectonic reference frames using Euler poles.
-
Setting Up Destination Folders:
-
Destination folders for each reference frame (
arab
,eura
,nubi
,sina
,anat
) andigb14
are set up withinresults_path/igb_nocomb_path
. -
Folder Paths: The script sets up distinct destination folders for each tectonic plate (
arab
,eura
,nubi
,sina
,anat
) and for theigb14
reference frame.- Variables:
arab_dest_folder
: Destination folder for velocity fields expressed in Arabia-fixed reference frameeura_dest_folder
: Destination folder for velocity fields expressed in Eurasia-fixed reference framenubi_dest_folder
: Destination folder for velocity fields expressed in Nubia-fixed reference framesina_dest_folder
: Destination folder for velocity fields expressed in Southern Sinai-fixed reference frameanat_dest_folder
: Destination folder for velocity fields expressed in East Anatolia-fixed reference frameigb14_dest_folder
: Destination folder for for velocity fields expressed in IGB14/ITRF14 reference frame
-
Folder Creation: Each destination folder is created using
os.makedirs
, ensuring the script can proceed without interruption due to missing directories.
-
-
Defining Euler Poles:
- Euler pole parameters for Arabia, Eurasia, Nubia, Sinai, and Anatolia are defined based on recent studies.
-
Processing Velocity Files:
- Iterates through each
.vel
file inigb14_dest_folder
. - Removes
_igb14
suffix from the base name. - Applies Euler pole rotation for each tectonic plate using the
cvframe
script distributted with GAMIT/GLOBK. - Names rotated files as
"{base}_{ref_frame}.vel"
.
- Iterates through each
-
File Management and Rotation:
- Moves each rotated file to its respective destination folder based on the reference frame.
-
Completion Message:
- Prints a completion message post-rotation.
Step 7 involves the combination of GNSS velocity fields for each reference frame, using a parallel processing approach.
- Combining GNSS Velocities
- Function
combine_velocities(ref_frame)
:- A function designed to combine GPS velocities for a given reference frame.
- It utilizes the
combination_script_path
to process data in the specifiedref_frame_folder
.
- Process:
- For each reference frame, the function is called to combine GNSS velocity data in that specific frame.
- Parallel Execution
- ThreadPoolExecutor:
- The script uses
concurrent.futures.ThreadPoolExecutor
for parallel execution. - This approach significantly speeds up the process by handling multiple reference frames simultaneously.
- The script uses
- Total Workers:
- The number of workers is set to the number of reference frames plus one (for IGB14).
- Execution:
- The
executor.map
function is used to applycombine_velocities
to each reference frame. - Additionally, the IGB14/ITRF14 reference frame combination is handled separately.
- The
- Error Handling
- The script checks for the existence of the
combination_script_path
. - If the script is not found, an error message is displayed, and the process is terminated.
In this step, multiple velocity fields in different reference frames are filtered in parellel, applying user-defined filtering based on coordinates and radii. The user can edit the filtering criteria file ./manual_filter/filter_criteria.csv
, which is a space-separated CSV with columns center_lon (in degrees), center_lat (in degrees), radius (in kilometers). Stations located within the specified radius around the coordinate provided will be excluded from the final combined velocity field. Clean data and log files listing removed stations are saved in the folder ./results/combined_velocities/manual_filter/
.
In this step, the code executes the plotting script plot_rotated_script_path
, corresponds to scripts/plot_rotated_vels.py
.
This script plots the horizontal GNSS velocity fields in different reference frames given an input folder containing the velocity fields as CSV files.
- Function
plot_gps_velocity_fields(folder_path)
:
- Purpose: Display maps of GPS velocity fields from CSV files.
- Process:
- Find CSV Files: The function searches for all CSV files in the specified input
folder_path
. - Figure Creation: For each CSV file, a PyGMT figure is created.
- Sets region, projection, and frame for the map.
- Adds shaded topography, coastlines, and custom color palettes.
- Data Reading: Reads CSV file data into a pandas DataFrame, setting appropriate column names.
- Data Validation: Skips plotting if the DataFrame is empty.
- Velocity Vector Creation:
- Extracts coordinates, velocity components, and uncertainties (which are not used nor plotted in the current version of the code).
- Calculates the velocity magnitude and normalizes it.
- Creates a list of vectors for plotting.
- Plotting:
- Plots GPS velocity vectors with specified style and color.
- Adds a scale bar to the map.
- Displaying Figures: Shows the plotted figure and prints a status message for each reference frame.
- Find CSV Files: The function searches for all CSV files in the specified input
- Check expected number of rows and columns in output files
- Compare combined velocity field with published velocities from the International Terrestrial Reference Frame 2014 (check magnitude of velocity residuals at common GNSS sites). Plot a map showing residual velocities.