Skip to content

caisikai/wrds

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

QLIB_WRDS

1. Schedule

  1. TODO List
  • write scripts/dump_single/check_dump_single.py, which is used to check the result of dump_single.py
  • write data/wrds_handler.py, which can handler one dataset
  • dump document
  • dump check document
  • place .bin file on /storage/qlib/qlib_data/wrds/
  • write main.ipynb, show results of each dataset's result of dataset.prepare,if it is too long to load data, select the longest observed period instreumetns as a market,(save test100.txt in instruments subdir of each dataset)
  1. Dataset
dataset dump check main.ipynb
CRSP monthly stocks
Compustat fundamentals
Compustat price data
Compustat Global fundamentals
Compustat Global price data ⬜️ ⬜️ ⬜️
Compustat Global FX rates

2. Quick Start in 204 Server

2.1. Shared Enviroment

Add the following code in your ~/.bashrc then restart your shell

# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/opt/miniconda3/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
    eval "$__conda_setup"
else
    if [ -f "/opt/miniconda3/etc/profile.d/conda.sh" ]; then
        . "/opt/miniconda3/etc/profile.d/conda.sh"
    else
        export PATH="/opt/miniconda3/bin:$PATH"
    fi
fi
unset __conda_setup
# <<< conda initialize <<<
# >>> acitvate wrds enviroment>>>
conda activate wrds
# <<< conda initialize <<<

2.2. WRDS_QLIB

  1. clone the repository

    git clone https://github.com/caisikai/wrds
  2. Dump the .parquet file into .bin file

    #go into repository
    cd wrds
    python scripts/dump_single/dump_single.py dump_all --csv_path /storage/wrds/crsp/sasdata/a_stock/msf.parquet --qlib_dir /storage/qlib/qlib_data/crsp/a_stock/msf_test --symbol_field_name permno --date_field_name date

    See more samples in here

  3. Check Dump

     #go into repository
     cd wrds
     python check_dump_single.py check_single --qlib_dir /storage/qlib/qlib_data/wrds/comp/d_global/currency/g_exrt_mth/ --check_symbol_num -1 --parquet_path /storage/wrds/comp/sasdata/d_global/currency/g_exrt_mth.parquet

    see more samples in here

  4. dataset demo

    data/wrsd_handler.py has a demo handler class that can provide customized dataset

    main.ipynb provides multiple dataset demo that can prepare some pd.dataframe which can be used for downstream task

3. Next step

  • Current version code can not support for Compustat Global price data, where there are about 0.2 billion row, which consume too many memory.
  • Fuse the price data and fundamental data

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published