MM-TSFlib is an open-source library for multimodal time-series forecasting based on Time-MMD dataset. We achieve multimodal time series forecasting tasks, including both short-term and long-term, by integrating time series models and language models. Our framework is illustrated in the figure.
🚩News (2024.08) We have largely extanded text modeling approaches, now supporting open-source models (LLAMA2, LLAMA3, GPT2, BERT, GPT2M, GPT2L, GPT2XL), any closed-source models, as well as small models trained from scratch (e.g., Doc2Vec). You can specify the model using the --llm_model
option.
🚩News (2024.08) We now support enhancing inter-modal interactions using an attention mechanism by setting --pool_type
to attention.
🚩News (2024.08) We have uploaded additional preprocessed datasets, which significantly accelerate the training process.
🚩News (2024.08) We have significantly cleaned up the code and written detailed documentation for usage.
🚩News (2024.06) Preprocessing functions and preprocessed data to speed up the training process will be released soon
- Install environment, execute the following command.
pip install -r environment.txt
-
Prepare Data. Our dataset is Time-MMD dataset. We provide preprocessed data in the ./data folder to accelerate training, particularly simplifying the text matching process.
-
Prepare for ClosedSource LLM. Our framework is already capable of integrating closed-source LLMs. To save costs, you should first use closed-source LLMs, such as GPT-3.5, to generate text-based predictions. We have provided specific preprocessing methods in the [document/file]. We have also provided preprocessed data that can be directly used in
./data/
You can use any other closedsource llm to replace it. -
Prepare for open-source LLMs. Our framework currently supports models such as LLAMA2, LLAMA3, GPT2, BERT, GPT2M, GPT2L, and GPT2XL, all available on Hugging Face. Please ensure you have your own Hugging Face token ready.
-
Train and evaluate model. We provide the example experiment script under the folder
./scripts/
. You can reproduce the experiment results as the following examples:
#Conduct experiments on the health dataset using GPU 0, and utilize the 0th to 1st models.
bash ./scripts/week_health.sh.sh 0 1 0
- You can set a list of model names, prediction lengths, and random seeds in the script for batch experiments. We recommend specifying
--save_name
to better organize and save the results. --llm_model
can set as LLAMA2, LLAMA3, GPT2, BERT, GPT2M, GPT2L, GPT2XL, Doc2Vec, ClosedLLM. When using ClosedLLM, you need to do Step 3 at first.--pool_type
can set as avg min max attention for different pooling ways of token. When--pool_type
is set to attention, we use the output of the time series model to calculate attention scores for each token in the LLM output and perform weighted aggregation.
If you find this repo useful, please cite our paper.
@misc{liu2024timemmd,
title={Time-MMD: A New Multi-Domain Multimodal Dataset for Time Series Analysis},
author={Haoxin Liu and Shangqing Xu and Zhiyuan Zhao and Lingkai Kong and Harshavardhan Kamarthi and Aditya B. Sasanur and Megha Sharma and Jiaming Cui and Qingsong Wen and Chao Zhang and B. Aditya Prakash},
year={2024},
eprint={2406.08627},
archivePrefix={arXiv},
primaryClass={id='cs.LG' full_name='Machine Learning' is_active=True alt_name=None in_archive='cs' is_general=False description='Papers on all aspects of machine learning research (supervised, unsupervised, reinforcement learning, bandit problems, and so on) including also robustness, explanation, fairness, and methodology. cs.LG is also an appropriate primary category for applications of machine learning methods.'}
}
If you have any questions or suggestions, feel free to contact: [email protected]
This library is constructed based on the following repos: