How to preprocess the Time-MMD dataset? #2

Ironieser · 2024-10-09T05:59:25Z

Thank you for your great work.

Currently, I am confused about preprocessing the Time-MMD dataset.

In your provided data, data/Public_Health/US_FLURATIO_Week.csv, I do not know how to get six kinds of data, such as prior_history_avg', 'prior_history_std', 'Final_Search_2', 'Final_Search_4', 'Final_Search_6', 'Final_Output'.

By data/DataPre_ClosedSourceLLM/Prepare.ipynb, we could obtain Final_Output, however, how can we get the other five columns of data?

Any suggestion will help me a lot, thank you!

The text was updated successfully, but these errors were encountered:

ranlychan · 2024-11-26T12:11:03Z

im also wondering what does prior_history_avg means

ranlychan · 2024-11-28T09:54:37Z

Thank you for your great work.

Currently, I am confused about preprocessing the Time-MMD dataset.

In your provided data, data/Public_Health/US_FLURATIO_Week.csv, I do not know how to get six kinds of data, such as prior_history_avg', 'prior_history_std', 'Final_Search_2', 'Final_Search_4', 'Final_Search_6', 'Final_Output'.

By data/DataPre_ClosedSourceLLM/Prepare.ipynb, we could obtain Final_Output, however, how can we get the other five columns of data?

Any suggestion will help me a lot, thank you!

After reading the paper Time-MMD: Multi-Domain Multimodal Dataset for Time Series Analysis and analyze the original data from https://gis.cdc.gov/grasp/fluview/fluportaldashboard.html, I believe that the prior_history_avg in many of the datasets is obtained by conducting seasonal grouped average. To be specific, in data/Public_Health/US_FLURATIO_Week.csv, the author take a seasonal period (marked as $p$) of 51 weeks and group window size (marked as $n$) of 1, the prior_history_avg at time step $t$ is $x_{t-51}$ according to the following formula:

$$ \text{prior history avg}(t) = \frac{1}{n}\sum_{i=1}^n {x_{t-i * p}} $$

In which the $x_t$ is %UNWEIGHTED ILI data in US_FLURATIO_Week at time step $t$.

Accordingly, I write a code to get prior_history_avg:

import pandas as pd

def seasonal_group_average(df=pd.DataFrame(), seasonal_period=51, group_window_size=1, target='%UNWEIGHTED ILI'):
 
    """
    Compute the `prior_history_avg` based on seasonal grouped average.
    
    Args:
        df (pd.DataFrame): DataFrame containing the time series data.
        seasonal_period (int): Seasonal period, e.g., 51 weeks.
        group_window_size (int): Size of the group window for averaging.
        target (str): Column name of the target time series.
    
    Returns:
        pd.DataFrame: DataFrame with a new column `prior_history_avg`.
    """
    if target in df.columns:
        df['prior_history_avg'] = [
            (
                sum(
                    df[target].iloc[max(0, t - i * seasonal_period)] 
                    for i in range(1, group_window_size + 1)
                ) / group_window_size
                if t >= seasonal_period else 0.0
            )
            for t in range(len(df))
        ]
    return df

ili_data_df = pd.read_csv('ILINet.csv', header=1)
seasonal_grouped_df = seasonal_group_average(df=ili_data_df)
seasonal_grouped_df.to_csv('test.csv')
seasonal_grouped_df

But in the author's data US_FLURATIO_Week.csv, the result produced with my code did't match with the author's prior_history_avg in some places due to data changing or shifting. The reason for these manual adjustments remain unclear to me.

ranlychan · 2024-11-29T05:09:19Z

In US_VMT_Month.csv, $p=12, n=2$, the prior_history_avg at time step $t$ is $\frac{1}{n}(x_{t-12}+x_{t-24})$. Processing data with my code has no difference with the author's.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to preprocess the Time-MMD dataset? #2

How to preprocess the Time-MMD dataset? #2

Ironieser commented Oct 9, 2024

ranlychan commented Nov 26, 2024

ranlychan commented Nov 28, 2024 •

edited

Loading

ranlychan commented Nov 29, 2024 •

edited

Loading

How to preprocess the Time-MMD dataset? #2

How to preprocess the Time-MMD dataset? #2

Comments

Ironieser commented Oct 9, 2024

ranlychan commented Nov 26, 2024

ranlychan commented Nov 28, 2024 • edited Loading

ranlychan commented Nov 29, 2024 • edited Loading

ranlychan commented Nov 28, 2024 •

edited

Loading

ranlychan commented Nov 29, 2024 •

edited

Loading