TTB: Timestamped Tabular Benchmarks

Author: shreyashankar

This repository contains helper functions to read timestamped tabular data. This is a work in progress.

Data Criteria

To be included as a benchmark, the data source must:

Have a corresponding tractable ML task
Be time series & tabular in nature

Current Data Sources

NYC Taxicab Yellow Data, ingested into a timestamp-indexed table hosted on AWS RDS
Intel Lab Data, ingested into a timestamp-indexed table hosted on AWS RDS

How To Use

Install

This package is currently not hosted on PyPI. To install, git clone this repo and run pip install -e . in the project root directory. You will also need keys to pull from the DB containing the data sources, so email me if you are interested in having the keys.

DB Config

You can run all the following functions locally, but you will need to access the DB containing the task data. To do this, create a file named .env in the root directory with the following contents (which you will receive after emailing me for credentials):

HOSTNAME=...
USERNAME=...
PORT=...
SECRET=...

API

The core Dataset abstraction accepts a name (currently only "taxi_data"), cutoff date, and backend (currently only pandas). The cutoff date exists to prevent accidental data leakage. It contains two fuctions:

load(start_date, end_date): Takes in string or datetime objects representing dates. Returns a dataframe containing data from [start_date, end_date).
load_recent(delta): Takes in a timedelta of recent data to load in a dataframe.

The code is as follows:

class Dataset:
    def __init__(
        self,
        name: str,
        cutoff_date: typing.Union[str, datetime], # %Y-%m-%d
        cache_dir: str = None,
        backend: str = "pandas",
    ):
        ...

    def load(
        self,
        start_date: typing.Union[str, datetime], # %Y-%m-%d
        end_date: typing.Union[str, datetime], # %Y-%m-%d
    ) -> pd.DataFrame:
        """Method to load data for the dataset.

        Args:
            start_date (typing.Union[str, datetime]): Start date of the data (inclusive).
            end_date (typing.Union[str, datetime]): End date of the data (exclusive).

        Raises:
            ValueError: When end date is before start date.
            ValueError: When end date is after cutoff date.

        Returns:
            pd.DataFrame: Loaded data for the dataset.
        """
        ...

    def loadRecent(self, delta: timedelta) -> pd.DataFrame:
        """Method to load recent data for the dataset.

        Args:
            delta (timedelta): How far back in time to load data.

        Returns:
            pd.DataFrame: Loaded data for the dataset.
        """

        ...

Ongoing Work

If you are interested in working on any of the following, please create / comment on an issue or email me!

Adding more data sources in this format: email me if you have ideas and would like to contribute. This involves creating a migration from the source to the RDS instance.
Supporting backends other than Pandas: We plan to support TF & PyTorch data objects.
Caching data once it has been loaded locally: To prevent multiple unnecessary DB reads, we can add a cache layer to this project.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
designdocs		designdocs
intel_sensor_data		intel_sensor_data
table_scripts		table_scripts
ttb		ttb
.gitignore		.gitignore
README.md		README.md
example.py		example.py
scratch.ipynb		scratch.ipynb
scratch_signal.ipynb		scratch_signal.ipynb
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TTB: Timestamped Tabular Benchmarks

Data Criteria

Current Data Sources

How To Use

Install

DB Config

API

Ongoing Work

About

Releases

Packages

Contributors 3

Languages

loglabs/ttb

Folders and files

Latest commit

History

Repository files navigation

TTB: Timestamped Tabular Benchmarks

Data Criteria

Current Data Sources

How To Use

Install

DB Config

API

Ongoing Work

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages