AUTO_HIDUU

Overview

AUTO_HIDUU is a Python script that automates the process of uploading files to the HealtheIntent platform using the hi-data-upload-utility (HIDUU). The script validates CSV/TXT files against predefined schemas and uploads them to the correct dataset on the HealtheIntent system.

Features

Validates CSV/TXT files against defined schemas:
- Column presence and naming
- Data type validation (maps to Vertica types)
- Text length validation
- Required field (non-null) validation
- Numeric precision validation
Matches files using pattern matching with ? wildcards
Automatically uploads valid files to HealtheIntent using HIDUU
Provides detailed validation feedback and upload summaries

Prerequisites

Python 3.x
pandas library
HIDUU installed and accessible

Setup

Clone the repository:

git clone https://github.com/EddieDavison92/AUTO_HIDUU.git

Create a virtual environment and install dependencies:

python -m venv venv
venv\Scripts\activate
pip install -r requirements.txt

Create your environment configuration:

cp .env.example .env

Edit the .env file with your specific configuration:

# HealtheIntent Authentication
SAID=your_system_account_id
SAS=your_system_account_secret
SID=your_source_id

# File Paths
INPUT_FOLDER_PATH=/path/to/input/files
HIDUU_DIRECTORY=/path/to/hiduu/installation

Define your datasets in config/dataset_config.py:

from .schema_types import (
    Dataset, Column,
    VarcharType, CharType, DateType, TimestampType,
    IntegerType, FloatType, NumericType, BooleanType
)

my_dataset = Dataset(
    name="My Dataset",
    filename_pattern="MY_DATASET_????????.csv",  # ? matches any character
    min_rows=100,
    target_hei_dataset="TARGET_ID",
    columns=[
        Column("id", CharType(10), nullable=False),
        Column("name", VarcharType(50)),
        Column("date", DateType("%Y-%m-%d")),
        Column("timestamp", TimestampType()),  # Accepts any valid timestamp
        Column("count", IntegerType(precision=3)),  # Up to 999
        Column("amount", NumericType(precision=5, scale=2)),  # Up to 999.99
        Column("active", BooleanType(), nullable=False),
    ]
)

Available Column Types

Text:
- VarcharType(max_length) - Variable length text
- CharType(length) - Fixed length text
Date/Time:
- DateType(format=None) - Date values (e.g. "%Y-%m-%d")
- TimestampType(format=None) - Timestamp values, with or without timezone
Numbers:
- IntegerType(precision=None) - Whole numbers
- FloatType(precision=None) - Decimal numbers
- NumericType(precision=None, scale=None) - Exact decimal numbers
Boolean:
- BooleanType() - True/False values (accepts 1/0)

All types default to nullable=True. Add nullable=False to make a column required.

File Pattern Matching

The filename_pattern in Dataset configuration supports:

Question mark (?) to match any single character
Exact filenames for fixed files

Examples:

# Match files with any 8 characters before .csv
filename_pattern="DATA_????????.csv"  # DATA_20240315.csv, DATA_ABCD1234.csv

# Match exact filename
filename_pattern="REFERENCE.csv"  # Only REFERENCE.csv

Usage

Place your CSV/TXT files in the configured input directory
Run the script:

python main.py

The script will:

Find all CSV/TXT files in the input directory
Match files against dataset patterns
Validate file contents against schema
Upload valid files to HealtheIntent
Move successful uploads to a processed directory
Show a summary of results

Author

Eddie Davison | NHS NCL ICB

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
config		config
helpers		helpers
src		src
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
generated_datasets.txt		generated_datasets.txt
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AUTO_HIDUU

Overview

Features

Prerequisites

Setup

Available Column Types

File Pattern Matching

Usage

Author

About

Releases

Packages

Contributors 2

Languages

License

ncl-icb-analytics/AUTO_HIDUU

Folders and files

Latest commit

History

Repository files navigation

AUTO_HIDUU

Overview

Features

Prerequisites

Setup

Available Column Types

File Pattern Matching

Usage

Author

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages