Table of Contents
UniUsage is a command-line tool for analyzing user behavior from log files. It provides three core functionalities:
- ✅ Estimate unique user counts per operation using HyperLogLog (for large datasets)
- 🔍 Brute-force count of unique users per operation (for small datasets)
- 🔢 Count total number of log entries with multithreading
- Requirement: Java 17 or higher
- No need to run
gradlew buildbefore running the CLI, the script will do it for you
on Windows:
.\uniusage.bat [command] [options]on Linux/MacOS:
./uniusage.sh [command] [options]Estimate top K operations by unique user count using HyperLogLog.
Usage:
on Windows:
.\uniusage.bat tophll --logfile <path_to_log_file> [--k <top_k>] [--precision <hll_precision>]on Linux/MacOS:
./uniusage.sh tophll --logfile <path_to_log_file> [--k <top_k>] [--precision <hll_precision>]Options:
--logfile(required): Path to the log file--k,--top-k: Number of top operations to display (default:2)--precision,--hll-precision: HLL precision (default:18, ~0.2% standard error)
Note on Precision: The higher the precision, the more accurate the estimation, but the slower the computation and the higher the memory usage.
Brute-force method to find top K operations by unique users.
Best for small datasets.
Usage:
on Windows:
.\uniusage.bat top --logfile <path_to_log_file> [--k <top_k>]on Linux/MacOS:
./uniusage.sh top --logfile <path_to_log_file> [--k <top_k>]Options:
--logfile(required): Path to the log file--k,--top-k: Number of top operations to display (default:2)
Multithreaded log processor to count total number of entries in the log file.
Usage:
on Windows:
.\uniusage.bat count --logfile <path_to_log_file>on Linux/MacOS:
./uniusage.sh count --logfile <path_to_log_file>Options:
--logfile(required): Path to the log file
We are given a log file containing user actions and asked to:
- Identify the top K most used operations.
- Determine the percentage of users who used each operation.
- Ensure each user is counted once per operation.
🧑💼 "Can you tell me the most used two operations and the % of users who use them?"
✅ Your response: "Consider it done. Each user will be counted only once per operation."
.\uniusage.bat tophll --logfile log_file.logSample Output:
Top 2 operations by unique users with standard error of 0.20%:
Operation "connect" is used by 81.23% of our users (2251 users).
Operation "filter-changed" is used by 80.01% of our users (2217 users).
HLL provides high performance and low memory usage with a small error margin (±0.2%).
.\uniusage.bat top --logfile log_file.logSample Output:
Operation "connect" is used by 81.23% of our users (2251 users).
Operation "filter-changed" is used by 80.04% of our users (2218 users).
Ideal for debugging, development, and small-scale datasets.
🧑💼 "How do you know your results are correct?"
- ✅ I created smaller sample logs like
sm_log_file_x.log - 🔎 Manually analyzed them to find expected top operations and percentages
- 🧪 Ran both brute-force and HLL implementations on the samples
- ✅ Verified the outputs matched the expected results
🧑💼 "Can I find the top 3 operations instead of just 2?"
Absolutely. Use the --top-k flag:
.\uniusage.bat top --logfile log_file.log --top-k 3Sample Output:
Operation "connect" is used by 81.23% of our users (2251 users).
Operation "filter-changed" is used by 80.04% of our users (2218 users).
Operation "open-detailed-quotes" is used by 45.11% of our users (1250 users).
The CLI is flexible — change
--top-kto any number you need.
🧑💼 "Why is it slow? Can you make it faster?"
Yes, performance was part of the design. Here's how it’s optimized:
- 🔍 Brute force for accuracy in small logs
- 🧵 Multithreading is used for in HLLL implementation
- 🌀 HyperLogLog for large logs with minimal memory overhead
- ⚙️ Easy to switch between them using CLI flags