Skip to content

Commit a1897aa

Browse files
committed
docs: add concise README for the text sentiment example
1 parent 286f581 commit a1897aa

File tree

1 file changed

+70
-0
lines changed

1 file changed

+70
-0
lines changed

examples/README.MD

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
# Text Sentiment (SVM) with Class Rebalancing — imbalanced-learn Example
2+
3+
**What:** A small, runnable demo for 3-class sentiment (negative / neutral / positive) using:
4+
5+
```
6+
TF-IDF → RandomUnderSampler → LinearSVC
7+
```
8+
9+
**Why:** Text features are sparse (TF-IDF). Oversampling methods like SMOTE target dense/continuous data;
10+
11+
**under-sampling** works out-of-the-box for sparse text.
12+
13+
---
14+
15+
## Files
16+
17+
* `examples/text_sentiment_svm_with_resampling.py` — example script (CLI)
18+
* `imblearn/tests/test_text_sentiment_example.py` & `..._cli.py` — fast smoke and unit tests
19+
20+
---
21+
22+
## Setup
23+
24+
```bash
25+
# in a virtual env
26+
pip install -e . # install this repo
27+
pip install datasets matplotlib pytest
28+
# optional: keep dataset cache local
29+
export HF_DATASETS_CACHE="$PWD/.hf_cache"
30+
```
31+
32+
## Run
33+
34+
```bash
35+
python examples/text_sentiment_svm_with_resampling.py --plot --max-samples 6000
36+
```
37+
38+
**Outputs**
39+
40+
* Prints **balanced accuracy** + **classification report**
41+
* Saves `confmat_svm_imblearn.png` when `--plot` is used
42+
43+
**CLI options**
44+
45+
```
46+
--max-samples INT Limit training size (None = full). Default: 6000
47+
--plot Save confusion matrix image
48+
--output PATH Image path (default: confmat_svm_imblearn.png)
49+
```
50+
51+
---
52+
53+
## Tests
54+
55+
```bash
56+
pytest -q imblearn/tests/test_text_sentiment_example.py
57+
pytest -q imblearn/tests/test_text_sentiment_example_cli.py
58+
```
59+
60+
Tests are quick, deterministic, and skipped if `datasets` isn’t installed.
61+
62+
---
63+
64+
## Notes
65+
66+
* Metric focus: **balanced accuracy** & **macro-F1** (better for imbalance)
67+
* Reproducible: fixed `random_state`, controllable `--max-samples`
68+
* Troubleshooting: low disk? use `pip --no-cache-dir`, clear caches, keep only one env active
69+
70+
---

0 commit comments

Comments
 (0)