Skip to content

refactor: migrate install/update orchestration from Bash to Python#5466

Draft
moham96 wants to merge 114 commits into
hiddify:devfrom
moham96:refactor
Draft

refactor: migrate install/update orchestration from Bash to Python#5466
moham96 wants to merge 114 commits into
hiddify:devfrom
moham96:refactor

Conversation

@moham96

@moham96 moham96 commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Migrate the install/update path from Bash to Python

TL;DR

Replaces the bash orchestrator (install.sh, update.sh, restart.sh,
status.sh, uninstall.sh, hiddify_installer.sh, common/utils.sh,
common/jinja.py, the per-module install.sh/run.sh files, and the
common/run.sh.j2 firewall/timezone hook) with a python package
(hiddify_manager/) driven by ./init.sh install|update|upgrade|status|menu.

Repo .sh count: 70 → ~17 (12 real + 5 thin shims). Test count:
18 → 187. 269 commits, 363 files changed, net −30k / +8k
lines. Every remaining .sh is either an entrypoint, a curl|sh
bootstrap, a vendored upstream (google-bbr), a CI helper, or a 5-line
python shim kept so commander.py's sudoers paths don't churn.

Why it's worth doing

The bash orchestrator had real, silent, in-production bugs that fell
out the moment they touched python's typed environment + tests. None of
these were intentional in the migration; they showed up because porting
forces you to actually read every code path.

Bugs found and fixed along the way

Symptom on a running box Root cause
Nightly cert renewal hadn't been running common/daily_actions.sh called acme.sh/run.sh, deleted two refactors ago. Cron logged "file not found" every night to a place nobody looked.
Docker bootstrap broken docker-init.sh called install.sh + status.sh, both deleted. Any new docker container would have failed on bash: install.sh: No such file.
/usr/bin/hiddify doesn't run the menu Symlinked to menu.sh, deleted long ago. Operators got "command not found."
Singbox crashloop on every install Templates emit {users:[...,]} with trailing commas (legal json5). The legacy common/jinja.py re-parsed .json output via json5 and re-emitted strict JSON — my initial port skipped that step, hiddify-core fatal'd parsing.
Singbox crashloop on every install, take 2 Some xray/singbox templates exec() ls ssl/*.crt and bake the listing into JSON. Render was running before cert generation, so the listing was ls: cannot access ..., which the consumer then tried to open as a file.
WARP came up with no IPv4 address wgcf changed its Address = X, Y format to single-line comma-separated. The legacy regex (Address = ...:...) commented out the whole line when v6 was disabled, stripping the v4 with it. WARP "worked" (interface UP, handshake OK) but the kernel had no source v4 to send from.
dnstm-router crashlooping on port 53 modules/dnstt.py always installed + started; legacy gated it on hconfigs["dnstt_enable"]. systemd-resolved owns 53 on most boxes.
ssh-liberty-bridge fighting singbox for the same port Same pattern: legacy hardcoded install_run other/ssh 0 (disabled). My initial port ignored the flag.
hiddify-cli auto-restart loop the entire session Same again. Plus the SUB_LINK was being built from panel_links[-1] (the public domain) instead of http://127.0.0.1:9000/... like the legacy run.sh override.
bash common/install.sh failed on every reinstall mv /etc/cron.d/hiddify_daily_memory_release ... wasn't idempotent.
N nginx restarts per cert refresh acme.sh --pre-hook bash prepare_acme.sh ran per challenge, and every invocation systemctl restart hiddify-nginx. A 5-domain panel → 5 nginx restarts in 30s. Now it's one.
MariaDB password embeddable into SQL f"... IDENTIFIED BY '{pw}'" had no escaping. The current password generator is alnum so it never tripped, but the next person to change generate_random_password() would have.
WireGuard peer address math wrong for high user IDs The bash add_number_to_ipv4 carried v4 once (octets[2..3]) and add_number_to_ipv6 never carried at all. _add_int_to_ip uses ipaddress.ip_address(...) + n, which carries through.

Several of these had been broken since before this PR's first commit.
Bash orchestration made them invisible: no tests, no type checker, errors
discarded with >/dev/null 2>&1 or || true, and silent fall-through on
missing functions.

Architectural improvements

  • Tests: from 18 → 175. Every module migration ships with unit tests
    for the pure logic (template rendering, address math, port enumeration,
    cert expiry checks, JSON sanitize, lock TTL, etc.) and unittest.mock
    for the systemctl/subprocess calls. Refactoring is now safe.

  • Atomic file writes for state-bearing files: current.json,
    app.cfg, /etc/wireguard/hiddifywg.conf, /etc/iptables/rules.v{4,6},
    and the self-signed certs all go through tempfile + os.replace(). A
    crashed install no longer leaves half-truncated configs.

  • Idempotent operations: every _disable_legacy / _ensure_* /
    cron migration is safe to re-run.

  • Real python libraries instead of shell-out where it's a clear win:

    • Self-signed certs: cryptography package (no openssl subprocess
      per domain — one process, properly tested cert expiry/key validity
      detection).
    • Address arithmetic: stdlib ipaddress (proper carry, fewer bugs than
      bash arithmetic).
    • DNS lookup: socket.getaddrinfo (no dig dependency).
    • JSON sanitize: json5 (parse with trailing commas, re-emit strict).
    • HTTP API: stdlib urllib (no jq + curl pipe).
  • Service-level gates on every module that ships a systemd unit
    (hconfigs["X_enable"] checked before install + auto-stop unit when
    flag flips off). Matches what the legacy install_run other/X $(hconfig "X_enable")
    did but was inconsistently honored after the python port.

  • One-process-per-install instead of N+1 forks: the install loop no
    longer launches a bash subshell per module + per install_package /
    is_installed_package call. Faster on cold installs, way less log
    noise.

Panel integration & operability (kept working end-to-end)

Beyond the install loop, the whole panel-facing surface was re-verified
and fixed where the migration had broken it:

  • commander.py routes intact. The panel dispatches Apply-Configs /
    Install / Update / Restart / Status / get-cert / update-usage /
    temporary-short-link through commander.py's sudoers path. Each
    target is now a 5-line shim that exec ./init.sh <cmd> (or
    python -m hiddify_manager.modules.<x>). commander.py's Command
    enum is untouched.
  • ./init.sh apply-configs — new lightweight "panel state changed"
    path (regenerate current.json → certs → render → firewall → restart),
    replacing apply_configs.sh.
  • Panel live-log streaming works. init.sh tees each action to the
    log/system/<action>.log filename the panel's admin_log_api polls,
    with PYTHONUNBUFFERED=1 + stdbuf -oL so it streams line-by-line
    instead of dumping at the end, and set -o pipefail so a python
    failure isn't masked by tee.
  • Progress bar restored. utils/progress emits the
    ####<pct>####<title>####<subtitle>#### markers the panel's
    result.html regex parses, so the bar + labels update during long
    actions.
  • Real ACME certs fetched on install/apply for direct-mode domains
    (ported from the dropped acme.sh/run.sh), self-signed fallback via
    cryptography on failure.
  • Operator menu (./init.sh menu / hiddify): keyboard shortcuts,
    a rich status table (no log-timestamp noise), browseable+tailable
    system logs, fixed "Show admin link" (was calling the destructive
    reset-owner-password), confirm-before-uninstall.

What's preserved (no panel-side churn)

  • commander.py is untouched. Its Command enum still points at
    the same paths (get_cert.sh, update_usage.sh, add2shortlink.sh).
    Those files are now 5-line python shims that exec python -m hiddify_manager.modules.<x>. The panel doesn't know or care.
  • Systemd unit names unchanged.
  • Config layout unchanged (current.json, app.cfg, per-module
    rendered configs all in their legacy paths).
  • CLI surface preserved: ./init.sh install|update|status|menu|migrate
    works the same way as before; ./init.sh upgrade [mode] is the only
    new entrypoint, replacing the curl|sh hiddify_installer.sh flow.

What's still bash, and why

File Reason it stays
init.sh, docker-init.sh Entrypoints. Must be bash to bootstrap python.
common/download.sh, common/docker-installer.sh curl|sh first-install bootstrap. Runs before python exists on the box.
common/google-bbr.sh Vendored upstream (Teddysun BBR installer, 371 lines). Not ours to port.
acme.sh/get_cert.sh, hiddify-panel/update_usage.sh, nginx/add2shortlink.sh 5-line exec /...python -m ... shims so commander.py's sudoers paths stay valid.
operations/lxd/* (×3) LXD container provisioning — separate operational concern.
.github/release_message.sh GitHub Actions release-notes helper. CI-only.

How the install path looks now

./init.sh install
  └── hiddify_manager.manager.run_install()
      ├── for module in [common, redis, mysql, panel, nginx, haproxy,
      │                  acme.sh, speedtest, dnstt, telegram, ssfaketls,
      │                  ssh, warp, xray, hiddify-cli, wireguard, singbox]:
      │   ├── installer.install_module(module)
      │   │   └── hiddify_manager.modules.<module>.install()
      │   │       └── (apt + binary download + render + systemctl)
      │   └── if module == "hiddify-panel":
      │       └── manager._render_all_templates()
      │           ├── ensure_self_signed_cert for every domain
      │           ├── utils.template.render_tree(PROJECT_ROOT, configs)
      │           └── common.apply_runtime_config(configs)
      │               └── (timezone, firewall, sshd audit, auto-update cron)
      └── all modules log to /opt/hiddify-manager/log/system/*.log

Every module is independently importable and independently testable.

How to verify

# 1. unit tests on darwin/linux (no root needed):
python -m pytest tests/

# 2. smoke install on an Ubuntu box:
./init.sh install
systemctl list-units --state=failed   # expected: 0

# 3. check service health:
./init.sh status                       # rich colour table

# 4. menu:
./init.sh menu                         # keyboard shortcuts work

Risk

  • Bootstrap path (download.sh) rewritten — first-install via
    curl … | bash needs a smoke test on a fresh server. The reinstall
    / update / status / restart / uninstall paths have all been exercised
    on a running dev box throughout the PR.
  • CREATE_EASYSETUP_LINK env var (from the deleted download_install_easylink.sh)
    isn't honored yet. Easy add if anyone notices.
  • hiddify_installer.sh's whiptail progress UI is gone. Modern CLI users
    haven't missed it; the python output is colorised via rich.

Commit shape

Every behavioural change is in its own atomic commit (218 commits total),
with the new module + tests landing first, the orchestrator wiring next,
and the bash deletion last. Easy to bisect, easy to revert any one piece
if something surfaces in production.

moham96 and others added 30 commits June 9, 2026 18:26
Centralises the venv location (.venv313) and current.json path so
modules don't hardcode them. Unblocks config.py which referenced a
non-existent venv_path symbol.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Lets callers redirect output to a file (e.g. capturing
'hiddifypanel all-configs' to current.json) or inject a clean
environment without dropping down to raw subprocess. Tests updated
to match the new signature.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Reads chconfigs[0] from current.json so module installers can
fetch values like warp_plus_code / wireguard_port without parsing
the panel database directly. Generates current.json on demand by
invoking the venv's hiddifypanel CLI.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Mirrors common/jinja.py's filters (b64encode/quote/hexencode),
globals (enumerate), and shell exec helper, but uses PROJECT_ROOT
instead of hardcoded /opt/hiddify-manager so it works from any
clone location. Exposes render_template() for one-shot use and
render_tree() for bulk rendering across a directory.

Unblocks the warp/wireguard/telegram/ssfaketls migrations.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces other/mysql/install.sh: installs mariadb-server, generates
a random password stored in mysql_pass (0600), creates the
hiddifypanel user/database, and locks bind-address to 127.0.0.1.

Password is escaped before SQL interpolation and fed via stdin so
it never appears in argv or the process listing.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces other/redis/install.sh: installs redis-server, stops the
system service, generates a random requirepass in redis.conf if
missing, and wires up the hiddify-redis systemd unit pointing at
the repo config. Disables the OS redis to avoid a window with no
password set.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces hiddify-panel/install.sh: creates the hiddify-panel
user/group, sets permissions, links systemd units, downloads the
GeoLite2 mmdb databases via urllib (skipped if fresh within 24h),
and clears stale cron jobs.

Honours HIDDIFY_PANLE_SOURCE_DIR for editable installs of the
panel package from a sibling checkout.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Covers jinja filters (b64encode/quote), mode preservation, error
fallthrough on missing keys, hconfig() key lookup against a mocked
current.json, and lazy generation when current.json is absent.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Stops __pycache__, .pyc, .venv, and .venv313 from polluting diffs.
Untracks the already-committed bytecode under hiddify_manager/.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces other/speedtest/install.sh: writes a 30MB random blob via
os.urandom() in 1MB chunks, but skips regenerating if the file is
already the right size.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces other/ssh/install.sh: downloads the liberty-bridge binary,
creates the liberty-bridge user, links the systemd unit, and
rewrites .env with a REDIS_URL pointed at the local hiddify-redis
instance (parsing the password out of other/redis/redis.conf when
REDIS_URI_SSH isn't already set).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces other/dnstt/install.sh: downloads dnstm + dnstt-server,
generates the dnstt server keypair on first run, then locks down
ownership and permissions on the key files (priv 600, pub 644).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
install_module('other/redis') was building the import path as
hiddify_manager.modules.other/redis, which importlib can't load.
Take the basename first so subdir paths still map to a flat module.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces other/hiddify-cli/install.sh.j2 + run.sh.j2: pulls the
latest non-prerelease tag from the github releases API, downloads
the matching linux tarball for the host arch, extracts it, and
writes .env with the SUB_LINK pointed at the local panel using
proxy_path_client + the first user's uuid from current.json.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces other/ssfaketls/install.sh + run.sh: installs the
shadowsocks-libev + simple-obfs packages, renders the systemd unit
from hiddify-ss-faketls.service.j2 (which embeds the panel's
shadowsocks2022_port and ssfaketls_fakedomain), links it, and
disables the legacy ss-faketls.service from older installs.

First module to consume the new utils/template.render_template().

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces other/wireguard/install.sh.j2 + run.sh.j2 orchestration:
python loads current.json, renders both templates via
utils.template.render_template(), then executes the resulting
bash. Keeps the existing wg_utils.sh + iptables/PostUp shell
logic since rewriting it in python would balloon the change
without buying anything.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces other/warp/install.sh + other/warp/wireguard/install.sh +
run.sh.j2: installs wireguard-tools, downloads the wgcf binary via
package_manager (hash-verified from packages.lock), renders the
warp wireguard run.sh.j2 with the panel's warp_plus_code, and
executes it to bring up wg-quick@warp.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces other/telegram/install.sh.j2 + run.sh.j2: looks up
telegram_lib from hconfigs (selects between the python and tgo
backends), renders any *.j2 files inside that subdir against
current.json, then runs the subdir's install.sh + run.sh.

Also stops/disables the legacy mtproxy + mtproto-proxy units.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The legacy bash runsh() executed each module's install.sh without
propagating the exit code — failures were logged but the loop kept
going. Match that behaviour:
- bash install.sh / run.sh now run with check=False and log non-zero
  exits instead of raising CalledProcessError.
- Python install() raises are caught, logged with traceback, then
  swallowed so later modules still run.

Idempotency bugs in individual scripts (e.g. an mv on an already-
renamed file) shouldn't take down nginx + redis + the panel.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
moham96 and others added 29 commits June 14, 2026 02:36
Legacy install_run other/ssfaketls \$(hconfig "ssfaketls_enable").
Skip apt install + service restart when disabled; stop the unit so
a previously-enabled-then-disabled install actually goes quiet.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Legacy install_run other/hiddify-cli \$(hconfig "hiddifycli_enable").
This is almost certainly why hiddify-cli has been crash-looping
("auto-restart status=0/SUCCESS") for this whole session: it was
installing + starting on a host where the panel has the feature
disabled, so the binary exits cleanly when there's nothing to serve.

While I'm in here, reuse the configs dict loaded at the top of
install() instead of re-fetching it for _write_env at the bottom.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Legacy: install_run other/warp 1   if warp_mode != "disable"
        install_run other/warp 0   otherwise

My python port installed unconditionally. When the panel had warp
disabled, we'd still pull wgcf, register/update a WARP account,
write /etc/wireguard/warp.conf, and bring up wg-quick@warp. Plus
spam the WARP probe with curl every install.

Mirror the legacy gate: warp_mode=='disable' (case-insensitive) →
tear down wg-quick@warp + the dormant hiddify-warp.service unit
and bail. Any other value (including absent) → proceed with the
bring-up.

While I'm here, hoist the configs lookup to the top so it's clear
this module bails before any apt install or download work.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Zero callers anywhere in the repo (grep'd .sh + .py):

  common/package_manager.sh    186 lines - bash dup of
                                           utils/package_manager.py
  singbox/test.sh               28 lines - no caller
  hiddify-panel/backup.sh        - no caller
  hiddify-panel/download_yt.sh   - no caller
  hiddify-panel/temporary_access.sh
                                 - referenced in commander.py's
                                   Command enum, but the @cli.command
                                   that consumed it is commented out

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces uninstall.sh:

  - discover units via modules.services.discover_units (same source
    the restart/status flow uses), plus the legacy "netdata" name
  - systemctl kill + disable each
  - rm /etc/cron.d/hiddify* + service cron reload
  - if purge=True: apt-get purge nginx/gunicorn/mariadb-* and rm
    hiddify-panel/

DROPPED from the legacy script: the trailing `rm -rf *` from the
project root. It would wipe the script that's running it; the only
safe thing is to leave the repo checkout in place and let the
operator delete it manually.

Menu's "Uninstall" choice now calls run(purge=False). Add a CLI
exposure (./init.sh uninstall) in a follow-up if useful.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The hook was 3 lines (rm run/*, touch the pidfile). Replace with two
ExecStartPre= directives; the rm gets a `-` prefix + shell wrapper to
tolerate the empty-directory case (`rm -rf run/*` blows up if nullglob
isn't set and the dir is empty).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces nginx/add2shortlink.sh with modules/short_link.py:

  - validate the slug (alnum + - + _ only) before letting it land in
    an nginx config — the legacy script just shoved $2 in unvalidated,
    so a malicious slug could inject arbitrary nginx directives
  - validate minutes as a positive int
  - append a `location ~* ^/<slug>(/)?$ { return 302 <url>; }` line to
    nginx/parts/short-link.conf
  - schedule the removal via at(1) — same sed-strip the slug from the
    conf, same -<minutes> in the future
  - systemctl reload hiddify-nginx

add2shortlink.sh becomes a 5-line shim that exec's the python module,
so commander.py's Command.temporary_short_link path stays intact.

6 tests cover append behaviour, the at-scheduling argv, the nginx
reload, and the validation guards.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces hiddify-panel/update_usage.sh (the cron'd usage refresh
that the panel calls every minute via commander.py).

What the bash did, now in modules/update_usage:

  - lock /opt/hiddify-manager/log/update_usage.lock for ≤120s
    (mirrors common/utils.sh::set_lock); silent no-op on conflict
  - call the panel's local http api at
    http://localhost:9000/<api_path>/api/v2/admin/update_user_usage/
    with the Hiddify-API-Key header read out of current.json
  - on non-200 AND no running `hiddifypanel update-usage` process,
    fall back to invoking the cli via the venv python directly
  - always remove the lock at the end

Two paths the bash had but we didn't port:
  - `jq` is gone — json.load is enough for the api_path / api_key
    extraction, no subprocess
  - hiddify-panel-cli wrapper is gone — call the venv python with
    -m hiddifypanel update-usage; same behaviour minus a shell layer

update_usage.sh becomes a 5-line shim that execs the python module,
so commander.py's Command.update_usage path keeps working without
panel-side churn.

11 tests cover the lock TTL, the http-api urllib call (URL shape +
Hiddify-API-Key header), the run() dispatch (success / falls-back /
skip-when-busy), and the main() lock acquisition.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
8 files + 2 directories with zero callers anywhere in the repo:

  xray/disable.sh, xray/add_version.sh
  singbox/add_version.sh
  other/ssfaketls/disable.sh, other/ssh/disable.sh

  other/telegram/orig/  — pre-mtprotoproxy legacy backend
  other/telegram/telemt/ — third alternative; only python + tgo are
                           live in modules/telegram._BACKENDS
  other/v2ray/          — v2ray support, never wired into the python
                          install loop

Operators can `systemctl disable --now <unit>` directly without the
disable.sh files. add_version.sh files were manual version-bump
helpers used during release, not part of any install path.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Same shape as the nginx pre-start inline last commit. The hook was
two cleanup commands (rm run/*, rm the xtls unix socket). Replace
with a single ExecStartPre= shell wrapper with `-` prefix so the
empty-dir and missing-socket cases don't fail the service start.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces two operator scripts that manage Hiddify support team's
SSH access via ~/.ssh/authorized_keys.

  add():
    - append the assistant pubkey to ~/.ssh/authorized_keys (skips
      if already present — legacy didn't dedup)
    - chmod 600 (legacy missed this; fresh keys files would otherwise
      land 644 and openssh refuses to honor them)
    - print the SSH command line to share with support, including
      the listening sshd port detected from `ss -tulpn`

  remove():
    - drop the assistant pubkey line, keep everything else
    - tolerate missing file (legacy errored silently)

7 tests cover the dedupe, append-to-existing-keys, mode, the
ss-output port parser, and the missing-file path. Menu wired
directly through the python imports.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
daily_actions.sh was running every night via /etc/cron.d/hiddify_daily
and did exactly one thing: `bash /opt/hiddify-manager/acme.sh/run.sh`.
That script was deleted with the bash sweep this session, so the
cron has been failing silently — and ACME cert renewals have been
NOT HAPPENING for whoever's running this branch.

Replace with modules/daily_actions: walk every domain in current.json
and call modules.cert_issuer.get_cert per domain. cert_issuer handles
the LE issue + install + reload, and falls back to self-signed if
ACME is unreachable, so the worst case is "the cron logs a few warnings"
instead of "cert silently expires".

common._write_cron_entries now points the @daily line at
`python3 -m hiddify_manager.modules.daily_actions` (still via the
venv python), and the bash file is gone.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Zero callers in the repo. The script itself was broken too: it
calls `bash install.sh` (deleted in 20d2d79) as its final step,
and `ln -s /opt/hiddify-manager /opt/hiddify-config` is also from
a pre-rename era.

If we ever want a working downgrade flow, the right shape is a
manager command (./init.sh downgrade <ver>) — separate piece of
work. Removing the dead one now stops anyone from running it and
getting a half-broken result.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…nf changes

The legacy acme.sh --pre-hook ran bash prepare_acme.sh once *per
domain challenge*, and each invocation restarted hiddify-nginx
unconditionally — so a 5-domain panel got 5 nginx restarts in
quick succession.

Move the work into _prepare_acme() in cert_issuer.py:
  - mkdir webroot/.well-known/acme-challenge (idempotent)
  - read nginx/parts/acme.conf and only write+restart when its
    content actually differs from what we want
  - chown -R nginx the webroot (cheap, no need to gate)

Then drop the --pre-hook argument from _acmecmd and call
_prepare_acme() once at the top of get_cert(). Net effect: one nginx
restart per process invocation at most, instead of N.

Two new tests cover the "first call writes + restarts" and "second
call with matching conf does NOT restart" paths. Existing get_cert
tests get _prepare_acme stubbed by a fixture so they don't touch
the real filesystem.

prepare_acme.sh has no remaining caller — delete.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
docker-init.sh was calling `./install.sh docker --no-gui` and
`./status.sh --no-gui`. Both files were deleted earlier this session,
so the docker bootstrap was broken — anyone building/running the
hiddify-manager Docker image since then would have hit the install.sh
not-found error.

Swap to:
  export MODE=docker     # common.apply_runtime_config + .install
                         # already check this to skip host-level work
  ./init.sh install
  ./init.sh status

The legacy "docker mode" did more than this (skip firewall apply,
skip wg-quick, etc.); some of that is already honoured by my MODE
checks, the rest is a TODO. Filed mentally — best caught the next
time someone actually runs the docker compose.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three small drifts from what the legacy bash installer set up:

  - apt base list missing `clang` and `wireguard`. Add both.

  - /usr/bin/hiddify was symlinking to menu.sh, which got deleted in
    20d2d79 with the other legacy entrypoints. So the `hiddify`
    command on every box is broken. Replace the symlink with a
    one-liner shim:
        #!/bin/bash
        exec /opt/hiddify-manager/init.sh menu "$@"

  - root's bashrc gets `cd /opt/hiddify-manager` + `hiddify` appended
    so SSH logins land in the project + show the menu. Legacy did
    the same with `cd ...` + literal menu.sh path; ensure the stale
    lines are stripped on update.

New helper _ensure_bashrc_lines() drives the strip-then-append; 3
tests cover the dedupe, the stale-pattern strip with user lines
preserved, and the create-on-missing path.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Ports the manager self-update logic into modules/manager_updater:

  url_for_mode(mode) -> mode dispatch:
    release  -> releases/latest/download/hiddify-manager.zip
    v<tag>   -> releases/download/v<tag>/hiddify-manager.zip
    dev|develop -> archive/refs/heads/dev.tar.gz
    beta     -> None (needs resolved v<tag>; beta_resolver TBD)
    docker   -> None (local src; not source-updated this way)

  _extract handles .zip natively and strips the GitHub-prefixed
  top-level directory from .tar.gz like `tar --strip-components=1`.

  _wipe_stale_configs removes the same glob set the legacy bash did
  before re-rendering (xray/singbox 05_inbounds_* leftovers).

  _merge_into_project uses shutil.copytree(dirs_exist_ok=True) to
  overlay the staging dir on top of /opt/hiddify-manager — avoids the
  in-flight-process foot-gun of mv'ing files the running python has
  open.

  update_manager_source(mode, override_version=None) is the entry
  point; orchestrator caller re-runs ./init.sh install afterwards.

14 tests cover url dispatch (per-mode), the zip + tar.gz extract
shape (including the strip-components=1 behaviour), the stale-config
glob, the staging overlay, the unknown-mode early-return, and a
full happy-path round-trip via a fake local archive.

Not wired into ./init.sh upgrade yet — that's the next commit.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The curl|sh bootstrap entry was a 4-script chain:

  download.sh -> hiddify_installer.sh -> update_panel + update_config

and an alternate equally old:

  download_install.sh / download_install_easylink.sh
    -> hardcoded hiddifypanel==8.8.99 + /opt/hiddify-config/ (a name
       deprecated when the project renamed to hiddify-manager); these
       have been broken for ages.

Replace with:

  - download.sh: ~60 lines. Picks the right archive URL per
    mode (release latest / v<tag> zip / dev tarball), curls it,
    extracts straight into /opt/hiddify-manager, then execs
    ./init.sh update <mode>.

  - manager.run_upgrade(mode): new ./init.sh upgrade <mode>
    command. Runs modules.manager_updater.update_manager_source
    then os.execv()s ./init.sh update <mode> with the fresh code.
    This is what `bash hiddify_installer.sh <mode>` did, but
    python-driven and without the whiptail UI.

Deleted:
  common/hiddify_installer.sh
  common/download_install.sh
  common/download_install_easylink.sh

The CREATE_EASYSETUP_LINK env var (set by the old easylink path,
then consumed by post_update_tasks to flip a panel setting) isn't
honoured yet — flagging as a small follow-up if anyone notices it
missing.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
With hiddify_installer.sh + the bootstrap chain gone, nothing in the
project sources utils.sh any more — `grep 'source.*utils.sh'`
across .sh files returns empty. Every grep-hit for the function
names is either a comment, a Python string, or a same-named test
variable that has nothing to do with the bash helpers.

515 lines deleted. The functions that lived here are either ported
(allow_port / remove_port / save_firewall -> utils/firewall.py;
hconfig -> utils/config.py; install_package -> common.py via
apt-get; install_python* + activate_python_venv -> init.sh; get_*
-> modules/panel_installer; set_lock / remove_lock ->
modules/update_usage; etc.) or no longer needed (whiptail UI,
ASCII art, version comparator).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Both modules installed the binary and symlinked it to /usr/bin but
never linked the .service to /etc/systemd/system or invoked
systemctl. Result: after install the service stays in whatever state
it was in before — typically inactive on a fresh box, or stopped if
xray's install-time `systemctl stop` ran (it always does, then never
brought it back).

User just reported hiddify-singbox=inactive on the dev box despite a
successful install run. Wire up:
  ln -sf <module>/hiddify-<svc>.service /etc/systemd/system/
  systemctl enable <svc>
  systemctl restart <svc>

Same pattern as the earlier nginx + haproxy fix.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
User report: `View status of system` from the menu was printing a
"Running command: systemctl is-enabled X" + "Running command:
systemctl is-active X" line per service before the actual table,
and the table rows themselves came through the logger with the
"INFO -" timestamp prefix. Noisy.

  shell.run_cmd(..., quiet=True) now skips the per-call INFO log.
  Default False so install/update logs stay unchanged.

  services.py: every systemctl probe goes through quiet=True. The
  status + restart tables are printed via rich.Console + rich.Table
  with green/red/yellow status cells, no timestamp prefix.

Two new tests guard the quiet flag (still logs on default, silent
when True).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three reported issues + a polish:

1. Show admin link
   Was: run_cmd(["hiddify-panel-cli", "reset-owner-password"]).
   Two bugs in one line: hiddify-panel-cli isn't a real binary
   (FileNotFoundError on the user's box), and reset-owner-password
   is *destructive* — not what "Show admin link" should do.

   New modules/admin_links.show() reads panel_links straight from
   current.json and prints them with the legacy classification:
     http://*       -> red    [insecure]
     https://<ip>/  -> yellow [self-signed]
     otherwise      -> green  (real cert)
   No CLI dep, works even when the panel is down.

2. View system logs
   Was: run_cmd(["ls", "-lah", "log/system/"]). Useless — operator
   still has to remember a path and tail manually.

   New modules/logs.browse() lists each log with size + age, lets
   you pick one via questionary, tails the last 200 lines with rich.

3. Keyboard shortcuts
   use_shortcuts=True on every select() so 1-9 keys work, plus
   per-choice shortcut_key (s/a/l/r/i/u/x/q on the main menu,
   w/a/r/u/b on advanced). Tap a letter, hit enter, done.

4. Polish
   Uninstall now confirms before nuking the world.
   The advanced-menu "Back" doesn't ask for an Enter on exit.

13 new tests (7 admin_links, 6 logs).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
User caught it: the panel hits /admin/actions/apply_configs which
goes through commander.py sudoers -> /opt/hiddify-manager/apply_configs.sh,
which I deleted earlier this session. Every "Apply Configs" click was
failing with "No such file or directory".

Same is true of commander.py's apply/install/update/restart-services/
apply-users entries (Command enum -> {apply_configs,install,update,
restart,status}.sh, all of which I'd swept).

Two pieces:

1. manager.run_apply_configs(apply_users_only=False)
   The lightweight "panel state changed, re-derive everything from
   current.json" pass:
     - generate_current_json (forces a fresh pull from the panel)
     - cert-gen for any new domain
     - render_tree() over *.j2
     - common.apply_runtime_config (firewall, timezone, sshd audit)
     - services.restart
   apply_users_only=True skips the firewall pass — only users/peers
   changed, no need to re-touch system config.

   Plus three new CLI entries (./init.sh apply-configs / apply-users /
   restart).

2. Five .sh shims, all 3–5 lines that `cd` and `exec ./init.sh <cmd>`:
     apply_configs.sh -> ./init.sh apply-configs
     install.sh       -> ./init.sh install  (with apply_configs/apply_users
                         subcommand routing to match legacy install.sh)
     update.sh        -> ./init.sh update
     restart.sh       -> ./init.sh restart
     status.sh        -> ./init.sh status

commander.py's Command enum is unchanged — panel-side wiring stays
intact. Same pattern I used for get_cert.sh / update_usage.sh /
add2shortlink.sh earlier.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…d them

User report: panel /admin/.../log/xray (and singbox) returns
{"msg":"Invalid log file"}. Root cause: the StandardOutput= /
StandardError= lines in both unit files were commented out, so the
processes log to journald but nothing lands under
/opt/hiddify-manager/log/system/. The panel's log viewer allowlists
specific filenames in that directory; missing files fail validation.

Same fix in both:

  StandardOutput=file:/opt/hiddify-manager/log/system/<svc>.out.log
  StandardError=file:/opt/hiddify-manager/log/system/<svc>.err.log

After this + a systemctl restart of the unit, the panel tabs load.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
After every panel action (Apply Configs, Install, Restart, Status,
Update) the panel renders result.html with a fixed log_file name and
its JS polls /admin/.../log/<file>. The validator just checks that
the file exists in log/system/; missing file -> "Invalid log file"
forever.

Map (per panel/admin/Actions.py):
  install / apply-configs / apply-users -> log/system/0-install.log
  update / upgrade                       -> log/system/update.log
  restart                                -> log/system/restart.log
  status                                 -> log/system/status.log

init.sh now branches on the first arg and tees stdout+stderr through
`stdbuf -oL tee` into the matching file. Centralising it here means
every entrypoint creates the same file:

  - panel button -> commander.py -> .sh shim -> ./init.sh <cmd>
  - menu choice  -> ./init.sh menu (already in tee for menu? no —
                    menu shells in-process, doesn't tee, separate concern)
  - direct CLI   -> ./init.sh install

Three knobs make the stream actually streaming:

  - env PYTHONUNBUFFERED=1 + python -u: stdout is line-buffered when
    piped (was block-buffered, so the file stayed empty for minutes).
  - stdbuf -oL tee: defence in depth against tee buffering when its
    stdout is captured by commander.py's subprocess.
  - set -o pipefail + exit ${PIPESTATUS[0]}: a non-zero python exit
    propagates back to commander.py (check=True) instead of being
    masked by tee's success.

The five .sh shims (apply_configs / install / update / restart /
status) become one-line `exec ./init.sh <cmd>` — no mkdir, no
tee, no pipefail. Single source of truth.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
User caught: result.html's live-tail JS parses each log line for

  /####(?<progress>\d+)####(?<title>.*?)####(?<subtitle>.*?)####/

to drive the progress bar + the process-title / process-details
labels. Legacy bash printed these from `common/utils.sh::update_progress`;
my python orchestrator didn't, so the bar stayed at 0% and the
labels were empty the whole time.

Three pieces:

  utils/progress.progress(percent, title, subtitle="")
    Prints a marker in the exact ####N####title####subtitle#### shape,
    with the legacy capitalize-first-letter behaviour preserved
    ("${1^}" in bash). flush=True so it lands in the log file
    immediately for the panel's 1-second poll.

  manager.run_install
    A per-module (percent, label) table fed into a progress() call
    before each install_module(). Tuned to roughly match the legacy
    install.sh's progression. Bookended with 0/Done markers.

  manager.run_apply_configs
    Five markers: reading panel -> generating certs -> rendering
    templates -> applying system config -> restarting services -> done.
    Plus failure markers (100% / "Failed") on the early-exit paths.

  manager.run_update
    Marker at 5% before update_panel; failure marker on its failure
    path. The install loop's own markers drive the rest.

5 new tests in test_progress.py:
  - the literal regex from result.html matches our output
  - capitalize-first-letter
  - empty subtitle still matches
  - trailing newline (so the panel sees one marker per line)
  - single line (no multi-line title would confuse the regex)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ne dead imports

User hit on a real install:

  NameError: name 'VENV_DIR' is not defined
    at common.py:137 in _write_cron_entries
      f"@daily root {VENV_DIR}/bin/python3 -m ..."

The daily_actions cron rewrite referenced VENV_DIR but common.py's
paths import only pulled COMMON_DIR/PROJECT_ROOT/LOG_DIR. The
smoke-import didn't catch it because the bug is inside a function
that only runs during a real install (install() -> _write_cron_entries).

Fix: add VENV_DIR to the import.

Guardrail: ran pyflakes across hiddify_manager/ — it would have flagged
this as "undefined name". Wired it in as a check and cleaned every
"imported but unused" it reported while I was at it:

  migrate.py            glob
  manager.py            run_cmd (leftover from when cert-gen shelled out)
  services.py           log
  utils/package_manager sys
  utils/progress        sys
  modules/hiddify_cli   urllib.parse.urlparse
  modules/hiddify_panel shutil
  modules/nginx         shutil
  modules/ssh           re
  modules/logs          rich.syntax.Syntax

pyflakes now exits 0 on the package; 180 tests pass.

The hiddifypanel init_db "Can't DROP INDEX" / "Unknown column" lines in
the same paste are the panel's own idempotent migrations probing for
already-applied changes — DEBUG-level, harmless, not ours.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
User: after onboarding adds a domain, the old manager fetched a real
Let's Encrypt cert immediately; the migrated manager only ever leaves
a self-signed one (hid.mrhx.work.crt issuer = our self-signed DN).

Root cause: when I ported acme.sh/install.sh -> modules/acme_sh.py I
brought over the binary install + account registration but dropped
acme.sh/run.sh, which was the per-domain real-cert fetcher that ran on
every install/apply:

    domains=$(... select(.mode | IN("direct","relay",
                          "old_xtls_direct","sub_link_only")))
    for d in $domains; do get_cert $d & done; wait
    ... self-signed for fake + orphans ...
    systemctl reload hiddify-haproxy hiddify-singbox

My install/apply paths only call ensure_self_signed_cert for every
domain, so a real cert was never fetched until the daily cron fired
(up to 24h later) — and the panel's per-domain commander(get_cert)
goes through hiddifypanel's run_commander background thread which is
itself unreliable.

Fix:
  cert_issuer.fetch_real_certs(configs) — iterate domains, call
  get_cert() sequentially for those whose mode is in REAL_CERT_MODES
  {direct, relay, old_xtls_direct, sub_link_only}; reload singbox at
  the end (get_cert already reloads haproxy+nginx). Self-signed certs
  for all domains are still generated first by the render step, so
  this only *upgrades* the direct-mode ones and falls back cleanly.

Wired into:
  - manager._render_all_templates (install): after apply_runtime_config
  - manager.run_apply_configs (apply): AFTER services.restart so
    nginx/haproxy are up to serve the HTTP-01 challenge; gated on
    not apply_users_only (user/peer changes don't add domains).

4 tests: only real-cert modes fetched (fake/cdn skipped), singbox
reload at the end, no-op when no real domains, failed domains excluded
from the obtained list.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…re corruption

Three bugs from the user's install log:

1. FileNotFoundError: acme.sh binary missing → whole install crashed.
   _render_all_templates() runs right after the hiddify-panel module
   (index 3), but it was calling fetch_real_certs — which needs the
   acme.sh binary (installed by the acme.sh module, index 6) and a
   running nginx (nginx module, index 4) to serve the HTTP-01 challenge.
   So on a fresh install the binary wasn't there yet and nginx wasn't
   up, and the raw FileNotFoundError propagated out and killed run_install.

   Fix: move the real-cert fetch OUT of _render_all_templates into a
   _fetch_real_certs() step at the very END of run_install, after the
   whole module loop. (run_apply_configs already had it correctly placed
   after services.restart.)

2. _acmecmd raised instead of failing gracefully. Even with the
   reorder, a missing/again-wiped binary should degrade to self-signed,
   not crash. _acmecmd now checks os.path.exists(ACME_BIN) and returns a
   non-zero sentinel result; get_cert falls back to ensure_self_signed_cert.

3. "ip6tables-restore: line 45 failed". firewall.save() deduped *every*
   line — including the dump's own COMMIT, *table, and :CHAIN lines.
   Collapsing/reordering those across tables produced an unrestorable
   ruleset. Now dedup only rule lines ('-A'/'-I'), reset per '*table'
   boundary, and keep structural lines (incl. COMMIT) verbatim — no more
   appended stray COMMIT.

Tests: +2 cert_issuer (missing-binary returns nonzero, doesn't exec;
present-binary execs), +1 firewall (per-table rule dedup preserves a
rule shared across tables), updated the existing save test for the
"exactly one COMMIT, structural lines preserved" contract. 187 pass,
pyflakes clean.

Note: the acme.sh/lib/ binary went missing on the dev box because
rsync --delete from the local repo (which doesn't track the runtime-
fetched binary) wiped it. On a real server it persists; and with the
reorder the acme.sh module re-fetches it before _fetch_real_certs runs
regardless.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@moham96

moham96 commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

I have a lot of improvements and fixes. queued but all depend on this big PR since it makes the code more maintainable, i made it as a draft to discuss it further, maybe the best way moving forward is to pull it into a separate branch instead of dev

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant