fix: persist gitea hosts entry in cloud-init #74

claude · 2026-02-16T17:43:54+01:00

claude commented

2026-02-16 17:43:54 +01:00

Summary

manage_etc_hosts: true перезатирает /etc/hosts при каждом boot VM
Добавлен bootcmd для восстановления записи 10.10.10.1 gitea
Containerd registry mirror + config_path добавлены в write_files

Root cause

После ребута bare_srv_1 VM перезагрузились → cloud-init стёр запись gitea → image pull сломался → promotion pipeline получил таймаут

Test plan

Проверено: hosts entry добавлена на всех 3 нодах
Image pull работает
Все 15 ArgoCD apps Synced + Healthy

## Summary - `manage_etc_hosts: true` перезатирает /etc/hosts при каждом boot VM - Добавлен `bootcmd` для восстановления записи `10.10.10.1 gitea` - Containerd registry mirror + config_path добавлены в write_files ## Root cause После ребута bare_srv_1 VM перезагрузились → cloud-init стёр запись gitea → image pull сломался → promotion pipeline получил таймаут ## Test plan - [x] Проверено: hosts entry добавлена на всех 3 нодах - [x] Image pull работает - [x] Все 15 ArgoCD apps Synced + Healthy

claude added 1 commit 2026-02-16 17:43:55 +01:00

fix: persist gitea hosts entry + containerd registry mirror in cloud-init

0/1 projects planned successfully.

AI Review / AI Code Review (pull_request) Successful in 2s

Details

PR Checks / OpenTofu Validate & Policy (pull_request) Failing after 11s

Details

Security Scan / Security Scan (pull_request) Successful in 15s

Details

9bacf44e76

manage_etc_hosts: true rewrites /etc/hosts on every VM boot, removing
the manually-added gitea entry. This broke image pulls after bare_srv_1
reboot because containerd couldn't resolve the Gitea auth token URL.

Changes:
- Add bootcmd to ensure 10.10.10.1 gitea in /etc/hosts on every boot
- Add containerd registry mirror config in write_files (was only in bootstrap)
- Add registry config_path to containerd config.toml

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

claude commented

2026-02-16 17:43:57 +01:00

Ran Plan for dir: environments/production workspace: default

Plan Error

Show Output

running 'sh -c' 'sops -d --extract '["proxmox_api_token"]' proxmox.secrets.yaml' in '/atlantis-data/repos/claude/infrastructure/74/default/environments/production': exit status 128: running "sops -d --extract '[\"proxmox_api_token\"]' proxmox.secrets.yaml" in "/atlantis-data/repos/claude/infrastructure/74/default/environments/production": 
Failed to get the data key required to decrypt the SOPS file.

Group 0: FAILED
  age1yttnttdpafzn73mf3g8fw4x04444gymwsfrfm99fv9qkcxqzqs7sld8hln: FAILED
    - | failed to load age identities. Did not find keys in
      | locations 'SOPS_AGE_SSH_PRIVATE_KEY_FILE',
      | '/home/atlantis/.ssh/id_ed25519',
      | '/home/atlantis/.ssh/id_rsa', 'SOPS_AGE_KEY',
      | 'SOPS_AGE_KEY_FILE', 'SOPS_AGE_KEY_CMD', and
      | '/home/atlantis/.config/sops/age/keys.txt'.

Recovery failed because no master key was able to decrypt the file. In
order for SOPS to recover the file, at least one key has to be successful,
but none were.

Ran Plan for dir: `environments/production` workspace: `default` **Plan Error** <details><summary>Show Output</summary> ``` running 'sh -c' 'sops -d --extract '["proxmox_api_token"]' proxmox.secrets.yaml' in '/atlantis-data/repos/claude/infrastructure/74/default/environments/production': exit status 128: running "sops -d --extract '[\"proxmox_api_token\"]' proxmox.secrets.yaml" in "/atlantis-data/repos/claude/infrastructure/74/default/environments/production": Failed to get the data key required to decrypt the SOPS file. Group 0: FAILED age1yttnttdpafzn73mf3g8fw4x04444gymwsfrfm99fv9qkcxqzqs7sld8hln: FAILED - | failed to load age identities. Did not find keys in | locations 'SOPS_AGE_SSH_PRIVATE_KEY_FILE', | '/home/atlantis/.ssh/id_ed25519', | '/home/atlantis/.ssh/id_rsa', 'SOPS_AGE_KEY', | 'SOPS_AGE_KEY_FILE', 'SOPS_AGE_KEY_CMD', and | '/home/atlantis/.config/sops/age/keys.txt'. Recovery failed because no master key was able to decrypt the file. In order for SOPS to recover the file, at least one key has to be successful, but none were. ``` </details>

claude commented

2026-02-16 17:44:11 +01:00

AI Code Review

Обзор PR: fix/gitea-hosts-persist

✅ Основная проблема решена

Использование bootcmd вместо runcmd — правильное решение. bootcmd выполняется ДО manage_etc_hosts, поэтому запись gitea не будет перезатёрта.

🔴 Критические проблемы

1. Race condition в bootcmd

grep -q '10.10.10.1 gitea' /etc/hosts || echo '10.10.10.1 gitea' >> /etc/hosts

/etc/hosts может ещё не существовать на момент выполнения bootcmd
Решение: Явно создать файл или переместить проверку в runcmd с явным touch

2. TLS проблема — skip_verify в production

skip_verify = true

Опасно даже во внутренней сети — уязвимо к MITM атакам в Proxmox VMs
Требуется: Либо использовать CA сертификаты (см. /srv/server-manifest/decisions/ADR-005), либо явно задокументировать угрозу
Проверить: есть ли уже CA сертификаты в cluster для 10.10.10.1?

3. Отсутствует конфигурация для других registry

Если в будущем понадобятся другие registry (Docker Hub, Quay), нужно обновлять this файл на всех 3 нодах
Рекомендация: Использовать loop или map в Terraform вместо хардкода

🟡 Лучшие практики

4. Порт 3000 — не standard container registry port

Container registry обычно слушает на :5000 или :443
Убедиться: Gitea действительно слушает на :3000 для container image pulls? Или это SSH/HTTP repo port?
Если это неправильно — image pull будет падать на production

5. Отсутствует lifecycle управление

Если потом добавить/удалить registry, cloud-init не пересоздаст файл
Рекомендация: Добавить triggers или явный recreate_before_destroy если registry конфиг меняется часто

6. Документация в коде неполная

# Runs BEFORE runcmd, on every boot — survives manage_etc_hosts rewrite

Хорошо, но не объясняется почему manage_etc_hosts: true нужен, если он ломает hosts entry
Рекомендация: Добавить комментарий про Proxmox DNS или toggle manage_etc_hosts: false если он не нужен

✅ Что хорошо

Тестирование на всех 3 нодах
ArgoCD apps validated
Идемпотентность grep-проверки

📋 Требуемые исправления перед merge

Проверить порт Gitea: Это действительно :3000 для image pulls?
Решить TLS: Где брать CA сертификаты? Обновить hosts.toml или перейти на https:// с верификацией?
Дополнить bootcmd: Убедиться, что /etc/hosts существует перед append
Обновить документацию: Почему manage_etc_hosts: true, если он конфликтует с gitea entry?

Automated review by Claude Haiku | advisory, does not block merge

## AI Code Review # Обзор PR: fix/gitea-hosts-persist ## ✅ Основная проблема решена Использование `bootcmd` вместо `runcmd` — правильное решение. `bootcmd` выполняется ДО `manage_etc_hosts`, поэтому запись gitea не будет перезатёрта. --- ## 🔴 Критические проблемы ### 1. **Race condition в bootcmd** ```bash grep -q '10.10.10.1 gitea' /etc/hosts || echo '10.10.10.1 gitea' >> /etc/hosts ``` - `/etc/hosts` может ещё не существовать на момент выполнения `bootcmd` - **Решение:** Явно создать файл или переместить проверку в `runcmd` с явным `touch` ### 2. **TLS проблема — skip_verify в production** ```toml skip_verify = true ``` - Опасно даже во внутренней сети — уязвимо к MITM атакам в Proxmox VMs - **Требуется:** Либо использовать CA сертификаты (см. `/srv/server-manifest/decisions/ADR-005`), либо явно задокументировать угрозу - Проверить: есть ли уже CA сертификаты в cluster для 10.10.10.1? ### 3. **Отсутствует конфигурация для других registry** - Если в будущем понадобятся другие registry (Docker Hub, Quay), нужно обновлять this файл на всех 3 нодах - **Рекомендация:** Использовать loop или map в Terraform вместо хардкода --- ## 🟡 Лучшие практики ### 4. **Порт 3000 — не standard container registry port** - Container registry обычно слушает на `:5000` или `:443` - Убедиться: Gitea действительно слушает на `:3000` для container image pulls? Или это SSH/HTTP repo port? - Если это неправильно — image pull будет падать на production ### 5. **Отсутствует lifecycle управление** - Если потом добавить/удалить registry, cloud-init не пересоздаст файл - Рекомендация: Добавить `triggers` или явный `recreate_before_destroy` если registry конфиг меняется часто ### 6. **Документация в коде неполная** ```yaml # Runs BEFORE runcmd, on every boot — survives manage_etc_hosts rewrite ``` - Хорошо, но не объясняется **почему** `manage_etc_hosts: true` нужен, если он ломает hosts entry - Рекомендация: Добавить комментарий про Proxmox DNS или toggle `manage_etc_hosts: false` если он не нужен --- ## ✅ Что хорошо - Тестирование на всех 3 нодах - ArgoCD apps validated - Идемпотентность grep-проверки --- ## 📋 Требуемые исправления перед merge 1. **Проверить порт Gitea:** Это действительно `:3000` для image pulls? 2. **Решить TLS:** Где брать CA сертификаты? Обновить `hosts.toml` или перейти на `https://` с верификацией? 3. **Дополнить bootcmd:** Убедиться, что `/etc/hosts` существует перед append 4. **Обновить документацию:** Почему `manage_etc_hosts: true`, если он конфликтует с gitea entry? --- _Automated review by Claude Haiku | advisory, does not block merge_

claude merged commit d6f32270f0 into main

2026-02-23 10:35:05 +01:00

claude referenced this issue from a commit

2026-02-23 10:35:06 +01:00

Merge pull request 'fix: persist gitea hosts entry in cloud-init' (#74) from fix/gitea-hosts-persist into main

claude deleted branch fix/gitea-hosts-persist

2026-02-23 10:35:06 +01:00

Sign in to join this conversation.

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: claude/infrastructure#74