Multica Docs

Troubleshooting

Common issues when self-hosting Multica — symptoms, causes, how to diagnose, how to fix.

Look up issues by symptom. Each entry gives you symptom / likely causes / how to diagnose / how to fix. If your situation isn't listed, open an issue on GitHub.

Daemon can't connect to the server

Symptom: multica daemon's status command shows offline or connection refused; the server logs show no /api/daemon/register or /api/daemon/heartbeat requests. For how the daemon mechanism works, see Daemon and runtimes.

Likely causes:

  1. MULTICA_SERVER_URL points at the wrong address — default is ws://localhost:8080/ws; self-host must change it to your server address
  2. Network / firewall blocking — the daemon and server aren't on the same network, or outbound traffic is blocked
  3. Token expired or invalid — you never ran multica login, or the PAT was revoked
  4. Server rejected registration — the account you signed in with isn't in the target workspace (register returns 403)
  5. DNS resolution failure — the hostname doesn't resolve on the daemon machine

How to diagnose:

multica daemon logs --lines 100    # look for daemon-side errors
echo $MULTICA_SERVER_URL          # confirm the address is set
curl -i http://<server-host>:8080/health   # hit the server directly
curl -i http://<server-host>:8080/readyz  # include DB + migration readiness
cat ~/.multica/config.json        # verify api_token exists
multica workspace list            # confirm you're a member of the target workspace

How to fix: address each cause above. The two most common fixes are changing MULTICA_SERVER_URL and restarting the daemon (multica daemon restart) and signing in again (multica logout && multica login).

Tasks stuck in queued

Symptom: after assigning an issue to an agent, the issue status flips to in_progress immediately, but a long time passes with no sign of agent execution on the page; multica daemon status shows the daemon online.

Likely causes (ordered by frequency):

  1. Agent concurrency limit reached — this agent's max_concurrent_tasks (default 6) is fully occupied by other running tasks
  2. Another task from the same agent is still running on the same issue — same agent × same issue is forced to run sequentially (prevents duplicate execution)
  3. Agent has been archived — after archival, new tasks still enqueue but can't be claimed, and they time out after 5 minutes (code-issue G-01)
  4. Daemon hasn't registered this runtime in the current workspace — restart the daemon or reselect the runtime in the UI
  5. Daemon disconnected — no heartbeat in the last 45 seconds. daemon status reporting online may reflect a very recent disconnect

How to diagnose:

multica daemon status --output json       # runtime list + last_seen_at
multica agent list                         # check agent archived state
multica issue show <issue-id>             # inspect task history

On the server side (self-host), grep for "no_tasks" / "no_capacity" to see the claim outcome.

How to fix:

  • Concurrency full → wait for running tasks to finish, or multica agent update <id> --max-concurrent-tasks 10 to raise the ceiling
  • Same-issue serialization → wait for the previous task to finish, or reassign to a different agent
  • Agent archived → multica agent restore <id>
  • Runtime not registered → multica daemon restart, and the daemon will re-register

WebSocket can't connect

Symptom: the browser console logs WebSocket is closed; the page doesn't show real-time updates (task progress, comments, inbox), and a refresh is needed to see them; backend tasks still execute.

Likely causes:

  1. Origin check failure — your frontend domain isn't in the server's CORS allowlist. The default allowlist only includes localhost:3000/5173/5174; self-hosting on the public internet requires FRONTEND_ORIGIN
  2. Protocol mismatch — frontend on https:// needs wss://; HTTP uses ws://
  3. Reverse proxy doesn't enable WebSocket upgrade — Nginx / Envoy / HAProxy don't forward the Upgrade header by default
  4. JWT cookie expired or missing — no re-sign-in after the 30-day expiry

How to diagnose:

  • Browser DevTools → Network → filter by "WS" and check connection state and status code
  • Grep server logs for "rejected origin" / "websocket" — an origin issue spells itself out
  • curl -i http://<server-host>:8080/ws should return 101 Switching Protocols (with the Upgrade header)

How to fix:

  • Wrong origin → set FRONTEND_ORIGIN=https://multica.yourdomain.com in the server's .env (or comma-separated CORS_ALLOWED_ORIGINS) and restart the server
  • Protocol mismatch → make sure FRONTEND_ORIGIN's protocol matches the frontend's
  • Reverse proxy → in Nginx, add proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection "upgrade";
  • Cookie expired → refresh the page and sign in again

Emails not received

Symptom: after submitting an email during sign-in or invite acceptance, neither the inbox nor the spam folder has the verification code.

First, confirm which provider the server thinks is active. At startup the backend prints one of:

  • EmailService: SMTP relay <host>:<port> from=<addr> — using SMTP (SMTP_HOST non-empty wins over Resend)
  • EmailService: Resend API from=<addr> — using Resend
  • EmailService: DEV mode — codes printed to stdout … — no provider configured
docker compose -f docker-compose.selfhost.yml logs backend | grep "EmailService:"

If the line you expected isn't there, the environment didn't reach the process — check .env and docker compose -f docker-compose.selfhost.yml exec backend env | grep -E 'RESEND_|SMTP_'. Credentials are never logged on this startup line.

When Resend is the active provider

Likely causes:

  1. RESEND_API_KEY not set — the server silently falls back and writes the code to its own stdout without error. Easy to trip over in production
  2. Resend API key invalid / out of quota — server logs show "failed to send verification code"
  3. RESEND_FROM_EMAIL's domain not verified in Resend — Resend refuses to send
  4. Email was sent but flagged as spam by the recipient's ISP — check the Resend dashboard and the spam folder

How to diagnose:

  • Grep server logs for "[DEV] Verification code for" — if present, Resend isn't configured and the code was written to stdout
  • Resend dashboard → Emails for send history
  • Confirm RESEND_FROM_EMAIL's domain appears in the Resend console's "Verified Domains" list

How to fix:

  • Missing API key → follow Sign-in and signup configuration → How email works to configure and restart the server
  • Domain not verified → run the DNS verification flow in the Resend console (add SPF / DKIM records)
  • In an emergency (internal testing) → copy the code printed under [DEV] from the server logs

When SMTP is the active provider

The SMTP path wraps every failure with the stage it failed at, so the server logs already tell you where the relay rejected the session. Grep for "failed to send verification email" / "failed to send invitation email" and check the wrapped error:

Logged errorWhat it meansHow to fix
smtp dial <host>:<port>: dial tcp …: connect: connection refused / i/o timeoutThe backend container can't reach the relay — wrong host, wrong port, firewall, or the relay isn't listeningVerify SMTP_HOST / SMTP_PORT resolve from inside the container (docker compose -f docker-compose.selfhost.yml exec backend nslookup <host> and nc -vz <host> <port>); open the firewall from the host running Multica to the relay
smtp starttls: x509: certificate signed by unknown authority (or certificate is not valid for any names)The relay uses a private CA / self-signed cert and the container's trust store rejects itEither install the CA into the container, or set SMTP_TLS_INSECURE=true only after confirming the relay is reachable on a trusted segment
smtp auth: 535 5.7.8 Authentication credentials invalid (or 534/530)SMTP_USERNAME / SMTP_PASSWORD are wrong, or the relay requires a different auth mechanism than PLAINRe-confirm the service-account credentials with your mail admin; for Exchange anonymous internal relay leave both empty (SMTP_USERNAME=, SMTP_PASSWORD=)
smtp MAIL FROM: 550 5.7.1 Client does not have permissions to send as this senderThe relay won't accept RESEND_FROM_EMAIL as the envelope sender — typical Exchange "anonymous users not allowed" or DMARC alignment issueSet RESEND_FROM_EMAIL to a domain the relay accepts; on Exchange, grant the source IP ms-Exch-SMTP-Accept-Any-Sender on the receive connector
smtp RCPT TO <addr>: 550 5.7.1 Unable to relayThe relay's receive connector doesn't allow your subnet to relay to external recipients (most common for anonymous internal relays talking to outside domains)Either restrict invites to internal recipients, or add the Multica host's subnet to the Exchange "Anonymous Users → Relay" permission list
smtp DATA / smtp write body / smtp end dataSession was accepted but the relay dropped the body — usually message-size limits, content filtering, or a connection reset mid-streamCheck the relay's logs for the same Message-ID (logged as <unixnano>@<host>); raise the message size limit if needed

MAIL FROM, RCPT TO, and DATA errors are always logged with the relay's response code so you can match them against Exchange / Postfix logs on the other side. Verification codes and invite tokens are never included in the wrapped error.

How to diagnose:

  • Grep "EmailService: SMTP relay" once at startup, then "failed to send" for runtime failures
  • From inside the backend container, sanity-check connectivity: docker compose -f docker-compose.selfhost.yml exec backend sh -c 'nc -vz $SMTP_HOST $SMTP_PORT'
  • Confirm the env reached the process: docker compose -f docker-compose.selfhost.yml exec backend env | grep SMTP_ (password will be in the output — only run on a trusted shell)

How to fix:

  • Wrong host / port → adjust SMTP_HOST / SMTP_PORT and restart the backend; for the supported relay modes see Auth setup → Option B: SMTP relay
  • Cert mismatch → install the relay's CA into the container, or temporarily SMTP_TLS_INSECURE=true on a trusted segment
  • Auth failure → re-check credentials; for anonymous internal relay leave SMTP_USERNAME and SMTP_PASSWORD empty
  • Unable to relay → either restrict to internal recipients or grant the Multica host's IP relay permission on the Exchange receive connector

Fixed local test code doesn't work

Symptom: on a self-hosted instance, you try to sign in with a fixed local test code such as 888888 and it's rejected with invalid or expired code.

Likely causes (mutually exclusive):

  1. MULTICA_DEV_VERIFICATION_CODE is empty — fixed codes are disabled by default
  2. APP_ENV=production — this is the correct production configuration; fixed local test codes are ignored in production
  3. The configured code is not 6 digits — the shortcut only accepts a 6-digit value

How to diagnose:

cat .env | grep -E 'APP_ENV|MULTICA_DEV_VERIFICATION_CODE'
docker exec <container> env | grep -E 'APP_ENV|MULTICA_DEV_VERIFICATION_CODE'

Check your inbox (including spam) for the real verification code.

How to fix:

  • In production, leave MULTICA_DEV_VERIFICATION_CODE empty — configure Resend and use real codes
  • For local development or internal testing, either copy the generated code from server logs or set APP_ENV=development plus MULTICA_DEV_VERIFICATION_CODE=888888 — never enable a fixed code on a public instance (see Sign-in and signup configuration → Fixed local testing codes)

Usage dashboard stays at zero

Symptom: agents complete tasks, raw token usage is written to the database, but Settings → Usage and Settings → Runtime show 0 input / output / cost across the board. This is silent — there is no error in the backend logs.

Likely causes:

  1. rollup_task_usage_hourly() is never being claimed — the Usage / Runtime dashboards read from the derived task_usage_hourly table, populated by that function. Since MUL-2957 the backend runs the rollup in-process via the DB-backed scheduler (sys_cron_executions); a stale build, a missing migration 113, or a sustained backend outage with no replicas left running can leave the table without a recent SUCCESS row.
  2. pg_cron is configured for compatibility but pointing at the wrong databasepg_cron.database_name defaults to postgres; if your Multica database has a different name, the scheduled job never sees rollup_task_usage_hourly(). The in-process scheduler does not depend on this, but if you removed the in-process scheduler and rely on pg_cron, the DB name must match.
  3. The handler is being claimed but silently erroring — e.g. the SQL function is missing because migrations were partially applied, or DB role / search_path is misconfigured. Check the FAILED audit rows in sys_cron_executions.

How to diagnose:

-- Confirm raw events exist but the hourly table is empty.
SELECT count(*) AS raw_rows FROM task_usage;
SELECT count(*) AS hourly_rows FROM task_usage_hourly;

-- Inspect the in-process scheduler's audit log.
SELECT plan_time, status, attempt, runner_id,
       error_code, error_msg, started_at, finished_at
  FROM sys_cron_executions
 WHERE job_name = 'rollup_task_usage_hourly'
 ORDER BY plan_time DESC
 LIMIT 20;

-- Watermark — if this is 1970-01-01, the rollup has never run.
SELECT watermark_at FROM task_usage_hourly_rollup_state;

-- Compatibility path: if you previously registered pg_cron, confirm
-- it is (or isn't) available and pointing at the right database.
SELECT * FROM pg_available_extensions WHERE name = 'pg_cron';
SHOW shared_preload_libraries;
SELECT jobname, schedule, database, active FROM cron.job;

How to fix:

  • Confirm the scheduler is actually running on at least one backend replica — every 30 seconds it should add a SUCCESS row to sys_cron_executions for rollup_task_usage_hourly.
  • Call the rollup once by hand to verify the SQL path: SELECT rollup_task_usage_hourly(); — refresh the dashboard; if numbers appear, the SQL function is fine and the issue is on the scheduler claim path.
  • If migration 113_sys_cron_executions has not applied yet, restart the backend so migrations run, or invoke migrate up manually.
  • If you have legacy pg_cron history that pre-dates the in-process scheduler, the SQL function still holds advisory lock 4246 internally and the two paths cannot double-write — see Self-host quickstart → Usage rollup for the optional cron.unschedule cleanup.

Migration 103 fails with refusing to drop legacy daily rollups

Symptom: upgrading from v0.3.4 to v0.3.5+, the backend container fails to start (or migrate up aborts) with:

ERROR: refusing to drop legacy daily rollups:
  task_usage_hourly_rollup_state.watermark_at (1970-01-01 ...) trails
  task_usage latest event (...) by more than 01:00:00 — backfill is
  incomplete or pg_cron is not running. Run cmd/backfill_task_usage_hourly
  (and let pg_cron catch up) before re-running migrate

Likely cause: this is migration 103's fail-closed guard. It refuses to drop the legacy daily rollups until task_usage_hourly has caught up with raw task_usage. The guard fires whenever existing rows are present and the rollup watermark still sits at the epoch — i.e. nothing has rolled history into the hourly table yet.

Since MUL-2957 the migrate command runs an idempotent monthly-slice backfill (under advisory lock 4246) automatically immediately before applying migration 103, so v0.3.4 → v0.3.5+ direct upgrades complete in a single migrate up invocation. If you are still seeing this error you are either on a pre-MUL-2957 binary or the hook itself failed — check the migrate logs for an earlier task_usage hourly rollup hook line.

How to fix:

  1. If you are on a pre-MUL-2957 binary and cannot upgrade the binary first, run the standalone backfill against the same database (idempotent, safe to interrupt, safe to re-run):

    # Docker Compose
    docker compose -f docker-compose.selfhost.yml exec backend \
      ./backfill_task_usage_hourly --sleep-between-slices=2s
    
    # Kubernetes
    kubectl -n multica exec deploy/multica-backend -- \
      ./backfill_task_usage_hourly --sleep-between-slices=2s
  2. Re-run the upgrade — restarting the backend container is enough, migrations run on startup. The guard now sees a current watermark and lets 103 apply.

  3. The in-process scheduler then keeps the watermark advancing — see Self-host quickstart → Usage rollup.

--sleep-between-slices=2s is a polite default on production databases with years of history. Use --months-back N --force-partial if you only want to keep the last N months and are willing to permanently abandon older buckets.

Port conflicts

Symptom: multica server or multica daemon start fails with address already in use.

Likely causes:

  1. Server port taken (default 8080)
  2. Daemon health port taken (default 19514, offset by a hash per profile)
  3. Web dev server port conflict (3000 / 5173)
  4. Insufficient privileges for the port (binding a privileged port < 1024 requires sudo)

How to diagnose:

lsof -i :8080        # macOS / Linux
netstat -ano | findstr :8080    # Windows

How to fix:

  • Kill the conflicting process (kill -9 <PID>), or change ports via PORT=9000
  • To use 80 / 443 → don't bind directly; put a reverse proxy (Nginx / Caddy) in front, forwarding to a high port

Where to find logs

ComponentLocationCommand
Daemon~/.multica/daemon.log (background mode) or foreground stdoutmultica daemon logs -f --lines 100
Server (Docker)Container stdoutdocker logs -f <container>
Server (systemd)journaljournalctl -u multica-server -f
Frontend (dev)Terminal running pnpm devRead directly
Frontend (browser)DevTools → ConsolePress F12

For more detailed daemon logs, move it from background to foreground: multica daemon stop && multica daemon start --foreground.