๐Ÿ“ฆ EqualifyEverything / equalify-reflow

๐Ÿ“„ authentication.md ยท 138 lines
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138# Authentication reference

The API supports two complementary authentication paths:

1. **API key** on `/api/*` endpoints โ€” always available, governed by `ENABLE_API_KEY_AUTH` and `API_KEYS`.
2. **Optional viewer auth** โ€” when `AUTH_MODE != none`, browser sessions are established via username/password (`basic`) or OIDC SSO (`oidc`, PR2). API keys remain valid in parallel โ€” programmatic clients are unaffected.

Everything outside `/api/*` (the Pipeline Viewer SPA shell, Swagger UI, OpenAPI spec, ReDoc, health checks, metrics) is publicly accessible regardless of mode.

For the rationale behind the same-origin bypass, stream-token flows, and the layered auth design, see [authentication design](../explanation/authentication-design.md).

## Configuration

```bash
# .env โ€” match the style in .env.example
ENABLE_API_KEY_AUTH=true
API_KEY_HEADER_NAME=X-API-Key
API_KEYS=your-secret-key-here
```

Generate a real value via the `uic-<uuid>` recipe at the bottom of this page, or follow whatever convention your deployment uses. Multiple keys are supported via a comma-separated list โ€” useful for rolling rotations without downtime. Keys are stored as `SecretStr` internally and compared with `secrets.compare_digest()` for constant-time comparison. The header name is configurable via `API_KEY_HEADER_NAME`.

Implementation: `src/middleware/api_key_auth.py`.

## Public endpoints (no API key required)

- `/` and every SPA deep link โ€” viewer HTML
- `/docs`, `/openapi.json`, `/redoc` โ€” API documentation
- `/health`, `/health/ready` โ€” load-balancer health checks
- `/metrics` โ€” Prometheus scrape target
- `/api/dev/monitoring/*`, `/api/dev/minimal/*`, `/api/dev/pipeline-viewer/*` โ€” public when `ENVIRONMENT=dev`
- `/api/v1/documents/{job_id}/stream?token=...` โ€” SSE stream endpoints with a valid short-lived token
- `/lti/*` โ€” authenticated via the Canvas LTI JWT flow, not by API key

## Stream tokens

Browser `EventSource` connections cannot send custom headers. For SSE, exchange an API key for a short-lived stream token:

1. `POST /api/v1/documents/{job_id}/stream/token` (with `X-API-Key`)
2. Server returns a single-use token with a 5-minute TTL
3. Client opens `GET /api/v1/documents/{job_id}/stream?token={token}`
4. Token is consumed on first use (`GETDEL` in Redis)

Tokens are job-scoped and deleted after first validation. Implementation: `src/services/job_service.py` creates and validates; `src/middleware/api_key_auth.py` recognises the `?token=` query parameter as an alternative credential.

## Approval endpoints

`/api/v1/approval/*` requires both an API key and a valid approval token โ€” see [authentication design](../explanation/authentication-design.md) for the defense-in-depth rationale.

## Middleware stack order

Middleware executes in reverse registration order (last added = first executed):

```
1. CORS
2. Security Headers
3. Logging
4. Rate Limit
5. Error Handler
6. Session Auth         (only when AUTH_MODE != none)
7. API Key Auth
8. Endpoint
```

`SessionAuthMiddleware` runs ahead of `APIKeyAuthMiddleware` so the latter can short-circuit when ``request.state.identity`` is set. Both middlewares coexist on purpose โ€” API keys remain a parallel auth path for programmatic clients regardless of `AUTH_MODE`.

## Viewer authentication

| Variable | Default | Required when | Notes |
|---|---|---|---|
| `AUTH_MODE` | `none` | โ€” | One of `none`, `basic`, `oidc`. `none` preserves today's behaviour. |
| `AUTH_SECRET_KEY` | โ€” | `AUTH_MODE != none` | HMAC key for signing session and CSRF cookies. >= 32 chars; rotation invalidates all sessions. |
| `AUTH_SESSION_TTL_SECONDS` | `28800` (8h) | โ€” | Sliding re-issue at half-life. |
| `AUTH_SESSION_COOKIE_NAME` | `reflow_session` | โ€” | CSRF companion cookie is named `<this>_csrf`. |
| `AUTH_COOKIE_SECURE` | `true` | โ€” | Disable only for local HTTP dev. |
| `AUTH_BASIC_USERS` | โ€” | `AUTH_MODE=basic` | Semicolon-separated `username:argon2hash` pairs (commas collide with argon2 parameter blocks). Generate hashes with `make auth-hash-password`. |
| `AUTH_OIDC_PROVIDERS` | โ€” | `AUTH_MODE=oidc` | JSON array of `{id, display_name, discovery_url, client_id, client_secret, scopes?}` entries. The `id` field becomes the path segment in `/api/v1/auth/callback/{id}` โ€” must match what's registered with the IdP. |
| `AUTH_POST_LOGIN_REDIRECT` | `/` | โ€” | Where to send the browser after login when no `?next=` is present. |

### Endpoints under `/api/v1/auth/*`

| Method & path | When available | Notes |
|---|---|---|
| `GET /auth/config` | always | Public. SPA reads on mount; reports `mode` and providers. |
| `POST /auth/login` | basic | JSON `{username, password}`. Sets session + CSRF cookies. |
| `GET /auth/login/{provider_id}` | oidc | 302 to IdP authorisation endpoint with PKCE. Sets the `reflow_oauth_tx` signed cookie carrying state, nonce, PKCE verifier, and `next_path`. |
| `GET /auth/callback/{provider_id}` | oidc | Validates state + tx cookie, exchanges code for tokens, validates ID token (signature, iss, aud, exp, nonce), sets session cookies, 302 to `next`. |
| `POST /auth/logout` | basic + oidc | CSRF required (`X-CSRF-Token` header). Clears cookies. |
| `GET /auth/me` | basic + oidc | Returns identity or 401. |

### Cookies

- `reflow_session` โ€” `HttpOnly`, `Secure` (configurable), `SameSite=Lax`. Stateless signed cookie carrying `{sub, email, name, provider_id, issued_at, expires_at}`. The ID token itself is **not** stored in the cookie.
- `reflow_session_csrf` โ€” NOT `HttpOnly`. HMAC of the session-cookie value. SPA echoes as `X-CSRF-Token` on non-GET requests under `/api/v1/auth/*`.
- `reflow_oauth_tx` โ€” `HttpOnly`, `SameSite=Lax`, 10-minute TTL. Set during the OIDC kickoff route; carries the OAuth `state`, OIDC `nonce`, PKCE verifier, and the user's original `next_path`. Cleared by the callback route. Signed with a different `itsdangerous` salt from the session cookie so the two can never be confused.

### Audit logging

When auth is on, `LoggingMiddleware` adds `user_sub`, `user_email`, `user_provider` fields to every Response log line. Auth-state transitions emit a separate structured record with `category="auth_event"` covering `login_success`, `login_failure`, `logout`, `session_expired`. Failure reasons are categorical (`invalid_password`, `csrf_mismatch`, โ€ฆ) โ€” never the username attempted, never PII.

## Client IP extraction

The API key middleware handles reverse proxy setups (AWS ALB, Nginx, Cloudflare). Priority order:

1. `X-Forwarded-For` โ€” take the first IP
2. `X-Real-IP`
3. `request.client.host` โ€” direct connection

Extracted IPs are included in authentication logs for audit trails.

## Generating API keys

```bash
python3 -c "import uuid; print(f'uic-{uuid.uuid4()}')"
```

## Testing

Unit tests: `tests/unit/middleware/test_api_key_auth.py`, `tests/unit/services/test_job_service.py::TestStreamTokens`

Integration tests: `tests/integration/api/test_api_authentication.py`, `tests/integration/api/test_stream_auth.py`

Quick manual check (requires `make dev`):

```bash
# No key โ†’ 401
curl http://localhost:8080/api/v1/documents/test-id

# Valid key โ†’ 200 (or 404 if job not found)
curl -H "X-API-Key: $API_KEY" http://localhost:8080/api/v1/documents/test-id

# Swagger UI โ€” public, no prompt
open http://localhost:8080/docs

# Health โ€” public
curl http://localhost:8080/health
```