๐Ÿ“ฆ EqualifyEverything / equalify-reflow-docs

๐Ÿ“„ authentication.md ยท 163 lines
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163# Authentication

The API implements two independent authentication layers for different security purposes.

## API Key Authentication

**Purpose:** Secure API endpoints for programmatic access
**Header:** `X-API-Key` (configurable via `API_KEY_HEADER_NAME`)
**Implementation:** `src/middleware/api_key_auth.py`

### Configuration

```bash
# .env
ENABLE_API_KEY_AUTH=true
API_KEY_HEADER_NAME=X-API-Key
API_KEYS=uic-2bd2c716-bc67-4032-ba66-e4f35c441759
```

### Public Endpoints (no API key required)

- `/health`, `/health/ready`, `/metrics` - Always public for monitoring
- `/api/dev/monitoring/*` - Public in dev environment only
- `/api/v1/documents/{job_id}/stream?token=...` - Stream endpoints with valid token (see below)

### Stream Token Authentication

**Purpose:** Allow browser EventSource connections without exposing API key in URLs

Browser's native EventSource API cannot send custom headers. Stream tokens provide a secure alternative:

1. Client requests token via `POST /api/v1/documents/{job_id}/stream/token` (with API key header)
2. Server returns short-lived, single-use token (5-minute TTL)
3. Client connects to `GET /api/v1/documents/{job_id}/stream?token={token}`
4. Token is consumed on first use (GETDEL in Redis)

**Security properties:**
- Single-use: Token deleted after first validation
- Job-scoped: Token only valid for specific job_id
- Short TTL: 5-minute expiration via Redis
- Not logged: Short-lived tokens are less sensitive than API keys

**Implementation:** `src/services/job_service.py` (create/validate), `src/middleware/api_key_auth.py` (bypass)

### Approval Endpoints Security

Approval endpoints (`/api/v1/approval/*`) require BOTH:
1. **API Key** (Layer 1) - Ensures only authorized systems (UIC infrastructure) can make requests
2. **Approval Token** (Layer 2) - Ensures the requester has permission for the specific job

Both layers must pass for access - defense in depth.

### Security Features

- Constant-time comparison (`secrets.compare_digest()`) prevents timing attacks
- Multiple keys supported (comma-separated in env var)
- Keys stored as `SecretStr` to prevent accidental logging
- Whitespace automatically stripped from configured keys
- **API keys cached at middleware initialization** for optimal performance (loaded once, not per-request)
- Authentication events logged for security auditing (missing/invalid keys)

### Client IP Extraction

Both authentication middleware implementations support reverse proxy setups:
- Checks `X-Forwarded-For` header (load balancers, CDNs)
- Falls back to `X-Real-IP` header (nginx)
- Uses direct connection IP as final fallback
- Extracted IP included in authentication logs for audit trails

### Generating API Keys

```bash
python3 -c "import uuid; print(f'uic-{uuid.uuid4()}')"
```

## Swagger/Docs Authentication (HTTP Basic)

**Purpose:** Protect API documentation in production
**Endpoints:** `/docs`, `/openapi.json`, `/redoc`
**Implementation:** `src/middleware/docs_auth.py`

### Configuration

```bash
# .env
ENABLE_DOCS_AUTH=true
DOCS_USERNAME=dase
DOCS_PASSWORD=a11y
```

Environment-dependent: disabled in dev for easy testing, enabled in prod.

### Security Features

- HTTP Basic Authentication (RFC 7617)
- Constant-time credential comparison
- `WWW-Authenticate` header triggers browser login prompt
- Password stored as `SecretStr`

## Middleware Stack Order

Middleware executes in reverse order of registration (last added = first executed):

```python
# src/main.py execution order:
# 1. CORS
# 2. Logging
# 3. Rate Limit
# 4. Error Handler
# 5. Docs Auth
# 6. API Key Auth
# 7. Endpoint
```

## Testing Authentication

**Unit Tests:**
- `tests/unit/middleware/test_api_key_auth.py` - API key middleware
- `tests/unit/middleware/test_docs_auth.py` - Docs auth middleware
- `tests/unit/services/test_job_service.py::TestStreamTokens` - Stream token methods

**Integration Tests:**
- `tests/integration/api/test_api_authentication.py` - Full auth flow
- `tests/integration/api/test_stream_auth.py` - Stream token authentication

**Manual Testing:**

```bash
# Start services with auth enabled
make dev

# Test API endpoint without key (should fail)
curl http://localhost:8080/api/v1/documents/test-id
# โ†’ 401 Unauthorized

# Test with valid key (should work or 404 if job not found)
curl -H "X-API-Key: uic-2bd2c716-bc67-4032-ba66-e4f35c441759" \
  http://localhost:8080/api/v1/documents/test-id

# Test Swagger UI (should prompt for username/password)
open http://localhost:8080/docs
# โ†’ Browser login prompt: dase / a11y

# Test public endpoints still work
curl http://localhost:8080/health
# โ†’ 200 OK (no auth required)
```

## Important Notes

**Integration tests require authentication to be enabled** in `.env`:

```bash
# .env (required for integration tests)
ENABLE_API_KEY_AUTH=true
ENABLE_DOCS_AUTH=true
API_KEYS=uic-2bd2c716-bc67-4032-ba66-e4f35c441759
DOCS_USERNAME=dase
DOCS_PASSWORD=a11y
```

The authentication integration tests use the app instance from `src.main`, which is configured at startup based on `.env` settings. Always keep authentication enabled in `.env` to ensure integration tests pass!