📦 EqualifyEverything / equalify-reflow-docs

📄 rate-limiting.md · 378 lines
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378# Rate Limiting Documentation

## Overview

The Equalify PDF Converter implements **Redis-based sliding window rate limiting** to prevent abuse and control processing costs while maintaining system availability.

## Use Case Context

- **Primary Users**: UIC faculty and accessibility team (NOT public API)
- **Traffic Pattern**: Batch uploads of course materials
- **Cost Sensitivity**: ~$0.20/document AI processing cost
- **Processing Time**: 2-8 minutes per document
- **Critical Requirement**: Prevent accidental cost overruns from automated scripts or misconfigured integrations

## Rate Limit Tiers

### 1. Per-IP Submission Limits
**Limit**: 10 submissions per hour per IP address

**Purpose**: Prevent individual user abuse or runaway scripts

**Use Case**: Faculty member accidentally starts infinite upload loop

**Response**: HTTP 429 with retry-after header

### 2. Per-IP Status Check Limits
**Limit**: 100 status checks per hour per IP address

**Purpose**: Prevent aggressive polling storms

**Use Case**: Frontend with 1-second polling interval

**Response**: HTTP 429 with retry-after header

### 3. Global Submission Limits
**Limit**: 1000 submissions per day (system-wide)

**Purpose**: Cost control and capacity planning

**Use Case**: Prevent system-wide processing cost explosion

**Response**: HTTP 429 with retry-after header

## Implementation Details

### Algorithm: Sliding Window

Uses Redis Sorted Sets with timestamps as scores:

```python
# Each request adds entry with current timestamp
ZADD rate_limit_key {timestamp} {request_id}

# Remove entries outside time window
ZREMRANGEBYSCORE rate_limit_key 0 {window_start}

# Count remaining entries
count = ZCARD rate_limit_key

# Allow if under limit
if count < limit:
    ZADD rate_limit_key {now} {request_id}
    return ALLOWED
else:
    return DENIED
```

**Advantages**:
- Precise sliding window (not fixed buckets)
- Distributed rate limiting across multiple ECS tasks
- Automatic cleanup via ZREMRANGEBYSCORE
- Atomic operations prevent race conditions

### Fail-Open Philosophy

**Critical Design Decision**: Rate limiter **fails open** when Redis is unavailable.

**Rationale**:
- Availability > strict rate limiting for academic use case
- Faculty deadlines are inflexible
- Rate limiter should not be single point of failure
- Monitoring will alert on Redis issues

**Implementation**:
```python
try:
    rate_limiter = get_rate_limit_service()
    allowed, retry_after = await rate_limiter.check_submit_rate_limit(ip)
    if not allowed:
        return 429
except Exception:
    logger.error("Rate limiter unavailable")
    return ALLOW_REQUEST  # Fail open
```

## API Response Headers

### Rate Limit Information Headers

All rate-limited endpoints include these headers:

```http
X-RateLimit-Limit: 10
X-RateLimit-Remaining: 7
X-RateLimit-Reset: 1704124800
```

**Header Descriptions**:
- `X-RateLimit-Limit`: Maximum requests allowed in window
- `X-RateLimit-Remaining`: Requests remaining in current window
- `X-RateLimit-Reset`: Unix timestamp when limit resets

### Rate Limit Exceeded Response

```http
HTTP/1.1 429 Too Many Requests
Retry-After: 300
X-RateLimit-Remaining: 0

{
  "detail": "Rate limit exceeded for submission",
  "retry_after": 300,
  "limit_type": "submission"
}
```

## Exempt Endpoints

The following endpoints are **NOT** rate limited:

- `/health` - Health checks
- `/health/ready` - Readiness checks
- `/` - Root endpoint
- `/docs` - OpenAPI documentation
- `/redoc` - ReDoc documentation
- `/openapi.json` - OpenAPI schema

**Rationale**: Infrastructure monitoring and documentation must always be available.

## Redis Data Structures

### Rate Limit Keys

```
# Per-IP submission limits
eq-pdf:ratelimit:submit:ip:{client_ip}

# Per-IP status check limits
eq-pdf:ratelimit:status:ip:{client_ip}

# Global submission limits
eq-pdf:ratelimit:submit:global
```

### Data Format

```python
# Sorted set with timestamp scores
ZADD eq-pdf:ratelimit:submit:ip:192.168.1.1 1704120000.123 "req-1"
ZADD eq-pdf:ratelimit:submit:ip:192.168.1.1 1704120005.456 "req-2"
ZADD eq-pdf:ratelimit:submit:ip:192.168.1.1 1704120010.789 "req-3"

# Automatic expiration (cleanup)
EXPIRE eq-pdf:ratelimit:submit:ip:192.168.1.1 3600
```

## Client IP Detection

### Reverse Proxy Support

Rate limiting correctly handles reverse proxies (AWS ALB, Nginx, Cloudflare):

```python
# Priority order:
1. X-Forwarded-For: 203.0.113.1, 198.51.100.1  # Take first IP
2. X-Real-IP: 203.0.113.1
3. request.client.host  # Direct connection
```

**Security Note**: In production, configure ALB/reverse proxy to set trusted `X-Forwarded-For` headers.

## Administrative Operations

### Reset Rate Limit for User

```python
from src.services.rate_limit_service import RateLimitService

# Reset submission limit for specific IP
await rate_limiter.reset_rate_limit("192.168.1.100", "submit")

# Reset status check limit
await rate_limiter.reset_rate_limit("192.168.1.100", "status")
```

### Check Remaining Quota

```python
# Get quota info for IP
quota = await rate_limiter.get_remaining_quota("192.168.1.100", "submit")

# Returns:
{
    "limit": 10,
    "remaining": 3,
    "reset_at": 1704124800,
    "window_seconds": 3600
}
```

## Configuration

### Environment Variables

```bash
# Redis connection (required for rate limiting)
REDIS_URL=redis://redis:6379
REDIS_MAX_CONNECTIONS=10
```

### Adjusting Limits

Limits are configured in `src/services/rate_limit_service.py`:

```python
class RateLimitService:
    # Adjust these values based on usage patterns
    SUBMIT_PER_IP_LIMIT = 10        # Submissions per hour per IP
    SUBMIT_PER_IP_WINDOW = 3600     # 1 hour

    STATUS_PER_IP_LIMIT = 100       # Status checks per hour per IP
    STATUS_PER_IP_WINDOW = 3600     # 1 hour

    GLOBAL_SUBMIT_LIMIT = 1000      # Global submissions per day
    GLOBAL_SUBMIT_WINDOW = 86400    # 24 hours
```

## Monitoring

### Key Metrics to Track

1. **Rate Limit Hits**: How often 429 responses occur
2. **Top Rate-Limited IPs**: Identify problematic clients
3. **Global Quota Usage**: Track toward daily limit
4. **Redis Availability**: Monitor fail-open scenarios

### Recommended CloudWatch Alarms

```yaml
- Metric: HTTP 429 responses > 100/hour
  Action: Alert operations team

- Metric: Global quota > 80% of daily limit
  Action: Alert stakeholders

- Metric: Rate limiter exceptions > 10/hour
  Action: Check Redis health
```

## Testing

### Unit Tests

Located in `tests/services/test_rate_limit_service.py`:

- Sliding window algorithm correctness
- Per-IP and global limit enforcement
- Fail-open behavior on Redis errors
- Quota calculation accuracy

### Integration Tests

Located in `tests/test_rate_limit_middleware.py`:

- Middleware installation verification
- Exempt endpoint behavior
- Fail-open on Redis unavailability

### Manual Testing

```bash
# Test submission rate limit (11 requests should trigger limit)
for i in {1..11}; do
  curl -X POST http://localhost:8080/api/v1/documents/submit \
    -F "file=@test.pdf" \
    -H "X-Forwarded-For: 203.0.113.1" \
    -w "\n%{http_code}\n"
done

# Expected: First 10 succeed (201), 11th fails (429)
```

## Production Recommendations

### 1. Enable Redis Persistence

Rate limiting uses ephemeral data, but consider persistence for monitoring:

```yaml
# redis.conf
save 60 1  # Snapshot every 60s if 1 key changed
```

### 2. Configure Trusted Proxies

Ensure ALB/reverse proxy sets correct `X-Forwarded-For`:

```python
# AWS ALB automatically sets X-Forwarded-For
# No additional configuration needed
```

### 3. Monitor and Adjust

**Week 1**: Log all rate limit hits without blocking
**Week 2-4**: Gradually enforce limits based on actual usage
**Ongoing**: Adjust limits based on semester patterns

### 4. Whitelist Internal Systems

For automated integrations:

```python
# Add to middleware:
WHITELISTED_IPS = ["10.0.0.0/8"]  # Internal network
if client_ip in whitelist:
    return await call_next(request)
```

## Future Enhancements

1. **User-Based Rate Limiting**: Limit by authenticated user ID instead of IP
2. **Dynamic Limits**: Adjust limits based on system load
3. **Rate Limit Bypass Tokens**: Administrative bypass for urgent requests
4. **Distributed Rate Limiting**: Share limits across multiple data centers
5. **Analytics Dashboard**: Real-time rate limit visualization

## FAQ

### Q: What happens if Redis goes down?

**A**: Rate limiter fails open - all requests are allowed. CloudWatch alarm should alert operations team.

### Q: Can users see their rate limit status?

**A**: Yes, via `X-RateLimit-*` headers on every response.

### Q: How do I temporarily increase limits for a user?

**A**: Use administrative `reset_rate_limit()` function or whitelist their IP.

### Q: Do rate limits affect health checks?

**A**: No, health checks are explicitly exempted.

### Q: What if multiple users share an IP (NAT)?

**A**: Consider implementing user-based rate limiting (future enhancement).

### Q: How accurate is the sliding window?

**A**: Very accurate - uses actual timestamps, not fixed buckets.

---

## Implementation Files

- Service: `src/services/rate_limit_service.py`
- Middleware: `src/middleware/rate_limit.py`
- Tests: `tests/services/test_rate_limit_service.py`
- Dependency Injection: `src/dependencies.py::get_rate_limit_service`

## Related Documentation

- [Architecture Overview](../docs/architecture.md)
- [Redis Data Structures](../docs/redis-schema.md)
- [API Documentation](http://localhost:8080/docs)