Architettura Docker
Emblema utilizza una architettura Docker modulare basata su multi-stage builds e orchestrazione Docker Compose per garantire scalabilità, sicurezza e manutenibilità dell'intera piattaforma.
Panoramica Architetturale
Loading diagram...
Container Design Patterns
Multi-Stage Build Strategy
Tutti i container Emblema utilizzano multi-stage builds per ottimizzare le immagini:
Frontend Containers (Node.js/Next.js)
# Esempio da apps/www-emblema/Dockerfile
FROM node:20-alpine AS base
ENV APP_NAME=www-emblema
RUN apk add --no-cache curl bind-tools
FROM base AS builder
RUN apk add --no-cache libc6-compat
WORKDIR /app
RUN yarn global add turbo
COPY . .
RUN turbo prune ${APP_NAME} --docker
FROM base AS installer
WORKDIR /app
COPY --from=builder /app/out/json/ .
RUN corepack enable pnpm && pnpm i --frozen-lockfile
COPY --from=builder /app/out/full/ .
RUN pnpm turbo build --filter=${APP_NAME}...
FROM base AS runner
RUN addgroup --system --gid 1001 nodejs
RUN adduser --system --uid 1001 nextjs
USER nextjs
COPY --from=installer --chown=nextjs:nodejs /app/apps/${APP_NAME}/.next/standalone ./
EXPOSE 3000
CMD ["node", "apps/www-emblema/server.js"]
Vantaggi:
- Immagine finale ottimizzata (~200MB vs ~2GB)
- Sicurezza tramite utente non-privilegiato
- Output tracing Next.js per performance
Python AI Services
# Esempio da apps/background-task/Dockerfile
FROM nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04 AS base
# Stage 1: Dipendenze sistema
RUN apt-get update && apt-get install -y --no-install-recommends \
curl libgomp1 libmagic1 poppler-utils libreoffice \
pandoc tesseract-ocr ffmpeg imagemagick \
# Playwright dependencies
libnss3 libnspr4 libatk1.0-0 fonts-liberation \
&& rm -rf /var/lib/apt/lists/*
# Stage 2: Python dependencies con uv
FROM base AS deps
COPY --from=ghcr.io/astral-sh/uv:0.7.4 /uv /bin/uv
COPY ${APP_PATH}/pyproject.toml ${APP_PATH}/uv.lock /app/
WORKDIR /app
RUN uv sync --frozen --no-cache
RUN /app/.venv/bin/playwright install chromium
# Stage 3: Final runtime
FROM base
COPY --from=deps /app/.venv /app/.venv
COPY --from=deps /root/.cache/ms-playwright /root/.cache/ms-playwright
COPY ${APP_PATH}/app /app/app
ENV PATH="/app/.venv/bin:$PATH"
CMD ["fastapi", "run", "app/main.py", "--port", "80"]
Caratteristiche:
- Base CUDA per GPU workloads
- Gestione dipendenze con
uv(veloce) - Cache condivisa per browser Playwright
- Supporto multi-GPU con resource constraints
Security Hardening
# Pattern sicurezza per tutti i container
RUN addgroup --system --gid 1001 appgroup
RUN adduser --system --uid 1001 appuser
USER appuser
# Filesystem read-only quando possibile
RUN chmod -R 755 /app && chown -R appuser:appgroup /app
# Minimizzazione superficie di attacco
RUN rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
Docker Compose Orchestration
Architettura Modulare
Emblema utilizza una struttura modulare con 8+ file compose:
# docker-compose.yaml - File principale
include:
- path:
- ./docker-compose-base.yaml # Traefik, Redis
- ./docker-compose-data-source.yaml # DB, Storage
- ./docker-compose-auth.yaml # Keycloak
- ./docker-compose-llm.yaml # AI Services
- ./docker-compose-monitoring.yaml # Observability
- ./docker-compose-notification.yaml # Novu
- ./docker-compose-background-task.yaml # Workers
Service Dependencies
Loading diagram...
Esempio di dipendenze service:
www-emblema:
depends_on:
- document-render
- milvus
- minio
- graphql-engine
- keycloak
- litellm
- background-task
- background-task-worker
restart: always
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
interval: 30s
timeout: 10s
retries: 3
Resource Management
GPU Resource Allocation
# Strategia di allocazione GPU per AI services
vllm-bge-m3:
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
command:
- "--tensor-parallel-size=2"
- "--gpu-memory-utilization=0.05" # Solo 4GB
vllm-llama33-70b-instruct:
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
command:
- "--tensor-parallel-size=2"
- "--gpu-memory-utilization=0.50" # 24GB per GPU
Memory & CPU Limits
x-resource-limits: &default-limits
deploy:
resources:
limits:
memory: 2G
cpus: "1.0"
reservations:
memory: 512M
cpus: "0.5"
services:
www-emblema:
<<: *default-limits
deploy:
resources:
limits:
memory: 4G
cpus: "2.0"
Network Configuration
Network Topology
networks:
emblema:
external: true # Rete principale per tutti i servizi
redis-net:
driver: bridge # Rete dedicata per Redis cluster
internal: true # Non esposta esternamente
Service Discovery
I servizi comunicano tramite nomi DNS interni:
environment:
# Internal service URLs
HASURA_API_URL: http://graphql-engine:8080/v1/graphql
MILVUS_API_URL: http://milvus:19530/v2/vectordb
LITELLM_API_URL: http://litellm:4000/v1
BACKGROUND_TASK_API_URL: http://background-task
Port Mapping Strategy
# Principio: Solo Traefik espone porte esterne
traefik:
ports:
- "80:80" # HTTP redirect
- "443:443" # HTTPS
- "8080:8080" # Dashboard (solo dev)
# Altri servizi: nessuna esposizione diretta
milvus:
# ports: # Commentate in produzione
# - "19530:19530"
labels:
- "traefik.http.services.milvus.loadbalancer.server.port=19530"
Volume Management
Persistent Storage Strategy
volumes:
# External volumes per persistenza dei dati
emblema-hasura-data:
external: true
emblema-minio-data:
external: true
emblema-milvus-data:
external: true
emblema-redis-master-data:
external: true
emblema-keycloak-postgres:
external: true
Volume Mounting Patterns
Configuration Files
traefik:
volumes:
- "/var/run/docker.sock:/var/run/docker.sock:ro"
- "./config/traefik/certs:/certs:ro"
- "./config/traefik/traefik${CERT_RESOLVER:+-le}.yml:/etc/traefik/traefik.yml:ro"
Shared Cache
# Cache condivisa per modelli AI
x-ai-cache: &ai-cache
- "./shared-volume/huggingface:/root/.cache/huggingface"
vllm-bge-m3:
volumes: *ai-cache
vllm-llama31-8b-instruct:
volumes: *ai-cache
Data Persistence
postgres-vector:
volumes:
- emblema-hasura-data:/var/lib/postgresql/data
minio:
volumes:
- emblema-minio-data:/data
Backup Strategy
milvus-backup:
image: milvusdb/milvus-backup:latest
volumes:
- emblema-milvus-data:/milvus/data:ro
- ./backups:/backups
environment:
MINIO_ADDRESS: minio:9000
BACKUP_BUCKET_NAME: milvus-backup
command: |
backup create
--collection-names='*'
--backup-name=daily-$(date +%Y%m%d)
Logging & Monitoring
Centralized Logging
x-logging: &default-logging
driver: "json-file"
options:
max-size: "${MAX_LOG_SIZE:-1g}"
max-file: "3"
services:
www-emblema:
logging: *default-logging
Health Checks
# Pattern standard per health check
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 60s # Tempo extra per servizi AI
Deployment Patterns
Development vs Production
# Development: servizi opzionali attivi
services:
traefik:
ports:
- "8080:8080" # Dashboard esposta
api:
dashboard: true
insecure: true
# Production: sicurezza massimizzata
services:
traefik:
# ports: dashboard non esposta
api:
dashboard: false
insecure: false
Rolling Updates
# Strategia zero-downtime
services:
www-emblema:
deploy:
update_config:
parallelism: 1
delay: 30s
failure_action: rollback
order: start-first
rollback_config:
parallelism: 1
delay: 30s
Environment Configuration
# Template-based configuration
traefik:
volumes:
# Carica config dinamico basato su CERT_RESOLVER
- "./config/traefik/traefik${CERT_RESOLVER:+-le}.yml:/etc/traefik/traefik.yml:ro"
- "./config/traefik/dynamic${CERT_RESOLVER:+-le}:/etc/traefik/dynamic:ro"
Performance Optimization
Build Optimization
-
Docker Layer Caching:
# Dipendenze prima del codice sorgente
COPY package.json pnpm-lock.yaml ./
RUN pnpm install --frozen-lockfile
COPY . .
RUN pnpm build -
Multi-platform Builds:
build:
platforms:
- linux/amd64
# - linux/arm64 # Commentato per compatibilità GPU -
Shared Volumes:
# Cache condivisa per modelli AI (evita download multipli)
volumes:
- "./shared-volume/huggingface:/root/.cache/huggingface"
Runtime Optimization
-
Resource Constraints:
deploy:
resources:
limits:
memory: 4G
cpus: "2.0" -
IPC & SHM:
# Per servizi AI con tensor parallelism
ipc: host
shm_size: 2gb
Security Best Practices
Container Security
- Non-root Users: Tutti i container utilizzano utenti non-privilegiati
- Read-only Filesystems: Quando possibile
- Minimal Base Images: Alpine Linux o distroless
- Security Scanning: Integrato nella CI/CD
Network Security
- Internal Networks: Comunicazione tra servizi su reti private
- Minimal Exposure: Solo Traefik espone porte pubbliche
- TLS Termination: Gestita centralmente da Traefik
Secrets Management
# Variabili sensibili tramite environment
environment:
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
JWT_SECRET: ${JWT_SECRET}
# Mai hardcoded nei Dockerfile
Troubleshooting
Common Issues
- GPU Memory: Monitorare allocazione GPU tra servizi AI
- Network Connectivity: Verificare risoluzione DNS interna
- Volume Permissions: Assicurarsi che i volumi abbiano i permessi corretti
- Health Check Failures: Aumentare timeout per servizi AI
Debug Commands
# Verifica stato servizi
docker compose ps
# Log di un servizio specifico
docker compose logs -f www-emblema
# Accesso container per debug
docker compose exec www-emblema sh
# Verifica risorse GPU
docker compose exec vllm-bge-m3 nvidia-smi
Riferimenti
- Docker Compose File Reference
- Docker Multi-stage Builds
- Traefik Docker Provider
- NVIDIA Container Toolkit
Prossimi Passi
- Topologia di Rete - Configurazione di rete dettagliata
- Architettura di Sicurezza - Security patterns e RBAC
- Requisiti di Sistema - Hardware e software requirements