Passa al contenuto principale

Architettura Docker

Emblema utilizza una architettura Docker modulare basata su multi-stage builds e orchestrazione Docker Compose per garantire scalabilità, sicurezza e manutenibilità dell'intera piattaforma.

Panoramica Architetturale

Loading diagram...

Container Design Patterns

Multi-Stage Build Strategy

Tutti i container Emblema utilizzano multi-stage builds per ottimizzare le immagini:

Frontend Containers (Node.js/Next.js)

# Esempio da apps/www-emblema/Dockerfile
FROM node:20-alpine AS base
ENV APP_NAME=www-emblema
RUN apk add --no-cache curl bind-tools

FROM base AS builder
RUN apk add --no-cache libc6-compat
WORKDIR /app
RUN yarn global add turbo
COPY . .
RUN turbo prune ${APP_NAME} --docker

FROM base AS installer
WORKDIR /app
COPY --from=builder /app/out/json/ .
RUN corepack enable pnpm && pnpm i --frozen-lockfile
COPY --from=builder /app/out/full/ .
RUN pnpm turbo build --filter=${APP_NAME}...

FROM base AS runner
RUN addgroup --system --gid 1001 nodejs
RUN adduser --system --uid 1001 nextjs
USER nextjs
COPY --from=installer --chown=nextjs:nodejs /app/apps/${APP_NAME}/.next/standalone ./
EXPOSE 3000
CMD ["node", "apps/www-emblema/server.js"]

Vantaggi:

  • Immagine finale ottimizzata (~200MB vs ~2GB)
  • Sicurezza tramite utente non-privilegiato
  • Output tracing Next.js per performance

Python AI Services

# Esempio da apps/background-task/Dockerfile
FROM nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04 AS base

# Stage 1: Dipendenze sistema
RUN apt-get update && apt-get install -y --no-install-recommends \
curl libgomp1 libmagic1 poppler-utils libreoffice \
pandoc tesseract-ocr ffmpeg imagemagick \
# Playwright dependencies
libnss3 libnspr4 libatk1.0-0 fonts-liberation \
&& rm -rf /var/lib/apt/lists/*

# Stage 2: Python dependencies con uv
FROM base AS deps
COPY --from=ghcr.io/astral-sh/uv:0.7.4 /uv /bin/uv
COPY ${APP_PATH}/pyproject.toml ${APP_PATH}/uv.lock /app/
WORKDIR /app
RUN uv sync --frozen --no-cache
RUN /app/.venv/bin/playwright install chromium

# Stage 3: Final runtime
FROM base
COPY --from=deps /app/.venv /app/.venv
COPY --from=deps /root/.cache/ms-playwright /root/.cache/ms-playwright
COPY ${APP_PATH}/app /app/app
ENV PATH="/app/.venv/bin:$PATH"
CMD ["fastapi", "run", "app/main.py", "--port", "80"]

Caratteristiche:

  • Base CUDA per GPU workloads
  • Gestione dipendenze con uv (veloce)
  • Cache condivisa per browser Playwright
  • Supporto multi-GPU con resource constraints

Security Hardening

# Pattern sicurezza per tutti i container
RUN addgroup --system --gid 1001 appgroup
RUN adduser --system --uid 1001 appuser
USER appuser

# Filesystem read-only quando possibile
RUN chmod -R 755 /app && chown -R appuser:appgroup /app

# Minimizzazione superficie di attacco
RUN rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

Docker Compose Orchestration

Architettura Modulare

Emblema utilizza una struttura modulare con 8+ file compose:

# docker-compose.yaml - File principale
include:
- path:
- ./docker-compose-base.yaml # Traefik, Redis
- ./docker-compose-data-source.yaml # DB, Storage
- ./docker-compose-auth.yaml # Keycloak
- ./docker-compose-llm.yaml # AI Services
- ./docker-compose-monitoring.yaml # Observability
- ./docker-compose-notification.yaml # Novu
- ./docker-compose-background-task.yaml # Workers

Service Dependencies

Loading diagram...

Esempio di dipendenze service:

www-emblema:
depends_on:
- document-render
- milvus
- minio
- graphql-engine
- keycloak
- litellm
- background-task
- background-task-worker
restart: always
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
interval: 30s
timeout: 10s
retries: 3

Resource Management

GPU Resource Allocation

# Strategia di allocazione GPU per AI services
vllm-bge-m3:
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
command:
- "--tensor-parallel-size=2"
- "--gpu-memory-utilization=0.05" # Solo 4GB

vllm-llama33-70b-instruct:
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
command:
- "--tensor-parallel-size=2"
- "--gpu-memory-utilization=0.50" # 24GB per GPU

Memory & CPU Limits

x-resource-limits: &default-limits
deploy:
resources:
limits:
memory: 2G
cpus: "1.0"
reservations:
memory: 512M
cpus: "0.5"

services:
www-emblema:
<<: *default-limits
deploy:
resources:
limits:
memory: 4G
cpus: "2.0"

Network Configuration

Network Topology

networks:
emblema:
external: true # Rete principale per tutti i servizi
redis-net:
driver: bridge # Rete dedicata per Redis cluster
internal: true # Non esposta esternamente

Service Discovery

I servizi comunicano tramite nomi DNS interni:

environment:
# Internal service URLs
HASURA_API_URL: http://graphql-engine:8080/v1/graphql
MILVUS_API_URL: http://milvus:19530/v2/vectordb
LITELLM_API_URL: http://litellm:4000/v1
BACKGROUND_TASK_API_URL: http://background-task

Port Mapping Strategy

# Principio: Solo Traefik espone porte esterne
traefik:
ports:
- "80:80" # HTTP redirect
- "443:443" # HTTPS
- "8080:8080" # Dashboard (solo dev)

# Altri servizi: nessuna esposizione diretta
milvus:
# ports: # Commentate in produzione
# - "19530:19530"
labels:
- "traefik.http.services.milvus.loadbalancer.server.port=19530"

Volume Management

Persistent Storage Strategy

volumes:
# External volumes per persistenza dei dati
emblema-hasura-data:
external: true
emblema-minio-data:
external: true
emblema-milvus-data:
external: true
emblema-redis-master-data:
external: true
emblema-keycloak-postgres:
external: true

Volume Mounting Patterns

Configuration Files

traefik:
volumes:
- "/var/run/docker.sock:/var/run/docker.sock:ro"
- "./config/traefik/certs:/certs:ro"
- "./config/traefik/traefik${CERT_RESOLVER:+-le}.yml:/etc/traefik/traefik.yml:ro"

Shared Cache

# Cache condivisa per modelli AI
x-ai-cache: &ai-cache
- "./shared-volume/huggingface:/root/.cache/huggingface"

vllm-bge-m3:
volumes: *ai-cache

vllm-llama31-8b-instruct:
volumes: *ai-cache

Data Persistence

postgres-vector:
volumes:
- emblema-hasura-data:/var/lib/postgresql/data

minio:
volumes:
- emblema-minio-data:/data

Backup Strategy

milvus-backup:
image: milvusdb/milvus-backup:latest
volumes:
- emblema-milvus-data:/milvus/data:ro
- ./backups:/backups
environment:
MINIO_ADDRESS: minio:9000
BACKUP_BUCKET_NAME: milvus-backup
command: |
backup create
--collection-names='*'
--backup-name=daily-$(date +%Y%m%d)

Logging & Monitoring

Centralized Logging

x-logging: &default-logging
driver: "json-file"
options:
max-size: "${MAX_LOG_SIZE:-1g}"
max-file: "3"

services:
www-emblema:
logging: *default-logging

Health Checks

# Pattern standard per health check
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 60s # Tempo extra per servizi AI

Deployment Patterns

Development vs Production

# Development: servizi opzionali attivi
services:
traefik:
ports:
- "8080:8080" # Dashboard esposta
api:
dashboard: true
insecure: true

# Production: sicurezza massimizzata
services:
traefik:
# ports: dashboard non esposta
api:
dashboard: false
insecure: false

Rolling Updates

# Strategia zero-downtime
services:
www-emblema:
deploy:
update_config:
parallelism: 1
delay: 30s
failure_action: rollback
order: start-first
rollback_config:
parallelism: 1
delay: 30s

Environment Configuration

# Template-based configuration
traefik:
volumes:
# Carica config dinamico basato su CERT_RESOLVER
- "./config/traefik/traefik${CERT_RESOLVER:+-le}.yml:/etc/traefik/traefik.yml:ro"
- "./config/traefik/dynamic${CERT_RESOLVER:+-le}:/etc/traefik/dynamic:ro"

Performance Optimization

Build Optimization

  1. Docker Layer Caching:

    # Dipendenze prima del codice sorgente
    COPY package.json pnpm-lock.yaml ./
    RUN pnpm install --frozen-lockfile
    COPY . .
    RUN pnpm build
  2. Multi-platform Builds:

    build:
    platforms:
    - linux/amd64
    # - linux/arm64 # Commentato per compatibilità GPU
  3. Shared Volumes:

    # Cache condivisa per modelli AI (evita download multipli)
    volumes:
    - "./shared-volume/huggingface:/root/.cache/huggingface"

Runtime Optimization

  1. Resource Constraints:

    deploy:
    resources:
    limits:
    memory: 4G
    cpus: "2.0"
  2. IPC & SHM:

    # Per servizi AI con tensor parallelism
    ipc: host
    shm_size: 2gb

Security Best Practices

Container Security

  1. Non-root Users: Tutti i container utilizzano utenti non-privilegiati
  2. Read-only Filesystems: Quando possibile
  3. Minimal Base Images: Alpine Linux o distroless
  4. Security Scanning: Integrato nella CI/CD

Network Security

  1. Internal Networks: Comunicazione tra servizi su reti private
  2. Minimal Exposure: Solo Traefik espone porte pubbliche
  3. TLS Termination: Gestita centralmente da Traefik

Secrets Management

# Variabili sensibili tramite environment
environment:
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
JWT_SECRET: ${JWT_SECRET}
# Mai hardcoded nei Dockerfile

Troubleshooting

Common Issues

  1. GPU Memory: Monitorare allocazione GPU tra servizi AI
  2. Network Connectivity: Verificare risoluzione DNS interna
  3. Volume Permissions: Assicurarsi che i volumi abbiano i permessi corretti
  4. Health Check Failures: Aumentare timeout per servizi AI

Debug Commands

# Verifica stato servizi
docker compose ps

# Log di un servizio specifico
docker compose logs -f www-emblema

# Accesso container per debug
docker compose exec www-emblema sh

# Verifica risorse GPU
docker compose exec vllm-bge-m3 nvidia-smi

Riferimenti

Prossimi Passi

Questa pagina ti è stata utile?