Skip to Content

Self-Hosted Issues

TL;DR

Self-hosted CMDOP deployment issues commonly involve PostgreSQL connection failures, Redis connectivity, TLS certificate errors, and upgrade migration problems. Verify database connectivity with psql, check Redis with redis-cli ping, and always run pg_dump before upgrades. For Kubernetes deployments, use kubectl describe pod to diagnose CrashLoopBackOff or ImagePullBackOff errors. Enable debug logging with LOG_LEVEL=debug for detailed diagnostics.

How do I fix database connection issues?

What are the symptoms?

Error: connection to database failed

or

Error: FATAL: password authentication failed

How do I diagnose database connection problems?

# Test connection psql -h localhost -U cmdop -d cmdop -c "SELECT 1" # Check PostgreSQL is running systemctl status postgresql # Check logs tail -100 /var/log/postgresql/postgresql-15-main.log

How do I fix database connection failures?

Check Connection String

# Environment variable echo $DATABASE_URL # Should be: # postgres://cmdop:password@localhost:5432/cmdop?sslmode=disable

Verify PostgreSQL Accepts Connections

# /etc/postgresql/15/main/pg_hba.conf # Add line for local connections: host cmdop cmdop 127.0.0.1/32 scram-sha-256
# Reload configuration sudo systemctl reload postgresql

Check Firewall

# Allow local PostgreSQL sudo ufw allow from 127.0.0.1 to any port 5432

Reset Password

sudo -u postgres psql -c "ALTER USER cmdop PASSWORD 'newpassword';"

How do I fix Redis issues?

What are the symptoms?

Error: ECONNREFUSED 127.0.0.1:6379

How do I diagnose Redis problems?

# Test connection redis-cli ping # Should return: PONG # Check Redis is running systemctl status redis

How do I fix Redis connection failures?

Start Redis

sudo systemctl start redis sudo systemctl enable redis

Check Memory

redis-cli INFO memory # If memory is full redis-cli FLUSHDB # WARNING: clears data

Check Configuration

# Redis should listen on localhost grep "^bind" /etc/redis/redis.conf # Should be: bind 127.0.0.1

How do I fix certificate errors?

What are the symptoms?

Error: certificate verify failed

or

Error: x509: certificate signed by unknown authority

How do I fix certificate verification failures?

Self-Signed Certificates

# Generate new certificate openssl req -x509 -nodes -days 365 -newkey rsa:2048 \ -keyout /etc/cmdop/server.key \ -out /etc/cmdop/server.crt \ -subj "/CN=cmdop.company.com"

Agent Trusts Custom CA

# On agent machines sudo cp company-ca.crt /usr/local/share/ca-certificates/ sudo update-ca-certificates # Or set environment export CMDOP_CA_FILE=/path/to/ca.crt cmdop connect

Let’s Encrypt

# Install certbot sudo apt install certbot # Get certificate sudo certbot certonly --standalone -d cmdop.company.com # Certificate at: # /etc/letsencrypt/live/cmdop.company.com/fullchain.pem # /etc/letsencrypt/live/cmdop.company.com/privkey.pem

Certificate Renewal

# Auto-renew with cron 0 0 1 * * /usr/bin/certbot renew --quiet && systemctl restart cmdop

How do I fix upgrade failures?

What are the symptoms?

Error: migration failed

or service won’t start after upgrade

How do I fix failed upgrades?

Rollback

# Docker docker pull cmdop/control-plane:v1.2.3 # Previous version docker-compose down docker-compose up -d # Kubernetes kubectl rollout undo deployment/cmdop-control-plane

Manual Migration

# Check pending migrations cmdop-server migrate status # Run migrations cmdop-server migrate up # If failed, check logs journalctl -u cmdop -n 100

Database Backup Before Upgrade

# Always backup first pg_dump cmdop > backup-$(date +%Y%m%d).sql # Then upgrade docker-compose pull docker-compose up -d

How do I troubleshoot Kubernetes issues?

What if a pod won’t start?

# Check pod status kubectl get pods -n cmdop # Check events kubectl describe pod -n cmdop <pod-name> # Check logs kubectl logs -n cmdop <pod-name>

What are common Kubernetes errors?

How do I fix ImagePullBackOff?

# Check image exists docker pull cmdop/control-plane:latest # Check secret kubectl get secret regcred -n cmdop

How do I fix CrashLoopBackOff?

# Check logs for crash reason kubectl logs -n cmdop <pod-name> --previous

How do I fix pods stuck in Pending?

# Check resources kubectl describe pod -n cmdop <pod-name> # Common: insufficient CPU/memory # Scale down other workloads or increase node size

How do I fix Ingress issues?

# Check ingress kubectl get ingress -n cmdop kubectl describe ingress -n cmdop cmdop-ingress # Check ingress controller logs kubectl logs -n ingress-nginx <ingress-pod>

How do I configure a Service Mesh?

If using Istio/Linkerd:

# Allow gRPC apiVersion: networking.istio.io/v1alpha3 kind: DestinationRule metadata: name: cmdop-grpc spec: host: cmdop-grpc trafficPolicy: connectionPool: http: h2UpgradePolicy: UPGRADE

How do I troubleshoot Docker issues?

What if the container exits immediately?

# Check logs docker logs cmdop-control-plane # Run interactively docker run -it --rm cmdop/control-plane:latest /bin/sh

How do I fix out of memory errors?

# Check memory usage docker stats cmdop-control-plane # Increase memory limit docker-compose.yml: services: control-plane: deploy: resources: limits: memory: 2G

How do I fix volume permission errors?

# Fix permissions sudo chown -R 1000:1000 /data/cmdop # Or in compose user: "1000:1000"

How do I tune performance?

How do I optimize PostgreSQL?

-- Check slow queries SELECT query, calls, mean_time FROM pg_stat_statements ORDER BY mean_time DESC LIMIT 10;
# /etc/postgresql/15/main/postgresql.conf shared_buffers = 256MB effective_cache_size = 768MB work_mem = 4MB maintenance_work_mem = 64MB

How do I optimize Redis?

# /etc/redis/redis.conf maxmemory 256mb maxmemory-policy allkeys-lru

How do I set up connection pooling?

Use PgBouncer for high connection counts:

# /etc/pgbouncer/pgbouncer.ini [databases] cmdop = host=localhost dbname=cmdop [pgbouncer] pool_mode = transaction max_client_conn = 1000 default_pool_size = 20

How do I set up logging?

How do I enable debug logs?

# Environment variable export LOG_LEVEL=debug # Or in config log: level: debug format: json

How do I set up centralized logging?

# docker-compose.yml logging: driver: "json-file" options: max-size: "100m" max-file: "3"

How do I aggregate logs with ELK or Loki?

Forward to ELK/Loki:

# Filebeat config - type: container paths: - /var/lib/docker/containers/*/*.log processors: - add_kubernetes_metadata:

How do I back up and recover?

How do I create backups?

#!/bin/bash # backup.sh DATE=$(date +%Y%m%d) pg_dump cmdop | gzip > /backups/cmdop-$DATE.sql.gz redis-cli BGSAVE cp /var/lib/redis/dump.rdb /backups/redis-$DATE.rdb

How do I restore from a backup?

# Restore PostgreSQL gunzip -c backup.sql.gz | psql cmdop # Restore Redis sudo systemctl stop redis sudo cp redis-backup.rdb /var/lib/redis/dump.rdb sudo systemctl start redis

How do I test backup recovery?

Regularly test backups by restoring to staging environment.

How do I check system health?

How do I check API health?

curl http://localhost:8080/health

How do I check gRPC health?

grpcurl -plaintext localhost:50051 grpc.health.v1.Health/Check

How do I run a full health check?

# Combined health check script #!/bin/bash curl -sf http://localhost:8080/health || exit 1 grpcurl -plaintext localhost:50051 grpc.health.v1.Health/Check || exit 1 psql -h localhost -U cmdop -d cmdop -c "SELECT 1" || exit 1 redis-cli ping || exit 1 echo "All checks passed"
Last updated on