Skip to content

Kubernetes Documentation

Comprehensive guide to OVES Kubernetes infrastructure, cluster management, and application deployment.

Overview

OVES operates two Amazon EKS (Elastic Kubernetes Service) clusters in US-East-1: - Production Cluster: Hosts production workloads only - Development Cluster: Hosts dev applications and all third-party services

Cluster Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    US-East-1 Region                             │
│                                                                 │
│  ┌────────────────────────┐    ┌────────────────────────┐      │
│  │  Production Cluster    │    │  Development Cluster   │      │
│  │  (oves-prod)           │    │  (oves-dev)            │      │
│  │                        │    │                        │      │
│  │  ┌──────────────────┐  │    │  ┌──────────────────┐  │      │
│  │  │  Control Plane   │  │    │  │  Control Plane   │  │      │
│  │  │  (AWS Managed)   │  │    │  │  (AWS Managed)   │  │      │
│  │  └──────────────────┘  │    │  └──────────────────┘  │      │
│  │                        │    │                        │      │
│  │  ┌──────────────────┐  │    │  ┌──────────────────┐  │      │
│  │  │   Node Groups    │  │    │  │   Node Groups    │  │      │
│  │  │  - General (t3)  │  │    │  │  - General (t3)  │  │      │
│  │  │  - Compute (c5)  │  │    │  │  - Spot Instances│  │      │
│  │  │  - Memory (r5)   │  │    │  └──────────────────┘  │      │
│  │  └──────────────────┘  │    │                        │      │
│  │                        │    │  ┌──────────────────┐  │      │
│  │  Namespaces:           │    │  │  Namespaces:     │  │      │
│  │  - production          │    │  │  - development   │  │      │
│  │  - monitoring          │    │  │  - monitoring    │  │      │
│  │  - ingress-nginx       │    │  │  - logging       │  │      │
│  │  - cert-manager        │    │  │  - ingress-nginx │  │      │
│  │  - argocd              │    │  │  - cert-manager  │  │      │
│  │                        │    │  │  - argocd        │  │      │
│  └────────────────────────┘    │  │  - third-party   │  │      │
│                                │  └──────────────────┘  │      │
└─────────────────────────────────────────────────────────────────┘

Cluster Specifications

Production Cluster (oves-prod)

Control Plane: - Kubernetes Version: 1.28+ - Managed by AWS EKS - Multi-AZ for high availability - Private API endpoint access

Node Groups:

Node Group Instance Type Min Max Purpose
General t3.large 3 10 General workloads
Compute c5.xlarge 2 5 CPU-intensive apps
Memory r5.large 2 4 Memory-intensive apps

Workloads: - In-house microservices (account, auth, client, thing) - Production databases (MongoDB, Redis, PostgreSQL) - Critical third-party services

Access Control: - RBAC enabled with strict policies - Pod Security Standards: Restricted - Network Policies enforced - IAM Roles for Service Accounts (IRSA)

Development Cluster (oves-dev)

Control Plane: - Kubernetes Version: 1.28+ - Managed by AWS EKS - Single AZ (cost optimization) - Public API endpoint access

Node Groups:

Node Group Instance Type Min Max Purpose
General t3.medium 2 8 General workloads
Spot Mixed 1 5 Cost-optimized workloads

Workloads: - Development versions of microservices - All third-party services: - Grafana, Prometheus - Elasticsearch, Kibana, Logstash - InfluxDB, Loki - AlertManager, Uptime Kuma

Access Control: - RBAC enabled with relaxed policies - Namespace isolation - Developer access via kubectl

Accessing Clusters

Prerequisites

# Install kubectl
brew install kubectl

# Install AWS CLI
brew install awscli

# Configure AWS credentials
aws configure

Cluster Access

# Update kubeconfig for production
aws eks update-kubeconfig --name oves-prod --region us-east-1

# Update kubeconfig for development
aws eks update-kubeconfig --name oves-dev --region us-east-1

# Switch between clusters
kubectl config use-context arn:aws:eks:us-east-1:ACCOUNT:cluster/oves-prod
kubectl config use-context arn:aws:eks:us-east-1:ACCOUNT:cluster/oves-dev

# Verify access
kubectl cluster-info
kubectl get nodes

Namespaces

Production Namespaces

# production - Main application namespace
apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    environment: production
    monitoring: enabled

---
# monitoring - Monitoring stack
apiVersion: v1
kind: Namespace
metadata:
  name: monitoring
  labels:
    environment: production

---
# ingress-nginx - Ingress controller
apiVersion: v1
kind: Namespace
metadata:
  name: ingress-nginx

---
# cert-manager - Certificate management
apiVersion: v1
kind: Namespace
metadata:
  name: cert-manager

---
# argocd - GitOps deployment
apiVersion: v1
kind: Namespace
metadata:
  name: argocd

Development Namespaces

# development - Dev applications
apiVersion: v1
kind: Namespace
metadata:
  name: development
  labels:
    environment: development

---
# third-party - Third-party services
apiVersion: v1
kind: Namespace
metadata:
  name: third-party
  labels:
    environment: development

Deployment Patterns

1. Deployment with Service

apiVersion: apps/v1
kind: Deployment
metadata:
  name: account-microservice
  namespace: production
  labels:
    app: account-microservice
    version: v1.2.3
spec:
  replicas: 3
  selector:
    matchLabels:
      app: account-microservice
  template:
    metadata:
      labels:
        app: account-microservice
        version: v1.2.3
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "3000"
        prometheus.io/path: "/metrics"
    spec:
      serviceAccountName: account-microservice
      containers:
      - name: account-microservice
        image: ghcr.io/oves/account-microservice:v1.2.3
        ports:
        - containerPort: 3000
          name: http
        - containerPort: 9090
          name: metrics
        env:
        - name: NODE_ENV
          value: "production"
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: account-secrets
              key: database-url
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"
        livenessProbe:
          httpGet:
            path: /health
            port: 3000
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 3000
          initialDelaySeconds: 5
          periodSeconds: 5
        securityContext:
          runAsNonRoot: true
          runAsUser: 1001
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true

---
apiVersion: v1
kind: Service
metadata:
  name: account-microservice
  namespace: production
  labels:
    app: account-microservice
spec:
  selector:
    app: account-microservice
  ports:
  - port: 80
    targetPort: 3000
    protocol: TCP
    name: http
  - port: 9090
    targetPort: 9090
    protocol: TCP
    name: metrics
  type: ClusterIP

2. StatefulSet for Databases

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mongodb
  namespace: production
spec:
  serviceName: mongodb
  replicas: 3
  selector:
    matchLabels:
      app: mongodb
  template:
    metadata:
      labels:
        app: mongodb
    spec:
      containers:
      - name: mongodb
        image: mongo:6
        ports:
        - containerPort: 27017
          name: mongodb
        volumeMounts:
        - name: data
          mountPath: /data/db
        resources:
          requests:
            memory: "1Gi"
            cpu: "500m"
          limits:
            memory: "2Gi"
            cpu: "1000m"
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: ["ReadWriteOnce"]
      storageClassName: ebs-sc
      resources:
        requests:
          storage: 100Gi

3. ConfigMap and Secrets

# ConfigMap for configuration
apiVersion: v1
kind: ConfigMap
metadata:
  name: account-config
  namespace: production
data:
  app.conf: |
    server {
      port: 3000
      timeout: 30s
    }
  features.json: |
    {
      "newFeature": true,
      "betaFeatures": false
    }

---
# Secret for sensitive data
apiVersion: v1
kind: Secret
metadata:
  name: account-secrets
  namespace: production
type: Opaque
stringData:
  database-url: "mongodb://user:pass@mongo:27017/accounts"
  redis-url: "redis://redis:6379"
  api-key: "secret-api-key"

Ingress Configuration

NGINX Ingress Controller

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: account-microservice
  namespace: production
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
    nginx.ingress.kubernetes.io/rate-limit: "100"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - api.omnivoltaic.com
    secretName: api-tls
  rules:
  - host: api.omnivoltaic.com
    http:
      paths:
      - path: /account
        pathType: Prefix
        backend:
          service:
            name: account-microservice
            port:
              number: 80
      - path: /auth
        pathType: Prefix
        backend:
          service:
            name: auth-microservice
            port:
              number: 80

TCP Services (Load Balancer)

apiVersion: v1
kind: Service
metadata:
  name: mqtt-broker
  namespace: production
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
    service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
spec:
  type: LoadBalancer
  selector:
    app: mqtt-broker
  ports:
  - port: 1883
    targetPort: 1883
    protocol: TCP
    name: mqtt
  - port: 8883
    targetPort: 8883
    protocol: TCP
    name: mqtts

Storage

StorageClass

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ebs-sc
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  iops: "3000"
  throughput: "125"
  encrypted: "true"
  kmsKeyId: "arn:aws:kms:us-east-1:ACCOUNT:key/KEY-ID"
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
reclaimPolicy: Retain

PersistentVolumeClaim

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mongodb-data
  namespace: production
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: ebs-sc
  resources:
    requests:
      storage: 100Gi

Auto-Scaling

Horizontal Pod Autoscaler

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: account-microservice-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: account-microservice
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15
      - type: Pods
        value: 2
        periodSeconds: 15
      selectPolicy: Max

Cluster Autoscaler

Automatically scales node groups based on pod demands.

Configuration:

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-autoscaler-priority-expander
  namespace: kube-system
data:
  priorities: |-
    10:
      - .*-spot-.*
    50:
      - .*-general-.*
    100:
      - .*-compute-.*

RBAC Configuration

ServiceAccount

apiVersion: v1
kind: ServiceAccount
metadata:
  name: account-microservice
  namespace: production
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::ACCOUNT:role/account-microservice-role

Role and RoleBinding

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: account-microservice-role
  namespace: production
rules:
- apiGroups: [""]
  resources: ["configmaps", "secrets"]
  verbs: ["get", "list"]
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list", "watch"]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: account-microservice-binding
  namespace: production
subjects:
- kind: ServiceAccount
  name: account-microservice
  namespace: production
roleRef:
  kind: Role
  name: account-microservice-role
  apiGroup: rbac.authorization.k8s.io

Common Operations

Deployments

# Apply manifests
kubectl apply -f deployment.yaml

# Update image
kubectl set image deployment/account-microservice \
  account-microservice=ghcr.io/oves/account-microservice:v1.2.4 \
  -n production

# Scale deployment
kubectl scale deployment/account-microservice --replicas=5 -n production

# Rollout status
kubectl rollout status deployment/account-microservice -n production

# Rollback
kubectl rollout undo deployment/account-microservice -n production

# Restart deployment
kubectl rollout restart deployment/account-microservice -n production

Debugging

# View pods
kubectl get pods -n production

# Describe pod
kubectl describe pod account-microservice-abc123 -n production

# View logs
kubectl logs account-microservice-abc123 -n production

# Follow logs
kubectl logs -f account-microservice-abc123 -n production

# Previous container logs
kubectl logs account-microservice-abc123 -n production --previous

# Execute command in pod
kubectl exec -it account-microservice-abc123 -n production -- sh

# Port forward
kubectl port-forward account-microservice-abc123 3000:3000 -n production

# Copy files
kubectl cp account-microservice-abc123:/app/logs/app.log ./app.log -n production

Resource Management

# View resource usage
kubectl top nodes
kubectl top pods -n production

# View events
kubectl get events -n production --sort-by='.lastTimestamp'

# View all resources
kubectl get all -n production

# Delete resources
kubectl delete deployment account-microservice -n production
kubectl delete -f deployment.yaml

Troubleshooting

Pod Not Starting

# Check pod status
kubectl get pods -n production

# Describe pod for events
kubectl describe pod POD_NAME -n production

# Check logs
kubectl logs POD_NAME -n production

# Common issues:
# - Image pull errors: Check image name and registry access
# - CrashLoopBackOff: Check application logs
# - Pending: Check resource requests and node capacity

Service Not Accessible

# Check service
kubectl get svc -n production

# Check endpoints
kubectl get endpoints account-microservice -n production

# Test from another pod
kubectl run -it --rm debug --image=busybox --restart=Never -- \
  wget -O- http://account-microservice.production.svc.cluster.local

# Check ingress
kubectl get ingress -n production
kubectl describe ingress account-microservice -n production

Storage Issues

# Check PVC status
kubectl get pvc -n production

# Describe PVC
kubectl describe pvc mongodb-data -n production

# Check PV
kubectl get pv

# Check StorageClass
kubectl get storageclass

Best Practices

1. Resource Limits

Always set resource requests and limits:

resources:
  requests:
    memory: "256Mi"
    cpu: "250m"
  limits:
    memory: "512Mi"
    cpu: "500m"

2. Health Checks

Implement liveness and readiness probes:

livenessProbe:
  httpGet:
    path: /health
    port: 3000
readinessProbe:
  httpGet:
    path: /ready
    port: 3000

3. Security

  • Run as non-root user
  • Use read-only root filesystem
  • Enable Pod Security Standards
  • Use Network Policies

4. Labels and Annotations

Use consistent labels:

labels:
  app: account-microservice
  version: v1.2.3
  environment: production
  team: backend