Kubernetes Documentation¶
Comprehensive guide to OVES Kubernetes infrastructure, cluster management, and application deployment.
Overview¶
OVES operates two Amazon EKS (Elastic Kubernetes Service) clusters in US-East-1: - Production Cluster: Hosts production workloads only - Development Cluster: Hosts dev applications and all third-party services
Cluster Architecture¶
┌─────────────────────────────────────────────────────────────────┐
│ US-East-1 Region │
│ │
│ ┌────────────────────────┐ ┌────────────────────────┐ │
│ │ Production Cluster │ │ Development Cluster │ │
│ │ (oves-prod) │ │ (oves-dev) │ │
│ │ │ │ │ │
│ │ ┌──────────────────┐ │ │ ┌──────────────────┐ │ │
│ │ │ Control Plane │ │ │ │ Control Plane │ │ │
│ │ │ (AWS Managed) │ │ │ │ (AWS Managed) │ │ │
│ │ └──────────────────┘ │ │ └──────────────────┘ │ │
│ │ │ │ │ │
│ │ ┌──────────────────┐ │ │ ┌──────────────────┐ │ │
│ │ │ Node Groups │ │ │ │ Node Groups │ │ │
│ │ │ - General (t3) │ │ │ │ - General (t3) │ │ │
│ │ │ - Compute (c5) │ │ │ │ - Spot Instances│ │ │
│ │ │ - Memory (r5) │ │ │ └──────────────────┘ │ │
│ │ └──────────────────┘ │ │ │ │
│ │ │ │ ┌──────────────────┐ │ │
│ │ Namespaces: │ │ │ Namespaces: │ │ │
│ │ - production │ │ │ - development │ │ │
│ │ - monitoring │ │ │ - monitoring │ │ │
│ │ - ingress-nginx │ │ │ - logging │ │ │
│ │ - cert-manager │ │ │ - ingress-nginx │ │ │
│ │ - argocd │ │ │ - cert-manager │ │ │
│ │ │ │ │ - argocd │ │ │
│ └────────────────────────┘ │ │ - third-party │ │ │
│ │ └──────────────────┘ │ │
└─────────────────────────────────────────────────────────────────┘
Cluster Specifications¶
Production Cluster (oves-prod)¶
Control Plane: - Kubernetes Version: 1.28+ - Managed by AWS EKS - Multi-AZ for high availability - Private API endpoint access
Node Groups:
| Node Group | Instance Type | Min | Max | Purpose |
|---|---|---|---|---|
| General | t3.large | 3 | 10 | General workloads |
| Compute | c5.xlarge | 2 | 5 | CPU-intensive apps |
| Memory | r5.large | 2 | 4 | Memory-intensive apps |
Workloads: - In-house microservices (account, auth, client, thing) - Production databases (MongoDB, Redis, PostgreSQL) - Critical third-party services
Access Control: - RBAC enabled with strict policies - Pod Security Standards: Restricted - Network Policies enforced - IAM Roles for Service Accounts (IRSA)
Development Cluster (oves-dev)¶
Control Plane: - Kubernetes Version: 1.28+ - Managed by AWS EKS - Single AZ (cost optimization) - Public API endpoint access
Node Groups:
| Node Group | Instance Type | Min | Max | Purpose |
|---|---|---|---|---|
| General | t3.medium | 2 | 8 | General workloads |
| Spot | Mixed | 1 | 5 | Cost-optimized workloads |
Workloads: - Development versions of microservices - All third-party services: - Grafana, Prometheus - Elasticsearch, Kibana, Logstash - InfluxDB, Loki - AlertManager, Uptime Kuma
Access Control: - RBAC enabled with relaxed policies - Namespace isolation - Developer access via kubectl
Accessing Clusters¶
Prerequisites¶
# Install kubectl
brew install kubectl
# Install AWS CLI
brew install awscli
# Configure AWS credentials
aws configure
Cluster Access¶
# Update kubeconfig for production
aws eks update-kubeconfig --name oves-prod --region us-east-1
# Update kubeconfig for development
aws eks update-kubeconfig --name oves-dev --region us-east-1
# Switch between clusters
kubectl config use-context arn:aws:eks:us-east-1:ACCOUNT:cluster/oves-prod
kubectl config use-context arn:aws:eks:us-east-1:ACCOUNT:cluster/oves-dev
# Verify access
kubectl cluster-info
kubectl get nodes
Namespaces¶
Production Namespaces¶
# production - Main application namespace
apiVersion: v1
kind: Namespace
metadata:
name: production
labels:
environment: production
monitoring: enabled
---
# monitoring - Monitoring stack
apiVersion: v1
kind: Namespace
metadata:
name: monitoring
labels:
environment: production
---
# ingress-nginx - Ingress controller
apiVersion: v1
kind: Namespace
metadata:
name: ingress-nginx
---
# cert-manager - Certificate management
apiVersion: v1
kind: Namespace
metadata:
name: cert-manager
---
# argocd - GitOps deployment
apiVersion: v1
kind: Namespace
metadata:
name: argocd
Development Namespaces¶
# development - Dev applications
apiVersion: v1
kind: Namespace
metadata:
name: development
labels:
environment: development
---
# third-party - Third-party services
apiVersion: v1
kind: Namespace
metadata:
name: third-party
labels:
environment: development
Deployment Patterns¶
1. Deployment with Service¶
apiVersion: apps/v1
kind: Deployment
metadata:
name: account-microservice
namespace: production
labels:
app: account-microservice
version: v1.2.3
spec:
replicas: 3
selector:
matchLabels:
app: account-microservice
template:
metadata:
labels:
app: account-microservice
version: v1.2.3
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "3000"
prometheus.io/path: "/metrics"
spec:
serviceAccountName: account-microservice
containers:
- name: account-microservice
image: ghcr.io/oves/account-microservice:v1.2.3
ports:
- containerPort: 3000
name: http
- containerPort: 9090
name: metrics
env:
- name: NODE_ENV
value: "production"
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: account-secrets
key: database-url
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 3000
initialDelaySeconds: 5
periodSeconds: 5
securityContext:
runAsNonRoot: true
runAsUser: 1001
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
---
apiVersion: v1
kind: Service
metadata:
name: account-microservice
namespace: production
labels:
app: account-microservice
spec:
selector:
app: account-microservice
ports:
- port: 80
targetPort: 3000
protocol: TCP
name: http
- port: 9090
targetPort: 9090
protocol: TCP
name: metrics
type: ClusterIP
2. StatefulSet for Databases¶
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mongodb
namespace: production
spec:
serviceName: mongodb
replicas: 3
selector:
matchLabels:
app: mongodb
template:
metadata:
labels:
app: mongodb
spec:
containers:
- name: mongodb
image: mongo:6
ports:
- containerPort: 27017
name: mongodb
volumeMounts:
- name: data
mountPath: /data/db
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "1000m"
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: ebs-sc
resources:
requests:
storage: 100Gi
3. ConfigMap and Secrets¶
# ConfigMap for configuration
apiVersion: v1
kind: ConfigMap
metadata:
name: account-config
namespace: production
data:
app.conf: |
server {
port: 3000
timeout: 30s
}
features.json: |
{
"newFeature": true,
"betaFeatures": false
}
---
# Secret for sensitive data
apiVersion: v1
kind: Secret
metadata:
name: account-secrets
namespace: production
type: Opaque
stringData:
database-url: "mongodb://user:pass@mongo:27017/accounts"
redis-url: "redis://redis:6379"
api-key: "secret-api-key"
Ingress Configuration¶
NGINX Ingress Controller¶
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: account-microservice
namespace: production
annotations:
cert-manager.io/cluster-issuer: "letsencrypt-prod"
nginx.ingress.kubernetes.io/rate-limit: "100"
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
spec:
ingressClassName: nginx
tls:
- hosts:
- api.omnivoltaic.com
secretName: api-tls
rules:
- host: api.omnivoltaic.com
http:
paths:
- path: /account
pathType: Prefix
backend:
service:
name: account-microservice
port:
number: 80
- path: /auth
pathType: Prefix
backend:
service:
name: auth-microservice
port:
number: 80
TCP Services (Load Balancer)¶
apiVersion: v1
kind: Service
metadata:
name: mqtt-broker
namespace: production
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
spec:
type: LoadBalancer
selector:
app: mqtt-broker
ports:
- port: 1883
targetPort: 1883
protocol: TCP
name: mqtt
- port: 8883
targetPort: 8883
protocol: TCP
name: mqtts
Storage¶
StorageClass¶
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ebs-sc
provisioner: ebs.csi.aws.com
parameters:
type: gp3
iops: "3000"
throughput: "125"
encrypted: "true"
kmsKeyId: "arn:aws:kms:us-east-1:ACCOUNT:key/KEY-ID"
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
reclaimPolicy: Retain
PersistentVolumeClaim¶
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mongodb-data
namespace: production
spec:
accessModes:
- ReadWriteOnce
storageClassName: ebs-sc
resources:
requests:
storage: 100Gi
Auto-Scaling¶
Horizontal Pod Autoscaler¶
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: account-microservice-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: account-microservice
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 50
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15
- type: Pods
value: 2
periodSeconds: 15
selectPolicy: Max
Cluster Autoscaler¶
Automatically scales node groups based on pod demands.
Configuration:
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-autoscaler-priority-expander
namespace: kube-system
data:
priorities: |-
10:
- .*-spot-.*
50:
- .*-general-.*
100:
- .*-compute-.*
RBAC Configuration¶
ServiceAccount¶
apiVersion: v1
kind: ServiceAccount
metadata:
name: account-microservice
namespace: production
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::ACCOUNT:role/account-microservice-role
Role and RoleBinding¶
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: account-microservice-role
namespace: production
rules:
- apiGroups: [""]
resources: ["configmaps", "secrets"]
verbs: ["get", "list"]
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: account-microservice-binding
namespace: production
subjects:
- kind: ServiceAccount
name: account-microservice
namespace: production
roleRef:
kind: Role
name: account-microservice-role
apiGroup: rbac.authorization.k8s.io
Common Operations¶
Deployments¶
# Apply manifests
kubectl apply -f deployment.yaml
# Update image
kubectl set image deployment/account-microservice \
account-microservice=ghcr.io/oves/account-microservice:v1.2.4 \
-n production
# Scale deployment
kubectl scale deployment/account-microservice --replicas=5 -n production
# Rollout status
kubectl rollout status deployment/account-microservice -n production
# Rollback
kubectl rollout undo deployment/account-microservice -n production
# Restart deployment
kubectl rollout restart deployment/account-microservice -n production
Debugging¶
# View pods
kubectl get pods -n production
# Describe pod
kubectl describe pod account-microservice-abc123 -n production
# View logs
kubectl logs account-microservice-abc123 -n production
# Follow logs
kubectl logs -f account-microservice-abc123 -n production
# Previous container logs
kubectl logs account-microservice-abc123 -n production --previous
# Execute command in pod
kubectl exec -it account-microservice-abc123 -n production -- sh
# Port forward
kubectl port-forward account-microservice-abc123 3000:3000 -n production
# Copy files
kubectl cp account-microservice-abc123:/app/logs/app.log ./app.log -n production
Resource Management¶
# View resource usage
kubectl top nodes
kubectl top pods -n production
# View events
kubectl get events -n production --sort-by='.lastTimestamp'
# View all resources
kubectl get all -n production
# Delete resources
kubectl delete deployment account-microservice -n production
kubectl delete -f deployment.yaml
Troubleshooting¶
Pod Not Starting¶
# Check pod status
kubectl get pods -n production
# Describe pod for events
kubectl describe pod POD_NAME -n production
# Check logs
kubectl logs POD_NAME -n production
# Common issues:
# - Image pull errors: Check image name and registry access
# - CrashLoopBackOff: Check application logs
# - Pending: Check resource requests and node capacity
Service Not Accessible¶
# Check service
kubectl get svc -n production
# Check endpoints
kubectl get endpoints account-microservice -n production
# Test from another pod
kubectl run -it --rm debug --image=busybox --restart=Never -- \
wget -O- http://account-microservice.production.svc.cluster.local
# Check ingress
kubectl get ingress -n production
kubectl describe ingress account-microservice -n production
Storage Issues¶
# Check PVC status
kubectl get pvc -n production
# Describe PVC
kubectl describe pvc mongodb-data -n production
# Check PV
kubectl get pv
# Check StorageClass
kubectl get storageclass
Best Practices¶
1. Resource Limits¶
Always set resource requests and limits:
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
2. Health Checks¶
Implement liveness and readiness probes:
livenessProbe:
httpGet:
path: /health
port: 3000
readinessProbe:
httpGet:
path: /ready
port: 3000
3. Security¶
- Run as non-root user
- Use read-only root filesystem
- Enable Pod Security Standards
- Use Network Policies
4. Labels and Annotations¶
Use consistent labels:
labels:
app: account-microservice
version: v1.2.3
environment: production
team: backend