How can I distribute a deployment across nodes?

KubernetesGoogle Cloud-Platform

Kubernetes Problem Overview


I have a Kubernetes deployment that looks something like this (replaced names and other things with '....'):

# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "3"
    kubernetes.io/change-cause: kubectl replace deployment ....
      -f - --record
  creationTimestamp: 2016-08-20T03:46:28Z
  generation: 8
  labels:
    app: ....
  name: ....
  namespace: default
  resourceVersion: "369219"
  selfLink: /apis/extensions/v1beta1/namespaces/default/deployments/....
  uid: aceb2a9e-6688-11e6-b5fc-42010af000c1
spec:
  replicas: 2
  selector:
    matchLabels:
      app: ....
  strategy:
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: ....
    spec:
      containers:
      - image: gcr.io/..../....:0.2.1
        imagePullPolicy: IfNotPresent
        name: ....
        ports:
        - containerPort: 8080
          protocol: TCP
        resources:
          requests:
            cpu: "0"
        terminationMessagePath: /dev/termination-log
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      securityContext: {}
      terminationGracePeriodSeconds: 30
status:
  availableReplicas: 2
  observedGeneration: 8
  replicas: 2
  updatedReplicas: 2

The problem I'm observing is that Kubernetes places both replicas (in the deployment I've asked for two) on the same node. If that node goes down, I lose both containers and the service goes offline.

What I want Kubernetes to do is to ensure that it doesn't double up containers on the same node where the containers are the same type - this only consumes resources and doesn't provide any redundancy. I've looked through the documentation on deployments, replica sets, nodes etc. but I couldn't find any options that would let me tell Kubernetes to do this.

Is there a way to tell Kubernetes how much redundancy across nodes I want for a container?

EDIT: I'm not sure labels will work; labels constrain where a node will run so that it has access to local resources (SSDs) etc. All I want to do is ensure no downtime if a node goes offline.

Kubernetes Solutions


Solution 1 - Kubernetes

There is now a proper way of doing this. You can use the label in "kubernetes.io/hostname" if you just want to spread it out across all nodes. Meaning if you have two replicas of a pod, and two nodes, each should get one if their names aren't the same.

Example:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-service
  labels:
    app: my-service
spec:
  replicas: 2
  selector:
    matchLabels:
      app: my-service
  template:
    metadata:
      labels:
        app: my-service
    spec:
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: kubernetes.io/hostname
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app: my-service
      containers:
      - name: pause
        image: k8s.gcr.io/pause:3.1

Solution 2 - Kubernetes

I think you're looking for the Affinity/Anti-Affinity Selectors.

Affinity is for co-locating pods, so I want my website to try and schedule on the same host as my cache for example. On the other hand, Anti-affinity is the opposite, don't schedule on a host as per a set of rules.

So for what you're doing, I would take a closer look at this two links: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#never-co-located-in-the-same-node

https://kubernetes.io/docs/tutorials/stateful-application/zookeeper/#tolerating-node-failure

Solution 3 - Kubernetes

If you create a Service for that Deployment, before creating the said Deployment, Kubernetes will spread your pods across nodes. This behavior comes from the Scheduler, it is provided on a best-effort basis, providing that you have enough resources available on both nodes.

From the Kubernetes documentation (Managing Resources): > it’s best to specify the service first, since that will ensure the scheduler can spread the pods associated with the service as they are created by the controller(s), such as Deployment.

Also related: Configuration best practices - Service.

Solution 4 - Kubernetes

I agree with Antoine Cotten to use a service for your deployment. A service always keeps any service up by creating a new pod if, for some reason, one pod is dying in a certain node. However, if you just want to distribute a deployment among all nodes then you can use pod anti affinity in your pod manifest file. I put an example on my gitlab page that you can also find in Kubernetes Blog. For your convenience, I'm providing the example here as well.

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: nginx
spec:
  replicas: 2
  template:
    metadata:
      labels:
        app: nginx
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - nginx
              topologyKey: kubernetes.io/hostname
      containers:
      - name: nginx
        image: gcr.io/google_containers/nginx-slim:0.8
        ports:
        - containerPort: 80

In this example, each Deployment has a label which is app and the value of this label is nginx. In pod spec, you have podAntiAffinity that will restrict to have two same pods (label app:nginx) in one node. You can also use podAffinity if you would like to place multiple Deployments in one node.

Solution 5 - Kubernetes

If a node goes down, any pods running on it would be restarted automatically on another node.

If you start specifying exactly where you want them to run, then you actually loose the capability of Kubernetes to reschedule them on a different node.

The usual practice therefore is to simply let Kubernetes do its thing.

If however you do have valid requirements to run a pod on a specific node, due to requirements for certain local volume type etc, have a read of:

Solution 6 - Kubernetes

Maybe a DaemonSet will work better. I'm using DaemonStets with nodeSelector to run pods on specific nodes and avoid duplication.

http://kubernetes.io/docs/admin/daemons/

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionJune RhodesView Question on Stackoverflow
Solution 1 - KubernetesAnton BlomströmView Answer on Stackoverflow
Solution 2 - KubernetesKevin NisbetView Answer on Stackoverflow
Solution 3 - KubernetesAntoine CottenView Answer on Stackoverflow
Solution 4 - KubernetesAbu ShoebView Answer on Stackoverflow
Solution 5 - KubernetesGraham DumpletonView Answer on Stackoverflow
Solution 6 - KubernetesCamilView Answer on Stackoverflow