Kubernetes

February 22, 2021

Table of Contents

Install and Deploy

install kubectl
install minikube

Commands:

minikube start  
minikube status

kubectl get node
kubectl get pods

kubectl create deployment hello-minikube --image=k8s.gcr.io/echoserver:1.4
kubectl expose deployment hello-minikube --type=NodePort --port=8080
minikube service hello-minikube --url
curl http://192.168.49.2:31223
kubectl delete deployments.apps hello-minikube 

minikube pause
minikube unpause
minikube stop

Namespaces

for organization and resource separation
kubectl --namespace=mystuff or kubectl -n=mystuff
kubectl --all-namespaces

Default namespaces for new clusters:

$ kubectl get ns  
NAME              STATUS   AGE
default           Active   13m      # k8s resources are crated here by default
kube-node-lease   Active   13m      # storage for node lease information
kube-public       Active   13m      # world-readable
kube-system       Active   13m      # infrastructure pods

Commands:

kubectl get all --all-namespaces    # all objects in all ns
kubectl get pods -n default         # all pods in default ns
kubectl api-resources | grep -iE 'namespace|KIND'  # get res name: Namespace
kubectl explain Namespace | head -n 2              # get its version

# create app-ns.yml
kubectl create -f app-ns.yml 
kubectl create ns dev
kubectl get ns
kubectl describe ns dev
kubectl get ns default -o yaml

# create nginx-app.yml
kubectl create -f nginx-app.yml

kubectl get pods -n default nginx-app   # not found
kubectl get pods -n app nginx-app       # found

kubectl delete ns dev                       # delete everything in a ns incl. 
kubectl delete pods -n app --all            # keep namespace, delete all pods
kubectl delete all -n app --all             # delete all

Pods

declarative way: uses yaml files, like nginx-app.yml above
imperative way: uses kubectl
a pod contains containers. Usually one. If a pod has more than one containers, they are always executed on a single worker, never span to multiple workers.

Examples:

kubectl create -f nginx.yml
kubectl get pods -o wide
kubectl describe pod nginx          # shows the node
docker ps                           # get status on the node
kubectl exec -it nginx -c nginx -- /bin/bash    

kubectl port-forward nginx 8000:80

multi-container Pods use-cases

sidecar container: enhances primary app, e.g. logging. Both are closely related.
ambassador container: a container represents a primary container for outside e.g. proxy
adapter container: to adapt/normalize traffic for other apps inside

Examples:

# create create-sidecar.yml
kubectl create -f create-sidecar.yml 
kubectl get pods -o wide

$ kubectl exec -it sidecar-pod -c sidecar -- /bin/bash
[root@sidecar-pod /]# curl http://localhost/date.txt
Sun Feb 21 08:53:05 UTC 2021
Sun Feb 21 08:53:15 UTC 2021
Sun Feb 21 08:53:25 UTC 2021
Sun Feb 21 08:53:35 UTC 2021
Sun Feb 21 08:53:45 UTC 2021
Sun Feb 21 08:53:55 UTC 2021
Sun Feb 21 08:54:05 UTC 2021
Sun Feb 21 08:54:15 UTC 2021
Sun Feb 21 08:54:25 UTC 2021
Sun Feb 21 08:54:35 UTC 2021
Sun Feb 21 08:54:45 UTC 2021
[root@sidecar-pod /]# 

kubectl logs sidecar-pod sidecar    # show logs
kubectl delete pod nginx

Jobs

completitions=m         # stop after `m`
parallelism=n           # `n` jobs are started

Examples:

kubectl explain Job

# create k8s/pod-simple-job.yml
kubectl create -f k8s/pod-simple-job.yml
kubectl get pods        # each execution will add one more pod
kubectl get jobs        
kubectl delete job pod-simple-job

.spec.activeDeadlineSeconds to set an execution deadline no matter how many pods are created.

Init containers

run in the same pod as main container
to complete a task before the regular container is started
if a pod restarts, all init containers are executed again

Example:

$ cat pod-init-container.yml
...

$ k create -f pod-init-container.yml 

$ k get pods
NAME                       READY   STATUS     RESTARTS   AGE
init-container-example-1   0/1     Init:0/1   0          58s

Rolling Updates

A Replica Set manages pods. A Deployment manages a Replica Set.
property: minReadySeconds how long the pod should be ready before it is treated as available. Until it is available, the rollout will not continue.
maxSurge - max number of pods that can be created over the desired number of ReplicaSet before updating
maxUnavailable - max number of pods that can be unavailable during rollout

Examples:

k create deployment nginx-deploy --image=nginx --dry-run=client -o yaml > nginx-deploy.yml
# adapt nginx-deploy.yml

k create -f nginx-deploy.yml
k get deployments
k get rs            # replica set

# rolling deployment example
k create -f rolling-nginx.yml
k get pods
k get deployments
k get event --field-selector involvedObject.name=rolling-nginx-74cf96d8bb-bn9jq     # filter for pecific pod

k rollout history deployment rolling-nginx  # show rollout history (no record)

k delete deployments.apps rolling-nginx 

k create -f rolling-nginx.yml --record                          # add change cause to history
k set image deployment rolling-nginx nginx=nginx:1.15 --record  # changes will be recorded

kubectl set image deployment rolling-nginx nginx=nginx:1.15 --record

k rollout status deployment rolling-nginx                       # status of deployment

# pause and resume deployment
$ k set image deployment rolling-nginx nginx=nginx:1.16 --record
$ k rollout pause deployment rolling-nginx
$ k rollout status deployment rolling-nginx
Waiting for deployment "rolling-nginx" rollout to finish: 2 out of 4 new replicas have been updated...
$ k get pods -l app=rolling-nginx
NAME                             READY   STATUS              RESTARTS   AGE
rolling-nginx-74cf96d8bb-4pbs2   1/1     Running             0          88s
rolling-nginx-74cf96d8bb-8t2gf   1/1     Running             0          90s
rolling-nginx-74cf96d8bb-jklcv   1/1     Running             0          90s
rolling-nginx-765c4fc67d-dfqgs   0/1     ContainerCreating   0          20s
rolling-nginx-765c4fc67d-nm5sr   0/1     ContainerCreating   0          20s
$ kubectl rollout resume deployment rolling-nginx

# rollback
$ k rollout undo deployment rolling-nginx --to-revision=2

Labels

key: value pairs for categorization.
Annotations are labels that won’t need to be queried against.

Selectors:

tier = frontend
tier != frontend
tier != frontend, game = super-shooter-2

environment in (production, qa)
tier notin (frontend, backend)
partition  # all pods that have partition label - no mater what value

env in (prod, qa), tier notin (fe, be), partition   # joining

kubectl examples:

# Assign labels while creating new objects
k create deployment label-nginx-example --image=nginx --dry-run=client -o yaml > label-nginx-example.yml
# edit the file
k create -f label-nginx-example.yml 
k get deployments --show-labels
k get pods --show-labels 

# Assign new label to existing pod runtime as patch
k patch deployment label-nginx-example --patch "$(cat update-label.yml)"
k describe deployment label-nginx-example
k get pods --show-labels 

# Assign a new label to existing deployments using kubectl
k create -f nginx-deploy.yml 
k get deployments.apps --show-labels 
k label deployment nginx-deploy tier=backend
k get deployments.apps --show-labels    # `tier` added for nginx-deploy

# List resource objects
k get pods --selector 'app=prod'
k get pods --selector 'app=dev' 
k get deployment --selector "app in (prod, dev)"

# Removing labels
k label deployment nginx-deploy type-
k get deployments.apps --show-labels

Replication Controller and Replica Sets

A replication controller is a k8s resource that ensures that its pods are always running.
It ensures that exact number of pods always matches its label selector.
If a Node is out of resources for creating new pods, it will automatically create new one on another available cluster node.

Examples:

# create replication-controller.yml
kubectl api-resources | grep -iE 'KIND|replication'
kubectl explain ReplicationController | head -n2    # find out version
kubectl create -f replication-controller.yml 
kubectl get pods            # three new pods
kubectl get rc              # status and list of available rc

k delete pod myapp-rc-mgztn
k get pods                  # one is new
k get pods -o wide
k describe rc myapp-rc

# changing the pod template will affect only newly created pods
k edit rc myapp-rc      # reduce replicas to 2
k get pods              # one is terminating
k get rc                # display status

k scale rc myapp-rc --replicas=6        # horizontal scaling

k delete rc myapp-rc --cascade=false    # to keep its pods running

ReplicaSets

ReplicaSet deprecates ReplicationController. It has more expressive pod selectors. It can match pods lacking a certain selector or just having a key regardless of the value.
Pods aren’t owned by RC/RS and can be moved between them if necessary.

Examples:

k api-resources | grep -iE 'KIND|replica'
k explain ReplicaSet | head -n 2

k apply -f replica-set.yml      # to manage orphaned pods `app=myapp`
k get rs
k describe rs myapp-replicaset

k delete rs myapp-replicaset
k apply -f replica-set2.yml 
kubectl scale rs myapp-replicaset --replicas=6
k delete rs myapp-replicaset

Kubernetes Services

A Kubernetes Service provides a single, constant (pod are ephemeral) entry point to a group of pods providing the same service.
The kube-proxy agent on the nodes watches the k8s API for new services and endpoints.

DaemonSets

A DaemonSet ensures that a Pod is running across a set of nodes.
Deploy system daemons like log collectors and monitoring agents.

Examples:

kubectl api-resources | grep -iE 'KIND|daemonse'
kubectl explain DaemonSet | head -n 2

k create -f fluentd-daemonset.yml
k get ds            # inspect daemonsets
k get pods -o wide

# deploy on specific nodes only (label `sdd: true`)
k create -f nginx-daemonset.yml 
k get ds                        # desired = 0
k get nodes --show-labels
k label nodes minikube ssd=true
k get ds                        # desired = 1
k label nodes minikube ssd-     # remove label

k get ds
k delete ds nginx-fast-storage fluentd   # use --cascade to keep pods

Scheduling Jobs

kubectl api-resources | grep -iE 'KIND|cron'
kubectl explain cronjob.spec
kubectl explain cronjob.spec.schedule   # alternative

# create pod-cronjob.yml
k create -f pod-cronjob.yml

k get cronjobs.batch        # list avail. jobs
k get jobs --watch          # monitor status
k get pods                  # show completed pods

k delete cronjobs.batch pod-cronjob

Volumes

pods have isolated FS.
Storage volumes aren’t top level resources like pods, but are defined a s components of the pod.
Aren’t standalone kubernetes objects and cannot be created or deleted on their own.
Volumes live with a Pod across a container life cycle.

Types of volumes (Volume Type: Storage Provider):

emptyDir: Localhost. Simplest volume type. Will be ereased, when the Pod is removed.
hostPath: Localhost
glusterfs: GlusterFS cluster
downwardAPI: Kubernets Pod Information
nfs: NFS server
awsElasticBlockStore: AWS Elastic Block Store
gcePersistentDsik: Google Compute Engine persistent disk
azureDisk: Azure disk storage
projected: Kubernetes resources; currently: secret, downwardAPI and configMap
secret: K8s secret resource
vSphereVolume: vSphere VMDK volume
gitRepo: git repository (volume content will be deleted when Pod is removed)

Once you define volumes in volumes section, you can start using them in the volumeMounts section.

Examples:

k create -f shared-volume-emptyDir.yml
k get pods shared-volume-emptydir 
k get pod shared-volume-emptydir -o json

k exec -it shared-volume-emptydir -c alpine1 -- touch /alpine1/someFile.txt
k exec -it shared-volume-emptydir -c alpine2 -- ls -l /alpine2

To create the dir in memory using tmpfs:

volumes:
- name: data
- emptydir:
    medium: Memory

This gives instead:

$ k exec -it shared-volume-memory -c alpine2 -- df -h /alpine2
Filesystem                Size      Used Available Use% Mounted on
tmpfs                     1.4G         0      1.4G   0% /alpine2

Instead of original:

$ k exec -it shared-volume-memory -c alpine2 -- df -h /alpine2
/dev/mapper/rhel-root
                     10.3G      7.9G      1.8G  81% /alpine2

hostPath is the first type of persistent storage. Its contents are stored on a node’s FS. It is not a good idea to use hostPath for regular poth, because it makes the app sensitive to pod scheduling. Examples:

k create -f shared-volume-hostpath.yml 
k get pods shared-volume-hostpath -o wide   # get the node IP
k exec -it shared-volume-hostpath -c alpine1 -- touch /alpine1/someFile.txt      

[root@nodeIP]# ls -l /tmp/data/
total 0
-rw-r--r-- 1 root root 0 Jan  7 15:26 someFile.txt

NFS

First make sure that nfs-utils package is installed on k8s minions. Check /etc/exports and make sure it can be mounted using nfs -t nfs server:share mountpoint. Examples:

kubectl create -f shared-volume-nfs.yml kubectl get pods shared-volume-nfs -o wide # wait for Running state kubectl describe pod # check mounting status

K8s actually mounts server:share into /var/lib/kubelet/pods/<id>/volumes/kubernetes.io-nfs/nfs and then mounts it into the container as the destination /<mount-point>.

Persistent Volumes

Capacity:

K (kilobyte: 1000 bytes)
Ki (kibibyte: 1024 bytes)

Volume mode is either Filesystem (default) or Block.

Access Modes:

RWO - ReadWriteOnce - a single node can mount for RW
ROX - ReadOnlyMany
RWX - ReadWriteMany

Storage is mounted to nodes, so even with RWO, multiple pods on the same node can mount the volume and write to it.

Reclaim Policy determines, what happens when a persistent volume claim is deleted:

Retain: volume will need to be reclaimed manualy
Delete: associated storage asset, such as AWS EBS, AzureDisk, OpenStack Cinder volume etc. is deleted.
Recycle: delete content only. Allows the volume to be claimed again.

ATM Only NFS and HostPath support recycling. AWS EBS, GCE PD, Azure Disk and Cinder volumes support deletion.

Example:

k api-resources | grep -iE 'KIND|persistent'
k explain PersistentVolume | head -n 2

k create - f persistent-volume.yml

# kubectl get pv
NAME           CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM   STORAGECLASS   REASON   AGE
nfs-share-pv   1Gi        ROX,RWX        Recycle          Available                                   18s

PersistentVolumes don’t belong to any namespace. They are cluster-level resources like nodes.

PersistentVolumeClaim

Claiming a PersistentVolume is completely separate process from creating a pod.

Example:

k explain PersistentVolumeClaim | head -n 2

[root@controller ~]# kubectl get pvc
NAME            STATUS   VOLUME         CAPACITY   ACCESS MODES   STORAGECLASS   AGE
nfs-share-pvc   Bound    nfs-share-pv   1Gi        ROX,RWX   

[root@controller ~]# kubectl get pv
NAME           CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                   STORAGECLASS   REASON   AGE
nfs-share-pv   1Gi        ROX,RWX        Recycle          Bound    default/nfs-share-pvc                           7m26s

[root@controller ~]# kubectl describe pvc nfs-share-pvc
Name:          nfs-share-pvc
Namespace:     default
StorageClass:
Status:        Bound
Volume:        nfs-share-pv
Labels:        <none>
Annotations:   pv.kubernetes.io/bind-completed: yes
            pv.kubernetes.io/bound-by-controller: yes
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      1Gi
Access Modes:  ROX,RWX
VolumeMode:    Filesystem
Mounted By:    <none>
Events:        <none>

PV is now yours to use. Nobody else can claim it until you release it.

Using a PersistentVolumeClaim in a Pod

# kubectl create -f nfs-share-pod.yml
# kubectl get pods pod-nfs-share      # make sure pod is Running
# kubectl exec -it pod-nfs-share -- df -h /var/www
Filesystem                Size  Used Avail Use% Mounted on
192.168.43.48:/nfs_share   14G  8.6G  4.1G  68% /var/www

Recycle Persistent Volumes:

k delete pod pod-nfs-share
k delete pvc nfs-share-pvc

[root@controller ~]# kubectl get pv     # Status - Released
NAME           CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS     CLAIM                         STORAGECLASS    REASON   AGE
nfs-share-pv   1Gi        ROX,RWX        Recycle          Released   default/nfs-share-pvc                                  80m

If we would have used Retain ReclaimPolicy then we would have to manually clean up the data in order to bind the PersistentVolume again.

Local Persistent Volumes with StorageClass

k create -f storage-class.yml 
k get sc    # check status

k create -f local-pv-sc.yml
k get pv
k describe pv local-pv

k create -f local-pvc.yml
k get pvc       # STATUS = Pending
k describe pvc local-storage-claim

k create -f local-pv-pod.yml
k get pods local-pod        # make sure it is `Running`
k get pvc                   # STATUS=Bound
k get pv                    # STATUS=Bound

ConfigMaps

Env-specfic data is provided to the application by the env it is deployed into.
ConfigMap defines application related data.

Examples:

kubectl create configmap my-config
    --from-file=foo.json            # single file
    --from-file=bar=foo.json        # single file stored under custom key
    --from-file=config-opts/        # dir
    --from-literal=foo=bar          # literal value

kubectl create cm nginx-cm --from-file nginx-custom-config.conf
k get cm

k create -f nginx-cm.yml
k exec -it nginx-cm -- cat /etc/nginx/conf.d/default.conf

# create CM using CLI args
k create cm myconfig --from-literal=color=red
k get cm

k create -f test-cm-pod.yml
k exec -it test-pod -- env | grep COLOR

Kubernetes Secrets

Indented to store small amount (1 MB for a secret) of sensitive data.
A secret is base64 encoded, so we cannot treat it as secure.
K8s ensures that Secrets are passed only to the nodes that are running Pods that need the respective secrets.

Examples:

k create secret generic test-secret  --from-literal=user=deepak --from-literal=password=test1234
k get secrets test-secret -o yaml
echo dGVzdDEyMzQ= | base64 --decode

k create -f secret-busybox.yml
k exec -it secret-busybox -- /bin/sh
    # cat data/password
    # cat data/user

Defining secrets from a file:

k create secret generic secret-tls --from=file=server.crt --from-file=server.key
k describe secret secret-tls

k create -f secret-ls-pod.yml
k get pods secret-tls-pod
k exec -it secret-ls-pod -- /bin/sh
    # ls -l /tls

Stateful sets

Available from k8s 1.5 as a bond between the Pod and the Persistent Volume.
Pods names consist of name-I where I is zero based index.
Replaced Pods get the same name and hostname as the Pod that has been replaced.
Limitations:
- storage for a given Pod must be provisioned by a PersistentVolume Provisioner
- deleting and/or scaling will not delete associated volumes.
- Headless Service is required to be responsible for the network identity of the Pods. You’re responsible for creating this Service.
- No guarantee on termination of pods on deletion. To achieve ordered and graceful termination of the pods, scale the StatefulSet down to 0 prior deletion.
- When using Rolling Updates with the default Pod Management Policy (OrderedReady), it’s possible to get into a broken state that requires manual intervention to repair.

Examples:

[root@controller ~]# exportfs -v
/share1         (sync,wdelay,hide,no_subtree_check,sec=sys,rw,secure,no_root_squash,no_all_squash)

k create -f nfs-pv-share1.yml -f nfs-pv-share2.yml -f nfs-pv-share3.yml
k api-resources | grep -iE 'KIND|stateful'
k explain StatefulSet | head -n 2

k create -f nfs-stateful.yml
k get statefulsets      # READY = 0/3
k get pvc               # STATUS = Bound
k get pv
k get pods              # no fancy names
k get pv                # all claimed

k get pods -o wide
k exec -it nginx-statefulset-2 -c nginx-statefulset -- touch /var/www/pod3-file
[root@controller ~]# ls -l /share3/
total 0
-rw-r--r-- 1 root root 0 Jan  9 16:44 pod3-file

k delete pod nginx-statefulset-2
k get pods -o wide      # new pod is being created with same IP and nodename
kubectl exec -it nginx-statefulset-2 -c nginx-statefulset -- ls -l /var/www/ # `pod3-file` is there

K8s API Server

In k8s all communication between control plane and external clients (e.g. kubectl) are translated into REST API calls handled by the API server. The API server is the only component that talks directly with distributed storage etcd.
API server responsibilities is to provide k8s API and to proxy cluster components (e.g. dashboard), stream logs, service ports or serve kubectl exec sessions.
Master: etcd <-> API Server (Control Manager + Scheduler)
Worker: kubelet, native app
The API server is stateless and designed to scale horizontally. For HA it is recommended to have 3+ instances.

Examples:

k get pods -n kube-system

K8s HTTP Request flow: client (kubectl) + Service Account -> authentification -> authorization -> admission control -> etcd.

After the request is authenticated and authorized, it goes to the admission control modules. K8s comes with predefined admission controllers, but you can define custom ones as well.

Service Account + Roles

k apply -f service-account.yml -f cluster-role.yml -f cluster-role-binding.yml

k get pods --all-namespaces
k describe pod -n kube-system kube-apiserver-minikube

# forbidden - returned by authorization plugin
k --as=system:serviceaccount:default:read-only-user get pods --all-namespaces
k --as=system:serviceaccount:default:read-only-user describe pod -n kube-apiserver-minikube

# inquiry 
k auth can-i get pods --all-namespaces  # yes
k --as=system:serviceaccount:default:read-only-user auth can-i get pods --all-namespaces

k delete serviceaccount read-only-user
k delete clusterrole read-only-user-cluster-role

HTTP Interface of the API Server:

POST: k create -f
PUT: k apply -f
GET: k get; k describe
PATCH: k set image deployment/kuberserve nginx=nginx1.9.1
DELETE: k delete

API resources and versions:

k api-resources
k api-versions

k explain pod  # for any resource kind

K8s API via CLI:

k get --raw /       # get all API resources
k get --raw /api/v1/namespaces  | jq
k get --raw /api/v1/namespaces/default  | jq

# access using curl
k proxy --port=9000
curl http://127.0.0.1:9000/apis | less
curl http://127.0.0.1:9000/apis/apps/v1/namespaces/default/deployments

Container’s Security Context

Each pod gets its own IP and port space. Each pod has its own process tree, its own IPC namespace, allowing only processes in the same pod to communicate through IPC.
Here we learn to allow pods to access resources of the node they’re running on.

Example:

k create -f pod-with-host-network.yml
k get pods
k exec pod-with-host-network -- ifconfig

# Allow pods to bind a port in the node's default namespace 
# using `ports.hostPort`
# (only one instance of the pod can be schedules to each node)
k create -f nginx-lab.yml
curl 127.0.0.1:9000     # XXX doesn't work (because of minikube?)

# Using node's PID and IPC namespaces
k create -f pod-with-host-pid-and-ipc.yml
k exec pod-with-host-pid-and-ipc -- ps aux  # includes container process

Configure container’s security context

# run with given user
$ k run pod-with-defaults --image alpine --restart Never -- /bin/sleep 99999
$ k exec pod-with-defaults -- id
uid=0(root) gid=0(root) groups=0(root),1(bin),2(daemon),3(sys),4(adm),6(disk),10(wheel),11(floppy),20(dialout),26(tape),27(video)
$ k create -f pod-as-user-guest.yml 
$ k exec pod-as-user-guest -- id
uid=65534(nobody) gid=65534(nobody)

# run as non-root
k create -f pod-run-as-non-root.yml
k get po pod-run-as-non-root        # STATUS=CreateContainerConfigError
k describe pods pod-run-as-non-root # Error: container has runAsNonRoot and image will run as root

# run in privileged mode (to get full access to node's kernel)
k create -f pod-privileged.yml
k exec -it pod-with-defaults -- ls /dev     # 16
k exec -it pod-privileged -- ls /dev     # 241

Individual Capabilities

# add individual capabilities to a container
k exec -it pod-with-defaults -- date +%T -s "12:00:00"      # date: can't set date: Operation not permitted
k create -f pod-add-settime-capability.yml
k exec -it pod-add-settime-capability -- date +%T -s "12:00:00"; date

# drop individual capabilities from a container
k create -f pod-drop-chown-capability.yml 
k exec pod-drop-chown-capability -- chown guest /tmp    # operation not permitted

# ro FS
k create -f pod-with-readonly-filesystem.yml
k exec -it pod-with-readonly-filesystem -- touch /new-file  # exit 1 - Read-only file system

k exec -it pod-with-readonly-filesystem -- touch /volume/newfile    # works
k exec -it pod-with-readonly-filesystem -- ls -la /volume/newfile

Authentication

Client certificates (most common) using X509 CA. --client-ca-file=file_path server option.
Static tokens: ``-token-auth-file=. Tokens persist indefinitely and the API server needs to be restarted to update the tokens. Username and password are passed in the request header: Authentication: Basic base64(user:password)`
Bootstrap tokens: are dynamically managed and stored as secrets in kube-system. --enable-bootstrap-token-auth option for the CLI server.
Service account tokens: --service-acount-key-file
Authentication proxy: --requestheader-{username-headers,group-headers,extra-headers-prefix} arguments.
Webhook tokens: authorization-webhook-config-file=

kubectl uses certificates stores in ~/.kube/config or /etc/kubernetes/admin.conf.

Example using client certificates:

[root@controller ~]# kubectl config view | grep server
server: https://192.168.43.48:6443

[root@controller ~]# curl https://192.168.43.48:6443
curl: (60) SSL certificate problem: unable to get local issuer certificate

[root@controller ~]# export client=$(grep client-cert /etc/kubernetes/admin.conf | cut -d " " -f 6)
[root@controller ~]# export key=$(grep client-key-data /etc/kubernetes/admin.conf | cut -d " " -f 6)
[root@controller ~]# export auth=$(grep certificate-authority-data /etc/kubernetes/admin.conf | cut -d " " -f 6)

[root@controller ~]# echo $client | base64 -d - > client.pem
[root@controller ~]# echo $key | base64 -d - > client-key.pem
[root@controller ~]# echo $auth | base64 -d - > ca.pem

[root@controller ~]# curl --cert client.pem --key client-key.pem --cacert ca.pem  https://192.168.43.48:6443   # works

Authorization

Node. Enabled by default.
ABAC: requests are validating policies against the attributes of the request. --authorization-policy-file= and --authorization-mode=ABAC options.
RBAC. To enable start the API server with --authorization-mode=RBAC.
Webhooks (uses remote API server to check for permissions). Option --authorization-webhook-config-file=.

RBAC

kubeconfig:

users: username and authentication mechanism
clusters: all data necessary to connect to the cluster
contexts: association between users and clusters

Create user example:

# create linux user
useradd -G nogroup user1
passwd user1

# create certs
openssl genrsa -out user1.key 4096
openssl req -new -key user1.key -out user1.csr -subj "/CN=user1/O=dev"  # cert. signing request
# openssl x509 -req -in user1.csr -CA /etc/kubernetes/pki/ca.crt -CAkey /etc/kubernetes/pki/ca.key -CAcreateserial -out user1.crt -days 365
openssl x509 -req -in user1.csr -CA ~/.minikube/ca.crt -CAkey ~/.minikube/ca.key -CAcreateserial -out user1.crt -days 365

# create namespace (optional)
k create namespace dev

# update k8s config with user credentials
k config view
k config set-credentials user1 --client-certificate=user1.crt --client-key=user1.key
k config view

# create security context for new user
k config set-context user1-context --cluster=kubernetes --namespace=dev --user=user1    # and set default namespace
k config get-contexts
k --context=user1-context get pods  # XXX failed (minikube?)

Define new role with “modify” permission:

kubectl api-resources | grep -iE 'role|KIND'
k explain Role | head -n 2

k create -f dev-role.yml
k create -f user1-rolebind.yml 

k --context=user1-context get pods      # XXX failed (minikube?)

k -n dev describe role dev

k create deployment devnginx --image=nginx --context=user1-context 
k get pods          # not there
k get pods -n dev   # but here

# testing
k auth can-i create pods --context=user1-context    # yes
k auth can-i create service --context=user1-context    # no

# Define role with "view-only" permission
k create namespace view-only
k create -f view-only-role.yml
k create -f view-only-rolebinding.yml

k config set-context viewonly-context --cluster=kubernetes --namespace=view-only --user=user1
k config get-contexts

k --context=viewonly-context get pods
k -n view-only describe role view-only
k create deployment testnginx --image=nginx --context=viewonly-context    # FORBIDDEN

k auth can-i create pods --context=viewonly-context     # no
k auth can-i get pods --context=viewonly-context        # yes
k auth can-i get service --context=viewonly-context     # no

Deleting Context, Role, RoleBinding:

k config delete-context user1-context
k config delete-context viewonly-context

k delete role dev -n dev
k delete rolebinding dev-role-binding -n dev

Limit Resources

Resource quota is applied on the namespace.
Resource limit is applied on the containers.
If creating or updating a resource violates a quota contraint -> HTTP 403.
Is quota enabled in a namespace for compute resources (cpu, mem), users must specify requests or limits for those value, otherwise quota system may reject pod creation.

Resource quota types

Compute resources:

limits.cpu: sum of CPU limits cannot exceed this value
limits.memory: sum of memory limits cannot exceed this value
requests.cpu: dtto for requests
requests.memory: dtto for requests

Storage resource quota:

requests.storage: total amount of requested storage across all persistent volume claims
persistentvolumeclaims: maximum number of persistent volume claims allowed in the namespace
.storageclass.storage.k8s.io/requests.storage: total amount of requested storage across all persistent volume claims associated wit the storage class name
.storageclass.storage.k8s.io/persistentvolumeclaims: maximum number of persistent volume claims allowed in the namespace that are associated with the storage class name
requests.ephemeral-storage: total amount of requested ephemeral storage across all pods in the namespace claims
limits.ephemeral-storage: total amount of limits for empeheral storage across all pods in the namespace claims.

Object count quota:

count/. for resources from non-core groups
count/ for resources from the core group

Some of these:

count/persistentvolumeclaims
count/services
count/secrets
count/configmaps
count/replicationcontrollers
count/deployments.apps
count/replicasets.apps
count/statefulsets.apps
count/jobs.batch
count/cronjobs.batch

Example:

k create namespace quota-example
k apply -f ns-quota-limit.yml 
k describe ns quota-example
k create -f pod-nginx-lab-1.yml
k -n quota-example scale deployment/example --replicas=5
k -n quota-example get events
k -n quota-example get pods     # ready
k describe ns quota-example     # show current

k delete deployments -n quota-example example
k delete ns quota-example

# count quota for pods
k create ns pods-quota-ns
k apply -f pod-quota-limit.yml
k get resourcequota -n pods-quota-ns        # REQUEST pods: 0/2
k describe ns pods-quota-ns

k create -f nginx-example.yml 
k get pods -n pods-quota-ns                 # READY 1/1
k -n pods-quota-ns scale deployment/nginx-1 --replicas=5
k get pods -n pods-quota-ns                 # 2
k -n pods-quota-ns get events               # FailedCreate ...
k describe ns pods-quota-ns                 # 2/2

Limit Range

If a LimitRange object exists in a namespace, then any container created without the resource requests or limits configured will inherit these values from the limit range.

Example:

k create -f assign-limit-range.yml
k get limits -n pods-quota-ns           # define-limit created $NOW
k describe ns pods-quota-ns
k describe limits -n pods-quota-ns define-limit     # only the limits

k delete deployments.apps -n pods-quota-ns nginx-1 
k get pods -n pods-quota-ns                 # 0

k apply -f nginx-example.yml
k describe pods -n pods-quota-ns nginx-1        # requests/limits inherited

k delete limitrange -n pods-quota-ns define-limit

Limiting resources

spec.containers[].resources.limits.cpu
spec.containers[].resources.limits.memory
spec.containers[].resources.limits.hugepages-
spec.containers[].resources.requests.cpu
spec.containers[].resources.requests.memory
spec.containers[].resources.requests.hugepages-

CPU:

CPU unit is 1 core for cloud providers and 1 hyperthread for bare metal Intel processors
.5 is half a core, or 500 m (mili-cores).
Smallest addressable unit is 1m.

Memory:

K/M/G/T/P/E suffix - 1000**N
Ki/Mi/Gi/Ti/Pi/Ei - 1024**N

Example:

k create ns cpu-limit
k get ns
k create -f pod-resource-limit.yml
k get pods -n cpu-limit -o wide
k get pods -o wide
k describe pods frontend

k delete pods -n cpu-limit frontend

If no CPU limit is specified and there is no LimitRange object, the countainer will use all the available CPU resources on the node it is running.

Expose containers to external networks

kubectl port forwarding

kubectl port-forward nginx 8888:80
curl localhost:8888

kubectl expose

k create deployment nginx-lab-1 --image=nginx --replicas=3 --dry-run=client -o yaml > nginx-lab-1.yml
k create -f nginx-lab-1.yml
k get pods | grep nginx-lab-1

k expose deployment nginx-lab-1 --type=NodePort --port=80
k get services
# ...
# nginx-lab-1   NodePort    10.108.128.242   <none>        80:31060/TCP   3s
k describe svc nginx-lab-1  # to get more info about the service

Ingress

provide an externally visible URL to the service
load balance traffic
terminates SSL
provide name-based virtual hosting

Only creating an Ingress resource has no efect. You need to select and deploy one to your cluster (many implementations, e.g. nginx or HAProxy).

Examples:

minikube version
kubectl get nodes

minikube addons enable ingress
k get pods -n kube-system   # ingress-nginx-*

Configure Ingress using Host

k create deployment nginx --image=nginx
k scale deployment nginx --replicas=3
k get deployments

k expose deployment nginx --type=NodePort --port=80
k get service
minikube service nginx --url        # or `ip a`
# add this to /etc/hosts as host.example.com
k create -f nginx-ingress-rule.yml
k get ingress
k get ing nginx-ingress -o yam
curl http://host.example.com

Configure Ingress using Path

k create deployment web2 --image=nginx
k scale deployment web2 --replicas=3
k expose deployment web2 --type=NodePort --port=80
k get svc   # +web2
minikube service web2 --url
# vi nginx-ingress-rule.yml
k apply -f nginx-ingress-rule.yml
k get ing nginx-ingress -o yaml
curl http://host.example.com/v2