In this article, we’ll guide you through the process of deploying a stateful, Dockerized Node.js app on Google Cloud Kubernetes Engine! As an example application, we will use Ghost - the open-source blogging platform we use to run the RisingStack blog and serve ~150K readers/month. The application will have persistent storage so it can persist its themes and images.

Takeaway: Once you go through this tutorial you’ll be able to follow the processes & ideas outlined here to deploy any stateful application on Kubernetes!

If you are not familiar with Kubernetes on Google Cloud Kubernetes Engine or with setting up clusters, I recommend reading our How to Get Started With Kubernetes article first. It will give you the basics to get started.

This article was written by Kristof Ivancza who is a software engineer at RisingStack & Tamas Kadlecsik, RisingStack's CEO. In case you need guidance with Kubernetes or Node.js, feel free to ping us at [email protected]


What is Ghost?

Ghost is an open-source blogging platform powered by a non-profit organization called the Ghost Foundation, and its maintained by independent contributors. Ghost was written in Node.js on the server-side, Ember.js & handlebars on the client side. Check out their GitHub repository for more information.


What will we need to deploy a stateful app on Kubernetes properly?

  • Create a cluster
  • Persistent Disks to store our images and themes
  • Create a Second Generation MySQL instance and connect to it
  • A secret to store credentials
  • A deployment
  • A service to expose the application

Cluster creation

First, we need to create a cluster and set the default cluster for gcloud and pass cluster credentials to kubectl.

# create the cluster
$ gcloud container clusters create [CLUSTER_NAME]
# set the default cluster
$ gcloud config set container/cluster [CLUSTER_NAME]
# pass cluster credentials to kubectl
$ gcloud container clusters get-credentials [CLUSTER_NAME]

Get the Cluster ID of the project and assign it to a variable named PROJECT_ID.

$ export PROJECT_ID="$(gcloud config get-value project -q)"

Getting started with the container

Here you can find the official Dockerfile for ghost and docker-entrypoint.sh script. To test it locally, you can run:

$ docker run --name test-ghost -p 8080:2368 ghost:alpine

Now you should be able to reach your local Ghost by opening http://localhost:8080 in your browser.

If we want to deploy this container on Kubernetes, we’ll need to create a deployment.

# deployment.yml
apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: ghost-blog
  labels:
    app: ghost-blog
  spec:
    replicas: 1
    selector:
      matchLabels:
        app: ghost-blog
    template:
      metadata:
        labels:
          app: ghost-blog
      spec:
        containers:
        # ghost container
        - name: ghost-container
          image: ghost:alpine
          imagePullPolicy: IfNotPresent
          # ghost always starts on this port
          ports:
          - containerPort: 2368

We’re not production ready yet, so we’ll keep updating the deployment as we go!

As a second step, let’s create and mount the disks we’ll use to store our Ghost themes and blogpost images.


Creating persistent storages to store our themes and images

Kubernetes pods are stateless by default, meaning that it should be possible to kill and spin up new pods for a deployment on moment’s notice. As a result, each pod’s file system is ephemeral, so whatever files were modified or created during the pod’s lifetime will be gone once the pod is shut down.

However, Ghost stores the themes and images we upload in /var/lib/ghost/content/, thus we have to make sure they are persisted properly. To do so, we need to use a persistent storage and make our application stateful.

We have two ways of creating disks. We can create one manually on GCE and pass it on to Kubernetes, or just tell Kubernetes what we need and let it create the disk for us. The first method is called Static Provisioning and the second one is called - you guessed it - Dynamic Provisioning.

Static Provisioning is useful when you have an already existing disk from before, and you want your pods to use this disk. But if you don’t have a disk yet, it’s easier to let Kubernetes create one for you, which means using Dynamic Provisioning.

Side note: it is also easier on our wallet to go with Dynamic Provisioning as on GCE the smallest disk we can create is a 100GB volume, but when we let Kubernetes provision the disk for us, we can request whatever size we need.

To understand the disk creation we need to take a look at Persistent Volume Claims, so let’s get to it straight away!


Persistent Volume Claim

Let’s update our deployment first, so it will wait for a mountable disk to be present.

# deployment.yml
apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: ghost-blog
  labels:
    app: ghost-blog
spec:
  replicas: 1
  selector:
    matchLabels:
      app: ghost-blog
  template:
    metadata:
      labels:
        app: ghost-blog
    spec:
      containers:
        # ghost container
        - name: ghost-container
      	   image: ghost:alpine
      	   imagePullPolicy: IfNotPresent
    	   # ghost always starts on this port
    	   ports:
      	   - containerPort: 2368
    	   volumeMounts:
    	   # define persistent storage for themes and images
    	   - mountPath: /var/lib/ghost/content/
      	     name: pd-blog-volume
      volumes:
      - name: pd-blog-volume
        persistentVolumeClaim:
          claimName: pd-blog-volume-claim

What changed is that we added the volumeMounts and volumes fields.

The volumeMounts belongs to the container. The mountPath defines where the volume will be mounted in the container. So it’s basically the same as if we ran our container with docker run -vpwd:/var/lib/ghost/content/ --name ghost-blog -p 8080:2368 ghost:alpine.

The volumes defines the PersistentVolumeClaim or pvc that will handle the attachment of the volume to the container. In our case it will look like this:

# PersistentVolumeClaim.yml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: pd-blog-volume-claim
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: standard

As you can see, the name matches the one we referred to in the deployment. In the spec we define the accesModes to be ReadWriteOnce, as ext4 disks cannot be attached to multiple nodes. The part where we request 10Gi of storage speaks for itself IMO, and for all our sakes and purposes it’s enough to know that the storageClassName: standard field will let kubernetes automatically provision an HDD for us.

To submit the pvc run the following command:

$ kubectl apply -f PersistentVolumeClaim.yml
# to make sure everythind happend correctly
$ kubectl get pvc
# if something is not bound or need more information for debugging
$ kubectl describe pvc

If everything went right, you should see after running $ kubectl get pvc that the persistent volume is created and bound to the volume claim.


Create and connect to MySQL using a Second Generation instance

We need to create a "Cloud SQL for MySQL Second Generation" instance.

By using a Second Generation instance, we can use a Cloud SQL Proxy sidecar in our deployment to communicate with the database. A sidecar is a second, helper container inside a deployment next to the application container that handles auxiliary tasks, such as encryption. (This also might shed some light on why the containers field is plural in the deployment.ymls and why it's an array.)

Setting up the instance and the sidecar will be a bit tricky, but at least this way we don't have to configure SSL connection, whitelist IP addresses or create a Static IP to connect to our CloudSQL instance, as the proxy handles all communication with the database.


Creating a Second Generation instance:

  • First we get machine types
$ gcloud sql tiers list
TIER           	AVAILABLE_REGIONS			RAM    	DISK
D0             	[long-ass region list]		128 MiB	250 GiB
D1             	[long-ass region list]	 	512 MiB	250 GiB
D2             	[long-ass region list]	 	1 GiB  	250 GiB
[...]
db-f1-micro		[even longer region list] 		614.4 MiB  	3.0 TiB
db-g1-small    	[even longer region list]		1.7 GiB	3.0 TiB
db-n1-standard-1   	[even longer region list]		 3.8 GiB	10.0 TiB
[...]
# to use a 2nd gen instance, you must choose from values that are starting with `db-`
  • Then we create the instance
$ gcloud sql instances create [INSTANCE_NAME] --tier=[TIER] --region=[REGION]
# [INSTANCE_NAME] = this will be the name of the db
# [TIER] = chosen machine tier from previous list
# [REGION] = preferably your clusters region (e.g: us-central1)
  • Finally, we set root for MySQL
$ gcloud sql users set-password root % --instance [INSTANCE_NAME] --password [PASSWORD]
# [ISNTANCE_NAME] = name of your previously created db
# [PASSWORD] = the password you want for root

Connect to CloudSQL using a Proxy sidecar

  1. First, we need to enable the Cloud SQL Admin API. You can do it here
  2. Create a Service Account
    1. Go to the Service Account Page
    2. Select the needed Cloud SQL instance
    3. Click Create Service Account
    4. Select Cloud SQL > Cloud SQL Client from the role dropdown menu
    5. Change the account ID to a value you will remember later, if needed
    6. ClickFurnish a new Private Key
    7. Click create
      • A JSON file with the private key will be downloaded to your machine. Keep it somewhere safe, as you will need it later. I will refer to this file later as [PATH_TO_DOWNLOADED_JSON_SECRET]
  3. Create the proxy user: A MySQL user that the proxy sidecar will use when connecting to the database. To do so, use to following command:
$ gcloud sql users create proxyuser cloudsqlproxy~% --instance=[INSTANCE_NAME] --password=[PASSWORD]
# Instance name = MySQL instance you want to connect to (e.g: ghost-sql)
# The username of the proxyuser will be "proxyuser" with the password you pass as argument to the command
  1. Get your instance connection name
$ gcloud sql instances describe [INSTANCE_NAME]
$ gcloud sql instances describe ghost-sql | grep 'connectionName'
connectionName: ghost-blog:us-central1:ghost-sql
  1. Create the secrets that we’ll use in the deployment:

2 secrets are required to access data in Cloud SQL from your application
cloudsql-instance-credentials Secret contains the service account. (JSON file you get in step 2.7) The cloudsql-db-credentials Secret contains the proxy's user account and password.

To create cloudsql-instance-credentials run:

$ kubectl create secret generic cloudsql-instance-credentials --from-file=credentials.json=[PATH_TO_DOWNLOADED_JSON_SECRET]
# [PATH_TO_DOWNLOADED_JSON_SECRET] = JSON file you downloaded when created the service account

To create cloudsql-db-credentials run:

$ kubectl create secret generic cloudsql-db-credentials --from-literal=username=proxyuser --from-literal=password=[PASSWORD]
# username=proxyuser - created username for CloudSQL in the 3rd step 
# password=[PASSWORD] - password for proxyuser we set in the 3rd step
  1. Add the proxy container to the deployment:

Replace [INSTANCE_CONNECTION_NAME], with the value you got in the 4th step.

# deployment.yml
[...]
  spec:
    containers:
      # ghost container
      - name: ghost-container
    	 image: ghost:alpine
    	 imagePullPolicy: IfNotPresent
    	 # ghost always starts on this port
    	 ports:
      	 - containerPort: 2368
    	 volumeMounts:
    	 # define persistent storage for themes and images
    	 - mountPath: /var/lib/ghost/content/
      	   name: pd-blog-volume
      # cloudsql proxy container
      - name: cloudsql-proxy
        image: gcr.io/cloudsql-docker/gce-proxy:1.11
        command: ["/cloud_sql_proxy",
                  "-instances=[INSTANCE_CONNECTION_NAME]=tcp:3306",
                  "-credential_file=/secrets/cloudsql/credentials.json"]
        volumeMounts:
        - name: cloudsql-instance-credentials
          mountPath: /secrets/cloudsql
          readOnly: true
    volumes:
      - name: pd-blog-volume
        persistentVolumeClaim:
          claimName: pd-blog-volume-claim
  1. Pass the Cloud SQL credentials to the ghost container.
# deployment.yml
[...]
spec:
  template:
    spec:
      containers:
      # ghost container
      - name: ghost-container
    	 image: ghost:alpine
    	 imagePullPolicy: IfNotPresent
    	 # ghost always starts on this port
    	 ports:
      	 - containerPort: 2368
    	 volumeMounts:
    	 # define persistent storage for themes and images
    	 - mountPath: /var/lib/ghost/content/
      	   name: pd-blog-volume
        # Env vars to be passed to the container
        env:
        - name: database__connection__host
          value: "127.0.0.1"
        - name: database__connection__user
          valueFrom:
            secretKeyRef:
              name: cloudsql-db-credentials
              key: username
        - name: database__connection__password
          valueFrom:
            secretKeyRef:
              name: cloudsql-db-credentials
              key: password
      # cloudsql proxy container
      - name: cloudsql-proxy
[...]
    volumes:
      - name: pd-blog-volume
        persistentVolumeClaim:
          claimName: pd-blog-volume-claim
      # db credentials stored in this volume to access our mysql
      - name: cloudsql-instance-credentials
        secret:
          secretName: cloudsql-instance-credentials

database__connection__host is 127.0.0.1 as containers in the same pod can access each other on localhost.

The secret named cloudsql-db-credentials stores the created username & password for the proxy.

We also added a new volume to volumes at the bottom of the yml. As you can see it is not an actual disk, but the secret we created before. This is the secret that stores the data from the JSON file we got when we created in step 2.7.


Set up the mail server connection

In our example, we will use Sendgrid to send emails. As before, we'll create a secret to pass on the values to the deployment.

In the previous section we used the following command to create a secret:

$ kubectl create secret generic cloudsql-db-credentials --from-literal=username=proxyuser --from-literal=password=[PASSWORD]

We can do the same here as well:

$ kubectl create secret generic mail-secrets --from-literal=mailuser=[SENDGRID_USERNAME] --from-literal=mailpass=[SENDGRID_PASSWORD]

If you run kubectl get secret mail-secrets -o yaml you'll get

$ kubectl get secret mail-secrets -o yaml

apiVersion: v1
data:
  mailpass: V2hhdCB3ZXJlIHlvdSBob3BpbmcgeW91J2QgZmluZCBoZXJlPyA7KQo= 
  mailuser: WW91J3JlIGEgdGVuYWNpb3VzIGxpdGxlIGZlbGxhLCBhcmVuJ3QgeW91PyA6KQo=
kind: Secret
metadata:
  creationTimestamp: 2018-03-13T15:48:39Z
  name: sendgrid-secrets
  namespace: default
  resourceVersion: "2517884"
  selfLink: /api/v1/namespaces/default/secrets/sendgrid-secrets
  uid: ffec2546-26d5-11e8-adfc-42010a800106
type: Opaque

As you can see the main information is in data. The values we passed to the command are base64 encoded and stored there.

If you prefer to create a yaml file for the secret as well, you can strip this one from the auto generated metadata, so it looks something like this:

apiVersion: v1
data:
  mailpass: V2hhdCB3ZXJlIHlvdSBob3BpbmcgeW91J2QgZmluZCBoZXJlPyA7KQo= 
  mailuser: WW91J3JlIGEgdGVuYWNpb3VzIGxpdGxlIGZlbGxhLCBhcmVuJ3QgeW91PyA6KQo= 
kind: Secret
type: Opaque

and upload it with

$ kubectl create -f mail-secrets.yml

Now we also need to pass these as env vars to the app container:

[...]
spec:
  template:
    spec:
      containers:
      # ghost container 
      - name: ghost-container
    	 [...]
        env:
        - name: mail__transport
          value: SMTP
        - name: mail__options__service
          value: Sendgrid
        # use mail envvars from the mail-secrets file
        - name: mail__options__auth__user
          valueFrom:
            secretKeyRef:
              name: mail-secrets
              key: mailuser
        - name: mail__options__auth__pass
          valueFrom:
            secretKeyRef:
              name: mail-secrets
              key: mailpass
          # end of mailenvs
        - name: mail__options__port
          value: "2525"
        - name: database__client
          value: mysql
        # CloudSQL credentials to connect with the Proxyuser
        - name: database__connection__host
          value: "127.0.0.1"
        - name: database__connection__user
          valueFrom:
            secretKeyRef:
              name: cloudsql-db-credentials
              key: username
        - name: database__connection__password
          valueFrom:
            secretKeyRef:
              name: cloudsql-db-credentials
              key: password
      # cloudsql proxy container
      - name: cloudsql-proxy
[...]

Creating the deployment

By now we have all the objects our deployment needs to run. There is still some additional setup left, but let's see the whole package:

# deployment.yml
apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: ghost-blog
  labels:
    app: ghost-blog
spec:
  replicas: 1
  selector:
    matchLabels:
      app: ghost-blog
  template:
    metadata:
      labels:
        app: ghost-blog
    spec:
      containers:
      # ghost container
      - name: ghost-container
        image: ghost:alpine
        # envs to run ghost in production
        env:
        - name: mail__transport
          value: SMTP
        - name: mail__options__service
          value: Sendgrid
        # use mail envvars from the mail-secrets file
        - name: mail__options__auth__user
          valueFrom:
            secretKeyRef:
              name: mail-secrets
              key: mailuser
        - name: mail__options__auth__pass
          valueFrom:
            secretKeyRef:
              name: mail-secrets
              key: mailpass
        # end of mailenvs
        - name: mail__options__port
          value: "2525"
        - name: database__client
          value: mysql
        # CloudSQL credentials to connect with the Proxyuser
        - name: database__connection__user
         # referencing to secret file
          valueFrom:
            secretKeyRef:
              name: cloudsql-db-credentials
              key: username
        - name: database__connection__password
          valueFrom:
            secretKeyRef:
              name: cloudsql-db-credentials
              key: password
        # end of Cloud SQL secrets
        - name: database__connection__host
          value: "127.0.0.1"
        # also recommended to put the database name inside a secret file
        - name: database__connection__database
          value: database_name
        - name: url
          value: "http://your_url.com"
        - name: NODE_ENV
          value: production
        # end of envs
        imagePullPolicy: IfNotPresent
        # ghost always starts on this port
        ports:
          - containerPort: 2368
        volumeMounts:
        # define persistent storage for themes and images
        - mountPath: /var/lib/ghost/content/
          name: pd-blog-volume
          subPath: blog
        # resource ghost needs
        resources:
          requests:
            cpu: "130m"
            memory: "256Mi"
          limits:
            cpu: "140m"
            memory: "512Mi"
      # cloudsql proxy container
      - name: cloudsql-proxy
        image: gcr.io/cloudsql-docker/gce-proxy:1.11
        command: ["/cloud_sql_proxy",
                  "-instances=[INSTANCE_NAME]=tcp:3306",
                  "-credential_file=/secrets/cloudsql/credentials.json"]
        volumeMounts:
        - name: cloudsql-instance-credentials
          mountPath: /secrets/cloudsql
          readOnly: true
        # resource cloudsql needs
        resources:
          requests:
            cpu: "15m"
            memory: "64Mi"
          limits:
            cpu: "20m"
            memory: "128Mi"
      volumes:
      # db credentials stored in this volume to access our mysql
      - name: cloudsql-instance-credentials
        secret:
          secretName: cloudsql-instance-credentials
      - name: cloudsql
        emptyDir:
      # persistent storage used to store our themes and images
      # please note that we are using the predefined volume claim
      - name: pd-blog-volume
        persistentVolumeClaim:
          claimName: pd-blog-volume-claim

There are still some fields that might need some explanation.

In the root, you can see replicas: 1. This tells Kubernetes that we want exactly one pod to be spawned by the deployment. If you want to achieve high availability you should set this value to at least 3. You could also set up pod autoscaling if you want to make sure that your pods are scaled up horizontally when the load is higher and scaled back after the peek is over.

You can also find selector fields at three different places. The first one in metadata.labels is the label of the deployment. So when you run kubectl get deployment app=ghosts-blog you'll get all the deployments that has this label present. In selector.matchLabels we define which pods should the deployment handle. This also means that you could manually create pods and the deployment will handle them.

But as you saw we didn’t create pods manually. We used the spec.template field instead, which creates a pod template that the deployment will use when it spawns new pods. That is why you see the strange path before the container specification that is spec.template.spec.containers. The first spec is the specification of the deployment which has a pod template, and this pod template spawns pods based on its own spec. And that’s also the reason why we have another set of labels in the template. These are the labels the csreated pods will have, and this way the deployment can match the pods it needs to handle once they are spawned.

We also added the resources field with CPU and memory requests and limits. If you omit this, the first created pod will eat up all the resources of its host node and all other pods will be stuck in pending status. One thing to note though is that there is quite a small difference between the CPU request and limit. The reason for this is to be ready for autoscaling. If there is a big difference between the two it might happen that your node will be filled with a lot of pods which uses just a small amount of CPU. When the need comes to scale them vertically though there is no available processor time is left, so you are stuck with pods that cannot serve their purpose fast enough and cannot be scaled up. To prevent this, have a small difference between the requested CPU and its limit.

It is also worth mentioning that Google Compute Engine blocks outbound connection ports 25, 465, 587. All the popular third-party mail providers such as MailGun, MailJet or SendGrid uses one of these ports by default in Ghost. That’s why we have overwritten the default mail port to 2525 with the mail__options__port env var.

Now we are ready to apply the deployment:

$ kubectl apply -f deployment.yml
# get pods with watch flag, so the output is constantly updated when changes happen
$ kubectl get pods -w
# to get further info when a problem has occurred
$ kubectl describe pods

With the following command, you can also run a particular image and create a deployment, which can come handy while you are testing if your setup is correct. (Note that this is the way you manually start a pod without a deployment.)

$ kubectl run ghost-blog --replicas=1 --image=ghost:alpine --port=80

And here are some more handy kubectl commands you can use while debugging:

# copy from your computer to pod - use for testing only!
$ kubectl cp SOURCE default/_podname_:/DESTINATION -c container_name
# view logs
$ kubectl logs _podname_
# if multiple containers are in the pod
$ kubectl logs _podname_ --container container_name
# get a shell to a running container
$ kubectl exec -it _podname_ -- sh

Creating a service to expose our application

All that’s left is to expose our application, so it can receive external traffic.

You can let Kubernetes get a static IP for you to expose your blog to the public internet, but then you have to reconfigure your DNS provider each time you recreate the service. It is better to provision one manually first and then pass it on to the service.

# create a Static IP address named ghost-blog-static-ip
$ gcloud compute addresses create ghost-blog-static-ip --region us-central1
# get the Static IP created with the previous command
$ gcloud compute addresses describe ghost-blog-static-ip --region us-central1 | grep 'address'

And now create the following service.yml file and replace loadBalancerIP with yours.

# service.yml
apiVersion: v1
kind: Service
metadata:
  name: blog-ghost-service
labels:
  app: blog-ghost
spec:
  selector:
    app: blog-ghost
  ports:
  - port: 80
    targetPort: 2368 # exposed port of the ghost container
  type: LoadBalancer
  loadBalancerIP: [IP_ADDRESS]

This creates a service named blog-ghost-service, it finds any pod’s container port that has the label app: blog-ghost and exposes it’s port 2368 on port 80 to the public internet while balancing the load between them.

$ kubectl apply -f service.yml
# watch the get service command
$ kubectl get service -w
# usually it takes about a minute to get the External IP
# if it's still stuck in <pending> status run the following
$ kubectl describe service

If you prefer one liners, you can achieve the same result by running the following command:

$ kubectl expose deployment ghost-blog --type="LoadBalancer" \
--name=ghost-blog-service --port=80 --target-port=2368

This will expose your previously created ghost-blog deployment on port 80 with the service name ghost-blog-service without the need to create the yaml file yourself.


Final thoughts on deploying to Kubernetes

I know, this whole thing might look daunting, especially if you have already deployed stateless apps to Kubernetes. However, if you take into account the fact that when you deploy a Ghost blog, or any other stateful application to simple VMs without containers or container orchestration you would need to go through the same steps, but manually. You need to create disks and attach them by hand, create a database instance and set up the connection. And you also need to store your credentials safely and set up your firewall rules. The majority of complexity here comes from the fact that managing stateful apps is complex on its own right. Kubernetes makes it easier by handling the creation and attachment of disks to our service instances and helps to keep things organized when the app needs to be horizontally scaled.

This article was written by Kristof Ivancza who is a software engineer at RisingStack & Tamas Kadlecsik, RisingStack's CEO. In case you need guidance with Kubernetes or Node.js, feel free to ping us at [email protected]

The only part that is a bit more tedious than would be otherwise is the Cloud SQL Proxy we needed to set up, but this was necessary because of Google Cloud, not Kubernetes. Add the fact here that by leveraging container technologies, we get a ready-made proxy we can utilize which takes away a lot of manual setup that we’d need to handle otherwise.

Now that we have deployed one stateful app, we are ready to package all our blogs in a similar way and set them up in a similar cluster, or even in the same one, if we want to reduce our costs. This way we are given a unified environment that we can interact with for each of our assets if needed. Even though Kubernetes is mostly used for distributed applications, now we’ve shown that it can be used for the deployment of several standalone apps easier than would be otherwise.

Happy infrastructuring!