Skip to content
On this page

Infrastructure

CodeCollab infrastructure is split between multiple platforms.

  • DevOps

    • GitLab
    • Terraform Cloud
  • Deployments

    • Google Cloud
      • Cloud Run
      • Compute Engine
      • Redis Enterprise
      • MongoDB Dedicated
    • AWS
      • MongoDB Regional Serverless
  • Netlify

    • Soon to be Cloudflare Pages (maybe)

Refer to here for a diagram.

DANGER

All services including IAM, but with the exception of:

  • Netlify
  • MongoDB Atlas
  • Redis Enterprise

must be provisioned via Terraform.

As much as possible, avoid provisioning resources manually, and also avoid editing Terraform provisioned resources at all costs as it will cause drifting. This allows us to have a single source of truth for all infrastructure, and also makes it easier to deploy and manage regional services.

Making an infrastructure change

Terraform is set up as a GitLab application running under the CodeCollab group. Terraform Cloud will act on behalf of @Qin Guan. All Terraform configs live under the DevOps subgroup, each repository represents a Terraform Workspace.

An update to the default branch will not apply the change, an operator needs to provide confirmation before it is applied.

For each Terraform Workspace, a service account may be created so that it can access Google Cloud. For this service account, basic roles must not be given, ensure that it is least privilege. Terraform Workspaces should not share service accounts. Details of this service account must be set in the Terraform Workspace variables:

  • GCP_SERVICE_ACCOUNT
  • GCP_SERVICE_KEY

After specifying it in your Terraform config as described here, you will be able to work with Google Cloud.

Info on GitLab

This is not relevant if we move to GitHub

Terraform also manages parts of our GitLab infrastructure. An access token is provided:

  • GITLAB_ACCESS_TOKEN

Google Cloud

Services on Google Cloud will serve 3 regions from these locations:

  • United States
    • Iowa - us-central1
  • Europe
    • Finland - europe-north1
    • Netherlands - europe-west1
  • Asia
    • Taiwan - asia-east1
    • Tokyo - asia-northeast1
    • Singapore - asia-southeast1

As Cloud Run is the primary service, pricing for it is the largest concern. All locations must be Tier 1 pricing. Compute Engine comes second in priority as it does not handle latency sensitive services. Compute Engines should only be ran in us-central1 or asia-southeast1 as these are the most cost effective locations.

However, us-central1 will be the primary region, mainly due to cost and direct undersea networking to other regions. All resources by default should be deployed to us-central1 which has undersea networks to Japan, Taiwan, Australia (Guam → AU) and regions in Europe. If latency becomes an issue, further deployments can be done in the locations specified above for respective services.

Kubernetes

Our Kubernetes setup runs on GKE and consists of two clusters. All k8s resources are managed by Terraform in the DevOps/kubernetes repository.

The k8s setup uses ingress-nginx because we’re too broke to make use of Google’s load balancer services. This also means we have to provision our own certificates. For now, we get those from LetsEncrypt using cert-manager. For typical ingresses, this works using HTTP-01 verification, which is relatively simple. However, for wildcard certificates, it must use the DNS-01 verification. This means our Cloudflare account must be connected to cert-manager so that it can update the DNS records when required. A CloudFlare API Token is passed to the Terraform configuration through secret variables.

Info on HTML server

Eventually, the HTML server will move from App Engine to either GKE or some other service.

If it is moved to GKE, ingress-nginx will need to handle the wildcard subdomain routing as well.

There have been issues getting it to work, please see this issue.