High Availability
When you turn ON High Availability (HA), we will spin up (at least) 3 nodes in your cluster in 3 different physically & geographically separate data centers, each of which can service reads and writes.
So even if one of these nodes is inaccessible for any reason (for eg: due to underlying infrastructure issues or due to maintenance), the other two nodes can continue servicing requests with zero downtime.
When you enable HA, you will get a special load-balanced hostname and searches sent to this endpoint will automatically be distributed to one of the nodes in your cluster.
Writes can also be sent to this load-balanced endpoint and your data will be automatically replicated to all the nodes in your cluster.
We highly recommend enabling High Availability when running Typesense in a production environment.
With HA turned ON, you can avoid a downtime during the following scenarios:
- Infrastructure issues: in a multi-node HA cluster, even if one node or one datacenter has underlying hardware issues, the other two nodes in the cluster will continue servicing traffic, as the problematic node recovers.
- Capacity changes: when RAM / CPU is changed in a multi-node HA cluster, we will upgrade/downgrade one node at a time and your cluster will continue servicing traffic from the other two nodes, as the configuration change happens on the 3rd one.
- Typesense version changes: when the Typesense Server version is changed in a multi-node HA cluster, we will upgrade/downgrade one node at a time and your cluster will continue servicing traffic from the other two nodes, as the version change happens on the 3rd one.
- Maintenance: we typically tend to do maintenance on the underlying OS every 1-2 months. In a multi-node cluster, we will service one node at a time and your cluster will continue servicing traffic from the other two nodes, as the 3rd one is being serviced.
For clusters with HA enabled, we also provide Critical Production Support following the SLAs described here (see first row).
With HA turned OFF, you will experience a downtime proportional to the size of your dataset during the scenarios above. For scenario #1, the downtime could be several hours depending on infrastructure recovery times. For scenario #2, #3 and #4 this could be anywhere from 15 minutes to 3 hours as the single non-HA node in your cluster recovers. We also do not provide any Critical Production Support SLAs for clusters without High Availability enabled.
Read more about High Availability in Typesense Cloud here.
Enabling High Availability (HA)
New Clusters:
To enable HA for a new cluster, make sure you toggle "High Availability" to ON, in the cluster configuration page when you click on "New Cluster".
Existing Clusters:
To enable HA for an existing single-node cluster, you want to open your Cluster Dashboard, go to Cluster Configuration on the left side nav, then click on "Modify" and then toggle High Availability to ON, pick a time for the change and then click on Schedule.
Client-side configuration for HA
This is an important step to use your cluster's HA features.
To take advantage of your cluster's HA features, you also need to update your Typesense client library configuration to make use of the new load balanced hostname that you'll see in the dashboard after enabling HA.
You also want to include the 3 individual hostnames that will be used as a fallback, in case the node chosen by the load balanced hostname is currently in a draining state.
This section in the Typesense Documentation has code snippets in various languages that show you how to configure your client libraries: https://typesense.org/docs/guide/high-availability.html#when-using-typesense-cloud-or-a-load-balancer