Prometheus High Availability and Fault Tolerance Strategy (With VictoriaMetrics)


luca carboni

Prometheus is a gargantuan tool for monitoring diminutive, medium, and enormous infrastructures.

Prometheus anyway, and the pattern personnel in the assist of it, are fervent in scraping metrics. It’s an extremely gargantuan resolution for immediate term retention of the metrics. Long elope retention is one more memoir except it’s extinct for collecting a diminutive number of metrics. This is habitual in a roundabout contrivance, as a result of as a rule, when investigating some concerns the use of the metrics scraped by Prometheus, we use metrics no longer older than 10 days. But this is no longer constantly the case, in particular when the statistics that we are shopping for are a correlation between diversified periods, esteem diversified weeks per months, or diversified months, or we are curious about preserving historical synthesis.

Undoubtedly, Prometheus is completely ready to win metrics and to store them even for a extremely long time, however storage will was extremely costly since Prometheus must make use of rapid storage, and Prometheus is no longer identified to be a resolution which permits to reach HA and FT in a fancy contrivance (as we are going to show there is a contrivance, no longer so refined, however it absolutely’s there). We are in a position to show in the contemporary article straightforward tips on how to enact HA and FT for Prometheus and likewise why we are in a position to enact long term storage for metrics, in a better contrivance the use of one more tool.

That mentioned, in the midst of the past years many tools began to compete and loads are gentle competing for fixing these concerns and no longer simplest.

The total formula of a Prometheus installation are:

  • Prometheus
  • Blackbox
  • Exporters
  • AlertManager
  • PushGateway

Prometheus can use federation (Hierarchical and Atrocious-Provider), which permits to configure a Prometheus occasion to problem chosen metrics from diversified Prometheus instances ( in model/federation/). This more or less resolution in all fairness staunch when you include to repeat simplest a subset of chosen metrics to tools esteem Grafana, or when you include to mixture base-practical metrics (esteem industrial metrics from one Prometheus and a subset of companies metrics from one more one which is working in a federated contrivance). This is completely gorgeous, and it must work in many use situations, however it absolutely’s no longer compliant with the conception of Excessive Availability, nor with the conception of Fault Tolerance: we are gentle talking a couple of subset of metrics, and if one among the Prometheus instances goes down, these metrics will most likely be no longer gathered in the midst of the down. Making Prometheus HA and FT must be accomplished in a thoroughly different contrivance: there’s no longer any native resolution from the Prometheus mission itself.

Prometheus can enact HA and FT in a extremely straightforward contrivance, with out the need for complex clusters or consensus programs.

What we include to produce, is to replica the an identical configuration file, the prometheus.yml in two diversified instances configured in the an identical formulation, that are going to problem the an identical metrics from the an identical sources. The most attention-grabbing distinction is that occasion A also will most definitely be monitoring occasion B and vice versa. The staunch and broken-down conception of redundancy is easy to put into effect, it’s unheard of, and if we use IaC (Infrastructure as Code, esteem Terraform) and a CM (Configuration Manager, esteem Ansible) it would even be extremely straightforward to preserve watch over and recall. You produce no longer include to replica an extremely enormous and pricey occasion with one more one, it’s better to replica a diminutive occasion, and to preserve simplest immediate term metrics on it. This also makes the instances rapidly recreable.

Successfully, the AlertManager has the conception of a cluster, and it’s in a position to deduplicating records obtained from more than one Prometheus instances and interacting with diversified AlertManager to fireplace an alert simplest one time. So, let us set up the alert manager in two diversified instances, presumably the 2 that are web web hosting the Prometheus A and its copy, the Prometheus B. Pointless to assert we also use our IaC and CM resolution to preserve the AlertManager configuration in code.

NodeExporters are installed in an instant on nodes which could well per chance be the provision of the metrics you’re going to be collecting, there’s no longer any include to replica something there, the configuration of Prometheus is the an identical, so essentially the most traditional need is to allow Prometheus A and Prometheus B to connect with them.

PushGateway is a little of bit diversified, simply replica it’s no longer adequate, you’re going to include gotten to invent a single level of injection for metrics which could well per chance be pushed to it (whereas Prometheus works pulling the metrics). assemble it HA and FT is to replica it on the 2 instances, and set up in entrance of them a DNS, configured as an active/passive failover, so there’ll constantly be a push gateway active and in case of failure, the second will most likely be promoted because the active one. On this contrivance, you can moreover present a special entry show batch processes, lambdas, sporadic functions, and loads others. You may maybe well moreover moreover use a balancer in entrance of them, for my portion I opt an active/passive resolution in this case, however it absolutely’s as a lot as you.

BlackBox is one more tool without a conception of HA and FT, however we are in a position to replica it also, in the an identical two instances, A and B, that we include already configured.

Now we include two diminutive instances of Prometheus, with two AlertManager that are working collectively as a cluster, two PushGateways in active/passive configuration, and two BlackBoxes, so HA and FT are performed.

There’s no longer this kind of thing as a motive to utilize these instances for collecting the overall metrics for your farm, which could moreover presumably be peaceable of diversified VPCs, that could well reside in diversified areas, be portion of diversified accounts, or even be hosted in diversified cloud suppliers, and when you’re going to be lucky, for your farm there also will most definitely be something on-premises. There’s no longer this kind of thing as a motive to produce so as a result of the diminutive instances would was extremely enormous in this case; when something diminutive fails it’s normally more straightforward to repair. It’s total observe to include many Prometheus instances that are in HA and FT configuration (esteem we described previously) and that are to blame for negate formula of the infrastructure, the definition of portion is truly as a lot as you, it relies for your needs, requirements, network and safety configuration, have faith between your groups, and loads others.

So, as a recap, we include diminutive or pretty diminutive instances of Prometheus, duplicated with the overall companies mentioned above, we include the code to recreate them rapidly, and we are in a position to tolerate a full failure of 1 occasion per personnel of them. This is smartly an development in the simply contrivance if our HA and FT notion extinct to be called “hope”.

Image for post

Image for post


We’ve a Prometheus and its ecosystem configured for HA and FT, we include more than one groups of Prometheus instances that are fervent of their portion of the infrastructure and they are pretty diminutive.

Cool, however we are preserving the records for simplest, let’s disclose, 10 days, that’s doubtlessly the biggest duration to query however pointless to assert it’s no longer adequate, what about long time storage for metrics?

Right here near solutions esteem Cortex, Thanos, M3DB, VictoriaMetrics, and more others. They can win the metrics from diversified Prometheus instances, deduplicate the duplicated metrics (you’ll include barely deal of them, attach in mind, every Prometheus occasion you’re going to include gotten is duplicated, so that you just’re going to include gotten double metrics), and they’ll present a single level of storage for the overall metrics you’re going to be collecting.

Even supposing Cortex, Thanos, and M3DB are gargantuan tools, positively in a position to reaching the scheme of long term storage for metrics, and likewise to be themselves HA and FT, we chose the newborn VictoriaMetrics. This text is no longer going to focal level on comparing all these tools, however I chase to train why we include chosen VictoriaMetrics.

VictoriaMetrics is on hand in two diversified configurations, one is an all-in-one resolution, more straightforward to configure, and with the overall formula collectively (it’s a staunch and proper resolution, also edifying to scale, however simplest vertically, so it on the overall is a need for you relying for your needs) and the cluster resolution, with separated formula, so to moreover scale vertically and horizontally, for every single inform.

We esteem complex issues (that’s positively no longer factual) so we decided to utilize the cluster resolution.

The cluster model of VictoriaMetrics includes three principal formula, the “vmstorage” (to blame for storing the records), the “vminsert” (to blame for writing the records into the storage), and the “vmselect” (which is to blame for querying the records from the storage). The tool is amazingly versatile, and the vminsert and vmselect are kinds of proxy.

Vminsert, as mentioned, is to blame for inserting the records into the vmstorage. There are barely deal of ideas that you just can moreover configure, however for the scope of this text, it’s vital to dangle you can moreover with out concerns replica vminsert in an arbitrary number of instances, and set up a Load Balancer in entrance of them as a single level of injection for incoming records. Vminsert is stateless, so it’s also straightforward to preserve watch over, replica, and it’s a staunch candidate for immutable infrastructure and autoscaling groups. The problem accepts some ideas that you just might want to gentle present, most important are the storage addresses (you’re going to include gotten to invent the listing of the storages), and the “-replicationFactor=N”, the set up N is the number of the storage the set up the records will most likely be replicated. So, who will ship the records to the balancer in entrance of the vminsert nodes? The reply is Prometheus, the use of the “remote_write” configuration ( in model/configuration/configuration/#remote_write), with the Load Balancer of vminsert as a scheme.

Vmstorage is the core inform and even essentially the most important one. Contrary to the vminsert and vmselect, the vmstorage is a stateful, and every occasion of it doesn’t truly know concerning the diversified instances in the pool. Every vmstorage is an isolated inform from its viewpoint, it’s optimized to utilize Excessive-Latency IO and low IOPS storages from cloud suppliers, which makes it positively more value efficient than the storage extinct by Prometheus. Obligatory ideas are:

  • -storageDataPath”: the path the set up the metrics will most likely be saved into the disk,
  • -retentionPeriod”: esteem in Prometheus the timeframe the metrics is retained,
  • -dedup.minScrapeInterval”: which in background deduplicate the obtained metrics.

Every vmstorage has its have records, however the “replicationFactor” possibility from the vminsert contrivance that the records is dispensed and attributable to this truth replicated in N storages. The problem could well per chance moreover moreover be scaled vertically if wished, better storage could well per chance moreover moreover be extinct, however thanks to the invent of this storage (Excessive-Latency IO and low IOPS), it may maybe maybe well per chance even be no longer costly even for long term retention.

Vmselect is to blame for querying the records from the storages, likewise the vminsert, it may maybe maybe well per chance moreover moreover be with out concerns duplicated in an arbitrary number of instances and could well per chance moreover be configured with a Load Balancer in entrance of them, creating a single entry level for querying metrics. You may maybe well scale it horizontally, and likewise use many ideas. The Load Balancer, as mentioned, will most likely be the single entry level for querying records, which now is collecting metrics from more than one Prometheus personnel of instances, and retention that could well per chance moreover moreover be arbitrarily long, relying for your needs. The first person of all this data will most likely be doubtlessly Grafana. Equally because the vminsert, the vmselect could well per chance moreover moreover be configured in an Autoscaling Community.

Image for post

Image for post


Grafana is a gargantuan tool to work collectively and to query metrics from Prometheus, it must produce the an identical with VictoriaMetrics by ability of the Load Balancer in entrance of the vmselect instances. This is doubtless as a result of VictoriaMetrics is esteem minded with PromQL (the question language of Prometheus) even when VictoriaMetrics also has its have question language (called MetricsQL). Now we include all of our formula in HA and FT, so let’s also assemble Grafana an HA and FT resolution edifying.

In many installations, Grafana makes use of SQLite as a default resolution for preserving the remark. The topic is that SQLite is a gargantuan database for increasing functions, cell functions, and loads barely deal of scopes, however no longer truly for reaching HA and FT. For this scope it’s better to utilize a outdated database, shall we disclose we are in a position to use an RDS Postgresql, with Multi-AZ capabilities (that will most likely be to blame for the remark of the application), and this solves our principal topic.

For the Grafana application itself and in repeat to invent customers with a single entry show work along with it, we are in a position to invent an arbitrary number of equal instances of Grafana, configured to connect with the an identical RDS Postgresql. How many Grafana instances to invent is as a lot as your needs, you can moreover scale them horizontally, and likewise vertically. Postgresql could well per chance moreover moreover be installed on instances, however I’m indolent and I esteem to utilize companies from cloud suppliers when they are ready to produce a gargantuan job and are no longer supplier locking. This is a most attention-grabbing instance that could well assemble our lives more straightforward.

Now we want a Load Balancer which could moreover moreover be to blame for balancing the traffic between the N instances of Grafana and our customers. We could well per chance moreover moreover resolve our unhealthy Load Balancer take care of with a succesful DNS name.

Grafana could well per chance moreover moreover be linked to VictoriaMetrics vmselect Load Balancer the use of the datasource kind Prometheus, and this closes our infrastructure for observability. Our infrastructure is now HA and FT in all of the formula, configured to be resilient, scope focused, long term storage edifying, and value optimized. We could well per chance moreover moreover add an automatic course of to invent scheduled snapshots of the vmstorages and ship them to an S3 bucket esteem minded, to assemble the retention duration even longer.

Successfully, this was the metrics portion, we are gentle lacking the logging portion, however this is one more memoir 🙂

Image for post

Image for post


Your total structure:

Image for post

Image for post

Total structure

Would you esteem to be an Engineer, Personnel Lead or Engineering Manager at Miro? Try alternatives to be part of the Engineering personnel.

Read More

Leave A Reply

Your email address will not be published.