Sun, 21 Mar 2021

Simple Prometheus Blackbox Exporter setup for Kubernetes services monitoring

techmonitoringobservabilitykubernetesprometheus

225 Words

Sun, 21 Mar 2021 18:57:27 +0000

Let’s install Prometheus¹ blackbox-exporter² in a Kubernetes cluster, using its Helm chart³.

We won’t change something in the default values, it’s good enough as it is, but feel free to read it, as you always should, before deploying it.

Then, monitoring an internal Kubernetes service is as simple as adding the following scrape configuration to prometheus:

- job_name: 'prometheus-blackbox-exporter'
  metrics_path: /probe
  params:
    module: [http_2xx]
  static_configs:
    - targets:
      - https://snipeit.snipeit.svc.cluster.local #example
      - https://<my-app-service-name>.<my-app-namespace>.svc.cluster.local #generic
  relabel_configs:
    - source_labels: [__address__]
      target_label: __param_target
    - source_labels: [__param_target]
      target_label: instance
    - target_label: _address_
      replacement: prometheus-blackbox-exporter:9115

And now you can port-forward the blackbox-exporter and open localhost:9115 and see the checks yourself. By default it is doing one check every minute.

prometheus-blackbox-exporter-port-forward

After that, why not set up an alert to trigger when a check fails?

Open up alertmanager⁴ and add the following configuration:

  groups:
  - name: BlackboxAlerts
    rules:
    - alert: DevOps service down
      expr: probe_success != 1
      for: 1m
      labels:
        severity: critical
        slack-channel: k8s-monitoring 
      annotations:
        summary: "Service {{ $labels.instance }} failed HTTP check."

So, now, assuming that the alertmanager configuration knows how to parse the summary annotation, and has a slack receiver for the #k8s-monitoring Slack channel, then the alert will trigger successfully.

Example snippet from alertmanager configuration:

global:
  slack_api_url: 'https://hooks.slack.com/services/<SLACK_SERVICE_INFO>'

route:
  group_wait: 10s
  group_interval: 5m
  repeat_interval: 15m
  receiver: 'k8s-monitoring-slack-receiver'
  routes:
    - receiver: k8s-monitoring-slack-receiver
      match_re:
        slack-channel: k8s-monitoring
      continue: true
  receivers:
    - name: k8s-monitoring-slack-receiver
      slack_configs:
      - channel: '#k8s-monitoring'
        send_resolved: true
        text: '{{ range .Alerts }} {{ .Annotations.summary }}\n\n{{ end }}'

Simple, right?

Here’s how it may look:

prometheus-blackbox-exporter-alert

Of course, now you understand that you may add more slack channel receivers, or any other receiver that alertmanager supports. Combine that, with well-thought annotations and annotation handling and you have yourself a nice setup to be alerted when a service seems to be unresponsive.