Overview
slingnode.ethereum_observability is an Ansible role used to deploy a full observability stack that seamlessly integrates with Ethereum nodes deployed by slingnode.ethereum role. Both roles use the same naming for common variables. This means you can define them once in your Playbook, group_vars or host_vars and deploy a full Ethereum node along with the observability stack.
Out of the box you get a fully functional stack with:
ethereum client metrics
node metrics
container metrics
parsed and aggregated client logs
dashboarding solution
The stack is comprised of:
ELK
Filebeat
Grafana
Prometheus
Node-Exporter
Ethereum-Metrics-Exporter
Container Advisor
Deployment types
SlingNode Observability Stack (SOS) can be deployed on a single server along with Ethereum clients or it can be used in a distributed deployment. The following sections describe both options.
Single server deployment
In a single server deployment (default type), all components (monitoring server and monitoring agents) are deployed to the same server as the Ethereum clients. In this deployment type, the services communicate over Docker Network as depicted in the diagram below.
Distributed deployment
In a distributed deployment, the monitoring server (Prometheus, ELK, Grafana) is deployed to a dedicated node and configured to monitor Ethereum Clients running on remote servers. The monitoring agents (Filebeat, node-exporter, ethereum-metrics-exporter, cadvisor) are deployed to the servers where the Ethereum clients are running. The monitoring server and agents communicate over the network as depicted in the diagram below.
Customization
The role comes with a configuration for Prometheus, EL and Filebeat that will work out of the box for single server and distributed deployments. However the configuration is fully customizable to let you control the how those services work. The configuration files are templated out and specified by variables. Please refer to configuration section for details:
Scalability
Scalability of an observability stacks is a big topic. There are a lot of variables that factor into it. Amongst others:
Scrape interval
Logging level
Retention period
Number of monitored hosts
Types of applications
Types of OSes
Data query patterns (frequency, bucket sizes)
Availability requirements
SlingNode Observability Stack is meant to be used for monitoring "small deployments". What's small? We have successfully monitored 10 servers running full Ethereum stack (execution, consensus, validator) with scrape interval of 5s, info logging level and 30 days data retention.
Having said that we don't have any precise benchmarks.
As outlined above this stack will work perfectly fine for a "small deployment". However if you need to scale your observability infrastructure and make it "production grade" you will need something more elaborate. SlingNode Team has extensive experience building and maintaining large scale observability solutions and is ready to help out. Contact us at "contact - at - slingnode.com".
Last updated