Monitoring
- 8 minsAttention
This is a live document that has work in progress content. Meaning that I will be updating this document with new content.
Table of Contents
- Introduction
- What is the purpose of using extensions?
- What are the existing extensions?
- How can we create a new extension?
- How to deploy an extension?
- Why it is important to define extensions following these practices?
- How to define an automation process?
- Conclusion
- References
Introduction
This project is focused on how to define a strategy for monitoring systems, this will allow us measure the healthiness of applications, databases, among others. Also we will talk about how to design extensions and how to be flexible on deploying those plugins and how to autoregister them.
For this PoC (Proof of Concept), we will use Zabbix as a base monitoring system, as it allows to use some embed features that we can take advantage of.
In addition, this PoC will also show how to extend the monitoring system tool capabilities using extensions and how to create new extensions to cover more services.
The minimum requirements for this project are the following:
Purpose | Nodes | vCPU | Memory | Disk |
---|---|---|---|---|
Zabbix Server | 1 | 1 | 8 GB | 250 GB |
Hint
Please consider to upgrade those requirements depending on the infrastructure that you are considering to monitor. Also I will generate a post on how to have a high availabity Zabbix monitoring system.
What is the purpose of using extensions?
To be able to monitor services, applications or databases, sometimes we will need to create some extensions to be fully cover by our monitoring system. Despite the effort of the communities that provide them, there will be cases that we will need to generate our owns. In this section, I will provide you the modules that I needed to create over the time to fulfil those requirements, also I will try to explain how to create your plugins and how I achieved the deployment automation for them.
What are the existing extensions?
So far I was able to create more than 25 plugins to cover different applications, operative system information, databases, automation tools, among others. Please feel free to propose new plugins or even to modify mine on their github repository. Please take into account that as I am the owner of those projects I will try to keep them clean and I will be the only person who can define the roadmap of them but you are free to create your own fork.
Service / App | Extension |
---|---|
ArangoDB | aranix |
Bind | bindix |
Docker | zocker |
Dovecot | doveix |
Elasticsearch | elasix |
GlusterFS | glusix |
HAProxy | habbixy |
IPSec | zipsec |
Java Springboot | zaring |
Jenkins | jenkix |
KVM | virbix |
Keepalived | keepax |
Logstash | lostix |
MSSQL | msqlix |
MySQL | mysbix |
Nginx | znginx |
OpenLDAP | zaldap |
OpenVPN | zaovpn |
Oracle DB | zabora |
PostgreSQL | zapgix |
Python Gunicorn | gunbix |
Redis | zedisx |
Seafile | seabix |
Splunk | spluix |
UNIX / Linux | custix |
Windows | custiw |
There are some plugins that also help the monitoring system to gather information and then generate an inventory base on this information. Taking advantage of the inventory system feature, I was able to also create an application that gathers this data from the monitoring system and provides some nice dashboards and very useful information when you have several datacenters and a huge environment.
How can we create a new extension?
In order to extend the monitoring system capabilities we can follow these procedure to ensure the automation processes is able to handle them.
How to deploy an extension?
There are different ways to deploy every extension in an environment, manually or it can be integrated with an automation process as we describe on the workflow above. For the purpose of this post we will show you the manual steps.
After seen the detailed steps of the workflow describe above, performing the manual process should be similar. Let’s take as an example the mysbix extension (MySQL): as a first step we need to connect to the database server and do a git clone from the plugin repository. Once we are on the local directory where we did the clone, we should use the installation script provided by the extension. In this particular case, we can execute the deployment script called deploy_zabbix.sh. If the installation was succesful, we will need to restart the zabbix-agent to load this extension on the agent.
~# git clone https://github.com/sergiotocalini/mysbix.git
~# sudo ./mysbix/deploy_zabbix.sh -u "monitor" -p "changeme" -o "localhost"
~# sudo systemctl restart zabbix-agent
Hint
Please take this as an example and I would also like to recommend you to check the dependencies that the plugins may have.
In addition to this, we will need to upload the Zabbix template on the web interface, usually this templates can be found on the same repository and depending on the Zabbix version that we are using there can be slide changes on them. For the purpose of this example, the extension provides the template 3.4 for Zabbix (zbx3.4_template_db_mysql.xml).
Why it is important to define extensions following these practices?
The way we define plugins has an important role when we are trying to deploy them in large scale environments. It will also allow us to maintain those environments in a reliable and stable way.
If we are able to manage the plugins following the steps we described above, we can deploy them regularly using a configuration management tool, such as, ansible, chef, puppet, saltstack, etc.
In this way we will have a monitoring system that will be flexible enough to have everything automated and where we can base our SLAs.
How to define an automation process?
If we are able to follow these practices we will be also able to define an automation process to handle the whole monitoring system and the services that we are measuring. One way to do this is using Ansible, where we can provision the servers or applications and adding them to the monitoring system with their plugins needed.
Basically, we can pass parameters to the provisioning tool and it can take all the needed extensions to proper monitor the servers and in this way we will be able to maintain them too.
Note
Please note that this is a conceptual example and if you're interested in having more insights about the automation tools and how to provision your infrastructure, please feel free to contact me and we can start a collaboration. To have more information please visit the extension repositories.
Conclusion
As a conclusion, our takeaways are that making a strong focus on automation, following a DevOps approach, will always help us to keep our monitoring system and our full infrastructure up to date, making our systems more reliable and protected from any attack that they could be exposed.
After this implementation, we will be ready to start defining SLIs (Service Level Indicators), SLOs (Service Level Objetives) and SLAs (Service Level Agreements). Starting a completly new journey and the way they interact with different systems. For more information about these topics you can see the references.
References
- Youtube - class SRE implements DevOps
- SRE fundamentals: SLIs, SLAs and SLOs
- SLA vs. SLO vs. SLI: Whats the difference?