1. HOME
  2. What's New
  3. Server Monitoring with Slack

What's New

Some contents are available in Japanese only.

Tech Blog

Server Monitoring with Slack

Slack is a collaboration platform designed to get users out of email and meetings so they can do your best work, stay connected to the people who matter most, and give them back time to do the things they love. In this blog, we will discuss how XTREME-D effectively utilizes Slack for Server Monitoring.

Introduction

XTREME-D Inc. uses Slack as its internal communication tool. It is well known as a business messaging app which has a more collaborative, flexible and inclusive way of working. It enables you to connect and communicate with your colleagues easily. Not only that, XTREME-D utilizes Slack as one of their monitoring tools with the use of Slack API. In recent years, Slack, a ChatOps tool, attracted attention to system operation.

Moreover, Slack supports the creation of channels. These channels are like a dedicated space where people communicate, collaborate, and discuss information together. In XTREME-D, we also have channels solely intended for monitoring. In the background, XTREME-D is building a system that can immediately identify system anomalies by sending system logs (includes status/warnings/errors) to Slack. This can be expected to reduce service downtime by alerting and notifying the users of possible system issues. With this integrated into XTREME-D, members will respond to anomalies faster.

Some of them are customer-support, general notification and some monitoring applications such as Datadog, and CloudWatch. Slack based communication is more convenient in remote work. We can also extract output of disk usage, calculation of total time, AWS related user usage, etc based on some programming scripts. For XTREME-D members this has become a working style that always starts with Slack, in this way it can be realized, it can be expected that the efficiency of monitoring servers will be improved. We use Bash Scripts, Python scripts, etc.,

In this blog, we will learn how XTREME-D uses Slack for monitoring purposes.

Setup Steps

Script Creation

Create a shell script that will calculate disk usage of each group.

This is an example structure of the home directory, /home/<group>/<user1>
So, we check the disk capacity for each group.

#!/bin/bash
 
# get the total disk usage per group 
usage=$(du -h /home --max-depth=1 | grep group | column -t)
 
# push the data to a Slack channel
curl -X POST -H 'Content-type: application/json' --data "{\"text\": \"\`\`\`$usage\`\`\`\"}" https://hooks.slack.com/services/XXXXXXXXX/YYYYYYYYY

This is how we obtain the result in Slack.

374G  /home/group_1
544K  /home/group_2
496K  /home/group_3
1.4M  /home/group_4
0     /home/group_0
8.9M  /home/group_5
54G   /home/group_6
75M   /home/group_7
19G   /home/group_8
404K  /home/group_9
380K  /home/group_10
Slack Settings

Next, let’s see how to set up Slack.

  1. Using the Slack Web API page.
  2. Click ‘Create an app’ on the Slack Web API page and ‘Create an App’ on the next dialog box.
  3. Choose ‘From scratch’ and you see the dialog box below.

After application is created, then the next page is displayed:


Finally, we have to integrate it to our workspace and desired channel to notify.

Then, this is how we get our integration in Slack.

Crontab Settings

Next, let’s configure crontab to run the script you created on a regular basis.
For example, At 9.05 PM Everyday.

5 9 * * * /path/to/script/filename01 >>/path/to/log/file01.log 2>&1

Result

With the above settings, We can confirm that the execution result of the script will be sent to the designated Slack channel at 9:05 PM every day by crontab on a regular basis.

‘xdg-059’ is a group and its Disk usage is around 527G bytes which is sent to xd-monitoring Slack channel.

CONCLUSION

At first, It was unclear about the usage rate of various servers. Now it is much easier to understand the usage rates by notifying the desired output in Slack. Before it was manual and time consuming. It was pretty tough to handle many servers. But now it is convenient to understand and also we can schedule the scripts according to our needs.

  • Until now, it was necessary to log in and confirm each cluster after the trouble occurred. It was very troublesome.
  • However, it has become possible to avoid troubles by aggregating information in Slack.
  • Now when the issue occurs it can be set immediately in half a day, and its effect is great.
  • As a whole now it is effective to improve system operation and reduce service downtime by warnings and alerts.

<About the author>
T.M.Shanmathi is a network engineer for more than 2 years. Working as a System Engineer at XTREME-D Inc. for more than 5 months, engaged in Network & Cloud based Projects. And she’s also engaged in different kinds of projects in full-stack web development and high-performance computing.