Being able to monitor how much power a server is utilizing over a period of time can be extremely handy. In this post I will show how I handled this using Grafana for graphing, Telegraf for gathering data from the iDRAC6 (over IPMI), and InlufxDB for storage.

This post assumes you already have Telegraf, InfluxDB, and Grafana all installed and configured to work with each other already. You also need a PowerEdge server that exposes power information over IPMI via the iDRAC. I am using the PowerEdge r710 that comes with iDRAC6.

Getting Started

Configuring iDRAC on your server

The iDRAC (short for Integrated Dell Remote Access Controller) is a small add-on card that sits on the server motherboard that allows Systems Administrators to update and manage Dell systems, even when the server is turned off.

Dell iDRAC6 Adapter Card

So in order for this to work make sure your server actually has one of these cards installed. When they are installed they add an additional Ethernet port to the back of the system that needs to be plugged into your network in order for you to be able to communicate with it (so you can check your system has this additional port to see if the card is installed).

You will need to go through the boot menu and press CTRL+E to configure the iDRAC. Once in the Configuration Utility setup these things:

  • IPMI Over Lan: TRUE (We will use IPMI to have Telegraf poll for data so this has to be enabled)
  • NIC Selection: This depends on how you want it setup.. See this help article from dell for more information on the options.
  • LAN Parameters: Setup your static IP under this
  • LAN User Configuration: Set a username and password

Once all of that is configured you should be able to access the iDRAC web UI from a system on the same network to confirm its working:

iDRAC6 web interface when logged in

If you are able to login things should be good on this step.

Configuring Telegraf

I assume you already have Telegraf installed and configured to output to InfluxDB. Open up your telegraf.conf file and configure the [[inputs.ipmi_sensor]] sensor like so:

# # Read metrics from the bare metal servers via IPMI
 [[inputs.ipmi_sensor]]
   ## optionally specify the path to the ipmitool executable
   # path = "/usr/bin/ipmitool"
   ##
   ## optionally force session privilege level. Can be CALLBACK, USER, OPERATOR, ADMINISTRATOR
   # privilege = "ADMINISTRATOR"
   ##
   ## optionally specify one or more servers via a url matching
   ##  [username[:password]@][protocol[(address)]]
   ##  e.g.
   ##    root:passwd@lan(127.0.0.1)
   ##
   ## if no servers are specified, local machine sensor stats will be queried
   ##
   servers = ["username:password@lanplus(192.168.1.11)"]

   ## Recommended: use metric 'interval' that is a multiple of 'timeout' to avoid
   ## gaps or overlap in pulled data
   interval = "30s"

   ## Timeout for the ipmitool command to complete
   timeout = "20s"

Make sure to replace username, password, and the example ip address 192.168.1.11 with your details.

Now restart Telegraf and check its log file to make sure it's working correctly. If you don't see any errors after about a minute you should be all good here.

Setting up our graphs in Grafana

Using Grafana you can build some pretty awesome analytic dashboards. Here is the power section of my server dashboard exported so I can share it with all of you:

Grafana server power usage dashboard I made

Here is the  link to the dashboard I posted on Grafana's website:

Server Power Usage PowerEdge r710 dashboardData for Grafana
Dashboard for viewing your PowerEdge r710 usgae over time. Should also work with other systems using the iDRAC6 that expose the same information over IPMI. Requires Influxdb, Telegraf, and a supported system.

And here is a copy of the JSON on GitHub for archival reasons:

grafana-power-usage-dashboard
grafana-power-usage-dashboard. GitHub Gist: instantly share code, notes, and snippets.

I made the Cost Per kWh as an input because that usually changing all the time. You will need to make sure to enter your own costs here (just get them from your last electric bill).

Conclusion

I'm really glad I was able to make this happen. If you end up adding your CPU, Memory, and disk stats to the dashboard you will be able to see how they all relate over time. I also really like that I can select any period of time (say the last hour) and it will tell me how much power was used and what the cost was ($0.02 right now).

I've been using this for almost a year now without any issues. I can tell you exactly what my power consumption has been for any period during that time. Pretty nifty ;)

If you have any feedback or need help with any of this feel free to leave a comment below and I will get back to you.