Linux Management and Monitoring: Zabbix + Puppet ?!?

I’ve spent a good part of this week evaluating various solutions for monitoring and managing Linux servers.  There are many solutions that meet some of the needs, but I have not found a single product that does everything I need.  Without going into significant detail, here are my needs:

  • Simple service up/down monitoring and alerting
  • Detailed metrics for disk, CPU, memory, network, etc
  • Storage and graphing of historical data
  • Flexible creation of custom monitors, triggers, etc.
  • Significant pre-configured monitors, triggers, etc.
  • Deployment/upgrade/removal of packages
  • Configuration management
  • Inventory of hardware/software
  • Rich access control to satisfy various distributed administration and audit requirements
  • *ALL* functionality exposed by web interface
  • Inexpensive, preferably open source
  • Able to scale to several hundred, maybe thousands of Linux servers – and possibly even Windows servers.
  • Should be developed in a language I know so I can modify myself – (C, Perl, Python, Java)
  • Should run on Debian, though it does not need to be in the repo.

Candidate solutions:

  • Nagios
  • OCS-Inventory
  • Hyperic
  • Puppet
  • Zabbix

Conclusion:

Nothing definitive yet, but I’m narrowing in on a combination of Zabbix and Puppet.  I am still looking for something to automate the collection of inventory data, but I think this could be done using the Zabbix API to populate the Zabbix inventory (which is otherwise a manual process).

LazyWeb – any opinions?

Enterprise, Large Scale File Services

Admittedly, the world of file services has changed since Novell ruled the roost with Netware. All sorts of new buzzwords exist: Web Content Management, Enterprise Document Management, Document Archiving, Knowledge Management … but you still can’t beat simple file storage service like Windows offers natively. Except, it is really hard to provide that kind of service at a very large scale.
So here is the question — how do I provide a file service with the following requirements:

  • Must scale beyond 25,000 users (potentially 100,000), each with private “home” directories, plus whatever permutations of group space can be imagined.
  • Must support large amounts of storage, including individual files of several hundred gigabytes, user/group quotas of several terabytes.
  • Must support access from OSX, Windows, and Linux such that applications on these systems can natively open, read, write files — in other words, similar to simple CIFS access, though a non-native client to support this functionality is acceptable.
  • Must support some level of access from mobile devices, including Android, iPhone/iPad, Windows Mobile, and ideally Blackberry too.
  • Must provide a rich “sexy-looking” web interface.
  • Must provide consistent abstract interface — in other words, scaling across hundreds of servers is acceptable, as long as users never need to be told “connect to server #17 for X, and server #53 for Y”.  There should be some sort of abstracted virtual filesystem.
  • Must support user-controllable ACLs to facilitate sharing and security.
  • Must be accessible by non-technical end users with very little handholding – should be “intuitive”.
  • Must allow integration with backup solution that can provide file-level restoration.
  • Should allow for storage of data to be accessed by Linux and Windows servers, such as user generated web content, HPC-generated research data, etc.
  • Should allow for attachment of metadata for searching.
  • Should allow integration with backup solution that allows end-user to perform file-level restoration.

Some have tried to convince me that Windows DFS can do all this, but I have yet to see a deployment that actually encompasses all of the above.  Anyone have any references?

I am quite intrigued by OpenAFS, using the filedrawers web interface, and possibly using the Samba gateway to avoid deploying the OpenAFS client to every machine — anyone with any experience doing this?  Anyone serve OpenAFS data out over DAV via Apache, mod_dav, and mod_waklog?  Is filedrawers or DAV an acceptable mobile device access mechanism?  Pitfalls?

What else should I be considering?