As a follow-up to my Army of Loggers presentation, I've been looking at ways to load balance the CEF feeds from the entire SmartConnector estate to the pool/army of Loggers - preferably in a way that is scalable, reliable, and freely available to all.
So far, I've been looking at purely rsyslog for this, since the software itself is free to use and widely available in many distros, and is reputed to be pretty fast. F5 is more tried and tested, and well understood, but not free. SyslogNG is similarly widely used but apparently not free in this application.
There are recommendations in the rsyslog community to perform this load balancing using a VIP to cluster IPs for the Loggers, which is fine for dedicated software installations but harder for appliances.
My current approach is very simple: the rsyslog process receives a CEF syslog message from whichever source, and based on the receipt time of that message from a connector, sends it to a specific Logger. The choice of Logger is based on a modulo of the receipt time. ie.
if (subsecond_component_of_event_receipt_time) MOD (total_number_of_loggers) = N then send to (logger_N)
Since rsyslog tracks receipt time in microseconds, this means that the destination logger is switched on a microsecond basis.
Advantages and disadvantages of this approach
- Microsecond accuracy should be fast enough to avoid heavy bursts to any one destination (eg. even 100kps would still be divided evenly with no more than one event, or at worst one TCP packet full of events, per destination per cycle)
- More evenly distributed than using event time (which may only be at one-second resolution, hence lead to a one-second burst at only one Logger at the total input rate)
- Lower load / higher performance than using an event attribute like event time or index like EventId (since no parsing of event required)
- Distributes all events evenly across all Loggers
- Does not support re-distribution in the event of a destination Logger failure (events queued or lost); although rsyslog can support 1:1 destination failover if using TCP
- Can take additional manual configurations
- Does not currently provide internal monitoring and advice of performance, including whether events are lost
The files include a current /etc/rsyslog.conf file that I'm working with, and a tidied version of the Excel spreadsheet that I use to autogenerate a config file.
Feedback and suggestions for improvement are very welcome. This has currently been tested to 20keps for 5% CPU load on a 4-core VM.
EDIT: Note this was using the latest rsyslog at the time (v8.x). It uses a language syntax not supported in older versions bundled in older popular Linux distros, such as rsyslog 5.8 in RHEL 6.5. This config file will not work, and the methods may not be available, in these older versions.