syslog-ng: From Raw Data to Big Data
syslog-ng is an enhanced logging daemon, with a focus on central log collection. It collects logs from many different sources.
Raw log messages come in a variety of formats:
-
lacking any structure most are usually just an almost proper English sentence with some variable parts in it, like user names or IP addresses.
-
Fix table-like structure, like Apache access logs.
-
A small minority of logs arrive in an already structured form: JSON.
Parsers in syslog-ng make it possible to extract important information from any of these messages and create name-value pairs.Once you have name-value pairs instead of raw log messages, you have many possibilities. On the syslog-ng side, you can use them for filtering, for example, to send an alert if the username is “root”. You can also use them in file names, or messages can be modified to facilitate log rotation or better suit applications processing the logs.
Parsing and preprocessing log messages also allows you to store them more effectively:
-
you can send them to the destination (for example, ElasticSearch or MongoDB) in a format that can be easy to process (for example, JSON),
-
you can filter irrelevant data, and forward only what is really needed,
-
processing is off-loaded to very effective C code.
Finally you will learn about the “big data” destinations that syslog-ng supports, and how they benefit from message parsing:
-
Hadoop Distributed File System ( HDFS ),
-
Apache Kafka,
-
ElasticSearch and Kibana, and
-
MongoDB.
And if syslog-ng cannot already do something that you need, and you are not afraid of writing some code, you can learn about how language bindings of syslog-ng make it possible to add new destinations, not only in C, but also in Java, Lua, Perl, or Python.