Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additions #4

Open
wants to merge 28 commits into
base: master
Choose a base branch
from
Open

Additions #4

wants to merge 28 commits into from

Conversation

faxm0dem
Copy link

@faxm0dem faxm0dem commented Nov 6, 2014

This PR isn't intended to be merged entirely, more to ping @bazsi ;o)
Here are some of the open issues:

  • the naming schema for name-value pairs is probably a mess. If pdb is to make it to dists (which would be awesome) we really need something consistent. Some form of convention has apparently already made it to syslog-ng-3.6.1 (unix., audit.) or is existing (classifier.*), but I guess we need to discuss this
  • as we all seem to agree actions should be separated from rules, the actions present in this PR can probably be ignored
  • pam_unix ruleset obviously conflicts with various pdb files in access/* due to add filename based sorting to pdbtool merge in a similar vein to what run-parts does syslog-ng/syslog-ng#294
  • solaris rules might be cluttered up by the msgid which is on by default on Solaris, and it should be disabled or preparsed (dropped)

@csabamajor
Copy link
Contributor

Hi,

Thanks for the contribution, I'll contact Bazsi to discuss it with him.

Best regards,
Csaba

On Thu, Nov 6, 2014 at 1:23 PM, Fabien Wernli [email protected]
wrote:

This PR isn't intended to be merged entirely, more to ping @bazsi
https://github.com/bazsi ;o)
Here are some of the open issues:


You can merge this Pull Request by running

git pull https://github.com/ccin2p3/syslog-ng-patterndb additions

Or view, comment on, or merge it at:

#4
Commit Summary

  • additional pam_unix message
  • improve and move common rules to pam_unix.pdb
  • some additions
  • generic kernel debugging support
  • edac driver
  • rename xml files to pdb (so update-patterndb works)
  • add case where HOST is missing in message
  • add padding
  • correlate lines which are being continued '(command continued)'
  • add pattern with GROUP
  • added pubkey authenticatuon
  • added some rules and ipmi
  • added dnsmasq
  • xinetd START/EXIT message correlation
  • added some rules
  • propose some additions to schema conventions
  • moved some rules from access/sshd.pdb
  • missing rhost
  • changed to reflect (more or less) SCHEMAS.txt
  • empty pattern rulesets are not traversed if a non-empty exists
  • AFS Ignoring superuser root
  • first shot at sendmail and its milters
  • add context length for debugging
  • context-length is a template function
  • context-length
  • some additions to sm-mta
  • improve rules
  • assuming msg id was starting with 'r' was stupid

File Changes

Patch Links:


Reply to this email directly or view it on GitHub
#4.

@bazsi
Copy link
Collaborator

bazsi commented Nov 10, 2014

This PR isn't intended to be merged entirely, more to ping @bazsi ;o)

I consider myself pinged :)

Here are some of the open issues:

the naming schema for name-value pairs is probably a mess. If pdb is to make it to dists (which would be awesome) we really need something consistent. Some form of convention has apparently already made it to syslog-ng-3.6.1 (unix., audit.) or is existing (classifier.*), but I guess we need to discuss this

There's a naming scheme I'm trying to keep when adding name-value pairs, which is described here:

https://bazsi.blogs.balabit.com/2010/08/syslog-ng-name-value-pair-naming/

Other than the initial dot, the namespace is up to the user (or patterndb) to define. I saw most projects (e.g. Kibana) to gravitate to Common Information Model (a.k.a. cim), that has a dictionary to encapsulate information.

CIM is defined by splunk, and a specification is available here:
http://docs.splunk.com/Documentation/CIM/latest/User/Overview

I wouldn't want to create a new dictionary, but rather reuse one that exists. This was my intent with the CEE project, but that didn't take off. CIM is seeing some more adoption, for example nflogd and suricata are two applications that can generate it, and Kibana has predefined reports to consume it.

Let me know what you think about this. Thanks.

as we all seem to agree actions should be separated from rules, the actions present in this PR can probably be ignored

ok, got it. I don't yet know how to resolve this exactly for now. I first want to resolve bugzilla #294

pam_unix ruleset obviously conflicts with various pdb files in access/* due to balabit/syslog-ng#294

Yep, I was trying to work on this today, but was distracted by other work.

solaris rules might be cluttered up by the msgid which is on by default on Solaris, and it should be disabled or preparsed (dropped)

We would probably need a system() source change that removes (extracts) this information and puts it in a name-value pair. Do you have a parser that would extract this information on a Solaris system? I might give a stab at integrating it into system() and perhaps also publish it as an SCL snippet that can be reused even for those cases who don't use system().

Thanks for the PR.

@faxm0dem
Copy link
Author

I just had a quick look on the CIM spec. Is the splunk CIM related / identical to the CIM of DMTF ?
Edit: they seem unrelated
I'd love to see some kind of standard emerge, and stick to that.
About the Solaris thing, I have a way to disable it system-wide, so this could go into the documentation. Or, as you suggest, an SCL would be nice, I can provide you with a csv-parser based snippet.

@bazsi
Copy link
Collaborator

bazsi commented Nov 12, 2014

I never heard of DMTF, and quickly browsing through the specs, it seems
completely different. It seems to define objects that encapsulate managable
things in an IT infrastructure.

CIM here stands for common information model, and seems to originate from
splunk.

Ah, I've found a definite answer on that:
http://answers.splunk.com/answers/60548/how-is-the-splunk-cim-related-to-the-dmtf-cim.html

The csv-parser to extract msgid from Solaris messages would be nice, I'd
try to integrate that into the system() source to do that automatically.

Bazsi

On Wed, Nov 12, 2014 at 3:15 PM, Fabien Wernli [email protected]
wrote:

I just had a quick look on the CIM spec. Is the splunk CIM related /
identical to the CIM of [DMTF]'
http://dmtf.org/standards/cim/cim_schema_v2420) ?
I'd love to see some kind of standard emerge, and stick to that.
About the Solaris thing, I have a way to disable it system-wide, so this
could go into the documentation. Or, as you suggest, an SCL would be nice,
I can provide you with a csv-parser based snippet.


Reply to this email directly or view it on GitHub
#4 (comment)
.

@faxm0dem
Copy link
Author

I couldn't find your reference about kibana's implementation of cim.
So, if you were to adopt Splunk/CIM, how would you see it?
Something like this maybe?

cim.<Object name>.<Field name> = <Value>
cim.CPU.cpu_mhz = 3600
cim.Authentication.user = root
cim.All_Inventory.hypervisor_id = 1234
cim.Alerts.dest = [email protected]

Other question: they seem to have an alias concept. Any idea on how this could be implemented in syslog-ng?

@faxm0dem
Copy link
Author

found it on the ML, haven't tested though. I'll do ASAP

@bazsi
Copy link
Collaborator

bazsi commented Nov 12, 2014

Hi,

how about this integration in syslog-ng:

https://github.com/balabit/syslog-ng/compare/f/solaris-msg-id-parsing?expand=1

Bazsi

On Wed, Nov 12, 2014 at 3:40 PM, Fabien Wernli [email protected]
wrote:

found it on the ML
https://lists.balabit.hu/pipermail/syslog-ng/2014-October/021700.html,
haven't tested though. I'll do ASAP


Reply to this email directly or view it on GitHub
#4 (comment)
.

@bazsi
Copy link
Collaborator

bazsi commented Nov 12, 2014

Hi,

I think field aliasing should be implemented in the "search" layer. In many
cases multiple name-value pairs will refer to the same thing, searching
should cope with that.

You'd not reindex/reparse your logs once you realize you made a mistake
while naming a field. On the other hand, you can alias multiple fields when
searching for logs. Or this is where I would do that.

Where do you store your logs. Is it feasible to do field aliasing there?

Bazsi

Bazsi

On Wed, Nov 12, 2014 at 3:35 PM, Fabien Wernli [email protected]
wrote:

I couldn't find your reference about kibana's implementation of cim.
So, if you were to adopt Splunk/CIM, how would you see it?
Something like this maybe?

cim.. =
cim.CPU.cpu_mhz = 3600
cim.Authentication.user = root
cim.All_Inventory.hypervisor_id = 1234
cim.Alerts.dest = [email protected]

Other question: they seem to have an alias
http://docs.splunk.com/Splexicon:Alias concept. Any idea on how this
could be implemented in syslog-ng?


Reply to this email directly or view it on GitHub
#4 (comment)
.

@faxm0dem
Copy link
Author

I store the logs in Elasticsearch. It is absolutely possible to do so there, and I agree it is the right place to do it.

@faxm0dem
Copy link
Author

about the msgid thing, LGTM but I'll have to test it. That being said, I'm not sure it's relevant to keep .solaris.facility and .solaris.level as they're redundant with FACILITY and LEVEL. I'd even drop the msgid as it's always the same unlike what the manpage is saying. We can keep if for safety, but the other two are most certainly redundant.

@faxm0dem
Copy link
Author

@bazsi could you please comment on #4 (comment) so I can start migrating to the new naming scheme

@bazsi
Copy link
Collaborator

bazsi commented Nov 17, 2014

I couldn't find your reference about kibana's implementation of cim.
So, if you were to adopt Splunk/CIM, how would you see it?
Something like this maybe?

cim.. =
cim.CPU.cpu_mhz = 3600
cim.Authentication.user = root
cim.All_Inventory.hypervisor_id = 1234
cim.Alerts.dest = [email protected]

Other question: they seem to have an alias concept. Any idea on how this could be implemented in syslog-ng?

If you look at the $(format-cim) template function:

https://github.com/balabit/syslog-ng/blob/master/scl/cim/template.conf

It assumes that cim related name-value pairs have a .cim. prefix (e.g. leading dot and so), these are then formatted with the prefix removed to a .cim enabled receiver.

The system() source has been able to parse json-formatted messages since 3.6.1. On my system (latest HEAD, but should be the same on 3.6.1 too):

$ share/tools/system-expand 
## system() expands to:

channel {
    source {
systemd-journal();
    }; # source
channel {
  channel {
    parser {
      json-parser(prefix('.cim.') marker('@cim:'));
    }};
    flags(final);
  };
  channel { };
};
}; # channel
;

e.g. messages sent to /dev/log will be parsed as JSON if they have a @cim: prefix and the values in the input will have a .cim prefix, waiting to be picked up by $(format-cim).

$(format-cim) can then be used to format messages to ElasticSearch/Kibana.

Did I answer your question?

@faxm0dem
Copy link
Author

Not really, but it's very interesting :-)
Sorry, I was not clear enough. You are speaking about how cim preformatted messages are being leveraged by system().
What I'm after is to rename all keys in this repo's pdb files to conform to cim. I'm having some difficulties to do so by reading the spec.

Example:

<pattern>ata@ESTRING:krnacct.rsid::@ limiting SATA link speed to @ANYSTRING:krnacct.lspeed@</pattern>
<values>
  <value name='os'>Linux 2.6.x</value>
  <value name='state'>warning</value>
  <value name='family'>ata</value>
  <value name='storage.mount'>ata1</value>
</values
<tags>
  <tag>os</tag>
  <tag>storage</tag>
</tags>

Is that clearer?

@bazsi
Copy link
Collaborator

bazsi commented Nov 18, 2014

I thought I've answered your question too, as the question I saw was what
kind of prefix to use for these name-value pairs.

Within syslog-ng we should use the .cim prefix. Your sample doesn't do that.

I would always try to populate certain values much earlier than in a
db-parser(), preferably on the client.

Things like os should probably be filled by the system source, right at the
client. There, we might even have the exact distribution available.

The device Ata1 is specified by the kmsg driver.

My point is that structured information should be collected as soon as it's
available, and use db-parser as the last resort and only for stuff that are
actually in the message. I wouldn't try to guess context information and
probably not even severity.

Hth,

Bazsi

Not really, but it's very interesting :-)
Sorry, I was not clear enough. You are speaking about how cim preformatted
messages are being leveraged by system().
What I'm after is to rename all keys in this repo's pdb files to conform
to cim. I'm having some difficulties to do so by reading the spec
http://docs.splunk.com/Documentation/CIM/latest/User/RelationshipofCIMappstodata
.

Example:

ata@ESTRING:krnacct.rsid::@ limiting SATA link speed to
@anyString:krnacct.lspeed@

Linux 2.6.x
warning
ata
ata1
</values

os
storage

Is that clearer?


Reply to this email directly or view it on GitHub
#4 (comment)
.

@faxm0dem
Copy link
Author

I totally agree with "the closer to the source the better". However, what's the state of the art of that? Which distributions already actually populate the keys you're mentioning like os and device? I am pretty confident that with what I'm using in production (mainly EL6, some EL5 and EL7) this would not be the case so I would need to do some patterndb.

@bazsi
Copy link
Collaborator

bazsi commented Nov 19, 2014

You are right. But we could add these to the system source. Then deploy
syslog-ng on these machines if not already and you are set.
On Nov 19, 2014 10:49 AM, "Fabien Wernli" [email protected] wrote:

I totally agree with "the closer to the source the better". However,
what's the state of the art of that? Which distributions already actually
populate the keys you're mentioning like os and device? I am pretty
confident that with what I'm using in production (mainly EL6, some EL5 and
EL7) this would not be the case so I would need to do some patterndb.


Reply to this email directly or view it on GitHub
#4 (comment)
.

@faxm0dem
Copy link
Author

true for 'os', and other things too like puppet/facter facts (we're currently adding operatingsystem, operatingsystemmajrelease, productname by prepending key-value pairs to $MESSAGE then parsing them out using csv-parser on the other end).

You were mentioning kmsg that provided structured data, did I misunderstand? Could you elaborate?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants