-
-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added sampling to dnscap #15
base: develop
Are you sure you want to change the base?
Conversation
Added a module that allows the output to be sampled. With the new -q option, the user is now able to decide whether the output should be sampled as follows: Only every nth query is output and only responses that correspond to one of these sampled queries will be output. e.g.: dnscap -r ./file.pcap -g -q 10 will output only every 10th query and these queries' corresponding responses.
Updated dnscap.1.in to match the Sampling Module addition
uthash is used as the hash table library in the Sampling Module
Added a module that allows the output to be sampled. With the new -q option, the user is now able to decide whether the output should be sampled as follows: Only every nth query is output and only responses that correspond to one of these sampled queries will be output. e.g.: dnscap -r ./file.pcap -g -q 10 will output only every 10th query and these queries' corresponding responses.
You are calling this a module (or plugin) but it is not one, is there a reason you did not implement this under |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, please use more spaces in both control sections and argument lists and try to mimic the coding style of the rest of the code.
@@ -53,6 +53,7 @@ | |||
.Op Fl E Ar datetime | |||
.Op Fl P Ar plugin.so | |||
.Op Fl U Ar str | |||
.Op Fl q Ar unsigned int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The phrasing unsigned int
is not very user friendly, use something like nth
instead.
@@ -353,6 +354,11 @@ Enable immediate mode on interfaces. | |||
Append "and | |||
.Ar str | |||
" to the pcap filter. | |||
.It Fl q Ar unsigned int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I'll change it!
Causes the output to be sampled after the application of all other filters. | ||
Only every nth dns initiation will be output and responses are only output | ||
if they correspond to one of the sampled queries. This option cannot be | ||
used with option. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last sentence is unclear, specify the options that can't be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very sorry, this must have been a copy and paste error. The last sentence is only half of the sentence it should be, I'll correct that.
@@ -1099,6 +1130,11 @@ parse_args(int argc, char *argv[]) { | |||
usage("the -L and -l options are mutually exclusive"); | |||
if (background && (dumptrace || preso)) | |||
usage("the -b option is incompatible with -d and -g"); | |||
if (sample && chooseSidesResponse) | |||
{ | |||
if (!((dir_wanted & DIR_INITIATE) != 0) && ((dir_wanted & DIR_RESPONSE) != 0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why check this? If both are true then both options have been used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be incompatible with -s r. I fixed this flaw.
I'm actually sorry, I thought I had tested this..
if (sample == TRUE) | ||
{ | ||
ns_msg dnsmsgSample; | ||
ns_initparse(dnspkt,dnslen,&dnsmsgSample); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is part of libbind and dnscap can be compiled without libbind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean ns_msg
and ns_initparse
? Is there another way to retrieve the transaction id of a query? I've seen ns_initparse
being used in the output functions in dump_dns.c
, so I assumed it was safe to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look at the top #if HAVE_NS_INITPARSE && ...
, you will need to disable all the sampling code if libbind is missing or create your own functions for parsing the required information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now I'll disable the sampling code if libbind is missing. In future i might try to write my own parsing functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sampling code is still compiled even if libbind does not exist, checking if sample
is true or not does not remove the dependency of the functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but in parse_args()
sample
is only set to true if libbind does exist, by checking #if HAVE_NS_INITPARSE && HAVE_NS_PARSERR && HAVE_NS_SPRINTRR
. Is this checkback not suitable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You do know the difference between if()
and #if
right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do and I just realized where my mistake in thinking was at. I will work on it right away.
ns_initparse(dnspkt,dnslen,&dnsmsgSample); | ||
samplePacket *currentQuery; | ||
void *ipAddrBufferFrom = &from.u; | ||
void *ipAddrBufferTo = &to.u; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no need for these void *
you can type cast it in one go below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh okay, I initially tried to cast it in one go, but i got a compiler error. Ill try again, thank you!
querycount++; | ||
if(querycount % sampleAmount == 0) | ||
{ | ||
currentQuery = malloc(sizeof(*currentQuery)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You do not check if malloc()
was successful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mhm okay, thank you! I'll add all malloc checks that need to be added.
} | ||
else | ||
{ | ||
sample_lookup_key *lookup_key = (sample_lookup_key*)malloc(sizeof(*lookup_key));; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You do not check if malloc()
was successful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mhm okay, thank you! I'll add all malloc checks that need to be added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, i should also you calloc
instead of malloc
, since I initialize the pointers with 0s afterwards anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, one calloc()
instead of malloc()
and memset()
.
{ | ||
currentQuery = malloc(sizeof(*currentQuery)); | ||
memset(currentQuery, 0, sizeof(*currentQuery)); | ||
currentQuery->from = *fromBuffer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are meant to be in samplePacket->from
?
From the struct definition from
is just an unsigned
and this would essentially put the first unsigned
from the fromBuffer
into currentQuery
. That would be the first part of iaddr.u
and that is only the IPv4 address (or a very small part of it) which means this does not work for IPv6.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
samplePacket->from
should contain IP addresses (both IPv4 and v6 would be optimal).
On my system unsigned
is 32 bits (which is the size of IPv4 adresses), so it should be able to store a full IPv4. I'll change it to uint_32
so it also works on all systems.
As to IPv6, I did not find out how to handle such big numbers properly. I thought it would be better to take a small portion of the IPv6 addresses than not to work with IPv6 at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry you need to find a way to support both v4 and v6 otherwise I will reject the pull request. I do not know how exactly uthash works but maybe you can use the either iaddr
which from
is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially used from
but I came to the conclusion that it doesn't work properly. I'll give it another try though. If I don't get it to work like this, I'll look for another way to support IPv6 addresses.
In case I can't get it to work with IPv6 at all, I'll inform you, so you can reject this pull request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look at plugins/rssm/rssm.c for an example of how to make a hash table with both v4 and v6 addresses ("my_hashtbl").
dnscap_common.h defines 'iaddr' which you should use to store v4/v6 addrs.
rssm.c (and other plugin .c files) have iaddr_hash().
At some point it would make sense to move hashtbl.c and supporting functions out of plugins and into src.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found a way that will work perfectly fine. Ill commit it on Monday!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry but you have implemented this in a very dangerous way and the use of AMOUNT_32BIT_IN_128BIT
is unneeded. You are copying in and out of different structures without any check of structure sizes and seeing that the hash can take binary blobs as keys you should be able to put two iaddr
structures within the hash entry for both to and from v4/v6 addresses.
You are right, this is not a module or a plugin, that's why it's not implemented in
Alright! I was under the false impression that I did an alright job at mimicking the coding style. By the way, I'm genuinely grateful for your extremely constructive feedback and your time! It's worth a lot! |
Just "sampling" is sufficient. |
I also see that you have created this pull request from your
|
Okay, thanks for letting me know! As of now, I haven't even come to set up github locally on my computer yet. I'm still exclusively using the web interface. Takes some time to get into it. |
Changed parameter for the q option from "unsigned int" to "nth"
changed the parameter of the q option from "unsigned int" to "nth"
Applied most of the suggested corrections suggested by jelu
Disabled the sampling code if dnscap is not compiled with libbind.
if (!((dir_wanted & DIR_INITIATE) != 0) && ((dir_wanted & DIR_RESPONSE) != 0)) | ||
usage("the -q option is incompatible with -s r"); | ||
if(((dir_wanted & DIR_RESPONSE) != 0) && ((dir_wanted & DIR_INITIATE) == 0)) | ||
usage("the -q option is incompatible with -s r"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second glance I do not see the reason to have chooseSidesResponse
, dir_wanted
is set to both direction at start and you really only need to check if DIR_RESPONSE
is not set.
Corrected chooseSidesResponse logic.
Added a proper way to support ipv6 addresses for sampling.
Fixed the libbind dependency logic for sampling
Updated the support for ipv4/v6 in the sampling code. Also made some slight changes to the libbind dependency logic.
rephrased a comment from "Sampling Module" to "sampling"
Well, @jelu what do you think now? |
Sorry @SuperAmbitiousItGuy but I am pressed for time on other issues this week, hope to look at this next week and I won't be including this in v1.1.0 so it will not be merged just yet even if approved. |
Update pcap_thread to v1.0.1, add travis check that dnscap can run
Added sampling to dnscap. With the new -q option, the user is now able to decide whether the output should be sampled as follows: Only every nth query is output and only responses that correspond to one of these sampled queries will be output.
e.g.:
dnscap -r ./file.pcap -g -q 10 will output only every 10th query and these queries' corresponding responses.
Due to the nature of this sampling, the -q option is not compatible with -s r (of which the user would informed by usage()).
Updated dnscap.1.in to match this addition.
Added uthash/uthash.h since I needed a hashtable for the sampling to work. The dnscap and uthash licenses should be compatible.
I've made a very similar pull request at an outdated repository about 16 days ago already:
#10
Now that I' have based my work on the DNS-OARC repo, there are still two issues to talk about:
According to an unfinished master's thesis, sampling the way I implemented, is as suitable as a random distribution for DNS traffic. Since this thesis is not published yet, I cannot provide a source to it.
Also, might it not be difficult to produce good random numbers at such high package processing speeds? A way could be using the content of the single packets to produce random numbers from (like the transaction id)?.
All in all, I'm worried that certain packages could be favored over others.
Simply put, I think it would mean a lot of unnecessary work that I probably can't do properly, since I don't have a lot of experience yet. This would probably make the hash table rather inefficient.
I have absolutely no objection to somebody else doing so, though (although uthash seems quite OK to me).