-
Notifications
You must be signed in to change notification settings - Fork 353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ice-zc device initialization randomly fails #933
Comments
Hi Alfredo, regarding to #933
We have problem only when we create type #2 application with tx queue. I want to say that in X520 run on Dell G14 we never so problem. We uses pfring 7.4. I want to ask after refine the problem do you have some clue what can append. Thanks, |
@gyarom the additional info you provided would definitely help reproducing the issue, thank you. |
One more question: are you using a single queue or multiple RSS queues in 2.? |
No i do not change the number of rss, it remain 1 both if i use rx only or rx+tx.
should i change the rss to 2 when i use rx-tx?
Guy
From: Alfredo Cardigliano ***@***.***>
Sent: Thursday, 20 June 2024 11:07
To: ntop/PF_RING ***@***.***>
Cc: Yarom, Guy ***@***.***>; Mention ***@***.***>
Subject: Re: [ntop/PF_RING] ice-zc device initialization randomly fails (Issue #933)
Caution: This email originated from outside of the organization. Do not click links or open attachments unless you are confident the content is expected and safe. If you believe this email is suspicious, please send this email as an attachment to Cognyte SOC for further investigation.
One more question: are you using a single queue or multiple RSS queues in 2.?
—
Reply to this email directly, view it on GitHub<#933 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BJKP6XPCACAGITRLQYI7Y4LZIKER5AVCNFSM6AAAAABHJLKFZCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBQGA3DQNBZGU>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
That's fine, I was just asking to collect all the info to reproduce the issue. Thank you. |
@cardigliano @gyarom @[email protected] @[email protected]
|
@gyarom I tried running pfcount and pfsend at the same time, while receiving 10Gbit/15Mpps, but I was not able to reproduce the issue. |
Yarom, Guy ***@***.***) has sent you a protected message.
Read the message Learn about messages protected by Microsoft Purview Message Encryption.
Privacy Statement
Learn More on email encryption. Microsoft Corporation, One Microsoft Way, Redmond, WA 98052
|
Sorry I cannot read this message
… On 5 Jul 2024, at 06:34, gyarom ***@***.***> wrote:
Yarom, Guy ***@***.***) has sent you a protected message.
Read the message Learn about messages protected by Microsoft Purview Message Encryption.
Privacy Statement
Learn More on email encryption. Microsoft Corporation, One Microsoft Way, Redmond, WA 98052
—
Reply to this email directly, view it on GitHub <#933 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABZS6J3PZ7YCPUY5VTRH7K3ZKYO43AVCNFSM6AAAAABHJLKFZCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJQGEZTANZXGM>.
You are receiving this because you were mentioned.
|
Hi Alfredo,
i don’t think that it is in our code, the problem reproduces with ntop zcount and zbalance.
The issue is that the problem occurred in one setup which is little complicate, we also tried to reproduce the problem by the capture, and it does not reproduce to us as well.
We do not know what the difference is.
We try to replace all hw replace nic’s in this problematic setup, but problem continue.
We are trying now to reproduce the problem in simpler setup.
Remark: we does not have pf_ring license because we move from pf_ring 8.2 to 8.7, because the problem occurred in startup we does not care.
Can we open some log level to understand something?
attached example.
***@***.***
Thanks,
Guy
From: Alfredo Cardigliano ***@***.***>
Sent: Wednesday, 3 July 2024 16:22
To: ntop/PF_RING ***@***.***>
Cc: Yarom, Guy ***@***.***>; Mention ***@***.***>
Subject: Re: [ntop/PF_RING] ice-zc device initialization randomly fails (Issue #933)
Caution: This email originated from outside of the organization. Do not click links or open attachments unless you are confident the content is expected and safe. If you believe this email is suspicious, please send this email as an attachment to Cognyte SOC for further investigation.
@gyarom<https://github.com/gyarom> I tried running pfcount and pfsend at the same time, while receiving 10Gbit/15Mpps, but I was not able to reproduce the issue.
Could you provide a code snippet (or a sample application source code) for reproducing this?
—
Reply to this email directly, view it on GitHub<#933 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BJKP6XJWBVJ5INCFLGQVWCLZKP3IFAVCNFSM6AAAAABHJLKFZCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBWGA3DMNBVGU>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
In the steps above about "How we reproduce the problem" you wrote:
But I am a bit confused:
|
Hi Alfredo,
Just to clarify, the problem reproduces both in zcount & zbalance, even without our application.
At the begging I thought that tx may cause the problem, because in this environment we also transmit packet with our application, but I was wrong.
I may describe scenario that i watch then, but it is not relevant, also pure ntop application have problem in this environment and we don’t know why.
His there is a why to open verbose logs
Thanks,
Guy
From: Alfredo Cardigliano ***@***.***>
Sent: Friday, 5 July 2024 10:29
To: ntop/PF_RING ***@***.***>
Cc: Yarom, Guy ***@***.***>; Mention ***@***.***>
Subject: Re: [ntop/PF_RING] ice-zc device initialization randomly fails (Issue #933)
Caution: This email originated from outside of the organization. Do not click links or open attachments unless you are confident the content is expected and safe. If you believe this email is suspicious, please send this email as an attachment to Cognyte SOC for further investigation.
In the steps above about "How we reproduce the problem" you wrote:
1. Start our application with no traffic from simulator.
2. Wait for keep alive between our application and another Cognyte device.
3. Start to inject traffic from simulator (TestCenter/Spirent).
traffic. Income with no problem.
4. Stop our application and run ntop application zbalance, which is similar to our application.
./zbalance -i zc:ens3f0 -c 2 -g 1:3:5:7:9:11:13:15 -r 31
and all packets are drops.
if you stop the simulator during start running zbalance, there are no drops.
Can we make short meeting that I will demonstrate the problem, maybe be you will have some idea how to continue.
But I am a bit confused:
* This means you run zbalance after stopping your application, this means the only active traffic is come the simulator at the time you open the socket, thus I do not understand how your application can affect zbalance..
* You said it happens when you receive and transmit at the same time, however zbalance is not transmitting in your configuration. Please clarify.
—
Reply to this email directly, view it on GitHub<#933 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BJKP6XKE4CLWLB6I6JCBKC3ZKZDMHAVCNFSM6AAAAABHJLKFZCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJQGM2DMMZSGQ>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Hi Alferdo,
I try to compare dmesg printing, between running zcount on good and bad nic’s (bad nic is when we drops all packets).
There is one error, it may explain our problem, but i’m not sure.
I color in yellow the problematic error.
Can you please advice if it is relevant to our problem?
Run zcount on problematic nic:
/usr/local/vtps/pf_ring/zc/zcount -i zc:ens3f0 -c 3 -d
***@***.*** workspace]# dmesg
[192664.275604] [PF_RING] Trying to map ZC device ***@***.***
[192664.292795] device ens3f0 entered promiscuous mode
[192683.844325] device ens3f0 left promiscuous mode
[192683.846488] [PF_RING] Removing ZC device ***@***.*** [rx-ring=000000002ac0536a][tx-ring=00000000ec284ff2]
[192683.925362] ice 0000:98:00.0: PTP reset successful
[192683.946548] irq 889: Affinity broken due to vector space exhaustion.
[192683.946576] [PF_RING] Registering ZC device ***@***.*** [rx-ring=00000000c824e83b][tx-ring=00000000550cc3bc]
[192683.946582] ice 0000:98:00.0: VSI rebuilt. VSI index 0, type ICE_VSI_PF
[192683.951322] ice 0000:98:00.0: VSI rebuilt. VSI index 1, type ICE_VSI_CTRL
Run zcount with no problem:
/usr/local/vtps/pf_ring/zc/zcount -i zc:ens1f0 -c 3 -d
***@***.*** workspace]# dmesg
[192851.357644] [PF_RING] Trying to map ZC device ***@***.***
[192851.370795] device ens1f0 entered promiscuous mode
[192869.840987] device ens1f0 left promiscuous mode
[192869.842899] [PF_RING] Removing ZC device ***@***.*** [rx-ring=000000007972704d][tx-ring=00000000c3ed560f]
[192869.934103] ice 0000:17:00.0: PTP reset successful
[192869.961668] [PF_RING] Registering ZC device ***@***.*** [rx-ring=0000000085a59d9f][tx-ring=000000004a5c0f97]
[192869.961679] ice 0000:17:00.0: VSI rebuilt. VSI index 0, type ICE_VSI_PF
[192869.965606] ice 0000:17:00.0: VSI rebuilt. VSI index 1, type ICE_VSI_CTRL
Thanks,
Guy
From: Yarom, Guy ***@***.***>
Sent: Friday, 5 July 2024 10:48
To: ntop/PF_RING ***@***.***>; ntop/PF_RING ***@***.***>
Cc: Mention ***@***.***>; Irony, Dorit ***@***.***>; Levi, Ofir ***@***.***>; Shasha, Ofer ***@***.***>
Subject: RE: [ntop/PF_RING] ice-zc device initialization randomly fails (Issue #933)
Hi Alfredo,
Just to clarify, the problem reproduces both in zcount & zbalance, even without our application.
At the begging I thought that tx may cause the problem, because in this environment we also transmit packet with our application, but I was wrong.
I may describe scenario that i watch then, but it is not relevant, also pure ntop application have problem in this environment and we don’t know why.
His there is a why to open verbose logs
Thanks,
Guy
From: Alfredo Cardigliano ***@***.******@***.***>>
Sent: Friday, 5 July 2024 10:29
To: ntop/PF_RING ***@***.******@***.***>>
Cc: Yarom, Guy ***@***.******@***.***>>; Mention ***@***.******@***.***>>
Subject: Re: [ntop/PF_RING] ice-zc device initialization randomly fails (Issue #933)
Caution: This email originated from outside of the organization. Do not click links or open attachments unless you are confident the content is expected and safe. If you believe this email is suspicious, please send this email as an attachment to Cognyte SOC for further investigation.
In the steps above about "How we reproduce the problem" you wrote:
1. Start our application with no traffic from simulator.
2. Wait for keep alive between our application and another Cognyte device.
3. Start to inject traffic from simulator (TestCenter/Spirent).
traffic. Income with no problem.
4. Stop our application and run ntop application zbalance, which is similar to our application.
./zbalance -i zc:ens3f0 -c 2 -g 1:3:5:7:9:11:13:15 -r 31
and all packets are drops.
if you stop the simulator during start running zbalance, there are no drops.
Can we make short meeting that I will demonstrate the problem, maybe be you will have some idea how to continue.
But I am a bit confused:
* This means you run zbalance after stopping your application, this means the only active traffic is come the simulator at the time you open the socket, thus I do not understand how your application can affect zbalance..
* You said it happens when you receive and transmit at the same time, however zbalance is not transmitting in your configuration. Please clarify.
—
Reply to this email directly, view it on GitHub<#933 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BJKP6XKE4CLWLB6I6JCBKC3ZKZDMHAVCNFSM6AAAAABHJLKFZCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJQGM2DMMZSGQ>.
You are receiving this because you were mentioned.Message ID: ***@***.******@***.***>>
|
I do not see the color, but I guess you mean "irq 889: Affinity broken due to vector space exhaustion". I will dig a bit, first time I see this error. |
Ofir my manager found this link.
https://www.suse.com/support/kb/doc/?id=000019936
Thanks,
Guy
From: Alfredo Cardigliano ***@***.***>
Sent: Monday, 8 July 2024 10:01
To: ntop/PF_RING ***@***.***>
Cc: Yarom, Guy ***@***.***>; Mention ***@***.***>
Subject: Re: [ntop/PF_RING] ice-zc device initialization randomly fails (Issue #933)
Caution: This email originated from outside of the organization. Do not click links or open attachments unless you are confident the content is expected and safe. If you believe this email is suspicious, please send this email as an attachment to Cognyte SOC for further investigation.
I do not see the color, but I guess you mean "irq 889: Affinity broken due to vector space exhaustion". I will dig a bit, first time I see this error.
—
Reply to this email directly, view it on GitHub<#933 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BJKP6XJ4RGCBOCKP5MAVXRTZLI2KLAVCNFSM6AAAAABHJLKFZCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJTGE4TCMRWHE>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Hi Alfredo,
We are trying to allow you direct connection to the machine with the problem.
You will be free to do everything that you like by yourself.
we still have some work to arrange the setting.
Is it ok with you?
If yes, do you have preferred time, let say next week?
Thanks,
Guy
From: Yarom, Guy
Sent: Friday, 5 July 2024 10:19
To: ntop/PF_RING ***@***.***>; ntop/PF_RING ***@***.***>; Alfredo Cardigliano ***@***.***>
Cc: Mention ***@***.***>; Irony, Dorit ***@***.***>; Levi, Ofir ***@***.***>; Shasha, Ofer ***@***.***>
Subject: RE: [ntop/PF_RING] ice-zc device initialization randomly fails (Issue #933)
Hi Alfredo,
i don’t think that it is in our code, the problem reproduces with ntop zcount and zbalance.
The issue is that the problem occurred in one setup which is little complicate, we also tried to reproduce the problem by the capture, and it does not reproduce to us as well.
We do not know what the difference is.
We try to replace all hw replace nic’s in this problematic setup, but problem continue.
We are trying now to reproduce the problem in simpler setup.
Remark: we does not have pf_ring license because we move from pf_ring 8.2 to 8.7, because the problem occurred in startup we does not care.
Can we open some log level to understand something?
attached example.
***@***.***
Thanks,
Guy
From: Alfredo Cardigliano ***@***.******@***.***>>
Sent: Wednesday, 3 July 2024 16:22
To: ntop/PF_RING ***@***.******@***.***>>
Cc: Yarom, Guy ***@***.******@***.***>>; Mention ***@***.******@***.***>>
Subject: Re: [ntop/PF_RING] ice-zc device initialization randomly fails (Issue #933)
Caution: This email originated from outside of the organization. Do not click links or open attachments unless you are confident the content is expected and safe. If you believe this email is suspicious, please send this email as an attachment to Cognyte SOC for further investigation.
@gyarom<https://github.com/gyarom> I tried running pfcount and pfsend at the same time, while receiving 10Gbit/15Mpps, but I was not able to reproduce the issue.
Could you provide a code snippet (or a sample application source code) for reproducing this?
—
Reply to this email directly, view it on GitHub<#933 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BJKP6XJWBVJ5INCFLGQVWCLZKP3IFAVCNFSM6AAAAABHJLKFZCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBWGA3DMNBVGU>.
You are receiving this because you were mentioned.Message ID: ***@***.******@***.***>>
|
@gyarom that would be useful. I will be available next week in the CET (Italy) timezone. |
It seems a4e76ea fixed this, please reopen if reoccurs. |
Hi Alfredo, @[email protected], @[email protected] Your fixes is allot better, but it does not fix every thing. Thx, |
@gyarom please ignore the pps and check the absolute packet count (e.g. send 10 Million packets and count how many are captured). If there are more packets then expected, please print or dump those and let us see them to figure out from where they are coming from. |
Hi Alfredo, @cardigliano,@[email protected], @[email protected] I checked your assumption that it is only issue of “absolute packet count”. |
@gyarom please ask for an evaluation license to avoid restating the application every 5 minutes as application crashes may corrupt data structures. As of the packet count, we cannot do much if we do not have evidence of what kind of packets are exceeding the expected count, it is strange the adapter produces extra packets, it may be there is some loop in the network or other issues. |
Yarom, Guy ***@***.***) has sent you a protected message.
Read the message Learn about messages protected by Microsoft Purview Message Encryption.
Privacy Statement
Learn More on email encryption. Microsoft Corporation, One Microsoft Way, Redmond, WA 98052
|
Hi Alfredo, I ordered evaluation license from Maria.
zcount see the same as vtps
|
@gyarom what is vtps doing? Is it injecting some traffic perhaps? |
@gyarom I connected to your machine, I ran vtps, anche checked the hadware packet counter on the network interface with ethtool -S ens1f0 with a 1sec interval, and the counter is increasing by 4.7Mpps. This means there are actually 4.7 Mpps hitting the adapter. I think vtps is creating some loop in the network. |
@cardigliano |
Hi Alfredo, First, the 4.77 Mpps and 10G input issue is not related to pf_ring; it is the Cognyte environment that is causing the problem. Thanks, |
@gyarom please note that the adapter takes a bit to reload when opening/closing the socket, it may be when the service is restarted due to a demo expiration, the socket reset is too fast creating such issue. I suggest to check if this creates issues also after fixing the license, as in that case you do not have such restarts. |
Hi Alfredo, Guy |
When starting applications on ice-zc, sometimes the device initialization fails or all packets are dropped
The text was updated successfully, but these errors were encountered: