Skip to content

System hang issue of FeKern (fireeye)

sekim edited this page Jul 8, 2023 · 2 revisions

Recently (May 2023), there was an issue that caused I/O hang in a system where our module (bsr/wdrbd) and FeKern module were operated together.

FeKern is a mini filter module and bsr is a volume filter. The flow of I/O is as follows.

  • I/O request: App -> FeKern -> NTFS -> bsr -> volume -> Disk
  • I/O response: Disk -> Volume -> bsr -> NTFS -> FeKern -> App

In the I/O request flow and I/O response flow, if even one I/O is not processed for a certain period of time and waits, the system hangs.

Fortunately, we reproduced the problem and collected a memory dump to analyze the cause. As a result, FeKern's post-processing routine of the I/O response flow confirmed the I/O stack where I/Os were left waiting.

Analysis of I/O hang occurrence in a situation where security software and volume filter are operated together is to check if there is a module that prevents I/O processing from proceeding in the above I/O flows. One thing to note is to pinpoint the specific point in the dump of the various I/O stacks that is causing the problem. Take courier delivery as an example,

It is too hasty and complacent approach to limit responsibility for undeliverable items to the regional distribution center just because the parcel is pending at the regional distribution center. You need to determine at what point the I/O is waiting. You need to look specifically at which courier employee accidentally dropped the package. Despite that, if you insist on calling your local distribution center and telling them to take responsibility, you're bragging that you're an idiot.

In other words, the evidence for a system hang must be specific.

Here's the I/O completion stack waiting on FeKern , and if you come across stacks like this in the dump, keep in mind the possibility of similar issues.

                                    nt!KiSwapThread+0x17d
                                    nt!KiCommitThreadWait+0x14f
                                    nt!KeWaitForSingleObject+0x377
                                    nt!FsRtlpWaitForIoAtEof+0x11a
                                    nt!FsRtlAcquireEofLock+0x1e4
                                    NTFS!NtfsCommonQueryInformation+0x1f8
                                    NTFS!NtfsFsdDispatchSwitch+0xcc
                                    NTFS!NtfsFsdDispatchWait+0x40
                                    FLTMGR!FltpLegacyProcessingAfterPreCallbacksCompleted+0x1a6
                                    FLTMGR!FltPerformSynchronousIo+0x308
                                    FLTMGR!FltQueryInformationFile+0x52
                                    FeKern+0x6657
                                    FeKern+0x69fa
                                    FeKern+0x560a
                                    FLTMGR!FltpPerformPostCallbacksWorker+0x2fb
                                    FLTMGR!FltpPassThroughCompletionWorker+0x76
                                    nt!IopfCompleteRequest+0x112
                                    bsr!complete_master_bio+0x314
                                    ...   
Clone this wiki locally