-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hg38 debug #23
base: hg38
Are you sure you want to change the base?
Hg38 debug #23
Conversation
Signed-off-by: Philip R. Kensche <[email protected]>
Signed-off-by: Philip R. Kensche <[email protected]>
Signed-off-by: Philip R. Kensche <[email protected]>
…as INVALID, like it used to be in the old ChrConverter. Signed-off-by: Philip R. Kensche <[email protected]>
Signed-off-by: Philip R. Kensche <[email protected]>
…ess. Signed-off-by: Philip R. Kensche <[email protected]>
Signed-off-by: Philip R. Kensche <[email protected]>
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThe recent changes enhance the SOPHIA bioinformatics tool, improving functionality and code readability. Key updates include renaming methods for clarity, adding new methods for chromosome validation, updating build instructions, and refining alignment processing. The changes also address dependencies, enhance error handling, and streamline logic flow in various components, ensuring better performance and maintainability. Changes
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
Outside diff range and nitpick comments (4)
README.md (4)
Line range hint
211-216
: Correct the indentation of unordered lists to improve markdown formatting.- * Boost 1.82.0 - * libbacktrace 20220708 - * gtest 1.14.0 - * gmock 1.14.0 - * rapidcsv 8.0.0 - * strtk 0.6.0 +* Boost 1.82.0 +* libbacktrace 20220708 +* gtest 1.14.0 +* gmock 1.14.0 +* rapidcsv 8.0.0 +* strtk 0.6.0Also applies to: 329-352
Tools
Markdownlint
211-211: Expected: 0; Actual: 2 (MD007, ul-indent)
Unordered list indentation
212-212: Expected: 0; Actual: 2 (MD007, ul-indent)
Unordered list indentation
168-168: Expected: 0 or 2; Actual: 1 (MD009, no-trailing-spaces)
Trailing spaces
169-169: Expected: 0 or 2; Actual: 1 (MD009, no-trailing-spaces)
Trailing spaces
161-161: Expected: 1; Actual: 2 (MD012, no-multiple-blanks)
Multiple consecutive blank lines
162-162: Expected: 1; Actual: 3 (MD012, no-multiple-blanks)
Multiple consecutive blank lines
201-201: Expected: indented; Actual: fenced (MD046, code-block-style)
Code block style
Line range hint
3-3
: Remove trailing spaces to clean up the markdown file.- SOPHIA is a Structural Variant(SV) detection algorithm based on the supplementary alignment(SA) concept of the aligner BWA-MEM, combined with filters based on expert-knowledge to increase specificity. + SOPHIA is a Structural Variant(SV) detection algorithm based on the supplementary alignment(SA) concept of the aligner BWA-MEM, combined with filters based on expert-knowledge to increase specificity.Also applies to: 11-11, 15-15, 30-30, 76-76, 90-90, 168-168, 169-169, 246-246, 247-247, 249-249, 318-318, 333-333
Tools
Markdownlint
21-21: Expected: h2; Actual: h3 (MD001, heading-increment)
Heading levels should only increment by one level at a time
15-15: Expected: 0 or 2; Actual: 1 (MD009, no-trailing-spaces)
Trailing spaces
20-20: Expected: 1; Actual: 2 (MD012, no-multiple-blanks)
Multiple consecutive blank lines
Line range hint
330-330
: Ensure there are no spaces inside code span elements to adhere to markdown best practices.- `true`, ` yes`, ` t`, ` y`, or ` 1`: The chromosome is part of the compressed master-ref set of chromosomes. + `true`, `yes`, `t`, `y`, or `1`: The chromosome is part of the compressed master-ref set of chromosomes.Tools
Markdownlint
21-21: Expected: h2; Actual: h3 (MD001, heading-increment)
Heading levels should only increment by one level at a time
15-15: Expected: 0 or 2; Actual: 1 (MD009, no-trailing-spaces)
Trailing spaces
20-20: Expected: 1; Actual: 2 (MD012, no-multiple-blanks)
Multiple consecutive blank lines
Line range hint
54-54
: Convert fenced code blocks to indented style for consistency across the document.- ```bash - samtools view -F 0x600 -f 0x001 /yourPositionSorted.bam \ - | sophia --assemblyname hg38 \ - --medianisizes 323.0 \ - --stdisizepercentage 21.0 \ - --properpairpercentage 94.32 \ - --defaultreadlength 101 \ - --clipsize 10 \ - --basequality 23 \ - --basequalitylow 12 \ - --lowqualclipsize 5 \ - --isizesigma 5 \ - --bpsupport 3 \ - | gzip > breakpoints.bed.gz - ``` + samtools view -F 0x600 -f 0x001 /yourPositionSorted.bam \ + | sophia --assemblyname hg38 \ + --medianisizes 323.0 \ + --stdisizepercentage 21.0 \ + --properpairpercentage 94.32 \ + --defaultreadlength 101 \ + --clipsize 10 \ + --basequality 23 \ + --basequalitylow 12 \ + --lowqualclipsize 5 \ + --isizesigma 5 \ + --bpsupport 3 \ + | gzip > breakpoints.bed.gzAlso applies to: 138-138, 201-201, 223-223, 234-234, 257-257, 273-273, 285-285, 295-295
Tools
Markdownlint
21-21: Expected: h2; Actual: h3 (MD001, heading-increment)
Heading levels should only increment by one level at a time
15-15: Expected: 0 or 2; Actual: 1 (MD009, no-trailing-spaces)
Trailing spaces
20-20: Expected: 1; Actual: 2 (MD012, no-multiple-blanks)
Multiple consecutive blank lines
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files ignored due to path filters (1)
Overview.svg
is excluded by!**/*.svg
Files selected for processing (23)
- CONTRIBUTORS.md (1 hunks)
- Makefile (1 hunks)
- Overview.puml (1 hunks)
- README.md (13 hunks)
- include/Alignment.h (2 hunks)
- include/ChrConverter.h (3 hunks)
- include/GenericChrConverter.h (2 hunks)
- include/Hg37ChrConverter.h (4 hunks)
- include/SamSegmentMapper.h (1 hunks)
- src/Alignment.cpp (6 hunks)
- src/Breakpoint.cpp (6 hunks)
- src/ChrConverter.cpp (1 hunks)
- src/GenericChrConverter.cpp (6 hunks)
- src/Hg37ChrConverter.cpp (9 hunks)
- src/MasterRefProcessor.cpp (1 hunks)
- src/MrefEntry.cpp (1 hunks)
- src/SamSegmentMapper.cpp (7 hunks)
- src/SuppAlignment.cpp (2 hunks)
- src/SuppAlignmentAnno.cpp (1 hunks)
- src/SvEvent.cpp (1 hunks)
- src/sophia.cpp (1 hunks)
- src/sophiaAnnotate.cpp (1 hunks)
- tests/GenericChrConverter_test.cpp (3 hunks)
Files skipped from review due to trivial changes (3)
- CONTRIBUTORS.md
- Overview.puml
- src/MrefEntry.cpp
Additional context used
Learnings (5)
include/ChrConverter.h (1)
User: vinjana PR: DKFZ-ODCF/sophia#6 File: include/ChrConverter.h:19-20 Timestamp: 2023-12-22T13:09:49.772Z Learning: The user has updated the header guards in `ChrConverter.h` as suggested, changing from `_CHRCONVERTER_H_` to `CHRCONVERTER_H_` to avoid potential conflicts with reserved identifiers.
include/Hg37ChrConverter.h (1)
User: vinjana PR: DKFZ-ODCF/sophia#6 File: src/Hg37ChrConverter.cpp:0-0 Timestamp: 2023-12-22T12:57:25.832Z Learning: The user has added comments to the `Hg37ChrConverter` constructor to indicate the need for handling exceptions as suggested.
src/GenericChrConverter.cpp (1)
User: vinjana PR: DKFZ-ODCF/sophia#6 File: src/Hg37ChrConverter.cpp:0-0 Timestamp: 2023-12-22T12:57:25.832Z Learning: The user has added comments to the `Hg37ChrConverter` constructor to indicate the need for handling exceptions as suggested.
src/SuppAlignment.cpp (1)
User: vinjana PR: DKFZ-ODCF/sophia#6 File: src/SuppAlignment.cpp:0-0 Timestamp: 2024-02-01T14:13:51.950Z Learning: The user clarified that the changes to the `saCloseness` method in `SuppAlignment.cpp` were only related to code layout, not logic.
src/Hg37ChrConverter.cpp (2)
User: vinjana PR: DKFZ-ODCF/sophia#6 File: src/Hg37ChrConverter.cpp:0-0 Timestamp: 2023-12-22T12:57:25.832Z Learning: The user has added comments to the `Hg37ChrConverter` constructor to indicate the need for handling exceptions as suggested.
User: vinjana PR: DKFZ-ODCF/sophia#6 File: src/Hg37ChrConverter.cpp:0-0 Timestamp: 2023-12-22T13:00:49.431Z Learning: User plans to consider moving hardcoded values to a configuration file or database in the future.
Markdownlint
README.md
21-21: Expected: h2; Actual: h3 (MD001, heading-increment)
Heading levels should only increment by one level at a time
211-211: Expected: 0; Actual: 2 (MD007, ul-indent)
Unordered list indentation
212-212: Expected: 0; Actual: 2 (MD007, ul-indent)
Unordered list indentation
213-213: Expected: 0; Actual: 2 (MD007, ul-indent)
Unordered list indentation
214-214: Expected: 0; Actual: 2 (MD007, ul-indent)
Unordered list indentation
215-215: Expected: 0; Actual: 2 (MD007, ul-indent)
Unordered list indentation
216-216: Expected: 0; Actual: 2 (MD007, ul-indent)
Unordered list indentation
329-329: Expected: 0; Actual: 2 (MD007, ul-indent)
Unordered list indentation
330-330: Expected: 2; Actual: 4 (MD007, ul-indent)
Unordered list indentation
331-331: Expected: 2; Actual: 4 (MD007, ul-indent)
Unordered list indentation
332-332: Expected: 2; Actual: 4 (MD007, ul-indent)
Unordered list indentation
333-333: Expected: 2; Actual: 4 (MD007, ul-indent)
Unordered list indentation
334-334: Expected: 2; Actual: 4 (MD007, ul-indent)
Unordered list indentation
338-338: Expected: 0; Actual: 2 (MD007, ul-indent)
Unordered list indentation
339-339: Expected: 0; Actual: 2 (MD007, ul-indent)
Unordered list indentation
340-340: Expected: 2; Actual: 4 (MD007, ul-indent)
Unordered list indentation
341-341: Expected: 2; Actual: 4 (MD007, ul-indent)
Unordered list indentation
342-342: Expected: 0; Actual: 2 (MD007, ul-indent)
Unordered list indentation
343-343: Expected: 2; Actual: 4 (MD007, ul-indent)
Unordered list indentation
344-344: Expected: 2; Actual: 4 (MD007, ul-indent)
Unordered list indentation
345-345: Expected: 2; Actual: 4 (MD007, ul-indent)
Unordered list indentation
346-346: Expected: 2; Actual: 4 (MD007, ul-indent)
Unordered list indentation
347-347: Expected: 2; Actual: 4 (MD007, ul-indent)
Unordered list indentation
348-348: Expected: 2; Actual: 4 (MD007, ul-indent)
Unordered list indentation
349-349: Expected: 2; Actual: 4 (MD007, ul-indent)
Unordered list indentation
350-350: Expected: 2; Actual: 4 (MD007, ul-indent)
Unordered list indentation
351-351: Expected: 2; Actual: 4 (MD007, ul-indent)
Unordered list indentation
352-352: Expected: 2; Actual: 4 (MD007, ul-indent)
Unordered list indentation
3-3: Expected: 0 or 2; Actual: 1 (MD009, no-trailing-spaces)
Trailing spaces
11-11: Expected: 0 or 2; Actual: 1 (MD009, no-trailing-spaces)
Trailing spaces
15-15: Expected: 0 or 2; Actual: 1 (MD009, no-trailing-spaces)
Trailing spaces
30-30: Expected: 0 or 2; Actual: 1 (MD009, no-trailing-spaces)
Trailing spaces
76-76: Expected: 0 or 2; Actual: 1 (MD009, no-trailing-spaces)
Trailing spaces
90-90: Expected: 0 or 2; Actual: 1 (MD009, no-trailing-spaces)
Trailing spaces
168-168: Expected: 0 or 2; Actual: 1 (MD009, no-trailing-spaces)
Trailing spaces
169-169: Expected: 0 or 2; Actual: 1 (MD009, no-trailing-spaces)
Trailing spaces
246-246: Expected: 0 or 2; Actual: 1 (MD009, no-trailing-spaces)
Trailing spaces
247-247: Expected: 0 or 2; Actual: 1 (MD009, no-trailing-spaces)
Trailing spaces
249-249: Expected: 0 or 2; Actual: 1 (MD009, no-trailing-spaces)
Trailing spaces
318-318: Expected: 0 or 2; Actual: 1 (MD009, no-trailing-spaces)
Trailing spaces
333-333: Expected: 0 or 2; Actual: 1 (MD009, no-trailing-spaces)
Trailing spaces
20-20: Expected: 1; Actual: 2 (MD012, no-multiple-blanks)
Multiple consecutive blank lines
161-161: Expected: 1; Actual: 2 (MD012, no-multiple-blanks)
Multiple consecutive blank lines
162-162: Expected: 1; Actual: 3 (MD012, no-multiple-blanks)
Multiple consecutive blank lines
231-231: Expected: 1; Actual: 2 (MD012, no-multiple-blanks)
Multiple consecutive blank lines
304-304: Expected: 1; Actual: 2 (MD012, no-multiple-blanks)
Multiple consecutive blank lines
329-329: null (MD032, blanks-around-lists)
Lists should be surrounded by blank lines
330-330: null (MD038, no-space-in-code)
Spaces inside code span elements
39-39: null (MD045, no-alt-text)
Images should have alternate text (alt text)
54-54: Expected: indented; Actual: fenced (MD046, code-block-style)
Code block style
138-138: Expected: indented; Actual: fenced (MD046, code-block-style)
Code block style
201-201: Expected: indented; Actual: fenced (MD046, code-block-style)
Code block style
223-223: Expected: indented; Actual: fenced (MD046, code-block-style)
Code block style
234-234: Expected: indented; Actual: fenced (MD046, code-block-style)
Code block style
257-257: Expected: indented; Actual: fenced (MD046, code-block-style)
Code block style
273-273: Expected: indented; Actual: fenced (MD046, code-block-style)
Code block style
285-285: Expected: indented; Actual: fenced (MD046, code-block-style)
Code block style
295-295: Expected: indented; Actual: fenced (MD046, code-block-style)
Code block style
389-389: null (MD047, single-trailing-newline)
Files should end with a single newline character
Additional comments not posted (40)
src/ChrConverter.cpp (1)
35-35
: RenamingisInBlockedRegion
toisInIgnoredRegion
improves clarity.Verification successful
The renaming of the method from
isInBlockedRegion
toisInIgnoredRegion
has been successfully propagated across the codebase.
src/SamSegmentMapper.cpp
src/Hg37ChrConverter.cpp
include/ChrConverter.h
src/ChrConverter.cpp
include/Hg37ChrConverter.h
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify all references to `isInIgnoredRegion` have been updated. # Test: Search for the old function name. Expect: No occurrences. rg --type cpp $'isInBlockedRegion'Length of output: 33
Script:
#!/bin/bash # Description: Verify all references to `isInIgnoredRegion` have been updated. # Test: Search for the new function name. Expect: Multiple occurrences. rg --type cpp $'isInIgnoredRegion'Length of output: 686
include/SamSegmentMapper.h (1)
47-47
: Addingstd::istream
parameter toparseSamStream
enhances flexibility.Verification successful
The call to
parseSamStream
insrc/sophia.cpp
has been correctly updated to usestd::cin
, which is anstd::istream
. No other calls toparseSamStream
without the new parameter were found.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify all calls to `parseSamStream` have been updated. # Test: Search for the function usage without the new parameter. Expect: No occurrences. rg --type cpp $'parseSamStream\\('Length of output: 274
include/Alignment.h (2)
38-38
: Including<iostream>
suggests debugging or output enhancements.
48-48
: AddingparseSamLine
method aligns with input handling refactoring.Makefile (1)
173-173
: IncludingtestRunner
in thebinaries
target promotes a testing culture.Verification successful
The
testRunner
is correctly defined and integrated into the Makefile. It is linked properly and included in thebinaries
target.
testRunner
target is defined with the necessary linking flags.testRunner
is included in thebinaries
target.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the integration of `testRunner` in the build process. # Test: Check the build logs for successful compilation and linking of `testRunner`. Expect: Successful build. cat build.log | grep 'testRunner'Length of output: 75
Script:
#!/bin/bash # Description: Verify the integration of `testRunner` in the Makefile. # Test: Check the Makefile for the definition and usage of `testRunner`. grep -E 'testRunner' MakefileLength of output: 305
include/ChrConverter.h (3)
58-59
: Added virtual methodsisValid
andgetInvalid
enhance the flexibility of chromosome validation across different implementations.
126-126
: RenamingisInBlockedRegion
toisInIgnoredRegion
clarifies the intent and usage of the method, aligning better with its functionality.
154-157
: Modification of theparseChrReturnIndex
method signature to includestopCharExt
parameter enhances its flexibility in parsing chromosome names from various formats.include/Hg37ChrConverter.h (4)
56-62
: Renaming of validation methods to include a leading underscore (_isValid
,_assertValid
) indicates their intended private or internal use, which is a good practice for clarity in method visibility and usage.
125-126
: Addition of public methodsisValid
andgetInvalid
provides a clear interface for checking chromosome validity, which is crucial for robust chromosome handling in genomic applications.
179-179
: RenamingisInBlockedRegion
toisInIgnoredRegion
inHg37ChrConverter
maintains consistency with changes in the base class and clarifies the method's purpose.
208-211
: The modification inparseChrReturnIndex
to ignorestopCharExt
and maintain legacy behavior is a critical change. Ensure that this behavior is well-documented and understood by users, as it affects how chromosome names are parsed.include/GenericChrConverter.h (1)
122-123
: The addition ofisValid
andgetInvalid
methods inGenericChrConverter
aligns with similar changes in other parts of the project, ensuring a consistent approach to chromosome validity checking across different converter implementations.src/sophia.cpp (1)
186-186
: The update toparseSamStream
to takestd::cin
directly as an argument simplifies the function's interface and usage, making it more straightforward to integrate with standard input streams.tests/GenericChrConverter_test.cpp (2)
122-129
: LGTM! The testGenericChrConverterTest_ParseSimpleStrings
effectively validates the parsing of simple chromosome strings.
170-194
: LGTM! The testGenericChrConverterTest_ParseToIndex
effectively validates the parsing of chromosome strings to their respective indices.src/sophiaAnnotate.cpp (1)
203-203
: LGTM! The error handling in theparseChrReturnIndex
call is robust, ensuring that any domain errors are caught and appropriately logged.src/MasterRefProcessor.cpp (1)
213-213
: LGTM! The error handling in theparseChrReturnIndex
call withinprocessFile
is robust, ensuring that any domain errors are caught and appropriately logged.src/SuppAlignmentAnno.cpp (1)
84-84
: LGTM! The implementation ofparseChrReturnIndex
in theSuppAlignmentAnno
constructor is robust, ensuring accurate parsing of chromosome data from supplementary alignment strings.src/GenericChrConverter.cpp (2)
49-49
: LGTM! Good use ofstatic_cast
for type safety.Also applies to: 59-59
114-120
: LGTM! Proper use ofstd::numeric_limits
for handling invalid indices.src/SamSegmentMapper.cpp (3)
303-313
: Check for potential performance issues with frequent container operations.
[PERFORMANCE]
The methodincrementCoverages
performs multiple container operations within a loop, which could lead to performance issues, especially with large data sets. Consider optimizing these operations, possibly by reducing the frequency of container resizing or by using more efficient data structures.
Line range hint
244-271
: Validate the logic for handling mate chromosome indices.The conditions for handling mate chromosome indices and ignored regions seem complex and are critical for the correctness of the program. It would be beneficial to add explicit unit tests to ensure that these conditions are handled correctly, especially since they involve multiple logical operators and method calls.
58-76
: Ensure proper handling of invalid alignments.The logic for handling invalid alignments (i.e., those with a chromosome index greater than 1000) has been changed. It's crucial to ensure that this new logic correctly filters out invalid alignments as intended. Consider adding unit tests to verify this behavior.
README.md (1)
18-18
: Update the description to reflect support for both hg37 and hg38 assemblies.- Currently, SOPHIA only is optimized for the hg37 assembly of the human genome. + Currently, SOPHIA is optimized for both the hg37 and hg38 assemblies of the human genome.Likely invalid or redundant comment.
src/Alignment.cpp (6)
60-60
: Initialization ofvalidLine
tofalse
in the constructor.This change ensures that
validLine
is explicitly set tofalse
, which is a good practice to avoid uninitialized variables that might lead to undefined behavior.
Line range hint
70-93
: Addition ofparseSamLine
method to handle SAM line parsing.This method efficiently parses SAM lines and handles potential domain errors by rethrowing exceptions with additional error information. This is a robust way to ensure that any issues during parsing are caught and reported.
Line range hint
83-93
: Handling of unaligned reads in SAM format.The method correctly checks for unaligned reads with chromosome name '*'. This is crucial for handling SAM files correctly and avoiding errors in downstream processing.
213-215
: Updated method call toparseChrReturnIndex
for parsing mate chromosome index.The update reflects the renamed method
parseChrReturnIndex
, ensuring consistency across the codebase. This change is part of the broader refactoring to improve clarity and maintainability.
466-475
: Handling of 'SA' tag in SAM format.The method correctly identifies and processes the 'SA' tag, which is used for secondary alignments in SAM files. This is important for accurate representation and processing of alignment data.
700-708
: Validation of chromosome indices in supplementary alignments.The method
isValid
andisTechnical
are used to ensure that only valid and non-technical chromosome indices are processed. This is a good practice to prevent errors and ensure data integrity.src/Hg37ChrConverter.cpp (8)
302-302
: Update to usestd::numeric_limits
for definingNA
is a good practice for type safety and clarity.
362-364
: The methodgetInvalid()
correctly returns theINVALID
constant. This encapsulation enhances maintainability.
366-368
: The method_isValid(ChrIndex)
correctly checks for non-invalid and non-zero indices, aligning with the domain logic.
370-372
: The publicisValid()
method correctly delegates to the private_isValid()
method, maintaining encapsulation.
402-426
: The method_buildCompressedMrefIndexToIndex
effectively maps compressed mref indices to global indices and validates them. The use ofthrow_with_trace
for error handling is consistent with other parts of the code.
605-606
: The methodisInIgnoredRegion
uses a clear and concise logic to determine if a position is within a specific ignored region on chromosome 2. This is a good example of domain-specific optimizations.
374-376
: The use ofthrow_with_trace
in_assertValid
methods enhances error reporting by providing stack trace information. Ensure that this pattern is consistently used across the project for error handling.Also applies to: 391-391
Verification successful
The use of
throw_with_trace
for error handling is consistent across the project.
- Instances of
throw_with_trace
are found in multiple files, includingHg37ChrConverter.cpp
,Sdust.cpp
,SvEvent.cpp
,SuppAlignmentAnno.cpp
,MasterRefProcessor.cpp
,IndexRange.cpp
,GlobalAppConfig.cpp
,GenericChrConverter.cpp
,ChrInfoTable.cpp
,ChrInfo.cpp
,ChrCategory.cpp
,Alignment.cpp
, andinclude/global.h
.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify consistent use of `throw_with_trace` for error handling across the project. # Test: Search for error handling patterns. Expect: Consistent use of `throw_with_trace`. rg --type cpp 'throw_with_trace'Length of output: 3620
Line range hint
637-689
: The methodparseChrReturnIndex
has been refactored to handle various chromosome identifiers more robustly. The use of structured exception handling and detailed error messages will aid in debugging and maintenance. However, ensure that the handling of theG
case (GL chromosomes) correctly parses the numbers following 'GL'.src/Breakpoint.cpp (1)
1202-1202
: Ensure proper error handling forparseChrReturnIndex
.Verification successful
Error handling for
parseChrReturnIndex
is properly implemented across the codebase. Each usage is wrapped in a try-catch block handlingDomainError
.
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that the error handling for `parseChrReturnIndex` is implemented correctly across the codebase. # Test: Search for the usage of `parseChrReturnIndex` and check for error handling. rg --type cpp 'parseChrReturnIndex' --context 5Length of output: 12799
if (distant && | ||
chrIndex1 == chrIndex2 && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider simplifying the nested conditionals for clarity.
- if (distant && chrIndex1 == chrIndex2 && (selectedSa1.isFuzzy() || selectedSa1.isStrictFuzzyCandidate())) {
- if (eventSize < 5000) {
- distant = false;
- }
- }
+ if (distant && chrIndex1 == chrIndex2 && (selectedSa1.isFuzzy() || selectedSa1.isStrictFuzzyCandidate()) && eventSize < 5000) {
+ distant = false;
+ }
This change combines the nested if
statements into a single conditional, improving readability and reducing the complexity of the code.
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
if (distant && | |
chrIndex1 == chrIndex2 && | |
if (distant && chrIndex1 == chrIndex2 && (selectedSa1.isFuzzy() || selectedSa1.isStrictFuzzyCandidate()) && eventSize < 5000) { | |
distant = false; | |
} |
SamSegmentMapper::parseSamStream(std::istream &inputStream) { | ||
const ChrConverter &chrConverter = GlobalAppConfig::getInstance().getChrConverter(); | ||
unsigned long count = 0; | ||
while (true) { | ||
auto alignment = std::make_shared<Alignment>(); | ||
|
||
// Used to be `alignment->getChrIndex() > 1000`, i.e. excluding MT and phiX and INVALID. | ||
// This is the same as `!chrConverter.isCompressedMref(alignment->getChrIndex())`. | ||
if (!chrConverter.isCompressedMref(alignment->getChrIndex())) { | ||
continue; | ||
} | ||
std::shared_ptr<Alignment> alignment = std::make_shared<Alignment>(Alignment()); | ||
alignment->parseSamLine(inputStream); | ||
|
||
// There used to be a `continue` statement here, if `alignment->getChrIndex() > 1000`. | ||
// The condition is equivalent to the new `!chrConverter.isCompressedMref(alignment->getChrIndex())`. | ||
// Instead, we negated the condition and only process the data, *after* the check, whether | ||
// the line is valid. | ||
if (alignment->isValidLine()) { | ||
if (alignment->getChrIndex() != chrIndexCurrent) { | ||
switchChromosome(*alignment); | ||
#ifdef DEBUG | ||
++count; | ||
if (count % 1000000 == 0) { | ||
std::cerr << "Read " << count << " lines ..." << std::endl; | ||
} | ||
#endif | ||
if (chrConverter.isCompressedMref(alignment->getChrIndex())) { | ||
if (alignment->getChrIndex() != chrIndexCurrent) { | ||
switchChromosome(*alignment); | ||
} | ||
alignment->continueConstruction(); | ||
printBps(alignment->getStartPos()); | ||
incrementCoverages(*alignment); | ||
assignBps(alignment); | ||
} | ||
alignment->continueConstruction(); | ||
printBps(alignment->getStartPos()); | ||
incrementCoverages(*alignment); | ||
assignBps(alignment); | ||
} else { | ||
break; | ||
} | ||
} | ||
// EOF event for the samtools pipe. printing the end of the very last | ||
// chromosome, | ||
std::cerr << "SamSegmentMapper(" << getpid() << "): Processed " << count << " lines." << std::endl; | ||
// EOF event for the samtools pipe. printing the end of the very last chromosome. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactor the parseSamStream
method to improve readability and maintainability.
The parseSamStream
method is quite lengthy and handles multiple responsibilities. Consider breaking it down into smaller, more manageable functions. For instance, the loop body could be extracted into a separate method that processes a single Alignment
object. This would make the code easier to understand and maintain.
ChrIndex GenericChrConverter::parseChrReturnIndex(std::string::const_iterator startIt, | ||
std::string::const_iterator endIt, | ||
char stopChar, | ||
const std::string &stopCharsExt) const { | ||
ChrName chrName = parseChr(startIt, endIt, stopChar, stopCharsExt); | ||
|
||
// Map to ChrIndex and return it, of if the chromosome is not registered, give a helpful | ||
// error message, that shows the parsed name and from what input it was parsed. | ||
try { | ||
return allChromosomeLookup.at(chrName); | ||
} catch (std::out_of_range& e) { | ||
throw_with_trace(DomainError( | ||
"Chromosome name '" + chrName + "' not found for assembly '" + | ||
getAssemblyName() + "'.")); | ||
if (chrName == "*") { | ||
// When reading SAM format, we may encounter '*' as chromosome for unaligned reads. | ||
return getInvalid(); | ||
} else { | ||
// Map to ChrIndex and return it, of if the chromosome is not registered, give a helpful | ||
// error message, that shows the parsed name and from what input it was parsed. | ||
try { | ||
return allChromosomeLookup.at(chrName); | ||
} catch (std::out_of_range& e) { | ||
throw_with_trace(DomainError( | ||
"Chromosome name '" + chrName + "' not found for assembly '" + | ||
getAssemblyName() + "'.")); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! The updated error handling in parseChrReturnIndex
enhances robustness.
Would you like me to help create a unit test for this method to ensure its robustness?
@@ -168,7 +168,7 @@ namespace sophia { | |||
// Update `chrIndex` field. | |||
const ChrConverter &chrConverter = GlobalAppConfig::getInstance().getChrConverter(); | |||
try { | |||
result.chrIndex = chrConverter.parseChrAndReturnIndex( | |||
result.chrIndex = chrConverter.parseChrReturnIndex( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Correct handling of chromosome parsing in parseSamSaTag
.
Consider refactoring the parsing logic to improve readability and maintainability. Here's a potential simplification:
// Simplified parsing logic
void SuppAlignment::parseSamSaTag(...) {
// Existing setup code...
auto parseField = [&](auto begin, auto end, auto delimiter) {
std::vector<std::string::const_iterator> parts;
std::copy_if(begin, end, std::back_inserter(parts), [&](auto c) { return c == delimiter; });
return parts;
};
auto fields = parseField(saCbegin, saCend, ',');
// Use fields to update chrIndex, pos, etc.
}
std::cerr << "SamSegmentMapper(" << getpid() << "): Processed " << count << " lines." << std::endl; | ||
// EOF event for the samtools pipe. printing the end of the very last chromosome. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improve logging for better traceability.
The logging statement at the end of parseSamStream
could include more context about the process, such as the total number of valid and invalid lines processed. This would provide better traceability and debugging capabilities.
- std::cerr << "SamSegmentMapper(" << getpid() << "): Processed " << count << " lines." << std::endl;
+ std::cerr << "SamSegmentMapper(" << getpid() << "): Processed " << count << " valid lines and " << invalidCount << " invalid lines." << std::endl;
Committable suggestion was skipped due to low confidence.
> | sophiaAnnotate | TBD | TBD | TBD | 4 GB | | ||
> | ||
> Note that this is data from single runs on a dedicated compute node at *different* times, but using the same input data and parameters, except for the assembly name. | ||
|
||
* 35 (9e3b6ed) | ||
* Forked from [bitbucket](https://bitbucket.org/compbio_charite/sophia/src/master/) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure the file ends with a single newline character.
-
+ \n
Committable suggestion was skipped due to low confidence.
Tools
Markdownlint
389-389: null (MD047, single-trailing-newline)
Files should end with a single newline character
@@ -15,7 +15,7 @@ For a detailed description of the algorithm, please refer to [Umut Topraks's dis | |||
SOPHIA is a very fast and resource-light algorithm. | |||
It uses 2GB RAM, 2 CPU cores and runs in ~3.5 hours for 50x coverage WGS, and can detect variants with a single pass of the input BAM files. No local assembly is done. | |||
|
|||
Sophia is included in the [SophiaWorkflow](https://github.com/DKFZ-ODCF/SophiaWorkflow) that uses the [Roddy Workflow Management Systems](https://github.com/TheRoddyWMS/Roddy). | |||
SOPHIA is included in the [SophiaWorkflow](https://github.com/DKFZ-ODCF/SophiaWorkflow) that uses the [Roddy Workflow Management Systems](https://github.com/TheRoddyWMS/Roddy). | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensure there are no multiple consecutive blank lines to maintain a clean and professional appearance of the markdown file.
-
-
+
Also applies to: 161-162, 231-231, 304-304
Committable suggestion was skipped due to low confidence.
Tools
Markdownlint
20-20: Expected: 1; Actual: 2 (MD012, no-multiple-blanks)
Multiple consecutive blank lines
@@ -36,12 +36,18 @@ The code here is a fork of that repository. If you use the newer versions here, | |||
|
|||
For instructions and short explanations of on commandline parameters, invoke each tool with `--help`. | |||
|
|||
![](Overview.svg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add alt text to the image for accessibility.
- ![](Overview.svg)
+ ![Overview of SOPHIA Workflow](Overview.svg)
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
![](Overview.svg) | |
![Overview of SOPHIA Workflow](Overview.svg) |
Tools
Markdownlint
39-39: null (MD045, no-alt-text)
Images should have alternate text (alt text)
…file defaults. Also added unused annotation (for non-devel build). Note: The non-develop, non-static build is broken. Signed-off-by: Philip R. Kensche <[email protected]>
Changes collected during debugging, why there are differences to the 35 version even for the classic
HG37ChrConverter
.These differences indicate the refactoring did destroy something.
As soon as they are fixed, we can be sure, the Hg37ChrConverter is a correct refactoring of the SOPHIA 35 code, and then continue to test the GenericChrConverter with hg37.
Summary by CodeRabbit
Documentation
Refactor
New Features
parseSamLine
for parsing SAM lines in theAlignment
class.Bug Fixes
Alignment.cpp
.Tests
GenericChrConverter_test.cpp
for new and existing parsing functions.