Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speeding up C implementation #42

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dearblue
Copy link
Contributor

@dearblue dearblue commented Jul 13, 2023

Switch from Sarwate's algorithm to byte-order free Slicing-by-16.
Each table data file is generated by extconf.rb.

In addition, the inspection string, which was previously 10 bytes, is extended to 19 bytes.
This is to check the slicing-by-16 loop, which processes in 16-byte units.

@dearblue
Copy link
Contributor Author

For information, here is a summary of the benchmark results.
The environment is the same as #41 (comment).

CRC Models main slicing-by-8 slicing-by-16
Digest::CRC1#update ( 0.394643) ( 0.404651) ( 0.400259)
Digest::CRC5#update ( 0.015706) ( 0.004514) ( 0.003536)
Digest::CRC8#update ( 0.013652) ( 0.004703) ( 0.003673)
Digest::CRC8_1Wire#update ( 0.013575) ( 0.004455) ( 0.003519)
Digest::CRC15#update ( 0.020950) ( 0.004627) ( 0.003702)
Digest::CRC16#update ( 0.018752) ( 0.004552) ( 0.003753)
Digest::CRC16CCITT#update ( 0.019875) ( 0.004606) ( 0.003683)
Digest::CRC16DNP#update ( 0.016596) ( 0.004538) ( 0.003724)
Digest::CRC16Genibus#update ( 0.019800) ( 0.004601) ( 0.003699)
Digest::CRC16Modbus#update ( 0.018721) ( 0.004683) ( 0.003724)
Digest::CRC16QT#update ( 0.018728) ( 0.004732) ( 0.003727)
Digest::CRC16USB#update ( 0.018706) ( 0.004654) ( 0.003923)
Digest::CRC16X25#update ( 0.020320) ( 0.004558) ( 0.003866)
Digest::CRC16XModem#update ( 0.025078) ( 0.004610) ( 0.003897)
Digest::CRC16ZModem#update ( 0.022976) ( 0.004603) ( 0.003760)
Digest::CRC24#update ( 0.021261) ( 0.004867) ( 0.003787)
Digest::CRC32#update ( 0.017693) ( 0.004533) ( 0.003643)
Digest::CRC32BZip2#update ( 0.017650) ( 0.004801) ( 0.003785)
Digest::CRC32c#update ( 0.017690) ( 0.004551) ( 0.003679)
Digest::CRC32Jam#update ( 0.017696) ( 0.004685) ( 0.003636)
Digest::CRC32MPEG#update ( 0.017663) ( 0.004937) ( 0.003790)
Digest::CRC32POSIX#update ( 0.017704) ( 0.004839) ( 0.003791)
Digest::CRC32XFER#update ( 0.017731) ( 0.004858) ( 0.003827)
Digest::CRC64#update ( 0.017776) ( 0.004971) ( 0.004435)
Digest::CRC64Jones#update ( 0.017718) ( 0.004897) ( 0.004274)
Digest::CRC64XZ#update ( 0.017697) ( 0.004946) ( 0.004311)

Finally, FreeBSD uses clang, so if built with GCC 13, the results will be different.
Only the CRC-32 results are excerpted.

% gmake -C ext/digest/crc CC=gcc13 'cflags=$(optflags) $(debugflags) $(warnflags)' 'warnflags=-Wall -Wextra -Wundef' clean all

gcc13 sarwate
Digest::CRC32#update          0.017745   0.000000   0.017745 (  0.017761)

gcc13 slicing-by-8
Digest::CRC32#update          0.003948   0.000000   0.003948 (  0.003921)

gcc13 slicing-by-16
Digest::CRC32#update          0.002824   0.000000   0.002824 (  0.002815)

Copy link
Owner

@postmodern postmodern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot going on here which I worry will make it harder to maintain and for others to understand. I would prefer that each generated C extension be fully explicit and self-contained, allowing them to be loaded individually and easily be reviewed/audited.

@@ -27,6 +16,5 @@ void Init_crc12_3gpp_ext()
rb_ext_ractor_safe(true);
#endif

rb_undef_method(cCRC12_3GPP, "update");
rb_define_method(cCRC12_3GPP, "update", Digest_CRC12_3GPP_update, 1);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason why we undef and re-define the update method is to prevent warnings from ruby -w, which some people apparently use for testing/production. This solution will need to keep this, or prove it doesn't cause warnings under ruby -w.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will fix it to do undef.


have_header("stdint.h")
have_header('stddef.h')

slice_size = ENV["RUBY_DIGEST_CRC_ENABLE_SLICING_BY_16"].to_i > 0 ? 16 : 8
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use if/else with an implicit return value instead of ternary expressions? Also could we use shorter environment variable names, like RUBY_DIGEST_CRC_SLICE_SIZE?


have_header("stdint.h")
have_header('stddef.h')

slice_size = ENV["RUBY_DIGEST_CRC_ENABLE_SLICING_BY_16"].to_i > 0 ? 16 : 8
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer to use if/else with implicit return values over ternary expressions.


have_header("stdint.h")
have_header('stddef.h')

slice_size = ENV["RUBY_DIGEST_CRC_ENABLE_SLICING_BY_16"].to_i > 0 ? 16 : 8
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer to use if/else with an implicit return value instead of ternary expressions. Also this logic seems to be repeated in each file. Maybe it should be moved into gentable.rb?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it should be moved into gentable.rb?

Yes, I will.

@@ -0,0 +1,125 @@
using Module.new {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not use refinements or monkey-patching here. I think a regular method named bitreflrect() is good enough.

\
for (int i = 0; i < 256; i++) { \
crc_int_t n = V(table_initialize)(model->bitwidth, i); \
for (int j = 0; j < 8; j++) { \
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

table[0][i] = n; \
} \
\
for (int s = 1; s < SLICE_SIZE; s++) { \
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here.

} \
\
for (int s = 1; s < SLICE_SIZE; s++) { \
for (int i = 0; i < 256; i++) { \
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here.

ext/digest/crc.h Outdated
#if !defined(HAVE_RB_EXT_RACTOR_SAFE)
static inline void ruby_digest_crc_ensure_ractor_main(const char *mesg)
{
(void)mesg;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what the purpose is of this statement? Seems to be a no-op?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used to suppress cc -Wextra warnings about unused variables.

#define BUILD_TABLE_CORE_DECL(V) \
const crc_int_t modified_poly = V(adapt)(model->bitwidth, model->polynomial); \
\
for (int i = 0; i < 256; i++) { \
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also an old school optimization trick is to always define your loop variables above/outside of your loop. This prevents dumb compilers for re-allocating space on the stack for nested for loop index variables.

@dearblue
Copy link
Contributor Author

Summary of changes:.

  • Dedicated functions for each CRC are now provided.
  • No longer provides generic functions.
  • The CRC-64 table is now 32 KiB, but I believe modern PC processors have L1 data caches of 64 KiB or more.
  • Extend spec check string to 19 bytes (also in commit message).

Tables are still generated at build time of the installation.
As I wrote to #42 (comment), including the tables in the repository will probably increase the GEM package size to around 400 KiB.

@@ -15,52 +15,42 @@
*/
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to https://github.com/tpircher/pycrc#copyright-of-the-generated-source-code, the generated code can be treated as public domain.
Do we need any addenda as a digest-crc project to prevent misidentification?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐈‍⬛ I can't see it because it's out of sight.

/**
* \file
* Functions and types for CRC checks.
*
* Generated on Sat Feb 29 02:30:42 2020
* by pycrc v0.9.2, https://pycrc.org
* using the configuration:
* - Width = 12
* - Poly = 0x80f
* - XorIn = 0x000
* - ReflectIn = False
* - XorOut = 0x000
* - ReflectOut = True
* - Algorithm = table-driven
*/

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not. We may need to change the wording to "Originally generated" or "Derived from", since I made some modifications by adding custom typdefs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😺 I'll not do anything in this PR as I believe you will make this change if necessary.

Switch from Sarwate's algorithm to byte-order free Slicing-by-16.
Each table data file was generated by `ext/digest/gentable.rb`.

In addition, the inspection string for rspec, which was previously 10 bytes, is extended to 19 bytes.
This is to check the Slicing-by-16 loop, which processes in 16-byte units.
@dearblue
Copy link
Contributor Author

Table data is to be prepared in advance.
However, ext/digest/gentable.rb will be kept for future extensions.

Also, I forgot to mention in the last change, "Slicing-by-16" is always used.
This is because I don't know how to write a test that switches to "Slicing-by-8".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants