Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integer data type #91

Open
timmoon10 opened this issue Jun 28, 2017 · 2 comments
Open

Integer data type #91

timmoon10 opened this issue Jun 28, 2017 · 2 comments
Assignees
Labels

Comments

@timmoon10
Copy link
Collaborator

As of d2c414a, int is the standard integer data type. However, we may need 64-bit integers for very large matrices. Variables susceptible to overflow should be changed to El::Int. As a rule of thumb, indices into Elemental matrices should be El::Ints.

@ndryden
Copy link
Collaborator

ndryden commented Jun 28, 2017

With these changes, I'm getting a lot of warnings about narrowing casts.

@ndryden
Copy link
Collaborator

ndryden commented Jun 28, 2017

(Background/refresher: fundamental C++ types and fixed-width integer types)

Just to add some context for this discussion, these are our main sources/issues with integers, as far as I can recall:

  • Elemental defines El::Int to be either int or long long depending on our compile-time flags. (There is also El::Unsigned.) We want to use El::Int when interacting with Elemental matrices.
  • The C++ STL (e.g. std::vector) tends to use size_t for just about any size-related quantity. This can lead to us either narrowing a size_t or comparing signed/unsigned integers.
  • cuDNN expects parameters to be int.
  • MPI expects parameters to be int.

Ideally, we should come up with a consistent use of integers that satisfies all of these.

Edit: An additional thought: while we don't want to do it for production, we could compile with -ftrapv, which will trap for signed overflow on addition, subtraction, and multiplication.

oyamay pushed a commit to oyamay/lbann that referenced this issue Nov 29, 2019
* Add environment variable LBANN_NUM_IO_PARTITIONS

Specify the number of partitions in the depth dimension of the Cosmoflow
samples.

* Adjust the base offset for parallel sample I/O

* WIP: Further adjustment of sample sizes

* WIP: sample size adjustment

* WIP: sample size adjustment

* Remove debug output

* Cosmoflow parallel io (LBANN#86)

* before rebase

* updating

* updating

* small changes

* moving around where data is read in NOT DONE YET

* updated some comments and some todo

* cleaning up

* added comm member variabe

* cleaning up

* compiles, fixes stray variables and typos, adds correct member variables

* changing responses to float,taking away division, removing from image_data_reader

* fixing a mistake

* oops, changing m_all_responses back to float

* changed some variable names, changed indenting and fixed vim problems, fixed file access

* fixed duplicate count

* transposed dimensions, reverted resnet, took odd spacing and print statements out

* removed timing

* fixing comments and spacing

* Missing semicolon

* Fix type mismatch

* Remove trailing whitespaces

* HDF5 bug fixes

* Size adjustment fix

* Refactoring

* Support strided rank ordering

* Fix hang in HDF5 MPI-IO

HDF5 caused hanging. Likely because a HDF5 property was created with MPI
at every fetch_datum. The property is now moved out of the function and
is only done once, so it should not hang anymore. Yet, MPI-IO is disabled for now. Should be looked into again once everything becomes working.

* Disables assertion

This assertion fails when the last mini-batch is not a full one. Not
sure why it fails now and not before.

* Use normalized parameters in Cosmoflow

* Fix copying of a non-halo-expanded host tensor to a halo-extended device
tensor.

The distconv::Copy function doesn't seem to be working correctly, though
more comprehensive investigation is needed.

* Enable assertion check on mini-batch size again

* Disable debug output

* Temporary add debug dump in generic_input_layer

* Delete irelevant comment

* Fix response value loading when rank reordering is not used

* Formatting

* Fix protobuf version in superbuild

* Cleanup before merging to the mainline branch

* Further cleanup

* Check if int16 input is enabled
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants