Param dlrm #127

shz0116 · 2020-09-07T20:07:25Z

No description provided.

…earch#49 Also, adding an option for early stopping based on AUC. Finally, adding exact model config for mlperf in run_and_time.sh script. Summary: Adding support for binary loader proposed in pull request facebookresearch#49 Also, adding an option for early stopping based on AUC. Finally, adding exact model config for mlperf in run_and_time.sh script. Test Plan: Reviewers: Subscribers: Tasks: Tags:

* Fix command-line flag typo * Remove the end-of-epoch evaluation in MLPerf mode to avoid two evals close to each other

… Also, enforcing single copy of embeddings across devices on multiple GPUs.

…acebookresearch#53

…y batches when enumerate is used. Summary: Adjusting restart from saved model during training. Need to skip early batches when enumerate is used. Test Plan: Reviewers: Subscribers: Tasks: Tags:

* adding script to visualize embedding tables * updated embedding visualization

Co-authored-by: rpremsee <[email protected]>

* adding script to visualize embedding tables * updated embedding visualization * updating visualization - adding data visualization - analysis of categorical variables * updated data visualization - mapping test data into manifold * created double plot for data visualization * added plots for each data class * aaded more plots: correct and erros, refactored the code * more refactoring, added z data * added intermidiate Z layers * updating visualization * updated output directory and plots * fixed silent bug

…interval.

* bugfixes for mixd * remove whitespace

…Python.gitignore . (facebookresearch#91)

* Fix LR decay – allow a period of training with a constant base LR between warmup end step and decay start step * Bump pytorch version for multiGPU memory corruption bugfix

* sunc 2020-06-26 * cleanup, format, testing after all updates

…ck arch-interaction-op for valid choice (facebookresearch#113)

facebook-github-bot · 2021-06-17T02:33:48Z

Hi @shz0116!

Thank you for your pull request.

We require contributors to sign our Contributor License Agreement, and yours needs attention.

You currently have a record in our system, but the CLA is no longer valid, and will need to be resubmitted.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

huwan · 2021-11-07T09:00:30Z

README.params.md

+   Reducing the number of interaction features for the dot operation. 
+   A project operation is applied to the dotted features to reduce its dimension size.
+   This is mainly due to the memory concern. It reduces the memory size needed for top MLP. 
+   A side effect is that it may also imrpove the model accuracy.


Typo: imrpove->improve.

mnaumovfb and others added 30 commits January 26, 2020 12:16

Adding support for testing and mlperf flags to caffe2 version.

58c2806

Fix latent bug in caffe2 version when --max-ind-range is used

1768658

Tgrel/minor mlperf fixes (facebookresearch#54)

11fcf01

* Fix command-line flag typo * Remove the end-of-epoch evaluation in MLPerf mode to avoid two evals close to each other

Update README.md

eb3094c

Update README.md

bda0921

adding back end of epoch check for now.

73ac38a

Adding flexibility in saving/loading model to/from different devices.…

0e8818e

… Also, enforcing single copy of embeddings across devices on multiple GPUs.

Adjusting the use of --max-ind-range post processing as discussed in f…

916c0d6

…acebookresearch#53

Switch the binary dataloader to int32 datatype (facebookresearch#60)

819ef5f

Adjusting restart from saved model during training. Need to skip earl…

fde9723

…y batches when enumerate is used. Summary: Adjusting restart from saved model during training. Need to skip early batches when enumerate is used. Test Plan: Reviewers: Subscribers: Tasks: Tags:

Update README.md

9074b5e

Update README.md

6f7711d

Update README.md

66fe6d8

added visualization of DLRM embeddings (facebookresearch#72)

7f2129e

* adding script to visualize embedding tables * updated embedding visualization

Enable LR warmup and decay policy (facebookresearch#73)

fbabe61

Co-authored-by: rpremsee <[email protected]>

Adjusting ONNX calls to work with large models (more than 2GB in size).

75b02cf

Adjusting the learning rate to freeze at last, when passed the decay …

09017b8

…interval.

Mixd Bugfixes (facebookresearch#87)

1f25892

* bugfixes for mixd * remove whitespace

Added gitignore from https://github.com/github/gitignore/blob/master/…

236e331

…Python.gitignore . (facebookresearch#91)

Adjusting parameters for onnx.export to work with any data loader.

32181f7

Fixing a typo.

6bd3adb

Tgrel/tgrel mlperf fixes (facebookresearch#93)

ae23fca

* Fix LR decay – allow a period of training with a constant base LR between warmup end step and decay start step * Bump pytorch version for multiGPU memory corruption bugfix

Trimming trailing whitespaces (facebookresearch#100)

3ecf641

Adding tqdm package in requirements (facebookresearch#99)

d54c813

latest updates, 2020-06-27 (facebookresearch#103)

ce31eda

* sunc 2020-06-26 * cleanup, format, testing after all updates

Fixing saving of model protobuf with types and shapes in caffe2 version.

f8bf6ab

modifications for FAIR cluster

2944521

add projection

e6009d4

Hongzhang Shan and others added 10 commits July 28, 2020 06:15

test

18996f6

bug fix in projection

9951699

Add validation checks to arguments using dash separated lists and che…

cb44674

…ck arch-interaction-op for valid choice (facebookresearch#113)

add gaussian distribution

3a9b5cf

add synthetic data

53bf84b

Merge branch 'master' into dist_port

2da35a8

small fix

eaee70c

clean files

d32dfd7

add readme for param branch

f1d301c

put project into separate file

2fe7f81

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 7, 2020

Hongzhang Shan and others added 17 commits September 11, 2020 19:10

add fb_synthetic data

788dc43

Fix hang problem and add README

ebef575

remove README.param

b0420b0

data module cean-up

38ecd56

data module cean-up

323a593

copy dlrm_data.py from PARAM-Bench

b016326

update project file

601ad2e

add boundary check for dlrm_data

e1e2ca5

usse synthetic data

0eb05fd

modify time computation method

a755b01

change output + turn off nvidia-smi + reuse syn data

40dddb7

start to change input

64b7355

add tt.py

c0fba86

tested version on FAIR

77541e5

hack data access

61875e4

modify tt to reuse input

4692d0e

fix corner case in tt.py

751a2ca

huwan reviewed Nov 7, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Param dlrm #127

Param dlrm #127

shz0116 commented Sep 7, 2020

facebook-github-bot commented Jun 17, 2021

huwan Nov 7, 2021

Param dlrm #127

Are you sure you want to change the base?

Param dlrm #127

Conversation

shz0116 commented Sep 7, 2020

facebook-github-bot commented Jun 17, 2021

Process

huwan Nov 7, 2021

Choose a reason for hiding this comment