-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathtodo
116 lines (96 loc) · 8.82 KB
/
todo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
================================================================================
== TODO ========================================================================
================================================================================
- need to reproduce lecun mnist!
- compare with distances/angles (have a quantitative measure). for distances, compare same-distortion different-labels distances instead, makes more sense. We already know that; show that 6-6_+2 versus 6_+2-7_+2
- compare with two different machines (1 for label 1D, 1 for same-translation 1D; merge embedding vector), argue it's more expensive to train, forward. see if better or not? (compare 2D contrastive versus 2 models contrastive)
- rotation, add up to 180°, see if it loops
- for translation/rotation 2D contrastive:
* use lecun (2D, for label)
* use ours (3D: 2D for label, 1D for distortion)
- to compare, use euclidian distances, ours projected on x
- NORB dataset, try plane, see if we can train faster/smaller dataset than lecun, get same shape, stuff
* reproduce lecun: try larger margin, try longer
- write report: around 50-100 pages, introduction is big part containing all in summary
- oral: dry run (to prepare against critics)
================================================================================
== DONE ========================================================================
================================================================================
* find papers of "intriguing papers" that retrain on adversial
* read papers
* read papers from email
* see video for intuition Are Deep Networks a Solution to Curse of Dimensionality
* read blog colah (article Visualizing Representations)
* compare output difference (invariance) in t-SNE space wrt to:
- translation
- rotation (take bigger image, crop with enough margin)
- deformation (translate with local direction for displacement)
- illumination/brightness
- colors (raw vs "enhanced")
* reproduce colah visualisation on MNIST (because small dataset), t-SNE map (see how projections are clustered)
* plot (in video) the position of continously deformed input in CNN code t-SNE transformation (try to see pattern) <-- interactive plot
* explore other articles on colah, eg: NLP cluster map, table
* merge t-SNE, all transformations of all my numbers together, then hide on plots the one I'm not interested
* take a look at samples close to our transformed points (esp extreme of rotation), try to plot images instead of points (only for them?), try to make it clear <-- interactive plot
* use a point from the testset instead of a handmade one (mine is outliners, is that normal? part of t-SNE or because I made them, still on boundary)
* make video of plots with visualisation of closest points (1 frame = 1 transfo, in this order of increasing) <-- interactive plot
* interactive plot in Js, try quickly
* try to train dataset but adding translations (test if it learns of translation invariance)
* with shift-invariance: retrain correctly with test images and correct range x-y
* with shift-invariance: for each digit localize distorted digits, see if they form a cluster inside the digit cluster, can we understand the shape, is there one at all?
* try for all classes to apply distortions to one test image that is close to the cluster of pos/neg rotated input (see what happens, join/straight line/apart)
* add key for plot (to zoom-in/out)
* rerun different seed closest t-sne
* get the closest point to the original digit (apply t-sne)
* shift invariance: test transfo only with a single sample x+y (no grid) to be fair
* shift invariance: be fair with train range versus test range
* add adversial transformation into the framework
* add more motivations/justification to research (goal-oriented)
* data-augmentation: how can we justify our artifially-transformed testset? no, we can't. Instead: test over meaniful/natural dataset like house number (google street) dataset.
* train model to cluster similar samples together in vector space (transformation-invariance)
- try limted 2D-pairing: 5 closest image-space neighbors with same displacement distortions (no pairing with different displacement)
- Amos loss function 1D (pair by label) + 1D (pair by displacement)
- multiple pairing way: all same-labels together, or smart similar same-label tegether, etc
- if time, try aligner loss, compare and stuff
directions:
* data augmented (house numbers), meaningful training, should drop this?
- data-augment transfo: scaling,
- house numbers: recenter for receptive field
- o/w train (data-augment) on house numbers directly, drop mnist
* contrastive loss:
- try siamese continuous-translation pairs, in 2D/3D see useful properties (predictable directions, distances, visually, or create tool to measure?)
- keep in mind, mnist is (too) simple?
- if don't work: other loss, using comb of dist and angle/dir?
- can force some dimensions in embedding to encode displacement (x,y,z), or rotation angle?
- find predictable mapping (linear, non-linear?) F(x) -> F(x+t) ~= F(x) + ~t (idea: train a own shallow network/function for that?)
- o/w: compare siamese quantitatively, distance in vector space? what tool?
- keep aside: try to see if siamese embedding, fine tuned better accuracy than regular?
- keep aside: adversersial on siamese, train against, better resistance? t-sne show diff?
- siamese embedding: directions (not without special training?), digit=too simple, use imagenet?
- train embedding: F(x+t) = F(x) +t
- use case: medical imaging, detect structures, need to detect for every angles, then easy to compute F(x) then apply R on F(x) (instead of refeeding for every rotation), then can put into last application-oriented angle-dependent layer.
- try rotation on mnist (4/8 groups, large enough so we don't care about natural rotation)
- 2D contrastive rotation: fix bug with 90°, may be present in translation << FIXED: indeed
- variant of contrastive: use logistic/non-zero loss instead of max, see if more spread and close are closer << not working well (lose structures)
================================================================================
== ASIDE =======================================================================
================================================================================
* choose what I'd like to do
* ideas papers said "further study needed" for:
- intriguing props of NN: For MNIST, we do not have results for convolutional models yet, but […] may behave similarly as well.
- towards deep NN arch robust to adversial examples: Evaluate the performance loss due to layer-wise penalties […]. In addition, exploring non-Euclidean adversarial examples […] could lead to insights into semantic attributes of features learned at high levels of representation.
* shift-invariance: with an other (easy) dataset, boundary detection
* try to relate data augmentation (cost of freedom-degrees invariance) wrt accuracy, and can we measure the capacity (afterward)?
* try to plot capacity during training, compare different data augmentation (original versus shift), maybe gradient contribution is higher for a certain sort of degrees
* later on: try to combine two/or more transformations at the same time, see if it's a grid (row=1st transf, cols=2nd transfo) in t-SNE space, separate plots, by pairs
* later on: given a simple shape which a few neurons detect in a model A, how can we relate to neurons in a second model B trained with translation-invariance (wrt to neuron activation: activated at the same time in A and B but with transalation in B). Can we find a relation between neurons. (ref to visualizing cnns, forget for now because it's hard)
* boundary detection (dataset BSDS): train boundary detector (there is a curve passing through center pixel = 1, or 0) on patches of an image. plot t-sne (of all patches of a single image), interpret what clusters means, (never done before?), what about cluster intersections?
* boundary detection different models: 0/1 curve on central pixel, multi-class based on angles (8 classes), angle regression (directly: loss - angle) <- never done in CNN (?)
* analysis of adversial attack (better graphs, data-augmented adversial training)
* compare distance between original image and adversial in image space versus vector space (in which layer it happens), write code to detect the responsible weights for divergence (for adversial) or convergence (for invariance), comparing two models
* then on subset of ImageNet (link in blog, ILSVRC)
* compare CNN code for similar objects (animals breeds, similar to article), eg: v(bulldog) - v(dalmatian) ~ v(cat siamoi) - v(cat egyptian)
* compare line-wise in CNN code space, find directions/segments that make sense, perpendicular, other directions?
CNN code is taken from the layer before the last (before softmax) in article
* can use t-SNE again (or show as image for conv layers)
* try on other lower layers