Skip to content

Commit 887bd19

Browse files
committed
added indexes
1 parent bb0616f commit 887bd19

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

46 files changed

+612
-254
lines changed

CLASSIFICATION.canvas

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,19 +5,17 @@
55
{"id":"5cfa5299ec6e5e42","type":"file","file":"pages/classification/LINEAR PERCEPTRON.md","x":880,"y":-1140,"width":560,"height":400},
66
{"id":"9f5fffddfbaf0c19","type":"file","file":"pages/classification/SVM.md","x":880,"y":-440,"width":560,"height":560},
77
{"id":"030e01b3f666f84d","type":"file","file":"pages/classification/RETI NEURALI.md","x":850,"y":360,"width":620,"height":520},
8-
{"id":"7e5a119bbf27b16a","type":"file","file":"pages/VALUTARE UN CLASSIFICATORE PROBABILISTICO.md","x":105,"y":-460,"width":540,"height":600},
98
{"id":"5697c0e8d4fadc73","type":"file","file":"pages/classification/DECISION TREES.md","x":-405,"y":-1140,"width":400,"height":400},
109
{"id":"96f0383b7c3cdd2d","type":"file","file":"pages/classification/DECISION TREE PRUNING.md","x":-405,"y":-460,"width":400,"height":400},
1110
{"id":"12269943a4445a5d","type":"file","file":"pages/classification/REGRESSION.md","x":-880,"y":-1140,"width":400,"height":400},
1211
{"id":"fd16fde9c842e44d","type":"file","file":"pages/classification/TRAINING STRATEGIES.md","x":-605,"y":-1880,"width":400,"height":400},
13-
{"id":"5d76ed9475dd6a2b","type":"file","file":"pages/classification/PERFORMANCE OF A CLASSIFIER.md","x":880,"y":-1880,"width":400,"height":400}
12+
{"id":"5d76ed9475dd6a2b","type":"file","file":"pages/classification/PERFORMANCE_OF_A_CLASSIFIER.md","x":880,"y":-1880,"width":400,"height":400}
1413
],
1514
"edges":[
1615
{"id":"deda8623490138f7","fromNode":"5697c0e8d4fadc73","fromSide":"bottom","toNode":"96f0383b7c3cdd2d","toSide":"top"},
1716
{"id":"f9597acbc37cfe21","fromNode":"c0a3107194fb5ae7","fromSide":"bottom","toNode":"5697c0e8d4fadc73","toSide":"top"},
1817
{"id":"e1a41272a88507d2","fromNode":"c0a3107194fb5ae7","fromSide":"left","toNode":"fd16fde9c842e44d","toSide":"right"},
1918
{"id":"58b4fbfe6e88b23b","fromNode":"c0a3107194fb5ae7","fromSide":"bottom","toNode":"12269943a4445a5d","toSide":"top"},
20-
{"id":"96c8250cafb56e33","fromNode":"5eb3a3b43f9ae782","fromSide":"bottom","toNode":"7e5a119bbf27b16a","toSide":"top"},
2119
{"id":"a3c41bfd658ee9ff","fromNode":"c0a3107194fb5ae7","fromSide":"bottom","toNode":"5eb3a3b43f9ae782","toSide":"top"},
2220
{"id":"0c7c37d6d6ad53ed","fromNode":"c0a3107194fb5ae7","fromSide":"right","toNode":"5d76ed9475dd6a2b","toSide":"left"},
2321
{"id":"2828866a9b800c1e","fromNode":"5cfa5299ec6e5e42","fromSide":"bottom","toNode":"9f5fffddfbaf0c19","toSide":"top"},

DATA PREPROCESSING.canvas

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,22 @@
11
{
22
"nodes":[
3-
{"id":"bd5cc2c186c6513e","type":"file","file":"pages/preprocessing/FEATURE SUBSET SELECTION.md","x":-576,"y":-1119,"width":639,"height":870},
4-
{"id":"94839b4b2a1dec51","type":"file","file":"pages/preprocessing/SAMPLING.md","x":619,"y":400,"width":841,"height":1213},
5-
{"id":"ff325faa12fdff6f","type":"file","file":"pages/preprocessing/SCALING.md","x":-160,"y":560,"width":640,"height":743},
6-
{"id":"fbf4d8f8a68b0c3b","type":"file","file":"pages/preprocessing/FEATURE CREATION.md","x":1160,"y":-251,"width":400,"height":391},
7-
{"id":"4048a2750f515f64","type":"file","file":"pages/preprocessing/DIMENSIONALITY REDUCTION.md","x":676,"y":-1000,"width":684,"height":471},
8-
{"id":"d7262774c5d64ddb","type":"file","file":"pages/preprocessing/DATA PREPROCESSING.md","x":-680,"y":40,"width":343,"height":200},
9-
{"id":"dd7768016b37b514","type":"file","file":"pages/preprocessing/TYPE CONVERSIONS.md","x":-1360,"y":0,"width":400,"height":400},
10-
{"id":"7fd5b35f882fb209","x":-708,"y":870,"width":400,"height":400,"type":"file","file":"pages/preprocessing/SIMILARITY AND DISSIMILARITY.md"},
11-
{"id":"9938201f0bcc2a77","x":-1320,"y":731,"width":400,"height":400,"type":"file","file":"pages/preprocessing/DISTANCES.md"},
12-
{"id":"10b4c510401968cd","x":-823,"y":489,"width":225,"height":41,"type":"text","text":"# PROXIMITY"}
3+
{"id":"dd7768016b37b514","type":"file","file":"pages/preprocessing/TYPE CONVERSIONS.md","x":-1600,"y":-60,"width":400,"height":400},
4+
{"id":"d7262774c5d64ddb","type":"file","file":"pages/preprocessing/DATA PREPROCESSING.md","x":-680,"y":40,"width":360,"height":200},
5+
{"id":"10b4c510401968cd","type":"text","text":"# PROXIMITY","x":-612,"y":440,"width":225,"height":50},
6+
{"id":"7fd5b35f882fb209","type":"file","file":"pages/preprocessing/SIMILARITY AND DISSIMILARITY.md","x":-387,"y":732,"width":400,"height":400},
7+
{"id":"9938201f0bcc2a77","type":"file","file":"pages/preprocessing/DISTANCES.md","x":-1012,"y":732,"width":400,"height":400},
8+
{"id":"ff325faa12fdff6f","type":"file","file":"pages/preprocessing/SCALING.md","x":-680,"y":-1040,"width":360,"height":314},
9+
{"id":"bd5cc2c186c6513e","type":"file","file":"pages/preprocessing/FEATURE SUBSET SELECTION.md","x":-1380,"y":-560,"width":360,"height":314},
10+
{"id":"4048a2750f515f64","type":"file","file":"pages/preprocessing/DIMENSIONALITY REDUCTION.md","x":-1160,"y":-960,"width":360,"height":314},
11+
{"id":"94839b4b2a1dec51","type":"file","file":"pages/preprocessing/SAMPLING.md","x":-200,"y":-960,"width":360,"height":314},
12+
{"id":"fbf4d8f8a68b0c3b","type":"file","file":"pages/preprocessing/FEATURE CREATION.md","x":0,"y":-560,"width":360,"height":314}
1313
],
1414
"edges":[
15-
{"id":"4c3e9cf3050d1b64","fromNode":"d7262774c5d64ddb","fromSide":"right","toNode":"bd5cc2c186c6513e","toSide":"bottom"},
16-
{"id":"bb18dee657ecb168","fromNode":"d7262774c5d64ddb","fromSide":"right","toNode":"4048a2750f515f64","toSide":"bottom"},
17-
{"id":"dc3291d798198289","fromNode":"d7262774c5d64ddb","fromSide":"right","toNode":"fbf4d8f8a68b0c3b","toSide":"left"},
18-
{"id":"f97a2fe55380871a","fromNode":"d7262774c5d64ddb","fromSide":"right","toNode":"94839b4b2a1dec51","toSide":"top"},
19-
{"id":"a596bf5f8481ccd2","fromNode":"d7262774c5d64ddb","fromSide":"right","toNode":"ff325faa12fdff6f","toSide":"top"},
15+
{"id":"4c3e9cf3050d1b64","fromNode":"d7262774c5d64ddb","fromSide":"top","toNode":"bd5cc2c186c6513e","toSide":"bottom"},
16+
{"id":"bb18dee657ecb168","fromNode":"d7262774c5d64ddb","fromSide":"top","toNode":"4048a2750f515f64","toSide":"bottom"},
17+
{"id":"dc3291d798198289","fromNode":"d7262774c5d64ddb","fromSide":"top","toNode":"fbf4d8f8a68b0c3b","toSide":"bottom"},
18+
{"id":"f97a2fe55380871a","fromNode":"d7262774c5d64ddb","fromSide":"top","toNode":"94839b4b2a1dec51","toSide":"bottom"},
19+
{"id":"a596bf5f8481ccd2","fromNode":"d7262774c5d64ddb","fromSide":"top","toNode":"ff325faa12fdff6f","toSide":"bottom"},
2020
{"id":"6362fa8ff5cd2943","fromNode":"d7262774c5d64ddb","fromSide":"left","toNode":"dd7768016b37b514","toSide":"right"},
2121
{"id":"0a8f6608df3bf6a1","fromNode":"d7262774c5d64ddb","fromSide":"bottom","toNode":"10b4c510401968cd","toSide":"top"},
2222
{"id":"94854710411d170b","fromNode":"10b4c510401968cd","fromSide":"bottom","toNode":"7fd5b35f882fb209","toSide":"top"},

index.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,7 @@
1-
# DATAMINING
1+
# Datamining
2+
## CONTENTS
3+
- [ASSOCIATION_RULES](pages/association_rules/ASSOCIATION_RULES.md)
4+
- [CLASSIFICATION](pages/classification/CLASSIFICATION.md)
5+
- [CLUSTERING](pages/clustering/CLUSTERING.md)
6+
- [BUSINESS_INTELLIGENCE_AND_DATA_WAREHOUSE](pages/datamining_process/BUSINESS_INTELLIGENCE_AND_DATA_WAREHOUSE.md)
7+
- [DATA_PREPROCESSING](pages/preprocessing/DATA_PREPROCESSING.md)

pages/SCELTE DI PROGETTO.md

Lines changed: 0 additions & 6 deletions
This file was deleted.

pages/TIPI DI LEARNING.md

Lines changed: 0 additions & 4 deletions
This file was deleted.

pages/VALUTARE UN CLASSIFICATORE PROBABILISTICO.md

Lines changed: 0 additions & 11 deletions
This file was deleted.

pages/association_rules/APRIORI ALGORITHM.md renamed to pages/association_rules/APRIORI_ALGORITHM.md

Lines changed: 18 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,23 @@
1+
---
2+
id: APRIORI_ALGORITHM
3+
aliases: []
4+
tags: []
5+
---
6+
7+
- [~] ---
8+
id: APRIORI ALGORITHM
9+
aliases: []
10+
tags: []
11+
index: 5
12+
---
13+
114
# APRIORI ALGORITHM
215

3-
The apriori algorithm is a strategy to prune the three of candidates of the [frequent item-set generation](FREQUENT%20ITEMSET%20GENERATION.md) fase it's based on the apriori priciple
16+
The apriori algorithm is a strategy to prune the three of candidates of the [frequent item-set generation](FREQUENT_ITEMSET_GENERATION.md) fase it's based on the apriori priciple
417

518
### APRIORI PRINCIPLE
619
If an itemset is frequent, then all of its subsets must also be frequent and viceversa.
7-
We can see this principle as follows:
20+
We can see this principle as follows:
821

922
$$
1023
\forall X,Y: (X \subset Y) \implies sup(X) \geq sup(Y)
@@ -26,4 +39,6 @@ flowchart TD
2639
C-->|repeat until the current level is empty|A
2740
```
2841

29-
The $threshold$ value it's an important tuning parameter for complexity and the tradeoff element between number of valid time-sets founded and quality of the item-sets founded
42+
The $threshold$ value it's an important tuning parameter for complexity and the tradeoff element between number of valid time-sets founded and quality of the item-sets founded
43+
44+
[PREVIOUS](FREQUENT_ITEMSET_GENERATION.md) [NEXT](FP-GROWTH.md)

pages/association_rules/ASSOCIATION RULES.md renamed to pages/association_rules/ASSOCIATION_RULES.md

Lines changed: 14 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,10 @@
1+
---
2+
id: ASSOCIATION RULES
3+
aliases: []
4+
tags: []
5+
index: 1
6+
---
7+
18
# ASSOCIATION RULES
29

310
They are rules that describes situation where the presence of a given element $\{A\}$ or a combination of elements $\{A,B\}$ assure the presence of a third element $\{C\}$, they are based on statistics.
@@ -10,7 +17,7 @@ They are rules that describes situation where the presence of a given element $\
1017
- **SUPPORT** --> Fraction of transactions that contain an itemset.
1118
- **FREQUENT ITEMSET** --> An itemset whose support is greater than or equal to a minsup threshold.
1219

13-
Association rules can be described by the form
20+
Association rules can be described by the form
1421

1522
$$
1623
A \rightarrow C \space where \space A,C \in itemset
@@ -30,21 +37,23 @@ $$
3037

3138
### CONFIDENCE $conf$
3239

33-
the number of times $C$ appears over transactions that contains $A$
40+
the number of times $C$ appears over transactions that contains $A$
3441

3542
$$
3643
conf = \frac{(A,C)}{A}
3744
$$
3845

3946
#### CONFIDENCE FROM SUPPORT
4047

41-
confidence can also be computed from supports as
48+
confidence can also be computed from supports as
4249

4350
$$
44-
conf = \frac{(A,C)}{A} =\frac{\frac{(A,C)}{N}}{\frac{A}{N}} = \frac{sup(A,C)}{sup(A)}
51+
conf = \frac{(A,C)}{A} =\frac{\frac{(A,C)}{N}}{\frac{A}{N}} = \frac{sup(A,C)}{sup(A)}
4552
$$
4653

4754

4855
support measures "how much" an occurrence can be considered a rule (there must be enough transaction cases), a rule with low support can be generated by random associations
4956

50-
confidence measures how much a rule is represented in the transactions that contains it
57+
confidence measures how much a rule is represented in the transactions that contains it
58+
59+
[NEXT](ASSOCIATION_RULES_MINING.md)
Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,23 @@
1+
---
2+
id: ASSOCIATION RULES MINING
3+
aliases: []
4+
tags: []
5+
index: 2
6+
---
7+
18
# ASSOCIATION RULES MINING
29

3-
The goal of this procedure it's, given a list of $N$ item-set, finding association rules that have $conf$ and $sup$ grater than some thresholds
10+
The goal of this procedure it's, given a list of $N$ item-set, finding association rules that have $conf$ and $sup$ grater than some thresholds
411

512
## BRUTE-FORCE APPROACH
613

714
generate all possible combination and compute $conf$ and $sup$, this approach is always possible but is too much computational expensive
815

916
## TWO STEP APPROACH
1017

11-
this approach is based on the fact that rules that are generated from the same item-set have the same $sup$
18+
this approach is based on the fact that rules that are generated from the same item-set have the same $sup$
1219

13-
- **[frequent itemset generation](FREQUENT%20ITEMSET%20GENERATION.md)** -> in the first step all item-set that have $sup \gt threshold$ are generated (**this step is still computational expensive**)
20+
- **[frequent itemset generation](FREQUENT_ITEMSET_GENERATION.md)** -> in the first step all item-set that have $sup \gt threshold$ are generated (**this step is still computational expensive**)
1421
- **RULE GENERATION** -> in the second step rules with high confidence are generated from the previous generated item-sets
22+
23+
[PREVIOUS](ASSOCIATION_RULES.md) [NEXT](RULES_GENERATION.md)

pages/association_rules/FP-GROWTH.md

Lines changed: 21 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,25 @@
1-
The Apriori Alg. needs to generate the candidates sets, whose number can be really high!
2-
The FP-Growth algorithm consists on finding shortest patterns to chain with suffixes.
1+
---
2+
id: FP-GROWTH
3+
aliases: []
4+
tags: []
5+
index: 6
6+
---
7+
8+
The Apriori Alg. needs to generate the candidates sets, whose number can be really high!
9+
The FP-Growth algorithm consists on finding shortest patterns to chain with suffixes.
310
FP-Growth uses a compact representation of the DB via a FP-Tree, on which a recursive approach is used, following the "divide et impera" principle.
411

512
### HOW IT WORKS
613

714
1) Data are scanned in order to find the max support for every single item. Non-frequent item are discarded. Frequent items are sorted by decreasing support.
815
2) A second scan is done to build the FP-Tree. When the first transaction is read, the A and B nodes are generated with frequency counting = 1.
9-
![](Pasted%20image%2020231231173158.png)
16+
![](Pasted_image_20231231173158.png)
1017
3) When the second transaction is reaa new set containing the B, C and D nodes are created, each one with its relative path starting from the *null* node. Then, the subtree created during step 1) is linked to the just generated one. The two paths do not overlap because of their different prefix.
11-
![](Pasted%20image%2020231231173623.png)
18+
![](Pasted_image_20231231173623.png)
1219
4) If an overlapped path is found (it has the same prefix as a node, let's suppose, A), the counting of the node A is increased by 1.
13-
![](Pasted%20image%2020231231174113.png)
20+
![](Pasted_image_20231231174113.png)
1421
5) The algorithm continues untill the last transaction.
15-
![](Pasted%20image%2020231231174157.png)
22+
![](Pasted_image_20231231174157.png)
1623

1724
The tree size is often lower than the dataset one, but it depends on the transactions orientation.
1825

@@ -21,4 +28,11 @@ The tree size is often lower than the dataset one, but it depends on the transac
2128
Then, FP-Growth procedes with a ***bottom-up *strategy**:
2229
- The research procedure goes from the less frequent item to the most frequent one. FP-Growth scans the tree in search of itemset ending with the desired search item (es. D).
2330
- Then look for the only paths that contain the D element. This search is sped up with a pointer data structure.
24-
- So the subquestion that contains all the itemsets that end in D and that are frequent has to be built. The research is done by evaluating all possible combinations found that include D and that exceed $minSup$ in a divide-and-conquer logic, starting from the leaves to the root.
31+
- So the subquestion that contains all the itemsets that end in D and that are frequent has to be built. The research is done by evaluating all possible combinations found that include D and that exceed $minSup$ in a divide-and-conquer logic, starting from the leaves to the root.
32+
33+
34+
35+
36+
37+
38+
[PREVIOUS](APRIORI_ALGORITHM.md)

0 commit comments

Comments
 (0)