You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
NOTE: As of 3.0.0, the former three functions (`map_file`, `map_terms`, `map_tagged_terms`) have been condensed into one function. Users can now change the name of any function in old code to `map_terms` and it reads the input context to maintain the functionality of each one.
101
73
102
74
### Arguments
103
-
For `map_file`, the first argument 'input_file' specifies a path to a file containing the terms to be mapped. It also has a `csv_column` argument that allows the user to specify a column to map if a csv is passed in as the input file.
104
-
For `map_terms`, the first argument 'source_terms' takes in a list of the terms to be mapped.
105
-
For `map_tagged_terms`, everything is the same as `map_terms` except the first argument is either a dictionary of terms to a list of tags, or a list of TaggedTerm objects (see below). Currently, the tags do not affect the mapping in any way, but they are added to the output dataframe at the end of the process.
75
+
For `map_terms`, the first argument can be any of the following: 1) a string that specifies a path to a file containing the terms to be mapped, 2) a list of the terms to be mapped, or 3)dictionary of terms to a list of tags, or a list of TaggedTerm objects (see below).
76
+
Currently, the tags do not affect the mapping in any way, but they are added to the output dataframe at the end of the process. The exception is the Ignore tag, which causes the term to not be mapped at all, but still be outputted in the results if the incl_unmapped argument is True (see below).
106
77
107
78
All other arguments are the same, and have the same functionality:
108
79
@@ -115,6 +86,9 @@ All other arguments are the same, and have the same functionality:
115
86
Map only to ontology terms whose IRIs start with one of the strings given in this tuple, for example:
Allows the user to specify a column to map if a csv is passed in as the input file. Ignored if the input is not a file path.
91
+
118
92
`source_terms_ids` : tuple
119
93
Collection of identifiers for the given source terms
120
94
WARNING: While this is still available for the tagged term function, it is worth noting that dictionaries do not necessarily preserve order, so it is not recommended. If using the TaggedTerm object, the source terms can be attached there to guarantee order.
@@ -141,12 +115,18 @@ All other arguments are the same, and have the same functionality:
141
115
`save_mappings` : bool
142
116
Save the generated mappings to a file (specified by `output_file`)
143
117
118
+
`seperator` : str
119
+
Character that seperates the source term values if a file input is given. Ignored if the input is not a file path.
120
+
144
121
`use_cache` : bool
145
122
Use the cache for the ontology. More details are below.
146
123
147
124
`term_type` : str
148
125
Determines whether the ontology should be parsed for its classes (ThingClass), properties (PropertyClass), or both. Possible values are ['classes', 'properties', 'both']. If it does not match one of these values, the program will throw a ValueError.
149
126
127
+
`incl_unmapped` : bool
128
+
Include all unmapped terms in the output. If something has been tagged Ignore (see below) or falls below the `min_score` threshold, it is included without a mapped term at the end of the output.
129
+
150
130
All default values, if they exist, can be seen above.
151
131
152
132
### Return Value
@@ -185,9 +165,6 @@ As of version 1.2.0, text2term includes regex-based preprocessing functionality
185
165
186
166
Like the "map" functions above, the two functions differ on whether the input is a file or a list of strings:
@@ -202,7 +179,7 @@ NOTE: As of version 2.1.0, the arguments were changed to "blocklist" from "black
202
179
The Remove Duplicates `rem_duplicates` functionality will remove all duplicate terms after processing, if set to `True`.
203
180
WARNING: Removing duplicates at any point does not guarantee which original term is kept. This is particularly important if original terms have different tags, so user caution is advised.
204
181
205
-
The functions `preprocess_file()` and `preprocess_terms()`both return a dictionary where the keys are the original terms and the values are the preprocessed terms.
182
+
The function `preprocess_terms()`returns a dictionary where the keys are the original terms and the values are the preprocessed terms.
206
183
The `preprocess_tagged_terms()` function returns a list of TaggedTerm items with the following function contracts:
As mentioned in the mapping section above, this can then be passed directly to map_tagged_terms(), allowing for easy programmatic usage. Note that this allows multiple of the same preprocessed term with different tags.
194
+
As mentioned in the mapping section above, this can then be passed directly to `map_terms`, allowing for easy programmatic usage. Note that this allows multiple of the same preprocessed term with different tags.
218
195
219
196
**Note on NA values in input**: As of v2.0.3, when the input to text2term is a table file, any rows that contain `NA` values in the specified term column, or in the term ID column (if provided), will be ignored.
220
197
198
+
### Tag Usage
199
+
As of 3.0.0, some tags have additional functionality that is added when attached to a term:
200
+
201
+
IGNORE:
202
+
If an ignore tag is added to a term, that term will not be mapped to any terms in the ontology. It will only be included in the output if the `incl_unmapped` argument is True. Here are the following values that count as ignore tags:
0 commit comments