Skip to content

Commit 1bf4fb2

Browse files
authored
Update of Data Generation to accept file-based input data (#11)
* Added input-file based utternaces and entities * Updated Data Generation Notebook to use csv-based input data * Fixed comment from review, added ps1 to run scenarios for testing, updated documentation to use example assets in sample commands * Updated sampel config
1 parent 920e02e commit 1bf4fb2

12 files changed

+21361
-235
lines changed

.gitignore

+2
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,8 @@ config.ini
112112
# Working directories
113113
input/
114114
output/
115+
output_files/
116+
assets/customer_data/
115117

116118
# Virtual envs
117119
.venv/

GET_YOUR_KEYS.md

+1
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ In this section, we describe how to set up ffmpeg in case you do TTS.
1010
#### Windows
1111
- Download the latest version of [ffmpeg](https://ffmpeg.org/download.html#build-windows).
1212
- Extract the archive locally and copy the file `bin/ffpmeg.exe` to a location of your choice, e.g. to the `assets` folder of GLUE.
13+
- Add the location to the system environment variables "PATH" setting.
1314
- Finally, copy the path and insert it in your config.ini as below.
1415
```
1516
[driver]

README.md

+7-7
Original file line numberDiff line numberDiff line change
@@ -166,7 +166,7 @@ This scenario describes how you can batch-transcribe audio files using GLUE. A p
166166
2. Make sure your `.venv` is activated.
167167
3. Run the following command:
168168
```bash
169-
python src/glue.py --audio C:/audio_files/ --do_transcribe
169+
python src/glue.py --audio assets\examples\input_files\audio --do_transcribe
170170
```
171171
4. Wait for the run to finish.
172172

@@ -190,7 +190,7 @@ This scenario describes how you can batch-synthesize text data using GLUE. A pot
190190
2. Make sure your `.venv` is activated.
191191
3. Run the following command:
192192
```bash
193-
python src/glue.py --input ../input-files/text.csv --do_synthesize
193+
python src/glue.py --input assets\examples\input_files\example_tts.csv --do_synthesize
194194
```
195195
4. Wait for the run to finish.
196196

@@ -227,7 +227,7 @@ This scenario shows how you can use GLUE to batch-score textual data on a LUIS-e
227227
2. Make sure your `.venv` is activated.
228228
3. Run the following command:
229229
```bash
230-
python src/glue.py --input /home/files/luis-utterances.csv --do_scoring
230+
python src/glue.py --input assets\examples\input_files\example_luis.csv --do_scoring
231231
```
232232
4. Wait for the run to finish and see the command line outputs.
233233

@@ -263,7 +263,7 @@ This scenario describes how you can compare already existing recognitions with a
263263
2. Make sure your `.venv` is activated.
264264
3. Run the following command:
265265
```bash
266-
python src/glue.py --input ../input/transcriptions.txt --do_evaluate
266+
python src/glue.py --input assets\examples\output_files\example_transcriptions_full.csv --do_evaluate
267267
```
268268
4. Wait for the run to finish and see the command line outputs.
269269

@@ -292,7 +292,7 @@ This scenario describes how you can batch-transcribe audio files and compare the
292292
2. Make sure your `.venv` is activated.
293293
3. Run the following command:
294294
```bash
295-
python src/glue.py --audio C:/audio_files/ --input C:/audio_files/transcriptions.txt --do_transcribe --do_evaluate
295+
python src/glue.py --audio assets\examples\input_files\audio --input assets\examples\input_files\example_stt_eval.csv --do_transcribe --do_evaluate
296296
```
297297
4. Wait for the run to finish and see the command line outputs.
298298

@@ -322,7 +322,7 @@ This scenario describes how you can batch-transcribe audio files, compare these
322322
2. Make sure your `.venv` is activated.
323323
3. Run the following command:
324324
```bash
325-
python src/glue.py --audio C:/audio_files/ --input C:/audio_files/transcriptions.txt --do_transcribe --do_evaluate --do_scoring
325+
python src/glue.py --audio assets\examples\input_files\audio --input assets\examples\input_files\example_stt_eval_luis.csv --do_transcribe --do_evaluate --do_scoring
326326
```
327327
4. Wait for the run to finish and see the command line outputs.
328328

@@ -363,7 +363,7 @@ This scenario describes how you can batch-transcribe audio files and score both
363363
2. Make sure your `.venv` is activated.
364364
3. Run the following command:
365365
```bash
366-
python src/glue.py --audio C:/audio_files/ --input C:/audio_files/transcriptions.txt --do_transcribe --do_scoring
366+
python src/glue.py --audio assets\examples\input_files\audio --input assets\examples\input_files\example_luis.csv --do_transcribe --do_scoring
367367
```
368368
4. Wait for the run to finish and see the command line outputs.
369369

assets/examples/input_files/example-luis-app-with-generated-data.lu

+964
Large diffs are not rendered by default.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
intent;text
2+
BookFlight;a ticket to {Airport} please
3+
BookFlight;could you book a flight to {Airport} on 2nd of april for me?
4+
BookFlight;i want a flight to {Airport}
5+
BookFlight;i would like to book a flight to {Airport}.
6+
BookFlight;no, i want to fly to {Airport}, not to {Airport}.
7+
BookSeat;book a seat as close as possible to the galley on my flight to {Airport}
8+
BookSeat;book a seat for me on my preferred seat.
9+
BookSeat;i need a special seat on my flight
10+
BookSeat;i want to have a window seat
11+
BookSeat;i would like to book a seat on my flight to {Airport}
12+
CancelFlight;cancel flight to {Airport}
13+
CancelFlight;cancellation of flight to {Airport}.
14+
CancelFlight;i need to cancel my flight to {Airport}
15+
CancelFlight;i want to cancel my journey to {Airport}.
16+
CancelFlight;please cancel my flight to {Airport}
17+
ChangeFlight;change flight to {Airport}.
18+
ChangeFlight;change my flight from {Airport} to {Airport}
19+
ChangeFlight;i need to change my flight from {Airport} to {Airport}
20+
ChangeFlight;i would like to change my flight.
21+
ChangeFlight;please rebook my flight to {Airport}, please.
22+
DepartureInfo;could you check whether my flight has a delay?
23+
DepartureInfo;i need the status of my flight {FlightNumber}
24+
DepartureInfo;is my flight on time?
25+
DepartureInfo;what is the status of my flight {FlightNumber}
26+
DepartureInfo;when does {FlightNumber} depart?
27+
GetEntities;{first_name} {last_name}
28+
GetEntities;my name is {first_name} {last_name}
29+
GetEntities;i am {first_name} {last_name}
30+
GetEntities;my firstname is {first_name}
31+
GetEntities;my lastname is {last_name}
32+
GetEntities;It's {last_name}
33+
GetEntities;it is {first_name}
34+
GetEntities;{first_name} {last_name}.
35+
GetEntities;my name is {first_name} {last_name}.
36+
GetEntities;i am {first_name} {last_name}.
37+
GetEntities;my firstname is {first_name}.
38+
GetEntities;my lastname is {last_name}.
39+
GetEntities;It's {last_name}.
40+
GetEntities;it is {first_name}.
41+
None;What are you talking about?
42+
None;You are dumb!
43+
None;Do you like candy?
44+
None;how are you doing
45+
None;i want to speak to a human
46+
None;what can you do
47+
None;whats the weather like today
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
entity_name;entity_value
2+
first_name;Peter
3+
first_name;Paul
4+
first_name;Mary
5+
first_name;Margret
6+
first_name;Theo
7+
first_name;Tom
8+
first_name;Maria
9+
first_name;Linda
10+
first_name;Lisa
11+
last_name;Coskun
12+
last_name;Delaunay
13+
last_name;Maier
14+
last_name;Ebner
15+
last_name;Inhestern
16+
last_name;Rasztawicki
17+
last_name;Reichenbach
18+
last_name;Kistner
19+
last_name;Mahoney
20+
last_name;Müller
21+
FlightNumber;AB1234
22+
FlightNumber;LH7721
23+
FlightNumber;AA1834556
24+
FlightNumber;CP9919223
25+
FlightNumber;LH1346
26+
FlightNumber;AA1994557
27+
FlightNumber;Ab9164
28+
Airport;Berlin
29+
Airport;Tokyo
30+
Airport;New York
31+
Airport;Tel Aviv
32+
Airport;London
33+
Airport;Paris
34+
Airport;Los Angeles
35+
Airport;Toronto
36+
Airport;Seattle

0 commit comments

Comments
 (0)