Skip to content

Commit

Permalink
add blogpost
Browse files Browse the repository at this point in the history
  • Loading branch information
agricolamz committed Nov 25, 2020
1 parent 18e0b48 commit 172de35
Show file tree
Hide file tree
Showing 2 changed files with 118 additions and 439 deletions.
122 changes: 118 additions & 4 deletions presentations/2020.10.27_rOpenSci_blogpost/rOpenSci_blogpost.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,124 @@ Science become data-driven in many fields. The more we go this direction, the mo
* **applied**, where previously obtained results are used to solve some practical problems;
* **science craft**, which could be summarised in one question: "How obtain scintific results?".

Since the last part is out of wedlock, contributers in that field are underestimated: the software recieve less citations, the methodological articles are less wanted by scintific journals. It is possible that some desciplines have some standards and tutorials that list all necessary steps.
Since the last part is out of wedlock, contributers in that field are underestimated: the software recieve less citations, the methodological articles are less wanted by scintific journals. It is possible that some desciplines have some standards and tutorials that list all necessary steps.

I am a field linguist, so I have spent a lot of time working at the Caucasus villages collecting audio from speakers of indigenous languages. Theoretical linguists will be interested for example in whether [ejective consonants](https://en.wikipedia.org/wiki/Ejective_consonant) (the special type of consonants, you can hear them on wikipedia page) and their acoustic characteristics are common for all three endemic language families of the Caucasus. Applied linguists will be ineterested for example in creation of the sound generation system for some of those languages. Linguistic crafters should be interested in how to collect data in a way that both theoretical and applied linguists can do their job. I have wrote *should*, because I have never seen any linguistic creafters -- in most cases there functions were carried out by the linguist whoever is in a field.
I am a field linguist, so I have spent a lot of time working at the Caucasus villages collecting audio from speakers of indigenous languages. Theoretical linguists will be interested for example in whether [ejective consonants](https://en.wikipedia.org/wiki/Ejective_consonant) (the special type of consonants, you can hear them on wikipedia page) and their acoustic characteristics are common for all three endemic language families of the Caucasus. Applied linguists will be ineterested for example in creation of the sound generation system for some of those languages. Linguistic crafters should be interested in how to collect data in a way that both theoretical and applied linguists can do their job. I have wrote *should*, because I have never seen any linguistic crafters -- in most cases there functions were carried out by the linguist whoever is in a field.

During my fieldtrips I created my own pipeline of collecting data. That were a combination of different scripts written using different programming languages and without any future thinking. What they were doing was not any scince at all: automatic renaming, merging, automatic preannotation of files, making backups, visualising some data etc. The problem started, when I tried to pass knowledge to my students. Not all of my students were familiar with all programing languages, some of the code become outdated, some of the code not worked properly on all operational systems. That is why I decided to create a package that will be a toolkit for a phonetic researcher: writen using one programming language, simple for non-coders or people that are not familiar with R.
During my fieldtrips I created my own pipeline of collecting data. That were a combination of different scripts written using different programming languages and without any future thinking. What they were doing was not any scince at all: automatic renaming, merging, automatic preannotation of files, making backups, visualising some data etc. The problem started, when I tried to pass knowledge to my students. Not all of my students were familiar with all programing languages, some of the code become outdated, some of the code not worked properly on all operational systems. That is why I decided to create a package that become a toolkit for a phonetic researcher: writen using one programming language, simple for non-coders or people that are not familiar with R.

## `phonfieldwork`
## `phonfieldwork`

Most phonetic research consists of the following steps:

1. Formulate a research question. Think of what kind of data is necessary to answer this question, what is the appropriate amount of data, what kind of annotation you will do, what kind of statistical models and visualizations you will use, etc.
2. Create a list of stimuli.
3. Elicite list of stimuli from speakers who signed an Informed Consent statement, agreeing to participate in the experiment to be recorded on audio and/or video. Keep an eye on recording settings: sampling rate, resolution (bit), and number of channels should be the same across all recordings.
4. Annotate the collected data.
5. Extract the collected data.
6. Create visualizations and evaluate your statistical models.
7. Report your results.
8. Publish your data.

The phonfieldwork package is created for helping with items 3, partially with 4, and 5 and 8. If you are interested in the whole pipeline please read the whole [Get started section](https://ropensci.github.io/phonfieldwork/articles/phonfieldwork.html) in package documentation. I will try to show some peculiar features.

## What can be done with `phonfieldwork`?

Let's load the package:

```{r}
library(phonfieldwork)
```


### Sound annotation formats

There was a goal to convert multiple sound annotation formats into data.frame format, so you can find a whole bunch of functions that serves this goal:

* file `.TextGrid` from Praat (just change the `system.file()` function to path to the file); see also [`rPraat`](https://fu.ff.cuni.cz/praat/#rpraat-package-for-r) and [`textgRid`](https://github.com/patrickreidy/textgRid) packages
```{r}
textgrid_to_df(system.file("extdata", "test.TextGrid", package = "phonfieldwork"))
```

* file `.eaf` from ELAN (just change the `system.file()` function to path to the file); see also the [FRelan](https://github.com/langdoc/FRelan) package by Niko Partanen

```{r}
eaf_to_df(system.file("extdata", "test.eaf", package = "phonfieldwork"))
```

* file `.exb` from EXMARaLDA (just change the `system.file()` function to path to the file)
```{r}
exb_to_df(system.file("extdata", "test.exb", package = "phonfieldwork"))
```

* subtitles file `.srt` (just change the `system.file()` function to path to the file)

```{r}
srt_to_df(system.file("extdata", "test.srt", package = "phonfieldwork"))
```

* file `.txt` from Audacity

```{r}
audacity_to_df(system.file("extdata", "test_audacity.txt", package = "phonfieldwork"))
```

There is also an option to work with `.flextext` files from FLEx, but since it is not actually connected with the phonetics, I will skeep this part.

### Sound visualisation

Sound visualisation is a common task that could be solved via different programs and R packages, it is also possible to do with `phonfieldwork`:

```{r}
file <- system.file("extdata", "test.wav", package = "phonfieldwork")
draw_sound(file)
```

The peculiar thing is that it is possible to zoom in some part of the sound with spectrogram (time in the `zoom` argument is present in seconds):

```{r}
draw_sound(file, zoom = c(0.2, 0.4))
```

It is also possible to visualise any sound annotation format that was converted to dataframe in the previouse section:

```{r}
our_textgrid <- system.file("extdata", "test.TextGrid", package = "phonfieldwork")
draw_sound(file,
annotation = textgrid_to_df(our_textgrid))
```

### Sound viewer

If you have folders with small sound chunks and thir visualisations it is possible to create a sound viewer like [this](https://ropensci.github.io/phonfieldwork/additional/stimuli_viewer.html). This is done using the `create_viewer()` function:

```{r, eval = FALSE}
create_viewer(audio_dir = ".../sounds/", # path to folder with sounds
picture_dir = ".../pictures/", # path to folder with pictures
table = df, # dataframe with additional information
output_dir = "...", # where to store the result?
output_file = "...") # how to name the result file?
```

If you are familiar with my package `lingtypology` Moroz (2017) for interactive linguistic map generation and API for typological databases, there is a good news for you: it is possible to connect those two pacakages creating an interactive map that share the same hear and view buttons. Here is [an example](https://ropensci.github.io/phonfieldwork/additional/stimuli_viewer2.html).

I really hope that this format will become a new tool for searching, analysing and sharing phonetic data. However there is always a risk that this tool can be misused, so rOpenSci board asked me to write a [text about Ethical Research with phonfieldwork](https://ropensci.github.io/phonfieldwork/articles/ethical_research_with_phonfieldwork.html).

## Acknowledgements

I would like to thank

* my rOpenSci reviewers Jonathan Keane and Niko Partanen for their interesting comments and idias;
* Melina Vidoni for being the package review editor;
* participants of seminars by School of linguistics and Linguistic Convergence Laboratory at HSE, Moscow, where I first presented this package;
* my freinds Neige Rochant and Samira Verhees for sharing their data and problems with me, that made `phonfieldwork` better.

## Announcement

Dear all,

It’s my pleasure to announce a new stable release of phonfieldwork v. 0.0.10 package for R that just passed the rOpenSci review. The main goal of this package is to provide a huge variety of tools that can be used by phonetic researchers in the field and laboratories. The package helps collect, annotate and convert different sound annotation formats to tables. It is also possible to create sound viewers (like [this](https://ropensci.github.io/phonfieldwork/additional/stimuli_viewer.html)) that can be a handy tool for data analysis and data sharing process in phonetics. The process of work with this package is documented [here](https://ropensci.github.io/phonfieldwork/), but I would be happy to answer you questions via github issues or mail.

George Moroz,
Linguistic convergence laboratory, HSE
Loading

0 comments on commit 172de35

Please sign in to comment.