\onecolumn \title[One-Step, Three-Factor Passthought Authentication]{One-Step, Three-Factor Passthought Authentication with Custom-Fit, In-Ear EEG}
\author[\firstAuthorLast ]{\Authors} \address{} \correspondance{} \maketitle
The hardware that drives EEG-based BCIs has improved dramatically over the past five years, decreasing in size and cost by orders of magnitude (cite:Grierson2011a). Many consumer devices leverage this technology: as of December 2018, there are at least seven EEG devices on the market, ranging from 100 to 500 USD, and featuring one to sixteen electrodes. Many of them transmit data wirelessly to computers and smart devices. Meanwhile, advances in machine learning have radically improved the reliability of BCI applications. Taken together, prospects seem bright for the wider adoption of BCIs in everyday life.
However, the head-worn form-factor, and awkward visibility of EEG-based BCIs has proven a stubborn challenge to BCI adoption (cite:Mihajlovic2015). Both disabled and healthy subjects complain about the comfort of head-worn devices, the difficulty of applying electrodes correctly to the scalp, and questionable aesthetics of wearing such a visible device in public, social settings (cite:Ekandem2012,DavidHairston2014).
One possible solution to this problem is to embed EEG electrodes in earbuds, collecting EEG signals from the ear canal. While early work framed in-ear EEG largely as a tradeoff between ergonomics and signal quality (cite:Kidmose2013a), in-ear EEG signals are at least robust enough to detect auditory evoked responses (cite:Kidmose2012), and more recent work has indicated that EEG collected in the ear may have its own, unique affordances. For example, one study built a rudimentary eye-tracker using ocular signals (EOG, or electrooculography) collected from the ear canal (cite:Manabe2013).
To test in-ear EEG’s capacity to produce usable BCI applications, this paper attempts to use the sensing modality to construct a brain-based authentication system (cite:Chuang2013b) using custom-fit, EEG earbuds. Authentication relies on one or more factors: knowledge (something one knows), posesssion (something one has), or inherence (properties of one’s body). Where multifactor authentication provides added security over single-factor authentication such as passwords, multiple factors typically require multiple steps (e.g., entering a password, then entering a code from one’s cellphone). One particular brain-based authentication strategy, passthoughts, combines multiple factors of authentication into a single step: a knowledge factor (one’s secret thought), and a biometric factor (the unique way one express that thought neurally) (cite:Chuang2014). By incorporating a custom-fit earbud, we set out to combine all three factors of authentication into a single step (Figure \ref{fig:earpiece_diagram}).
This paper makes several, distinct contributions. First, we achieve a 99.82% authentication accuracy with zero false acceptance rate (FAR) using personalized custom-fit three-channel EEG earpieces and a passthoughts authentication paradigm. Second, we quantify the improvements over prior art in authentication accuracy due to the use of custom-fit versus generic earpieces, and the use of multiple electrodes versus a single electrode. Third, we evaluate multiple classification strategies that allows us to compare the relative contributions of the inherence factor and knowledge factor to authentication accuracy. Fourth, we perform simulation attacks to demonstrate the method’s robustness against impersonation via four scenarios where the attacker has access to the target’s earpiece and/or secret passthoughts.
Collectively, we build a case that in-ear EEG could offer a viable, usable road to accurate BCI applications, for healthy individuals or persons with disabilities. In addition, we argue that passthoughts authentication using personalized custom-fit earpieces offers a viable and attractive path towards one-step three-factor authentication.
The concept of in-ear EEG was introduced in 2011 with a demonstration of the feasibility of recording brainwave signals from within the ear canal cite:Looney2011). The in-ear placement can produce signal-to-noise ratios comparable to those from conventional EEG electrode placements, is robust to common sources of artifacts, and can be used in a brain-computer interface (BCI) system based on auditory and visual evoked potentials (\cite{Kidmose2013a}). One previous study attempted to demonstrate user authentication using in-ear EEG, but was only able to attain an accuracy level of 80%, limited by the use of a consumer-grade device with a single generic-fit electrode (\cite{curran2016passthoughts}). A follow-up study with a single, generic-fit electrode achieved an accuracy of 95.7% over multiple days cite:nakamura2018ear.
The use of EEG as a biometric signal for user authentication has a relatively short history. In 2005, Thorpe et al. motivated and outlined the design of a passthoughts system (\cite{Thorpe2005}). Since 2002, a number of independent groups have achieved 99-100% authentication accuracy for small populations using research-grade and consumer-grade scalp-based EEG systems (\cite{Poulos2002,Marcel2007a,Ashby2011,Chuang2013b}). Several recent works on brainwave biometrics have independently demonstrated individuals’ EEG permanence over one to six months (\cite{Armstrong2015,Maiorana2016}) or even over one year (\cite{Ruiz2017}).
Behavioral authentication methods such as keystroke dynamics and speaker authentication can be categorized as one-step two-factor authentication schemes. In both cases, the knowledge factor (password or passphrase) and inherence factor (typing rhythm or speaker’s voice) are employed (\cite{Monrose1997}). In contrast, the Nymi band supports one-step two-factor authentication via the inherence factor (cardiac rhythm that is supposed to be unique to each individual) and the possession factor (the wearing of the band on the wrist) (\cite{Nymi}). However, as far as we know, no one has proposed or demonstrated a one-step three-factor authentication scheme.
When proposing or evaluating authentication paradigms, robustness against imposters is often a first consideration, but the usability of these systems is of equal importance as they must conform to a person’s needs and lifestyle to warrant adoption and prolonged use. Sasse et al. describe usability issues with common knowledge-based systems like alphanumeric passwords, in particular that a breach in systems which require users to remember complex passwords that must be frequently changed is a failure on the part of the system’s design, not the fault of the user (\cite{sasse2001}). Other research analyzed some of the complexities of applying human factors heuristics for interface design to authentication, and indicate the importance of social acceptability, learnability, and simplicity of authentication methods (\cite{braz2006}). Technologies worn on the head entail particular usability issues; in their analysis of user perceptions of headworn devices, Genaro et al. identified design, usability, ease of use, and obtrusiveness among the top ten concerns of users, as well as qualitative comments around comfort and “looking weird” (\cite{Genaro2014}).
Mobile and wearable technologies’ continuous proximity to the user’s body provides favorable conditions for unobtrusively capturing biometrics for authentication. Many such uses have been proposed that embrace usability like touch-based interactions (\cite{Tartz2015,Holz2015}) and walking patterns (\cite{Lu2014}) using mobile phones, as well as identification via head movements and blinking in head-worn devices (\cite{Rogers2015}). However, these typically draw only from the inherence factor. Chen et al. proposed an inherence and knowledge two-factor method for multi-touch mobile devices based on a user’s unique finger tapping of a song (\cite{Chen2015}), though it may be vulnerable to “shoulder surfing”: imposters observing and mimicking the behavior to gain access.
It is well appreciated by experts and end-users alike that strong authentication is critical to cybersecurity and privacy, now and into the future. Unfortunately, news reports of celebrity account hackings serve as regular reminders that the currently dominant method of authentication in consumer applications, single-factor authentication using passwords or other user-chosen secrets, faces many challenges. Many major online services have strongly encouraged their users to adopt two-factor authentication (2FA). However, submitting two different authenticators in two separate steps has frustrated wide adoption due to its additional hassle to users. Modern smartphones, for instance, already support device unlock using either a user-selected passcode or a fingerprint. These devices could very well support a two-step two-factor authentication scheme if desired. However, it is easy to understand why users would balk at having to enter a passcode \emph{and} provide a fingerprint each time they want to unlock their phone.
“One-step two-factor authentication” has been proposed as a new approach to authentication that can provide the security benefits of two-factor authentication without incurring the hassle cost of two-step verification (\cite{Chuang2014}). In this work we undertake, to the best of our knowledge, the first-ever study and design of one-step, \textit{three}-factor authentication. In computer security, authenticators are classified into three types: knowledge factors (e.g., passwords and PINs), possession factors (e.g., physical tokens, ATM cards), and inherence factors (e.g., fingerprints and other biometrics). By taking advantage of a physical token in the form of personalized earpieces, the uniqueness of an individual’s brainwaves, and a choice of mental task to use as one’s “passthought”, we seek to achieve all three factors of authentication within a single step by the user.
In the system we propose here we seek to incorporate recommendations from this research for improved usability while maintaining a highly secure system. The mental tasks we test are simple and personally relevant; instead of complex alphanumeric patterns like a traditional password, a mental activity like relaxed breathing or imagining a portion of one’s favorite song are easy for a user to remember and perform as shown by participant feedback in previous passthoughts research and in our own results later in this paper. These mental activities are largely invisible to “shoulder surfing” attempts by onlookers, and furthermore present a possible solution to “rubber-hose attacks” (forceful coercion to divulge a password); a thought has a particular expression unique to an individual, the specific performance of which cannot be described and thus cannnot be coerced or forcibly unlike for example the combination to a padlock or fingerprint. Finally, to combat the wearability and obtrusiveness issues of scalp-based EEG systems used in other brain-based authentication research, our system’s form factor of earpieces with embedded electrodes is highly similar to earbud headphones or wireless headsets already commonly worn and generally socially accepted technologies.
Seven male, right-handed participants (P1-P7), five students and two researchers, were recruited via a university mailing list and completed our study protocol approved by our local ethics review board. The two researcher participants were also involved in the development of this study. Though this sample is relatively homogenous and greater diversity is necessary for a larger real-world feasibility assessment, this quality interestingly functions to strengthen the results of a system designed to discriminate between users (see Discussion). After participants’ 3D ear molds were obtained, the custom-fit earpieces were manufactured, and their fit and electrical impedances were checked, we proceeded to the collection of study data.
Data collection consisted of participants completing a demographics questionnaire, a setup period with the OpenBCI system and earpieces sed for EEG collection with a second impedance check, their performance of nine mental tasks, and finally a post-experiment questionnaire.
Earpieces were produced by an audiologist at Starkey, a manufacturer of hearing aids. To produce custom ear impressions, subjects’ ears were cleaned, a cotton ball with a string attached was placed inside the ear canal, and silicone was injected into the canals. Starkey “Precise3S Classic” two-part silicone impression material was used. When the silicone dried after a few minutes, the string was pulled to remove the impression from the ear canal. This impression was then scanned with a 3D scanner, and the resulting scan modified to achieve a comfortable fit and to ensure the intended electrode sites would make good contact with the skin. Channels were created in the 3D model to allow wire leads and associated EEG electrodes as well as a plastic tube to deliver audio. This 3D model was then sent to a 3D printer after which wires, leads, and associated AgCl electrodes were installed. Cortech EC-DC-AGP1 electrodes were used for the canal electrodes, and Cortech EC-DC-AGE6 electrodes were used for the concha electrode. The positions of the earpiece electrodes were simplified from those described in (\cite{Mikkelsen2015}). We reduced the number of canal electrodes in order to prevent electrical bridging and positioned them approximately 180 degrees apart in the canal (posterior/back and anterior/front locations in the canal). One other electrode was placed in the concha. An example of one of the manufactured earpieces is shown in Figure \ref{fig:earpiece_diagram}.
The electrodes were purchased from Cortech:
Canal electrodes: https://cortechsolutions.com/product/ec-dc-agp1/
Concha electrode: https://cortechsolutions.com/product/ec-dc-age6/
We selected a set of mental tasks based on findings in related work regarding the relative strengths of different tasks in authentication accuracy and usability as reported by participants (\cite{Chuang2013b,curran2016passthoughts}). Furthermore, given the in-ear placement of the electrodes and therefore the proximity to the temporal lobes containing the auditory cortex, we tested several novel authentication tasks based specifically on aural imagery or stimuli. The nine authentication tasks and their attributes are listed in Table \ref{tab:tasks}. Our strategy was to select tasks that captured a diversity across dimensions of external stimuli, involving a personal secret, eyes open or closed (due to known effects on EEG), and different types of mental imagery.
All sites were cleaned with ethanol prior to electrode placement and a small amount of conductive gel was used on each electrode. For EEG recording we used an 8-channel OpenBCI system (\cite{michalska2009openbci}) which is open-source and costs about 600 USD; an alternative to medical-grade EEG systems (which cost \textgreater20,000 USD), with demonstrated effectiveness (\cite{Frey2016}). We chose OpenBCI for its flexibility: despite the broad availability of low-cost EEG sensors, no commercially-available sensor allowed us to build our own recording configuration with a custom number, and configuration, of electrodes.
The ground was placed at the center of the forehead, at AFz according to the 10-20 International Standard for Electrode Placement (ISEP), and reference on the left mastoid (behind the left ear). We chose the AFz ground location to minimize the chances that our measruement setup caused differences between readings from the left and right electrodes, , though future systems using one ear only should test relocating the ground to a site on one ear (e.g., the earlobe). Six channels were used for the three electrodes on each earpiece (shown in Figure \ref{fig:earpiece_diagram}). For the remaining two channels, one AgCl ring electrode was placed on the right mastoid for later re-referencing, and one at Fp1 (ISEP location above the left eye) to validate the data collected in the ears against a common scalp-based placement. Before beginning the experiment, the data from each channel was visually inspected using the OpenBCI interface by having the participant clench their jaw and blink. Audio stimuli were delivered through small tubes in the earpieces.
During the experiment, participants were seated in a comfortable position in a quiet room facing a laptop on which the instructions and stimuli were presented and timings recorded using PsychoPy (\cite{peirce2007psychopy}). All tasks were performed for five trials each, followed by another set of five trials each to reduce boredom and repetition effects. Each trial was 10 seconds in length, for a total of 10 trials or 100 seconds of data collected per task. This collection protocol is outlined in Figure \ref{fig:data_collection_protocol}. The instructions were read aloud to participants by the experimenter, and participants advanced using a pointer held in their lap to minimize motion artifacts in the data. The experimenter also recorded the participant’s chosen secrets for the \textit{sport}, \textit{song}, \textit{face}, \textit{speech}, and \textit{sequence} tasks and reminded the participant of these for the second set of trials. After EEG data collection, participants completed a usability questionnaire assesing each task on 7-point Likert-type scales on dimensions of ease of use, level of engagement, repeatability, and likeliness to use for real-world authentication as well as a few open response questions. Approximately two weeks after data collection participants were contacted via e-mail and asked to recall their choices for those tasks that involved chosen secrets.
We confirm that the custom-fit earpieces were able to collect quality EEG data via two metrics: low impedances measured for the ear electrodes, and alpha-band EEG activity attenuation when a participant’s eyes were open versus closed.
It is important that the electrical impedances achieved for electrodes are low (<10 kOhm) to obtain quality EEG signals. Table \ref{tab:impedances} below summarizes the impedances across the seven participants’ six ear channels. With the exception of a few channels in select participants, impedances achieved were good overall. Most of the recorded impedances of the earpiece electrodes were less than 5 k\(Ω\), a benchmark used widely in previous ear EEG work, and all except two were less than 10 k\(Ω\). Nonetheless, the data from all electrodes were tested in our other data quality test.
For the alpha-attenuation test, data from the \textit{breathe} task was compared with that of the \textit{breathe - open} task. It is a well-known feature of EEG data that activity in the alpha-band (approx. 8-12 Hz) increases when the eyes are closed compared to when the eyes are open. This attenuation is clearly visible even in just a single trial’s data from our earpieces and matches that seen in our Fp1 scalp electrode data. Figure \ref{fig:alpha_atten} shows evidence of alpha attenuation in the left ear channels compared to Fp1, for one participant as an example. We see the same validation in the right ear channels.
\begin{figure} \centering \includegraphics[width=0.5\textwidth]{figures/002_AlphaAtt_all.jpg} \caption{Alpha-attenuation (8-12 Hz range) in left ear and Fp1 channels, referenced at left mastoid. Red indicates breathing data with eyes open, blue indicates the same task with eyes closed.} \label{fig:alpha_atten} \end{figure}
Since past work has shown that classification tasks in EEG-based brain-computer interfaces (BCI) are linear (\cite{Garrett2003a}), we used XGBoost, a popular tool for logistic linear classification (\cite{Chen2016}), to analyze the mental task EEG data. Compared to other linear classifiers, XGBoost uses gradient boosting in which an algorithm generates a decision tree of weak linear classifiers that minimizes a given loss function. Gradient boosting generally improves linear classification results without manually tuning hyper-parameters.
To produce feature vectors, we took slices of 100 raw values from each electrode (about 500ms of data), and performed a Fourier transform to produce power spectra for each electrode during that slice. We concatenated all electrode power spectra together. No dimensionality reduction was applied. For each task, for each participant, 100 seconds of data were collected in total across 10 trials of 10 seconds each, resulting in 200 samples per participant, per task.
We trained the classifier such that positive examples were from the target participant and target task, and negative examples were selected randomly from any task from any other participant. From this corpus of positive and negative samples, we withheld one third of data for testing. The remaining training set was used to cross-validate an algorithm over 100 rounds on different splits of the data. The results of each cross-validation (CV) step was used to iteratively tweak classifier parameters.
For the predictions, the evaluation regards the instances with prediction value larger than 0.5 as positive instances, and the others as negative instances. After updating classifier parameters, the classifier was tested on the withheld test set. Since negative examples far outweigh positive examples in this dataset, XGBoost automatically optimized using the error hyperparameter. Over a set of \(E\) examples containing \(E_W\) wrong examples \(E_W⊂{E}\), XGBoost’s binary classification error rate \(ε\) is calculated as
\begin{equation}\label{eq1} ε = E_W / E \end{equation}
We calculated false acceptance and false rejection rates (FAR and FRR, respectively) from these results. Over false attempts \(FA\) of which some subset \(FA_S\) were successful, and true attempts \(TA\) over which some subset \(TA_U\) were unsuccessful:
\begin{equation}\label{eq2} FAR = FA_S / FA \end{equation} \begin{equation}\label{eq3} FRR = TA_U / TA \end{equation}
To further test the robustness of the system, we also conducted a “leave one out” process for the best performing tasks in which each participant’s FAR was calculated once with each other participant left out (e.g., CV for P1 with P2 left out, then CV for P1 with P3 left out, etc., for every participant combination).
\begin{figure*} \centering \includegraphics[width=.9\linewidth]{./figures/mean-far-and-frr-by-electrode-config.png} \caption{Mean FAR and FRR by electrode configuration across all participants and tasks. All electrodes (Fp1, right, and left ear channels) combined achieved the best FAR score by mean and standard error. The right ear electrodes combined, and left ear electrodes combined, achieved next-best accuracy, both within error of one another.} \label{fig:meanByElectrode} \end{figure*}
For each configuration of electrodes, we calculated the mean FAR and FRR across all participants using each task as the passthought (Figure \ref{fig:meanByElectrode}). Incorporating all electrodes data resulted in the lowest FAR, followed by the combined right and left ear electrodes, respectively. For left ear (3 electrodes), right ear (3 electrodes), and both ears (6 electrodes) configurations, every participant had at least one task with zero FAR and FRR. Among the individual electrodes, the left canal front electrode produced a mean FAR of 0.12% and a mean FRR just below 20%. Counter to our expectations, Fp1 does not perform as well as most ear electrodes, though overall these reported FAR rates are \textless\textless 1%.
For each position, FAR was about ten times lower than FRR, which is preferable for authentication, as false authentications are generally more costly than false rejections.
Our results indicate acceptable accuracy using data from the left ear alone. This corresponds to a desirable scenario, in which the device could be worn as a single earbud. As such, we focus on results from only the left ear in the following analyses.
As an additional validity check, we replicated our results using data from the left ear only, high-passing the original frequency-domain data at 32Hz to select only data associated with non-cortical signals such as muscular activity. Our classifier performed roughly at chance. This analysis strongly suggests that EMG signals did not significantly contribute to our results. Future work may assess the relative contribution of different EEG frequency bands, as we discuss further in our discussion.
Before the end of the session, participants completed a usability questionnaire. Participants were asked to rate each mental task on four 7-point Likert-type scales: ease of use, level of engagement, repeatability, and likeliness to use in a real-world authentication setting. Mean ratings across participants for each of these dimensions for each task are shown in Table \ref{tab:usability}.
\begin{table}
\caption{Mental tasks ranked by mean ratings (\(μ\)) on 7-point Likert-type scales across participants in four usability dimensions.}
\label{tab:usability}
\begin{center}
\begin{tabular}{lrlr}
\hline
\multicolumn{2}{|c|}{\textbf{Ease of Use}} & \multicolumn{2}{|c|}{\textbf{Engagement}}
\textbf{Task} & \textbf{\(μ\)} & \textbf{Task} & \textbf{\(μ\)}\
\hline
Breathe & 6.75 & Sequence & 5\
Listen & 6.75 & Song & 5\
Breathe - Open & 6.5 & Song - Open & 5\
Song & 5.25 & Sport & 4.75\
Song - Open & 5 & Face & 4.5\
Speech & 5 & Speech & 4\
Sport & 3.5 & Breathe & 2.5\
Face & 2.75 & Breathe - Open & 2.25\
Sequence & 2.25 & Listen & 2.25\
\hline
\multicolumn{2}{|c|}{\textbf{Repeatability}} & \multicolumn{2}{|c|}{\textbf{Likeliness to Use}}\
\textbf{Task} & \textbf{\(μ\)} & \textbf{Task} & \textbf{\(μ\)}\
\hline
Breathe & 7 & Song - Open & 5\
Breathe - Open & 6.75 & Sequence & 4.25\
Listen & 6.75 & Song & 4\
Song & 4.75 & Sport & 4\
Speech & 4.75 & Breathe - Open & 3.75\
Song - Open & 4.25 & Speech & 3.75\
Face & 3 & Face & 3.5\
Sport & 3 & Listen & 3\
Sequence & 2.5 & Breathe & 2.75\
\hline
\end{tabular}
\end{center}
\end{table}
Participants also ranked the tasks overall from most (1) to least (9) favorite. \textit{Song - open} ranked highest (\(μ\)=4.25) followed by a tie between \textit{breathe - open}, \textit{song}, and \textit{speech} (\(μ\)=4.75). \textit{Sequence} (\(μ\)=7.75) and \textit{face} (\(μ\)=6.75) were ranked least favorite overall.
In addition to the scales and rankings, we included a few open response questions to ascertain attitudes around use cases for in-ear EEG and passthoughts, and the comfort of wearing an in-ear EEG device in everyday life. Participants first read the prompt, “Imagine a commercially available wireless earbud product is now available based on this technology that you’ve just experienced. It requires minimal effort for you to put on and wear.”, and were asked about use cases for in-ear EEG and passthoughts. Responses about in-ear EEG expectedly included authentication for unlocking a phone or computer and building access, but also aspects of self-improvement such as P4’s response “Help people increase focus and productivity”. P5 and P6 also indicated a use for measuring engagement with media like movies and music, and relatedly P4 wrote “music playback optimized for current mental state and feelings”. In terms of comfort wearing such a device, participants generally responded they would be comfortable, though P5 and P6 stipulated only when they already would be wearing something in the ears like earphones. Notably, three participants also added that imagining a face was difficult and had concerns regarding their ability to repeat tasks in the same exact way each time.
A final component of usability we assessed was the ability of the participants to recall their specific chosen passthoughts. Participants were contacted via e-mail approximately two weeks after data collection and asked to reply with the passthoughts they chose for the \textit{song}, \textit{sport}, \textit{speech}, \textit{face}, and \textit{sequence} tasks. All participants correctly recalled all chosen passthoughts, with the exception of one participant who did not recall their chosen word component for the \textit{sequence} task.
While our authentication analysis establishes that passthoughts achieve low FAR and FRR when tested against other participants’ passthoughts, this does not tell us how robust passthoughts are against a spoofing attack, in which both a participant’s custom-fit earpiece, and details of that participant’s chosen passthought, are leaked to an imposter who attempts authentication. We performed four different analyses to investigate the system’s robustness against imposter attacks.
First, we tested the ability of an imposter to wear an earpiece acquired from someone else and achieve viable impedance values for EEG collection based on the fit of the pieces in their ears. P1 tried on each of the other participants’ customized earpieces. The impedances from each electrode were recorded and are listed in Table \ref{tab:p1_imposter_impedances} below. Across all cases, the impedances are not only higher (worse), but also deviate significantly from those achieved by the pieces’ intended owners themselves (Table \ref{tab:impedances}). These results come as no surprise given the uniqueness of ear canal shapes between individuals cite:Akkermans2005, and point to the possibility that the presentation of a physical token that provides the correct impedance levels can be used as another demonstration of both the inherence and possession factors.
Second, to explore the scenario of an imposter attempting to gain access, we chose the case of the most vulnerable participant, P6, whose earpieces P1, P2, and P7 had the lowest impedances while wearing (Table \ref{tab:p1_imposter_impedances}). We collected data using the same data collection protocol, but had the “imposters” refer to P6’s list of chosen passsthoughts.
Each imposter performed each of P6’s passthoughts (simulating an “inside imposter” from within the system). Following the same analysis steps, we generated 200 samples per task for our imposters, using data from all left ear electrodes.
Since every participant has one classifier per task (for which that task is the passthought), we are able to make 200 spoofed attempts with the correct passthought on each of P6’s classifiers. We find zero successful spoof attempts for tasks with a chosen secret (e.g., \textit{song} or \textit{face}). In addition, we also do not find any successful spoof attacks for tasks with no chosen secret (e.g., \textit{breathe}). In fact, in all 1,800 spoof attempts (200 attempts for each of the nine classifiers), we do not find a single successful attack on any of P6’s classifiers.
Since this participant’s data appeared in the initial pool, the classifier may have been trained on his or her recordings as negative examples. As our third analysis, to explore the efficacy of an outsider spoofing recordings, we repeated the same protocol with an individual “PX” who did not appear in our initial set of participants (an “outside imposter”). Again, we find zero successful authentications out of 1,800 attempts.
\begin{table}
\caption{Left concha (C), canal-front (F) and canal-back (B) electrode impedances of “imposters” P1, P2, P7 and “PX” - a person completely outside of the system - wearing P6’s left earpiece.}
\label{tab:imposter_impedances}
\begin{center}
\begin{tabular}{lrrr}
& \multicolumn{3}{c}{Impedance [k\(Ω\)]}
\hline
\textbf{P} & \textbf{C} & \textbf{F} & \textbf{B} \
\hline
1 & 18.7 & 10.0 & 8.4\
2 & 46.7 & 35.7 & 24.8\
7 & 44.5 & 20.5 & 26.3\
X & 70.0 & 10.5 & 8.9\
\hline
\end{tabular}
\end{center}
\end{table}
Fourth, our “leave one out” analysis can also be seen as another set of outside imposter attacks, in which each participant acts as an outside imposter for each other participant, but where the imposters have their own manufactured earpieces and passthoughts. The best task classifiers achieved FARs of 0% across all combinations, successfully rejecting the simulated imposters.
Our findings demonstrate the apparent feasibility of a passthoughts system consisting of a single earpiece with three electrodes, a ground, and a reference, all in or on the left ear. Notably, the gain in performance when adding an additional three electrodes from the right ear is only marginal in our results, suggesting a single earpiece could suffice though this may change with larger sample sizes. FARs and FRRs are consistently low across all participants and tasks, with FARs overall lower than FRRs, a desirable pattern as FAR is the more critical of the two in terms of accessing potentially sensitive information. Participants’ best-performing tasks or passthoughts typically see no errors in our testing. From our various training/testing schema it emerged that the inherence factor performs better on its own compared to the knowledge factor, but the combination of the two achieves the lowest FAR indicating measurable benefit of multiple factors. Furthermore, we were able to achieve these results by generating feature vectors based on only 500ms of EEG signal (300 voltage readings across the three electrodes), suggesting that passthoughts can be captured and recognized quickly. Passthoughts also appear to be quite memorable given our two-week recall follow-up and a few were rated highly repeatable and engaging. Furthermore, no spoofed attacks were successful in our analyses.
Compared against the 80% authentication accuracy achieved with a single generic-fit electrode (\cite{curran2016passthoughts}), we are able to achieve 90% accuracy with a custom-fit earpiece using data from a single electrode, and 99.8% accuracy with the same custom-fit earpiece using three electrodes. This points to the importance of both the goodness-of-fit of the electrodes and the number of channels as contributors to authentication performance.
These personalized custom-fit earpieces can also be easily outfitted with a hardware keypair for signing authentication attempts, so as to function as a physical token similar to the way an electronic key fob can be used to unlock a car, but with additional inherence and knowledge factors in place.
Several tasks performed exceedingly well among participants, even tasks like \textit{breathe} and \textit{breathe - open} which did not have an explicit secondary knowledge factor as in \textit{song} or \textit{face}. This suggests a passthoughts system could present users with an array of task options to choose from without significant loss in security. While \textit{sport} performed best in terms of low FAR and FRR, it was not rated highly in usability dimensions or as a favorite by our participants. Tasks like \textit{breathe - open} and \textit{song - open} however, both performed well and were rated quite favorably. Interestingly, the \textit{sequence} task was rated low in ease of use and repeatability, and as the least favorite among participants, but was rated highest in likeliness to use in a real-world setting. \textit{Sequence} was arguably the most complex task, and its high rating in likeliness to use could indicate that users are more likely to use a task they perceive as more secure even at the cost of additional effort. This is true afterall for one of the most common forms of authentication, alphanumeric passwords, where increased complexity ensures better performance. The topic of user perceptions of different passthoughts as means of authentication warrants its own research.
The difficulty of stealing someone else’s knowledge factor emerged in our spoofing attacks. In conventional password-based systems, once the knowledge factor is divulged, an attacker can essentially spoof the target with 100% success rate. In a passthought-based system, even though our target participant documented their chosen passthought, the spoofers found ambiguity in how these passthoughts could be expressed. For example, for the \textit{face} task, the spoofers did not know the precise face the original participant had chosen. For the \textit{song} tasks, though the song was known, the spoofers did not know what part of the song the original participant had imagined, or how it was imagined. This experience sheds light on passthoughts’ highly individual nature and suggests there may be intrinsic difficulty in spoofing attempts. Future work should examine this effect more explicitly to elucidate the effect of knowledge task specificity on defense against imposters.
Performance on Fp1 was not as high as performance in the ear, despite Fp1’s popularity in past work on passthoughts (\cite{Chuang2013b}). One plausible explanation is that several of our mental tasks involved audio (real or imagined), which we would expect to be better observed from the auditory cortex near the ears, as opposed to frontal lobe activity (e.g., concentration) that might be more easily picked up near Fp1. Another possible explanation is that Fp1 may be more sensitive to large, task-irrelevant artifacts from EOG and facial EMG. In either case, future work should continue to investigate what classes of mental tasks best lend themselves to in-ear recording.
The sample size of our study, while small, is comparable to that of other EEG authentication studies (\cite{Ashby2011,Marcel2007a,Poulos2002,Chuang2013b,curran2016passthoughts}) and other custom-fit in-ear EEG research (\cite{Kidmose2013a,Mikkelsen2015}). The fitting and manufacturing of custom-fit earpieces for each recruited participant was the main limitation to increasing our sample size. This may very well pose a limitation in the proliferation and adoption of such a technology as well, although recently there have been developments in at-home kits for creating one’s own custom-fitted earpieces (\cite{voix2015settable}) that could help overcome this barrier.
The relative homogeneity of our participant pool can be seen as a strength of the reported results, given that system is meant to distinguish between individuals. For future studies however, we should expand the size and diversity of participants, encompassing users and use cases which this system would be particularly applicable such as those with extreme security needs and/or persons with disabilities which may prevent them from performing other authentication methods, e.g. those that require the use of one’s hands, voice, or particular bodily movement patterns.
Our work aimed primarily to evaluate our authentication system’s security characteristics. As such, we have not investigated which EEG frequency bands drive the authentication results. Future work could re-analyze our data to better understand which frequency bands are most contributing to our authenticator’s results. This work would deepen our neuroscientific understanding of how the authentication system achieves the results we observe.
Applications for a system like the one we propose here span any use case for authentication, but some may be particularly well-suited. As has been the motivation for much of the original and ongoing BCI research and development, brain-based systems like this one are nearly universally accessible for use by a wide variety of people with different bodies. As previously mentioned, one’s particular passthought is immune to observation and so is apt for use in public spaces or times when malicious observation is likely, and would be extremely difficult to coerce (or even willingly share). To aid in adoption, this system could be aligned with currently used technology of similar form factors, for example speakers could be placed inside our current custom-fit pieces to produce working “hearables” that could be used as ordinary headphones.
A key limitation to this work is that our experiments were conducted in a controlled laboratory setting with participants in a stationary, sitting position. Future work should examine EEG data collected from a variety of different user states: ambulatory or distracting settings, during physical exertion or exercise, under the influence of caffeine or alcohol, etc., as well as over longer periods of time or in multiple recording sessions. While these additional conditions may limit the performance of the system, it is interesting to consider which if any limiations might be advantageous in some way. For example, a system that prevents or allows access only when a user is in a certain state of mind or setting, or enforces a biologically-based expiration that requires classifier re-training and thus offers protection in a scenario where a user’s original EEG pattern was somehow leaked or surreptitiously stored.
Finally, our work leaves room for some clear user experience improvements. Future work should test the performance of this system using dry electrodes, which are commonly found in consumer EEG devices and have shown recent promise for ear EEG systems (\cite{kappel2018dry}), as eliminating the need for conductive gel would very likely improve comfort and usability and it is unlikely any system involving gel will be widely adopted. Future work should also attempt a closed-loop (or online) passthought system, in which users receive immediate feedback on the result of their authentication attempt. A closed-loop BCI system would assist in understanding how human learning effects might impact authentication performance, as the human and machine co-adapt.
Neuroscience fuels some of the most chilling predictions in science fiction (cite:Welsh2011). It also stands for some of the greatest possible advances in medicine, mental health, and understanding of human behavior. One ambitious goal is to detect or even predict seizures (cite:Mormann2006).
However, the original, and most active areas of research in BCI surround the creation of tools for persons with muscular disabilites (cite:Carrino2012). By collecting unstructured or semi-structured EEG data in the wild, passthought systems could help improve the development of such BCIs (cite:Grierson2011a). The small size of data repositories, limited mostly by the clinical trials needed to build BCIs for persons with disabilities, has consistently frustrated attempts to improve on algorithms and protocols in this field (cite:Allison2009). Although passthought users may not have muscular disabilities, pursuing passthoughts as an area of research will inevitably yield larger repositories of EEG data than have been collected to date. This data could prove invaluable for the development of EEG-based BCIs across a variety of fields, including (but not limited to) assistive technologies.
Again, these opportunities must strike a balance with the risks borne by users around privacy and security. Violating user privacy by revealing EEG data, even to researchers, could undermine any chance of wider BCI adoption in the long-term. Striking this balance will require a deeper understanding of the statistical properties of signals. How much data will users really need to give up? What counts as an “anomalous” reading? Answers to these questions could themselves inform neuroscientific inquiry. This balance will also require a deeper understanding of individuals’ attitudes about the meaning of such signals, and how private people believe them to be.
In general, as sensors grow smaller and cheaper, devices more connected, and machine learning more sophisticated, people will build increasingly high-resolution models of human physiology “in the wild.” Passthoughts present just a microcosm of the good such advances might bring, along with some of the most pressing anxieties: What does pervasive physiological recording mean for our privacy, security, safety? The balancing act between these risks and opportunities will prove recurring theme for decades to come. Perhaps passthought authentication could better protect sensitive readings such as EEG. Probing the outer limits of ubiquitous, pervasive sensing can shed light on both the good and bad of ubiquitous physiological monitoring.
Using custom-fit EEG earpieces, we produced a one-step, three-factor authentication system. We demonstrated that our system has high accuracy, higher than prior work using non-custom earpieces. We demonstrated that both inherence and knowledge factors contribute to authentication accuracy, and performed a simulated attack to show our system’s robustness against impersonation. We believe that custom-fit EEG earpieces provide a practical path forward for BCI applications, security-related and beyond, both for healthy individuals and for persons with disabilities.
This work was supported in part by the Berkeley Center for Long Term Cybersecurity (CLTC), and the Hewlett Foundation. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors.
\bibliographystyle{ext/frontiersinSCNS_ENG_HUMS} \bibliography{references}