Shared Keystroke Data for Continuous Authentication - Generation and Analysis

May 2018

Shared Keystroke Data for Continuous Authentication - Generation and Analysis

The standard methods to authenticate a computer or a network user, which typically occur once at the initial log-in, suffer from a variety of vulnerabilities such as masquerading and potential system compromise. An effective solution to this one-time authentication problem is the continuous authentication using behavioral biometrics. Monitoring of a user's keystroke dynamics is a useful mechanism for continuous authentication. Researchers have taken various approaches for the collection and use of keystroke dynamics. However, the privacy issue, the non-availability of large enough datasets for evaluation, the reliability and scalability, and the robustness of the methods are still not well addressed, which are the focus of this dissertation.

First, a systematic study of the security and privacy of the keystroke dynamics approach to continuous authentication is conducted. A rule based data sanitization scheme is developed to detect and remove personally identifiable and other sensitive information from the collected dataset. A data transmission scheme using the Extensible Messaging and Presence Protocol (XMPP) is implemented to guarantee privacy during transmission. Based on these two schemes, two distinct architectures are proposed for providing secure and privacy preserving data processing support for continuous authentication. These architectures provide flexibility of use depending upon the application environment.

Second, the largest publicly accessible keystroke dataset for continuous authentication has been generated. In this research, the details on the collection of a shared dataset for the study of keystroke dynamics are provided. The raw keystroke data was collected from 301 subjects allowing them to transcribe fixed text and answer questions freely. The dataset is characterized to reflect the temporal variations of typing patterns and the perturbations caused by different keyboard layouts.

Third, the effect of the number of subjects on the performance and the reliability and scalability of the keystroke dynamics as the authentication mechanism are explored. Three sets of experiments are conducted with the use of their previously generated large free-text dataset with 291 subjects using two standard classification algorithms. By systematically varying the number of subjects and the size of the typing profile, the findings are: 1) the keystroke authentication system can still achieve a good classification rate when the number of subjects involved is significantly high; 2) the performance is independent of the number of subjects after a certain threshold. The practical implication of their findings are also discussed.

Fourth, the user recognition rate is enhanced by adopting a group of keystroke features that has been overlooked by the research community. The research is conducted in two folds. To begin with, a standalone analysis is performed to identify the potentials of a group of normally ignored features, namely, secondary features. The experimental result compares well with the results obtained from letter based features (primary features) by other researchers. And quality results are obtained with fewer data records. Then, a feature selection and fusion mechanism is designed to select and fuse the secondary features with primary features to further improve the recognition rate of the underlying machine learning algorithms. Their approach is evaluated using their previously generated dataset and the result is better than the current state-of-the-art.

Fifth, the robustness of continuous authentication using keystroke dynamics under synthetic forgery attacks is studied. It is commonly accepted that users of a biometric system may have differing degrees of accuracy within the system. Some users may have trouble authenticating, while others may be particularly vulnerable to impersonation. In this research, a mechanism is designed to select certain type of users from a large keystroke dataset. With their data, a master key is forged to attack the existing keystroke authentication system. The attacks are launched under both zero-effort as well as non-zero effort scenarios. Their initial results indicate that in the wake of the proposed synthetic impostor attack, the recognition ability of the keystroke authentication system can be weakened.

Open Calls

Nomination/Position	Deadline
Nominations Open for Newly Formed Best Paper Award Review Committees	17 March 2026
Call for Nominations: IEEE T-MM 2026 Multimedia Prize Paper Award	31 March 2026
Call for Nominations: Board of Governors Members-at-Large and Regional Directors-at-Large	3 April 2026
Apply for a 2026 Signal Processing Society Scholarship!	30 June 2026

Nomination/Position

Deadline

Nominations Open for Newly Formed Best Paper Award Review Committees

17 March 2026