Shared Keystroke Data for Continuous Authentication - Generation and Analysis

You are here

Inside Signal Processing Newsletter Home Page

Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.

News and Resources for Members of the IEEE Signal Processing Society

Shared Keystroke Data for Continuous Authentication - Generation and Analysis

By: 
Sun, Yan. State University of New York at Buffalo

Advisor: Upadhyaya, Shambhu

The standard methods to authenticate a computer or a network user, which typically occur once at the initial log-in, suffer from a variety of vulnerabilities such as masquerading and potential system compromise. An effective solution to this one-time authentication problem is the continuous authentication using behavioral biometrics. Monitoring of a user's keystroke dynamics is a useful mechanism for continuous authentication. Researchers have taken various approaches for the collection and use of keystroke dynamics. However, the privacy issue, the non-availability of large enough datasets for evaluation, the reliability and scalability, and the robustness of the methods are still not well addressed, which are the focus of this dissertation.

First, a systematic study of the security and privacy of the keystroke dynamics approach to continuous authentication is conducted. A rule based data sanitization scheme is developed to detect and remove personally identifiable and other sensitive information from the collected dataset. A data transmission scheme using the Extensible Messaging and Presence Protocol (XMPP) is implemented to guarantee privacy during transmission. Based on these two schemes, two distinct architectures are proposed for providing secure and privacy preserving data processing support for continuous authentication. These architectures provide flexibility of use depending upon the application environment.

Second, the largest publicly accessible keystroke dataset for continuous authentication has been generated. In this research, the details on the collection of a shared dataset for the study of keystroke dynamics are provided. The raw keystroke data was collected from 301 subjects allowing them to transcribe fixed text and answer questions freely. The dataset is characterized to reflect the temporal variations of typing patterns and the perturbations caused by different keyboard layouts.

Third, the effect of the number of subjects on the performance and the reliability and scalability of the keystroke dynamics as the authentication mechanism are explored. Three sets of experiments are conducted with the use of their previously generated large free-text dataset with 291 subjects using two standard classification algorithms. By systematically varying the number of subjects and the size of the typing profile, the findings are: 1) the keystroke authentication system can still achieve a good classification rate when the number of subjects involved is significantly high; 2) the performance is independent of the number of subjects after a certain threshold. The practical implication of their findings are also discussed.

Fourth, the user recognition rate is enhanced by adopting a group of keystroke features that has been overlooked by the research community. The research is conducted in two folds. To begin with, a standalone analysis is performed to identify the potentials of a group of normally ignored features, namely, secondary features. The experimental result compares well with the results obtained from letter based features (primary features) by other researchers. And quality results are obtained with fewer data records. Then, a feature selection and fusion mechanism is designed to select and fuse the secondary features with primary features to further improve the recognition rate of the underlying machine learning algorithms. Their approach is evaluated using their previously generated dataset and the result is better than the current state-of-the-art.

Fifth, the robustness of continuous authentication using keystroke dynamics under synthetic forgery attacks is studied. It is commonly accepted that users of a biometric system may have differing degrees of accuracy within the system. Some users may have trouble authenticating, while others may be particularly vulnerable to impersonation. In this research, a mechanism is designed to select certain type of users from a large keystroke dataset. With their data, a master key is forged to attack the existing keystroke authentication system. The attacks are launched under both zero-effort as well as non-zero effort scenarios. Their initial results indicate that in the wake of the proposed synthetic impostor attack, the recognition ability of the keystroke authentication system can be weakened.

SPS on Twitter

  • DEADLINE EXTENDED: The 2023 IEEE International Workshop on Machine Learning for Signal Processing is now accepting… https://t.co/NLH2u19a3y
  • ONE MONTH OUT! We are celebrating the inaugural SPS Day on 2 June, honoring the date the Society was established in… https://t.co/V6Z3wKGK1O
  • The new SPS Scholarship Program welcomes applications from students interested in pursuing signal processing educat… https://t.co/0aYPMDSWDj
  • CALL FOR PAPERS: The IEEE Journal of Selected Topics in Signal Processing is now seeking submissions for a Special… https://t.co/NPCGrSjQbh
  • Test your knowledge of signal processing history with our April trivia! Our 75th anniversary celebration continues:… https://t.co/4xal7voFER

IEEE SPS Educational Resources

IEEE SPS Resource Center

IEEE SPS YouTube Channel