Presentation is loading. Please wait.

Presentation is loading. Please wait.

Keystroke Biometrics Studies on a Variety of Short and Long Text and Numeric Input Ned Bakelman, DPS Candidate Charles C. Tappert, PhD, Advisor Seidenberg.

Similar presentations


Presentation on theme: "Keystroke Biometrics Studies on a Variety of Short and Long Text and Numeric Input Ned Bakelman, DPS Candidate Charles C. Tappert, PhD, Advisor Seidenberg."— Presentation transcript:

1 Keystroke Biometrics Studies on a Variety of Short and Long Text and Numeric Input Ned Bakelman, DPS Candidate Charles C. Tappert, PhD, Advisor Seidenberg School of Computer Science and Information Systems Pace University White Plains, NY 10606, USA DPS Defense April 11, 2014

2 Researched Questions This study focuses on biometric authentication using long bursts of arbitrary input and short bursts of fixed input with an improved classification system Long Input: 100 – 1500 characters ( paragraph, couple of sentences, etc. ) Short Input: 10 – 15 characters ( password, pass code, etc.) Arbitrary Input: Open unrestricted text ( up to the users choosing )

3 Research Questions (continued) 1)Can we accurately detect the intruder use of a computer system in an office environment? 2)How does the use of standard applications such as word processing, spreadsheet, browser impact intruder detection? 3)Is an intruder still detectable if using a web browser (low text environment) Purpose of the Study Long Input - Unauthorized User Detection 1)What is the accuracy between the two? 2) Which performs better on long input? 3)Which performs better on short input? 1)What is the detection accuracy of short fixed numeric keypad input? 2)Does the use of specific keypad features improve detection accuracy? Short Keypad Input – Detection Accuracy Classifier Comparison – Multi Match vs. Single Match

4 Background T. Olzak, Keystroke Dynamics: Low Impact Biometric Verification, Sep, 2006 Derived from raw timing data Based on key press duration and transition times Also known as Dwell and Flight time Statistical in nature, mainly Means and Standard Deviations Pre-processing to remove outliers and standardize between 0 – 1 Fallback procedure (Source of Features or Attributes)

5 Background (continued) Wikipedia.org http://en.wikipedia.org/wiki/Computer_keyboard, last updated: March 6, 2012 QWERTYNumeric Keypad Separate features for QWERTY and Keypad Durations and transitions for individual keys, groups of keys, etc. QWERTY: each letter, each number, vowels, consonants, all letters, etc. Keypad: each digit, each operator (+ - * /), all digits, all operators, etc (Target of Features or Attributes)

6 Background (continued) (Pace Classifier: Single Match) Dichotomy Model Uses vector differences Transforms a multi-class problem to a two-class problem K-Nearest Neighbor (k-NN) is used for classification Feature Vector Space 3 subjects, 4 samples Feature Difference Space 18 within, 48 between

7 Background (continued) (Pace Classifier: Multi Match) Authentication Process User Focused Reduction Method (reduces the training space) System performance obtained using the Leave-One-Out method “Left out” test sample is used to create differences of different vectors Each test difference is classified(k-NN) Results are grouped together Authentication decision based on all Feature Reduction Space 6 within, 32 between Feature Vector Space 3 subjects, 4 samples Feature Difference Space 18 within, 48 between

8 Background (continued) Receiver Operating Characteristic Curves (ROC) Historically used in signal detection such as RADAR in distinguishing an actual signal from noise Used in Biometrics to plot the FAR and FRR at various operating points (thresholds) (Performance: ROC Curves, Equal Error Rate) Equal Error Rate (EER) The point on the ROC curve where the FAR and FRR are equal The operating point on the ROC curve where the FAR and FRR intersect ROC CurveFAR / FRR Intersection

9 Data Collection Only “perfect” samples were used (no mistakes) Rest period of at least one day between sessions Data entered into a spreadsheet using right hand 30 Subjects 914 193 7761 4 NumberSessions 20 Per Subject (Numeric Keypad)

10 Features AttributesMean (µ)Standard Deviation (σ)Total QWERTY (Non-Numeric) Durations:53 106 per (Type I and II)Transitions:3570 140 QWERTY (Numeric) Durations:27 54 per (Type I and II)Transitions:2652 104 Keypad Durations:29 58 per (Type I and II)Transitions:128256 512 Totals:298487 974 (Feature Attribute Summary)

11 Numeric Keypad Digits with Decimal 0 1 2 3 4 5 6 7 8 9. Arithmetic Operators with Num Lock and Enter Num Lock Enter /* - + All Keys Features (Keypad Durations) Print Screen, Sys Rq, Scroll Lock, Pause, Break Centerpad Home Page Up Page Dn End Del Ins Four Arrows

12 keypad -> keypad any digit-> any Digit 1->1,2,3…0 2->1,2,3…0 3->1,2,3…0 4->1,2,3…0 5->1,2,3…06->1,2,3…0 7->1,2,3…0 8->1,2,3…0 9->1,2,3…0 0->1,2,3…0 1->digits 2->digits 3->digits 4->digits 5->digits6->digits 7->digits 8->digits 9->digits 0->digits Any Digit-> Arithmetic Operators 1-> Arithmetic Operators 2-> Arithmetic Operators 3-> Arithmetic Operators 4-> Arithmetic Operators 5-> Arithmetic Operators 6-> Arithmetic Operators 7-> Arithmetic Operators 8-> Arithmetic Operators 9-> Arithmetic Operators 0-> Arithmetic Operators div-> digits Arithmetic Operator-> any digit mult-> digits sub-> digits add-> digits Any Key-> Any Key Features (continued) (Keypad Transitions)

13 Results – Short Input Experiments (Equal Error Rate for each keypad experiment per Classifier) 10 Subject 20 Subject30 Subject Multi Match Single Match Multi Match Single Match Multi Match Single Match

14 Results – Short Input Experiments (continued) (ROC Curve for each keypad experiment per Classifier) Multi Match ClassifierSingle Match Classifier 10 - 20: 10 Subjects, 20 samples each 20 - 20: 20 Subjects, 20 samples each 30 - 20: 30 Subjects, 20 samples each

15 Results – Short Input Experiments (continued) Numeric Keypad Subjects 102030 Samples per Subject 20 Total Samples (All Subjects) 200400600 EER % (Multi Match) 5.50%5.65%6.14% EER % (Single Match) 15.56%15.72%14.95% EER Improvement %64.65%64.06%58.93% Independent Variable 1: Number of Subjects Independent Variable 2: Classifier Conclusion 1: EER increases ˄ as Number of Subjects increases * Conclusion 2: New Classifier much better than Old Classifier * Except for old Classifier (Independent Variables for the short input experiments) (but not by much)

16 CMU Experiment - Keypad 914 193 7761 + Enter Key = 11 Characters 10 key-down ---> key-down 10 key-up ---> key-down 11 dwell times 31 Features Carnegie Melon Features (from their numeric keypad study *) (10 key-down ---> key-down) per µ, per σ = 20 (10 key-up ---> key-down) per µ, per σ = 20 (7 dwell) per µ, per σ = 14 54 Timing Features Pace University Features (from our numeric keypad study) (Features Set Comparison – CMU vs. PaceU) R. Maxion and K. Killourhy, "Keystroke Biometrics with Number-Pad Input,“ 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN), Chicago, IL, 2010, pp. 201-210. *

17 CMU Experiment – Keypad (continued) (Equal Error Rate and ROC Curves only using Multi Match) PU Data with CMU Features Equal Error RateROC Curves PU Features vs. CMU Features

18 CMU Experiment – Keypad (continued) Independent Variable: Feature Set Conclusion: PU Feature Set out performed CMU Feature Set (Independent Variable for the CMU Keypad experiment) Numeric Keypad (30 – 20) Features SetCMUPU Subjects 30 Samples per Subject 20 Total Samples (All Subjects) 600 EER % (Multi Match) 10.47%6.14% EER Improvement %41.36%

19 Conclusions Keystroke Biometrics can be effective at detecting the unauthorized use of a computer system in a closed environment (government office, school, business office, etc.) Performance Varied with Input Type: Spreadsheet: Good Performance (EER: 8.1%) Text: Very Good Performance (EER: 5.8%) Browser: Fair Performance (ER: 15.7%) Long Input Experiments – Intruder Detection Accuracy 1)Multi Match out performed Single Match significantly (EER Improvement from 50% - 64%) 2)Multi Match out performed detector study from CMU using their data and features (EER: 7.6%) Numeric Keypad yields very good performance (EER Range: 5.5% - 6.2%) PaceU Features Set is Effective: CMU features performed much worse (10.5% vs. 6.2%) Short Input Experiments – Detection Accuracy Classifier Comparison – Multi Match vs. Single Match

20 Conclusions (continued) Less optimal samples No designated entry window for sample collection (less control over quality of entry) Large fluctuations in the number of keystrokes Input types most likely had substantial mouse activity that “Interrupts” keystroke entry Possible sparseness of keystrokes (meaning less concentrated and spread out especially with browser entry) Long Input Performance: Weaker Performance compared to previous studies at PU… Why? Propose that correlating performance simply to Number of Keystrokes is not sufficient Need to factor in the density of the keystrokes as well Simply stated: It may take a lot more keystrokes to maintain an effective level of performance if the sparseness is high Future Considerations: Do keystroke counts tell the whole story?

21 Suggestions for Future Work Further studies on numeric entry from QWERTY Compare performance to numeric entry from keypad Study free text entry from keypad Feature Analysis Which features contributed to performance from the keypad? How do equivalent numeric features from QWERTY perform compared to keypad? Perform mixed mode experiments Collect input that combines spreadsheet, browser, and text Collect spreadsheet input which includes all numeric entry from keypad Incorporate Multi Biometric Keystroke + Mouse Movement + Stylometry

22 Backup Slides

23 Generate ROC Curves from kNN Data (vary m from 0 to k [m is the controlling or threshold parameter] ) R. Zack, C. Tappert, and S.Cha, "Performance of a Long-Text-Input Keystroke Biometric Authentication System Using an Improved k-Nearest-Neighbor Classification Method," IEEE 4th Int Conf Biometrics (BTAS 2010), Washington D.C., 2010. The m-kNN procedure with k = 9 and m = 5 For each Q (questioned) test sample: Examine the top k nearest-neighbors count the number of within-class matches If the number of within-class matches >= a threshold of matches (m), the user is authenticated. Otherwise rejected. Generate the ROC curve as follows: vary m from 0 to k calculate FAR / FRR in each of the following cases: m = 0, authenticate if 0 or more of the k choices are within m = 1 authenticate of 1 or more of the k choices are within and so on until m = 9 in this case Linear Rank Weighting Method: 1st choice weight = k, 2 nd choice weight = k-1… weight = 1 Authenticate a user if the sum of the weighted-within-class choices >= the m threshold Threshold varies from 0 to k(k+1)/2 (maximum score)

24 Equal Error Rates (From the Literature) Long Input: Ferreiar and Santos: 1.4% Monaco using data from Villani: 1.7% Generate the ROC curve as follows: vary m from 0 to k calculate FAR / FRR in each of the following cases: m = 0, authenticate if 0 or more of the k choices are within m = 1 authenticate of 1 or more of the k choices are within and so on until m = 9 in this case

25 Multi Biometrics for Intrusion Detection Motor Control Level: keystroke + mouse movement Linguistic Level: stylometry (char, word, syntax) Semantic Level: target likely intruder commands Intruder Keystroke + Mouse Stylometry Motor Control Level Linguistic Level Semantic Level Future Work (continued)

26 Intruder Experiment Design (continued) Authenticate user on various window sizes, beginning 300-keystroke windows Window Type 1: use overlapping windows to: Minimize the “wait” period for the next authentication Maximize fast intruder detection 1300600900120015001800 300 KS 300 KS 300 KS 300 KS 300 KS 300 KS 150 300 KS 450750105013501650 300 KS 300 KS 300 KS 300 KS Figure 1.5-1 Overlapping Window Burst Authentication

27 Continuous vs Continual Authentication with Data Capture Windows Continuous (ongoing) burst authentication Continual burst authentication with pauses 05 min10 min 1 min 1 min 1 min Burst 1Burst 2Burst 3 08 min30 min 1 min 1 min 1 min Pause Threshold Burst 1Burst 2Burst 3 Pause Threshold 27EISIC 2012

28 Background (continued) DARPA (Defense Advanced Research Projects Agency) through their Cyber Genome Program is funding research for the development of new software based authentication biometric modalities These include keystrokes and targets a desktop environment running Microsoft Office applications as the standard computer system platform DARPA. Active Authentication Program. https://www.fbo.gov/index?s=opportunity&mode=form&id=c7968647352f0276fc1b28817c581d86&tab=core&_cview=0, accessed 2014.www.fbo.gov/index?s=opportunity&mode=form&id=c7968647352f0276fc1b28817c581d86&tab=core&_cview=0 The 2008 United States Higher Education Opportunity Act requires institutions of higher learning to make greater online access control efforts by adopting ubiquitous identification technologies HEOA. Higher Education Opportunity Act (HEOA) of 2008. http://www2.ed.gov/policy/highered/leg/hea08/index.html, accessed 2014.http://www2.ed.gov/policy/highered/leg/hea08/index.html

29 Spreadsheet Template 201120102009 Assets Cash Investments : Cash Equity Securities Corporate debt securities US government securities Private equity Real estate Total Investments0 0 0 Other Assets Total Assets$0 Liabilities and Net Assets Liabilities: Penalities Accounts Payable Advance from Lendor Federak excuse tax Total Liabilities0 0 0 Net Assets: Tangiable Non Tangiable Total Net Assets0 0 0 Total Net Assets and Liabilities$0 Special Journal Entries Enter Journal Entry name here Total Journal Entries$0.00


Download ppt "Keystroke Biometrics Studies on a Variety of Short and Long Text and Numeric Input Ned Bakelman, DPS Candidate Charles C. Tappert, PhD, Advisor Seidenberg."

Similar presentations


Ads by Google