Presentation is loading. Please wait.

Presentation is loading. Please wait.

The New Complex Trial Protocol for Deception Detection with P300: Mock Crime Scenario and Enhancements J. Peter Rosenfeld, John Meixner, Michael Winograd,

Similar presentations


Presentation on theme: "The New Complex Trial Protocol for Deception Detection with P300: Mock Crime Scenario and Enhancements J. Peter Rosenfeld, John Meixner, Michael Winograd,"— Presentation transcript:

1 The New Complex Trial Protocol for Deception Detection with P300: Mock Crime Scenario and Enhancements J. Peter Rosenfeld, John Meixner, Michael Winograd, Elena Labkovsky, Alex Sokolovsky, Xiaoxing Hu,Alex Haynes, Northwestern University

2 OLD 3-STIMULUS, P300-BASED CIT (GKT)
PROBE: GUILTY KNOWLEDGE ITEM: $5000 Press non-target button. IRRELEVANT: OTHER AMOUNT: $200 TARGET: OTHER AMOUNT: $3000 Press target button.

3 Previous P300 DD protocols used Separate Probe(P),Irrelevant(I) and Target(T) trials.
80% to 95% correct detection rates….but…. *Rosenfeld et al. (2004) and Mertens, Allen et al. (2008):These methods are vulnerable to Counter-measures (CMs) via turning I’s into covert T’s.

4 Old P300 Protocol 1 of 3 Stimuli on each trial: Probe (P), or Irrelevant(I), or Target (T). Subject presses either Target or Non-Target (NT) button. Both P and I can be Non-Targets. Special I is defined T. This leads to 2 tasks for each stimulus: 1. implicit probe recognition vs. 2. explicit Target/Non-Target discrimination Possible Result: Mutual Interference more task demand  reduced P300 to P. CMs hurt Old test. A CM is an attempt to defeat the test by converting irrelevants into covert targets

5 How to do CMs: When you see a specific irrelevant, SECRETLY make some response, mental/physical. After all, if you can make special response to TARGET on instruction from operator, you can secretly instruct yourself. Irrelevant becomes secret target. It makes big P300. If P = I, no diagnosis.

6 Results from Rosenfeld et al. (2004): Farwell-Donchin paradigm
(BAD and BCAD are 2 analysis methods.) Diagnoses of Guilty Amplitude Difference (BAD) method,p=.1 Guilty Group Innocent Group CM Group 9/11(82%) 1/11(9%) 2/11(18%) Cross-Correlation(BC-AD) Method, p=.1 6/11(54%) 0/11(0%) 6/11(54%)

7 Results (hit rates) from Rosenfeld et al. (2004): Rosenfeld paradigm
Week BAD* BC-AD* 1: no CM /13(.92) 9/13(.69) 2: CM /12(.50) 3/12(.25) 3: no CM /12(.58) 3/12(.25) *Note: BCD and BAD are 2 kinds of analytic bootstrap procedures.

8 Farwell’s response:

9 What would have happened..
…If somebody beat the test? Would he pay the $100,000? No worries about that if you are 100% confident that it can’t happen (‘cuz you rigged it!)

10 Anyway, that was the abstract he sent to SPR ’08, published in the program which we all eagerly awaited. But here is what was posted:

11 NEW COMPLEX TRIAL PROTOCOL (ctp)

12

13 New Complex Trial Protocol (CTP)
2 stimuli, separated by about 1 s, per trial, S1; Either P or I…..then…..S2 ; either T or NT. *There is no conflicting discrimination task when P is presented, so P300 to probe is expected to be as large as possible due to P’s salience, which should lead to good detection; % in Rosenfeld et al.(2008) with autobiographical information. It is also CM resistant. (Delayed T/NT still holds attention.) * “I saw it” response to S1. RT indexes CM use.

14 Main Study: With false positive(FP) group.
Main Study. Within-subject correct detections of guilty subjects based on bootstrap comparison of probe P300 against the average of all irrelevant P300s over 3 weeks. WEEK Hit Rate [Hit Rate] Week 1 (no CM): /12 (92%) [12/12*( 100%)] Week 2 (CM): /11 (91%) [11/12* (92%)] Week 3 (no CM): /12 (92%) [12/12* (100%)] Main Study: With false positive(FP) group. Confidence= Confidence=.95 Test FPs Hits A’ FPs Hits A’ Iall Imax

15 EXP 1:How does this CTP do in detecting incidental mock crime details?
Subjects were divided into three groups (n=12) Simple Guilty (SG), Countermeasure (CM), and Innocent Control (IC) All subjects first participated in a baseline reaction time (RT) test in which they chose a playing card and then completed the CTP using cards as stimuli. SG and CM subjects then committed a mock crime. Subjects stole a ring out of an envelope in a professor’s mailbox. Subjects were never told what the item would be, to ensure any knowledge would be incidentally acquired through the commission of the mock crime. All subjects were then tested for knowledge of the item that was stolen. There were 1 P (the ring) and 6 I( necklace,watch,etc). CM subjects executed covert assigned responses to irrelevant stimuli in an attempt to evoke P300s to these stimuli to try and beat the Probe vs. Irrelevant P300 comparison.

16 A CTP Trial

17 Results: Grand Averages: SG, CM, IC, all P

18 Guilty Diagnoses Condition Detections Percentage SG 10/12 83
CM / IC /

19 RTs to S1 (P or I)

20 Conclusions As with autobiographical information, the CTP was found to be highly sensitive at detecting incidentally acquired concealed knowledge in a mock-crime scenario. Detection rates using the CTP compare favorably to similar polygraph CITs. The main advantage of the CTP over the old P300 or polygraph CIT is its resistance to CM use. The traditional covert-response CMs used to defeat past P300 CITs were found to be ineffective against the CTP, and actually led to larger Probe-Irrelevant amplitude differences and detection rates. CM use was also easily identified by a large increase in RT between the baseline and experimental blocks.

21 New study with autobiographical. Info, 2 mental CMs to 4 irrelevants.
So now we have a 5-button box for the left hand. The subject is instructed to press, at random*, one of the 5 buttons as the “I saw it” response to S1 on each trial with no repeats. T and NT (S2) stimuli and responses are as previously. We also hoped that this would make CMs harder to do. It didn’t, but we caught the CM users anyway. * We have done other studies with non-random, explicitly assigned responses also.

22 Design: Autobiographical information (birthdates): One P and 4 I (other, non-meaningful dates). *3 Groups as before: SG,CM, IC. *NEW: mental CMs to only 2 of the 4 Irrelevants: Say to yourself your first name was the CM1, your last name as CM2. These are assigned prior to run. *Why 2 irrels? Meixner &Rosenfeld(2010) showed countering all Irrels, not probe gives probe extra, special significance. They did a study with only 5 irrels, one of which was not countered. It had big P300. So doing CMs to all irrels is not a good strategy from perp’s perspective. *Why mental CMs? They should be faster and a bigger challenge for our CTP. Only one block per group (no baseline).

23 Results: Grand Averages (Pz, 2 uV/ division)

24 Detection rates: Group BT/Iall.9 BT/Imax.9
IC /13 (7.6%) /13 (7.6%) CM /12 (100%) /12 (83%)* *These are screened via RT, which still nicely represents CM use within a block.

25 RTs (to “I saw it”) in this study clearly index use of CMs:

26 New ERP: “P900—the CM potential” :largest at Fz, Cz (P=black, Iall=red, 2uV/division)

27 Elena Labkovsky & Peter Rosenfeld
New study: Effects of various numbers of CMs, 1-5, with 5 total stimuli Elena Labkovsky & Peter Rosenfeld

28 GAs: SG, IN, 1, 2, 3, 4, and 5 CN groups SG 1CM 2CM 3CM 4CM 5CM

29

30 John Meixner & Peter Rosenfeld
A Mock Terrorism Study John Meixner & Peter Rosenfeld How do you catch bad guys before crimes are committed, and before you know what was done, where, when?

31

32 A Mock Terrorism Application of the P300-based Concealed Information Test
Department of Psychology, Northwestern University, Evanston, IL

33 Table 1. Individual bootstrap detection rates
Table 1. Individual bootstrap detection rates. Numbers indicate the average number of iterations (across all three blocks) of the bootstrap process in which probe was greater than Iall or Imax. Blind Imax numbers indicate the average number of iterations in which the largest single item (probe or irrelevant) was greater than the second largest single item. Mean values for each column are displayed in bold above detection rates. Iall Imax Blind Imax Guilty Innocent 1000 648 985 287 603 610 999 416 998 602 955 598 889 476 892 649 996 611 898 430 893 605 994 150 946 17 943 689 909 475 698 284 761 547 945 600 677 365 702 536 997 555 959 250 961 569 586 908 217 907 565 690 888 382 886 706 912 390 667 129 650 903 644 837 215 842 966 546 863 289 872 619 12/12 0/12 10/12 AUC = 1.0 AUC = .979

34 So…………. CTP is a promising, powerful paradigm, against any number of CMs, mental and/or physical and RT reliably indicates CM use. The new “P900” might also.

35 This slide through 68 is optional 2016, except for slides 48-50.

36 So far, all CMs are done separated from and before “I saw it” response.
Separated or split away from are called “splitting CMs”. What happens if subjects are instructed to do CM and “I saw it” response at the same time? They lump these acts together. This is called “Lumping CMs.”

37 Here’s what happens: P3 still detects (83%) P vs Iall (b), but RT no longer indicates CMs!!

38 Note that this means you can no longer screen irrelevant comparison waves associated with large RTS.
Xiaoxing Hu to the rescue! (with Dan Hegeman and Elizabeth Landry). He simply increased irrelevants from 4 to 8, which should increase demand and RT…

39 Here are RT results with 8 irrels and 2,4,6 lumping CM groups here combined

40 RTs sorted by lumping CM groups.

41 Tabulated data…the more CMs you do, the harder the task, the more likely that RT will expose even lumping CM use…

42 P300 still catches CM users…

43 We were actually able to do some screening with 6-CM subjects which improved hit rate to 77%, A’ to .91

44 Remember, Allen Hu gave the CMs to Ss in advance and let them rehearse.
And his subjects were geniuses, like you all…

45 So we are now working with 10 Irrelevant items… and 3,5,7 CMs.

46 BUT… … it is obvious that having to form—on the spot-- and hold 6 CMs for 6 of 8 Irrels in your head –as must happen in the field--is probably too hard for most bad guys to do.

47 RECENT STUDIES

48 The “OMIT” effect: Recall from Winograd & Rosenfeld (2011):
Condition Detections Percentage SG / CM / Why are more Ss detected with CMs? (The same thing happened in another study.) We suspected because they countered each of 4 Irrels, but not the probe, the probe now had 2 sources of salience: 1) probe meaningfulness. 2) “oddballness”: only non-countered stimulus. (i.e., omitted CM resp.)

49 Meixner & Rosenfeld (2010): A study of CM mechanisms

50 ERPs

51 Another study of CM mechanisms: (Winograd & Rosenfeld, submitted)
Applied question: One for all or One for each? Theoretical Question: What causes the increase in P300 to the countered Irrelevant? Recognition of to-be countered, generally? vs… Recall/Execution of specific CM?

52 Initially, with, say, one probe stimulus and 6 irrelevants…
Subjects would be taught to make a different CM for each and every Irrel. (Winograd&Rosenfeld, 2011.)

53 We came to realize that the best CM strategy from Perpetrator’s perspective…
…would be to counter about half the irrelevants, with a different CM for each Irrel. This should make each of these countered irrels very salient so they would make large Irrelevant P300s. Indeed, this reduced the Probe-Irrelevant difference--although we still detected most guilty subjects…

54 …as we saw before:

55 …& RT clearly identified 2/4 CM-users: Rosenfeld&Labkovsky,2010)

56 We later hypothesized that if the subject executed the same CM for half the irrels…
…this could greatly reduce task demand (since there would now be only one CM to remember) which should reduce the RT index of CM use (Mike’s argument. I argued that now the CMs wouldn’t be so effective in increasing irrelevant P300 since individual salience would be lost.)

57 A theoretical issue also addressed by this “one for all” experiment:
Is it execution of a specific CM response which causes the increased Irrelevant P300? Or is it simply the recognition that “Uh Oh! Here’s an irrelevant requiring a CM” ?

58 Design of Experiment: 20 subjects in 2 counterbalanced blocks/conditions. One probe stimulus and 8 irrelevant stimuli, of which 4 were to be countered. Stimuli: home towns, birthdates in one of 2 blocks. In one block (OA “One for all”), the subjects were given one mental CM response (one person to imagine) for all 4 of the countered irrelevants. In the other block (OE” One for Each”) the S imagines (as their CM) a different person for each of the 4 to-be-countered irrelevants.

59 Detailed (Latin Square) Design:

60 New wrinkle: lumping CMs: subject makes the CM and “I Saw it” responses together:another RT indicator killer (Sokolovsky et al., 2011):

61 (ERPs are unaffected by Lumping)

62 Current Study ERP Results:

63 Bar graph representation of p-p P300s:

64 ERP Results statistically:
no differences between the probe P300 amplitudes, nor between the probe-minus-irrelevant P300 differences, between groups (p > .1). In OE block, 14/20 subjects were detected with .9 bootstrap tests, in the OA block, 15/20 subjects were detected (p>.2). These results support the view that it is merely the recognition of a stimulus as one requiring a CM that produces the enlarged irrelevant P300, not the need to select the correct CM response.

65 Are the OA RTs faster than OE. NO, suggesting equal task demand OA=OE
Are the OA RTs faster than OE? NO, suggesting equal task demand OA=OE. The only significant effect : Block 1 > Block 2. (Of course, Ic>Inc)

66 Bottom Line… OA is NOT less demanding than OE, so from applied perspective, RT as a CM index is not compromised, even if it seems easier for perpetrator. In any case, the P300s still catch most CM users, (75%) over 2 blocks. 80% in first block for both OA and OE.

67 CTP is a promising, powerful paradigm…
… against any number of CMs, mental and/or physical and RT reliably indicates CM use. The new “P900” might also. BUT: In recent studies testing for truly incidentally acquired information with mock crime scenarios, using symmetric protocols, hit rates have dropped to %, though still CM-resistant.

68 So lately, we have been seeking...
New Enhancements for incidental information detection. So lately, we have been seeking...

69 One Enhancement: Effects of feedback that focuses attention on probe-irrelevant dimension. +First we tried 3SP to follow up Verschuere et al, (2009). +Probes were home towns. +Two groups: One got “You are lying” feedback (deception group). The other (control group) received feedback about button pressing.

70 e.g., 3 of 6 feedbacks given to

71 e.g., 3 of 6 feedbacks given to

72 3SP-effect of deception awareness. P-I bigger in deception group.

73 3SP-effect of deception awareness

74 OK, but is 3SP the ideal protocol? Why not? What is? Yes, the CTP
So in the next study, we use CTP with mock crime items (less salient), and feedback is about recognition in high awareness group (like previous “deception group.”) In control group, feedback is about irrelevancies: if they are holding still, not blinking, etc..

75 CTP Effect of awareness: P300 AND N200 (1)

76 CTP Effect of awareness: P300 (2)

77 CTP Effect of awareness: N200 (3)

78 CTP Effect of awareness: N2+P3 (4)
N200: The ROC analyses showed: N200 at Fz discriminates guilty from innocent participants above the chance level only in the high awareness condition (AUC=.72, p<.05). (AUC in low cond. = .5) P300: In the high awareness condition, the P300 could effectively differentiate guilty from innocent participants (AUC=.79, p<.01). However, the AUC in the low awareness condition was not significantly different from chance level (AUC=.59). Combining N200 and P300: The ROC analyses based on this combined measurement achieved the highest classification efficiency (AUC=.91, p<.001) compared with N200 or P300 indicator alone in the high awareness condition, but not in the low awareness condition, where AUC = .48

79 Another Enhancement: aIAT plus CTP with delay (1)
In this study, a mock crime is committed: steal an exam from mailbox of Prof. Rosenfeld. So probe is incidentally acquired information during mock crime. Subjects in 3 groups: 1.Guilty/Immediate, tested right after crime with BOTH CTP, IAT, 2.Guilty/Delay, tested a month later, 3.Innocent Group, tested right away.

80 IAT plus CTP with delay: ERPs (2)

81 IAT plus CTP with delay

82 A FINAL ENHANCEMENT: Efficiency is enhanced. Sensitivity?

83 Labkovsky’s “Dual Probe” CTP
The original CTP tested for one item per trial in the first part of the trial:

84 Elena L. figured “Why waste the second part of the trial?”

85 First we did a pilot study with autobiographical information
3 groups: Simple Guilty (SG), N=13 (with P1& P2); Innocent (IN), N=12.(no probes); and Countermeasure (CM), 2 (of 4) irrelevants from the 1st and 2nd parts of a trial were countered with mental responses.. In the first part of a trial there were four different Irrelevant stimuli (I1, I2, I3, and I4) and one Probe (P1). All stimuli in the first part were dates. The Probe was a subject’s birth date. In the second part the stimuli were city names. There were three Irrelevants (NT1, NT2, and NT3), one probe (P2), and one target (T). The Irrelevants were irrelevant to the subject. The Probe was subject’s hometown name, and the Target was a city with an assigned significance (“Chicago” was used as the Target).

86 SG waves

87 CM Waves

88 IN Waves

89 More Quantitatively…

90 Mock Crime DPCTP with SG, CM, IN (this one...)

91 There are 8 Irrels, and in the CM group, 3 are countered.

92

93 Table 5. Results of DPCTP test on a mock crime:

94 So…. …we are optimistic that enhancements, including (1) specific feedback, (2) utilization of other measures (N200) and tests (aIAT), and (3) more efficient protocols like the dual-probe protocol may yield a field ready protocol in the not too distant future.

95 When will the P300-CTP be admissible in U.S. Courts?
John Meixner & J.Peter Rosenfeld Northwestern University.

96 Admissibility issues:
Any scientific expert testimony offered by a party in either a civil or criminal case must be evaluated by the judge as to its reliability. in all federal courts and in the majority of state courts,3 the judge assesses reliability under the Daubert standard, derived from the United States Supreme Court’s opinion in Daubert v. Merrell Dow Pharmaceuticals (1993).

97 Daubert Standard: 4 criteria:
(1) “whether [the theory or technique] can be (and has been) tested,” (2) “whether the theory or technique has been subjected to peer review or publication,” (3) “the known or potential rate of error,” (in the field---there’s the rub.) (4) the “general acceptance” of the technique

98 Frye Standard In a minority of state courts, a different (and much simpler) standard is followed, derived from the nearly 100-year-old case Frye v. United States (1923). Under the Frye standard, any scientific evidence that is to be admissible “must be sufficiently established to have gained general acceptance in the particular field in which it belongs” (Frye, 1923). (Like Daubert #4)

99 A second kind of standard…
Any testimony that solely assesses the credibility of other witnesses competes with the role of the jury, which has been given the role of determining the credibility of the witnesses. a court might rule inadmissible a polygraph expert whose testimony essentially amounts to a statement that one of the other witnesses was or was not lying.

100 For example… This view was espoused in the United States Supreme Court’s most recent polygraph decision, United States v. Scheffer (2003), in which the Court barred a defendant’s attempt to admit ANS polygraph-based control question test evidence indicating that he was truthful.

101 Thus…. The principal opinion, written by Justice Clarence Thomas, focused on “[p]reserving the court members’ core function of making credibility determinations in criminal trials,” (p. 312–313) stating plainly that “the jury is the lie detector.” (p. 313).

102 Regardless of whether the standard is a good one…..
Thus, the way the CIT is represented to judges will determine whether it is ruled inadmissible as impinging on the role of the jury. While the CQT and its variants are truly credibility assessment tests in that they make claims about whether the tested individual was truthful or deceptive, the CIT makes no such claim.

103 Instead-- The CIT provides substantive evidence of whether an individual recognizes information that is relevant to the legal question at hand, and leaves the credibility assessment itself to the jury. While this information may undermine the credibility of a witness indirectly, it is no different than any other piece of evidence offered at trial, such as the presence of a fingerprint that may undermine the defendant’s denial of guilt.

104 OK, we can do that… …and the main remaining obstacle to admission to court is determining error rate in the field, which involves a test on real detainees. This leads to a Catch 22: IRBs essentially forbid work on prisoners, who are considered a vulnerable population. Thus no grants.


Download ppt "The New Complex Trial Protocol for Deception Detection with P300: Mock Crime Scenario and Enhancements J. Peter Rosenfeld, John Meixner, Michael Winograd,"

Similar presentations


Ads by Google