Presentation is loading. Please wait.

Presentation is loading. Please wait.

What’s optimal about N choices? Tyler McMillen & Phil Holmes, PACM/CSBMB/Conte Center, Princeton University. Banbury, Bunbury, May 2005 at CSH. Thanks.

Similar presentations


Presentation on theme: "What’s optimal about N choices? Tyler McMillen & Phil Holmes, PACM/CSBMB/Conte Center, Princeton University. Banbury, Bunbury, May 2005 at CSH. Thanks."— Presentation transcript:

1 What’s optimal about N choices? Tyler McMillen & Phil Holmes, PACM/CSBMB/Conte Center, Princeton University. Banbury, Bunbury, May 2005 at CSH. Thanks to NSF & NIMH.

2 Neuro-inspired decision-making models* 1. The two-alternative forced-choice task (2-AFC). Optimal decisions: SPRT, LAM and DDM*. 2. Optimal performance curves. 3. MSPRT: an asymptotically optimal scheme for n > 2 choices (Dragalin et al., 1990-2000). 4. LAM realizations of n-AFC; mean RT vs ER; Hick’s law. 5. Summary (the maximal order statistics) * Optimality viewpoint: maybe animals can’t do it, but they can’t do better. ** Sequential probability ratio test, leaky accumulator model, drift-diffusion model.

3 2-AFC, SPRT, LAM & DDM p 1 (x) p 2 (x) Choosing between 2 alternatives with noisy incoming data Set thresholds +Z, -Z and form running product of likelihood ratios: Decide 1 (resp. 2) when R n first falls below -Z (resp. exceeds +Z). Theorem ( Wald, 1947; Barnard, 1946 ): SPRT is optimal among fixed or variable sample size tests in the sense that, for a given error rate (ER), expected # samples to decide is minimal. (Or, for given # samples, ER is minimal.)

4 DDM is the continuum limit of SPRT. Let +Z -Z Drift, a Extensive modeling of behavioral data (Stone, Laming, Ratcliff et al., ~1960-2005).

5 There’s also increasing neural evidence for DDM: FEF: Schall, Stuphorn & Brown, Neuron, 2002. LIP: Gold & Shadlen, Neuron, 2002.

6 Balanced LAM reduces to DDM on invariant line: (linearized: race model if  ). Uncouple via stable OU flow in y 1 if  large, DD in y 2 if . Absolute thresholds in (x 1, x 2 ) become relative (x 2 - x 1 )! +Z -Z

7 LAM sample paths collapse towards an attracting invariant manifold. (cf. C. Brody: Machens et al., Science, 2005) First passage across threshold determines choice.  

8 Simple expressions for first passage times and ERs: Redn to 2 params: Can compute thresholds that maximize reward rate: (Gold-Shadlen, 2002; Bogacz et al., 2004-5) This leads to … (1)

9 Optimal performance curves (OPCs): Human behavioral data: the best are optimal, but what about the rest? Bad objective function, or bad learners? Left: RR defined previously; Right: a family of RR’s weighted for accuracy. Learning not considered here. (Bogacz et al., 2004; Simen, 2005.) Increasing acc. wt.

10 N-AFC: MSPRT & LAM MSPRT chooses among n alternatives by a max vs. next test: MSPRT is asymptotically optimal in the sense that # samples is minimal in the limit of low ERs (Dragalin et al, IEEE trans., 1999-2000). A LAM realization of MSPRT (Usher-McClelland 2001) asymptotically predicts (cf. Usher et al, 2002)

11 The log(n-1) dependence is similar to Hick’s Law: RT = A + B log n or RT = B log (n+1). W.E. Hick, Q.J. Exp. Psych, 1952. We can provide a theoretical basis and predict explicit SNR and ER dependence in the coefficients A, B.

12 Multiplicative constants blow up log-ly as ER -> 0. Behavior for small and larger ERs: Empirical formula, generalizes (1), (2)

13 But a running max vs next test is computationally costly (?). LAM can approximately execute a max vs average test via absolute thresholds. n-unit LAM decoupled by: y 1 attracted to hyperplane y 1 = A, so max vs average becomes an absolute test! Attraction is faster for larger n: stable eigenvalue  1 ~ n. DD on hyperplane

14 Max vs average is not optimal, but it’s not so bad: absolute max vs average max vs next absolute max vs average max vs next Unbalanced LAMs - OU processes Max vs next and max vs ave coincide for n=2. As n increases, max vs ave deteriorates, approaching absolute test performance. But it’s still better for n < 8-10!

15 Simple LAM/DD predicts log (n-1), not log n or log (n+1) as in Hick’s law: but a distribution of starting points gives approx log n scaling for 2 < n < 8, and ER and SNR effects may also enter.

16 The effect of nonlinear activation functions, bounded below, is to shift scaling toward linear in n: The limited dynamic range degrades performance, but can be offset by suitable bias (recentering). Nonlinear LAMs Linearized LAM

17 Summary: N-AFC MSPRT max vs next test is asymptotically optimal in low ER limit. LAM (& race model) can perform max vs next test. Hick’s law: emerges for max vs next, max vs ave & absolute tests. A, B smallest for max vs next, OK for max vs ave. LAM executes a max vs average test on its attracting hyperplane using absolute thresholds. Variable start points give log n scaling for `small n.’ Nonlinear LAMs degrade performance: RT ~ n for sufficiently small dynamic range. More info: http://mae.princeton.edu/people/e21/holmes/profile.html


Download ppt "What’s optimal about N choices? Tyler McMillen & Phil Holmes, PACM/CSBMB/Conte Center, Princeton University. Banbury, Bunbury, May 2005 at CSH. Thanks."

Similar presentations


Ads by Google