How to evaluate the performance of a panel and its panelists?

How to evaluate the performance of a panel and its panelists?

How to evaluate performance?
Outlines The dataset How to evaluate the performance of a panel? How to evaluate the performance of a panelist? Which effects and interactions may influence the scores? How to evaluate performance?

Presentation of the dataset
6 dark chocolates (Excellence, Amère, Mi-doux, Amazonie, Pâtissier, Supérieur) 29 panelists (students) 2 sessions 14 descriptors : cocoa aroma, milk aroma, sweet, acid, bitter, cocoa, milk, caramel, astringent, crunchy, melt, sticky, granular, vanilla Scores between 0 and 10 How to evaluate performance?

How to evaluate performance?

Estimated density Normal distribution with mean and variance these of the dataset How to evaluate performance?

Brief reminder on analysis of variance
Definition : The mean effect The product effect i The panelist effect j The session effect k The product – panelist interaction effect The product – session interaction effect The panelist – session interaction effect Where : How to evaluate performance?

Meaning of the different effects
Product effect: products are discriminated (very interesting) Panelist effect: use of the scores is different from one panelist to the other (doesn’t really matter) Session effect: use of the scores is different from one session to the other (doesn’t really matter) How to evaluate performance?

Meaning of the different effects
Product - Session interaction : for this descriptor the whole of the panelists (i.e. the panel) is not repeatable (problematic) Product - Panelist interaction : for this descriptor there’s no consensus between panelists (problematic) Panelist - Session interaction : for some panelists the use of the scale has changed from one session to the other (doesn’t really matter) How to evaluate performance?

Fixed or random? Fixed effect: the effect of interest is linked to the categories of the factor (people are interested into the panelists’ results) Random effect: the effect of interest is not strictly speaking linked to the categories of the factor (people want to understand what “panelists in general” are saying from a sample of panelists; in that case we talk about “inference”) How to evaluate performance?

What is performance? A panel or a panelist shows good performance if: It/he/she makes differences between products (it is the case when the product effect is significant) It/he/she is repeatable (i.e. he makes differences between products the same way from one session to the other) How to evaluate performance?

How to evaluate the repeatability?
To evaluate the repeatability of the panelists (the whole of them) look at the interaction Product - Session of the model : Score = P + J + S + J*P + J*S + P*S, where P stands for the Product, J the Panelist, S the Session If the Product - Session interaction is significant: judges in their whole don’t evaluate the products in the same manner from one session to the other which is problematic How to evaluate performance?

Repeatability through the Product – Session interaction
All the products have not been evaluated in the same manner during the two sessions for the “sticky” descriptor: Panelists are not repeatable or Products have changed (different T°, …) Sticky Caramel Astringency MilkF 0.1182 Sweetness 0.1188 Bitterness 0.1897 Granular 0.283 Melting 0.3041 Vanilla 0.3183 CocoaF 0.4487 For the other descriptors the panel is repeatable Crunchy 0.6869 CocoaA 0.8011 MilkA 0.8287 Acidity 0.9835 How to evaluate performance?

Which interactions have contributed to the Product – Session interaction Sticky choc4 less sticky at the second session than at the first Session1 Session2 choc choc choc choc choc choc Scores for choc6 are homogeneous for both sessions Coefficients of the Product - Session interaction How to evaluate performance?

The Product – Session interaction
choc3 5.0 choc4 Products are sorted by ascending means One line per session No parallelism = interaction 4.5 Mean per Session choc1 4.0 choc6 choc2 3.5 graphinter(sensochoc, col.p = 4, col.j = 2, firstvar = 17, lastvar = 17,numr=1,numc=1) 3.0 choc5 Sticky 2.5 3.0 3.5 4.0 4.5 5.0 5.5 Mean on the whole Sessions How to evaluate performance?

The Product – Session interaction
choc1 choc2 choc3 choc4 choc5 choc6 Sticky 0.0 0.2 0.4 Chocolate 4 contributes to more than 50% of the Product – Session interaction obtained the following way: resinteract=interact(sensochoc, col.p = 4, col.j = 2, firstvar = 17,lastvar=17) How to evaluate performance?

Panel’s and Panelists’ repeatability
Evaluation through the P*J*S interaction of the model: Score = P + J + S + J*P + J*S + P*S + P*J*S Impossible because no repetition (i.e. only one score for a given triplet P*J*S) Evaluation (by panelist) of the P*S interaction of the model: Score = P + S + P*S (by panelist) Impossible because no repetition Evaluation by the standard deviation of the error term (for the panel or each panelist) Problem: no test is possible (qualitative comparison between panelists or descriptors) How to evaluate performance?

Panel’s repeatability
Standard deviation of the error term of the model: Score = P + J + S + J*P + J*S + P*S How to evaluate performance?

Panelists’ repeatability
14 23 11 22 24 3 28 21 1 13 10 12 5 27 19 25 9 15 17 26 2 16 18 6 4 8 29 20 7 median 0.707 0.764 1.35 1.29 0.5 1.19 0.913 1.15 0.957 1.053 1.08 1.61 1.12 1.22 2.04 1.53 1.58 2.12 1.71 0.645 1.185 1.73 1.68 1.04 1.55 1.32 2.5 1.91 0.577 1.27 1.5 0.866 1.26 2.31 1.78 3.91 0.816 1.63 1.41 1.47 1.85 1.76 1.98 1.315 1.38 2.43 1.94 1.83 0.289 1.87 1.8 2.08 2.1 0.408 2.47 2.16 1.335 2.63 2.81 2.52 2.53 1.395 1.44 2.33 1.425 2.45 2.38 2.02 2.27 2.14 1.455 2.72 2.25 1.515 3.14 1.96 3.35 1.89 2.36 1.565 2.69 3.16 1.59 2.58 2.57 2.2 3.19 1.74 2.06 2.65 1.745 2.99 2.29 3.48 1.79 3.38 2.42 3.25 1.99 3.62 1.66 2.74 3.01 2.22 4.11 2.48 2.77 2.55 2.68 2.515 3.27 2.6 3.08 3.75 2.61 2.54 2.66 3.37 3.66 2.59 CocoaF CocoaA Vanilla MilkF Caramel Sweetness Crunchy Astringency Bitterness Melting Acidity Sticky MilkA Granular res<-paneliperf(sensochoc, formul = "~Product+Panelist+Session+ Product:Panelist+Product:Session+Panelist:Session", formul.j = "~Product", col.j = 1, firstvar = 5, synthesis = TRUE) res.res<-magicsort(res$res.ind, method = "median") coltable(res.res, level.lower = 1, level.upper = 2) How to evaluate performance?

Panelists’ repeatability
Standard deviation of the error term of the model: Score = P Cocoa 14 24 3 1 15 27 10 5 13 25 9 28 21 26 2 19 4 12 23 11 22 16 17 6 8 18 7 20 29 0.764 0.816 0.913 1.08 1.12 1.22 1.26 1.29 1.35 1.41 1.55 1.61 1.68 1.71 1.73 1.78 2.2 2.45 2.53 3.25 3.27 4.11 Panelist 14 is the most repeatable Panelist 29 is the less repeatable How to evaluate performance?

Panelist’s global performance
Objective : screening of the panelists that show high performance in order to set up a panel of trained panelists A panelist shows high performance if: He discriminates well between the products He’s repeatable He does agree with what the others say How to evaluate performance?

Panelist’s global performance, ability to discriminate
Evaluation per panelist of the Product effect of the model: Score = Product (+ Session) The Product effect of this model is measured by : Big if the products are discriminated Small if the error term is also small (Product – Session interaction) i.e. if the panelist is repeatable Good test to evaluate the performance How to evaluate performance?

Panelist’s global performance, consensus with the panel
Evaluation by panelist of the adjusted means for the Product effect (model : Score = Product) Evaluation of the adjusted means for the whole panel and for the Product effect (model : Score = P + J + S + PJ + PS + JS) Consensus between one panelist and the panel is evaluated by their correlation coefficient: Panelist j Panel How to evaluate performance?

16 28 24 9 17 3 15 23 22 11 10 1 14 4 18 5 26 8 20 2 29 19 21 12 13 27 7 25 6 median 0.024 0.044 0.001 0.023 0.0078 0.0015 0.002 0.16 0.033 0.0053 0.77 0.17 0.37 0.59 0.028 0.086 0.0014 0.013 0.04 0.019 0.18 0.68 0.052 0.76 0.0032 0.045 0.049 0.043 0.078 0.021 0.06 0.015 0.022 0.032 0.047 0.32 0.78 0.035 0.091 0.096 0.053 0.031 0.064 0.081 0.0074 0.0025 0.31 0.34 0.054 0.28 0.27 0.008 0.25 0.014 0.059 0.011 0.13 0.0097 0.004 0.19 0.068 0.56 0.45 0.54 0.071 0.018 0.065 0.0041 0.79 0.14 0.0085 0.58 0.038 0.69 0.016 8.3e-05 0.24 0.005 0.15 0.41 0.26 0.082 0.087 0.0049 0.0027 0.21 0.95 0.92 0.99 0.0079 0.38 0.057 0.003 NaN 0.056 0.5 0.62 0.11 0.47 0.84 0.012 0.12 0.35 0.027 0.0064 0.041 0.026 0.94 0.2 9.4e-05 0.048 0.069 0.74 0.0065 0.43 0.072 0.066 0.89 0.48 0.095 0.1 0.097 0.061 0.025 0.83 0.062 0.063 0.86 0.46 0.0018 0.64 0.089 0.0016 0.22 0.0098 0.036 0.55 0.042 0.08 0.0059 0.82 0.23 0.65 0.067 0.088 0.0056 0.094 0.39 0.083 0.092 0.53 0.57 0.0086 0.29 0.093 0.055 0.0084 0.66 0.81 0.0028 0.0043 0.0081 0.33 0.075 0.63 0.0024 0.0044 0.034 0.85 0.91 0.42 0.4 0.87 0.6 0.05 0.49 0.88 0.9 0.09 6.9e-73 0.97 0.037 0.52 0.75 MilkF Crunchy Bitterness CocoaF Sweetness Melting Caramel Vanilla Acidity Astringency Granular MilkA Sticky CocoaA res<-paneliperf(sensochoc, formul = "~Product+Panelist+Session+ Product:Panelist+Product:Session+Panelist:Session", formul.j = "~Product+Session", col.j = 1, firstvar = 5, synthesis = TRUE) resprob<-magicsort(res$prob.ind, method = "median") coltable(resprob, level.lower = 0.05, level.lower2 = 0.01, level.upper = 1,main.title = "P-value of the F-test (by panelist)") hist(resprob,main="Histogram of the P-values",xlab="P-values",xlim=c(0,1),nclass=21,col=c("red","mistyrose",rep("white",19))) Score = Product + Session How to evaluate performance?

MilkF Crunchy Bitterness CocoaF Sweetness Melting Caramel Vanilla 16 0.024 0.044 0.001 0.023 0.0078 0.0015 0.002 0.16 28 0.0015 0.086 0.0014 0.013 0.04 0.019 0.18 24 0.078 0.021 0.06 0.015 0.024 0.37 0.022 0.032 9 0.031 0.064 0.081 0.0074 0.0025 0.31 0.022 0.34 17 0.019 0.011 0.13 0.052 0.021 0.0097 0.004 0.008 3 0.013 0.071 0.018 0.065 0.0041 0.79 0.28 0.54 15 0.065 0.047 0.016 8.3e-05 0.091 0.24 0.005 0.15 23 0.0049 0.065 0.033 0.0027 0.15 0.24 0.013 0.13 22 0.019 0.0049 0.0079 0.14 0.38 0.057 0.003 NaN 11 0.012 0.12 0.11 0.35 0.033 0.027 0.0064 0.041 10 9.4e-05 0.016 0.048 0.069 0.087 0.74 0.068 0.14 1 0.0065 0.43 0.071 0.011 0.031 0.072 0.13 0.066 14 0.1 0.097 0.1 0.061 0.26 0.11 0.025 0.83 median 0.019 0.068 0.071 0.08 0.095 0.11 0.13 0.18 How to evaluate performance?

resagree<-magicsort(res$agree, sort.mat = res$r2.ind, method = "median") coltable(resagree, level.lower = 0.00, level.lower2 = -0.50, level.upper = 0.8, level.upper2 = 0.9, main.title = "Agreement between panelists") How to evaluate performance?

16 28 24 9 17 3 15 23 22 11 10 1 14 median 0.86 0.91 0.69 0.66 0.7 0.3 0.75 0.85 0.33 0.96 0.67 0.9 0.95 0.88 0.68 0.73 0.77 0.93 0.83 0.98 0.82 0.89 0.72 0.87 0.97 0.92 0.79 0.94 0.76 -0.051 0.74 0.78 0.81 0.71 0.84 0.55 0.59 0.38 0.8 0.41 -0.046 0.65 0.6 -0.46 MilkF Crunchy Bitterness CocoaF Sweetness Melting Caramel Vanilla Acidity How to evaluate performance?

How to evaluate the performance of a panel and its panelists?

Similar presentations

Presentation on theme: "How to evaluate the performance of a panel and its panelists?"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

How to evaluate the performance of a panel and its panelists?

Similar presentations

Presentation on theme: "How to evaluate the performance of a panel and its panelists?"— Presentation transcript:

Similar presentations

About project

Feedback