Presentation on theme: "1 TCR TUMOUR MATCHING ALGORITHM The structure of the algorithm- Heather Bourne Using the algorithm - Jason Hiscox."— Presentation transcript:
1 TCR TUMOUR MATCHING ALGORITHM The structure of the algorithm- Heather Bourne Using the algorithm - Jason Hiscox
2 CURRENT METHOD OF TUMOUR MATCHING The UKACR multiple primary rules are interpreted by the registry staff entering the data. Usually… In general...
3 AUTOMATING TUMOUR MATCHING In order to carry out tumour matching automatically the computer must be able to make the value judgements once made by cancer registry staff-
4 THE TCR TUMOUR MATCHING ALGORITHM (1) In order to automate the registration of patients with multiple records the decision making process had to be reduced to a series of yes/no/dont know questions covering all eventualities. Two comprehensive reference tables were produced with entries for all values between: - M8000 and M9989 for morphology and and (ICDO-1) or C00.0-C80.9 (ICDO-2) for topography An algorithm was then written to allow all conceivable topography and morphology pairs to be processed appropriately.
5 THE TCR TUMOUR MATCHING ALGORITHM (2) The algorithm looks at:1. MORPHOLOGY 2. TOPOGRAPHY 3. BEHAVIOUR 4. SIDE to see if they are:1. THE SAME 2. DIFFERENT 3. QUERY - manual check required The morphology and topography codes are assigned to groups in the morphology and topography tables and the algorithm then carries out a number of tests to establish whether or not the values should match or be checked manually. Any values that have not been matched or sent for checking are treated as being different.
6 TCR TUMOUR MATCHING ALGORITHM (3) Morphologies defined as equal? NO ? Query for manual resolution YES Topographies defined as equal? NO 2 registrations ? Query for manual resolution YES Behaviour the same? NO Query for manual resolution Laterality the same? NO Query for manual resolution YES Records match, make one registration only There were found to be too many errors in the source data to allow records with different values for behaviour and laterality to be processed automatically. YES 2 registrations Pre-match validation is carried out to ensure that all records entering the matching process are as accurate as possible. The algorithm has 4 basic steps, each of which attempts to prove that the 2 tumours are different. If, after the 4th step, no differences have been identified one registration only is made.
7 STEP 1 MORPHOLOGY (a) The morphology matching table assigns each morphology code to a group of codes considered to be the same. e.g. Some lines from the morphology table: = 1A(= Neoplasm NOS ) = 1B(= Carcinoma NOS) = 6A(= Specific digestive adenocarcinomas (neuroendocrine)) = 6B(= Specific liver and bile duct carcinomas) = 36B(= Non-Hodgkins lymphoma NOS) = 36B
8 STEP 1 MORPHOLOGY (b) The algorithm treats all morphologies in the same group as being the same, but 3 types of processing are carried out: 1Certain automatic inter-group MATCHES are allowed, e.g. - Carcinoma NOS (group 1B) matches with: any other epithelial malignancy (groups 1-15) - Myeloid leukaemia NOS (group 50) matches with: specified types of myeloid leukaemia (groups 50A-50C) 2Some morphology group pairs are sent for MANUAL RESOLUTION, e.g. -Germ cell neoplasms (group 21) are checked against trophoblastic tumours (group 22) -Germ cell neoplasms (group 21) are checked against carcinoma NOS (group 1B)
9 STEP 1 MORPHOLOGY (c) 3A series of steps has been added to the algorithm to cope with POORLY CODED SOURCE DATA, e.g. One type of sarcoma is checked against any other type of sarcoma, spindle cell or pseudosarcomatous carcinoma or tumour. Any case where the morphologies have not matched, but where:. the dates of diagnosis are within 100 days, or where. the tumour site is the same are sent for checking.
10 STEP 2 TOPOGRAPHY(a) The topography matching table assigns each topography code to a value for processing. The values used correspond with ICD10 values, e.g. rectosigmoid junction, rectum and anus are allocated separate values. Any other topography codes with which a code should match, and any site combinations that require checking are given in separate columns in the table.
11 STEP 2 (b) TOPOGRAPHY TABLE Some lines from the topography table: Site Morphology Query with Match withGroup (Caecum) , (Rectum) X (Connective
"name": "11 STEP 2 (b) TOPOGRAPHY TABLE Some lines from the topography table: Site Morphology Query with Match withGroup (Caecum) 153.4 - 020,022+025 031023 (Rectum) 154.1 - 031+032 -033.X (Connective 171.3
12 STEP 2 TOPOGRAPHY (c) Most of the processing of the topography codes is carried out within the topography table. Two topographies are regarded as being equal if: The internal values to which they have been assigned are equal, or One is a more general term for the other. A series of processes has been added to deal with special cases and poorly coded data, e.g.: If one or both morphologies are site specific and the appropriate site code does not appear in either record the case is checked. When the 2 tumours appear to be at different sites a series of tests is carried out to avoid duplicate registrations e.g.: To check that one is not a metastasis from the other. Records for non-melanoma skin cancers are merged according to UKACR rules.
13 STEP 3 TUMOUR BEHAVIOUR Two records are matched only if the behaviour code recorded for each is the same. Any pair of records where the behaviour codes are different is resolved manually. We found that differences in behaviour code were just as likely to be the result of error as to be genuine.
14 STEP 4 LATERALITY Two records are matched only if the laterality recorded for each is the same. Any pair of records where the laterality recorded is different is resolved manually. We found that differences in laterality were just as likely to be the result of error as to be genuine. Notes: Not known, and either right, left or bilateral are considered to be the same. Bilateral, and either right or left are not considered to be the same.
15 USING THE TUMOUR MATCHING ALGORITHM The algorithm has been used for: Processing electronic data from a number of sources A nightly QA check for duplicate registrations in all data entered at the registry on the previous day. The basic algorithm seems to be stable and working well. Most of the amendments made now are to cope with poorly coded source data. However, there are some cases that could be termed casualties of electronic tumour matching, where the coding of the source data defies logic. To cope with these: Some cases fail the nightly consistency checks, but The standard site/histology consistency checks need to be tightened up. To do this a table of unusual diagnoses is being prepared.
16 TCR TUMOUR MATCHING ALGORITHM The structure of the algorithm- Heather Bourne Using the algorithm - Jason Hiscox
17 Inputs and Outputs Inputs Source Identifier Site, Morphology and Side Event DateOutputs Definite Match, New Tumour, No Decision
20 Outcomes (TCR v4.0)
21 Outcomes (TCR v5.1)
23 Differences (v4.0 vs. v5.1) Fewer definitive decisions in later algorithm but ENCR test cases are designed to be difficult / awkward!!! Are these results the same with real-life data?
24 Outcomes (Cancer Deaths)
25 Outcomes (Tertiary Centre)
26 Differences (Dths vs. Hospital) Quality of behaviour coding More metastatic behaviour (Hospital) -> more excluded / queried Quality of topography coding More inappropriately coded (e.g. site=blood) records (Hospital) -> more excluded / queried Quality of morphology coding more specific codes (Hospital) -> more rejected as no match at morphology comparison stage
27 Summary Acceptable performance depends on Quality of incoming diagnosis data 100% definitive decision making is not possible Matching algorithms need continual assessment vs. real life (70-80% definite decisions made on real life data compared to % on test data) to be flexible to be conservative