Logic form identification of medical clinical trials Clint Tustison.

Logic form identification of medical clinical trials Clint Tustison

2 Introduction  The what … Identify and extract logic forms from medical clinical trials (in)eligibility criteria  The why … Understand the data Match up the information with other data, i.e., patients ’ medical records  The how … Syntactic parser Cognitive modeling architecture

3 Process Syntactic Parser Cognitive modeling engine Clinical Trials (input) Predicate Calculus Post- Processing (output) Text- processing

4 Input  ClinicalTrials.gov  Sponsored by NIH and other federal agencies, private industry  8,800 current trials online  3,000,000 page views per month  Purpose, eligibility, location, more info.

5 Text processing  Convert trials to.xml format Eligibility Criteria Inclusion criteria: Adenocarcinoma of the pancreas.

6 Process: Input Syntactic Parser Clinical Trials (input) A criterion equals adenocarcinoma of the pancreas. Cognitive modeling engine Predicate Calculus Post- Processing (output)

7 Syntactic parser  Link-Grammar Parser Characteristics  Syntactic dependency parse  Constraints for determining grammaticality  Links give clues on how to process constituents Benefits  written in C  very fast  Robust - ability to process spelling errors  Free - http://www.link.cs.cmu.edu/linkhttp://www.link.cs.cmu.edu/link  Can be easily integrated with other applications

8 Process: Syntactic Parser Syntactic parser A criterion equals adenocarcinoma of the pancreas. +--------------------------------Xp--------------------------------+ +-----Wd-----+ +----Js----+ | | +--Ds--+----Ss----+------Os-----+-----Mp----+ +---Ds--+ | | | | | | | | | | LEFT-WALL a criterion.n equals.v adenocarcinoma[?].n of the pancreas.n.

9 Intelligent Processing  Soar Architecture Model and theory of cognition used in AI programming Translates syntactic parse to logic output by reading links Benefits  Goal-directed problem solving  Agent-based architecture  Ability to learn  Proven in multiple applications  Natural Language-Soar  Tactical Air-Soar  Nasa Test Director-Soar

10 Process: Intelligent processing (M1 ^idea N5 ^idea N4 ^idea N3 ^idea N2) (N5 ^annotation feat-dumped ^annotation seq-dumped ^annotation seq-prep ^aug N4 ^nuc pancreas ^wcount 7) (N4 ^annotation seq-dumped ^annotation seq-prep ^aug N3 ^nuc adenocarcinoma ^of N5 ^wcount 4) (N3 ^ext N2 ^int N4 ^nuc equals ^wcount 3) (N2 ^annotation feat-dumped ^annotation seq-dumped ^annotation seq-prep ^aug N3 ^nuc criterion ^wcount 2)

11 Tools: Representation  Predicate Logic Formal properties, allow for wide range of applications, usable crosslinguistically Vocabulary, syntax, semantics  First-order: quantification over individuals (FOPC)  Higher-order: quantification over relations, etc.

12 Process: Logic Output Predicate Calculus criterion(N2) & adenocarcinoma(N4) & pancreas(N5) & equals(N2,N4) & of(N4,N5). Syntactic Parser Cognitive modeling engine Clinical Trials (input) A criterion equals adenocarcinoma of the pancreas. Post- Processing (output)

13 Post-processing  Prolog axioms Remove elements not included in language of the criterion). Format elements needed in output (ampersands).  Reduce(Z, Y) :- member(Criterm, Y), functor(Criterm, criterion, 1), arg(1, Criterm, Critvar), member(Predterm, Y), functor(Predterm, Xterm, 1), arg(1, Predterm, Predvar), member(Equalsterm, Y), functor(Equalsterm, equals, 2), arg(1, Equalsterm, Critvar), arg(2, Equals, Critvar, Predvar), delete(Y, Criterm, Z2), delete(Z2, Equalsterm, Z).  Turns previous statement: criterion(N2) & adenocarcinoma(N4) & pancreas(N5) & equals(N2,N4) & of(N4,N5).  Into: adenocarcinoma(N4) & pancreas(N5) & of(N4,N5).

14 Output Eligibility Criteria Inclusion Criteria: Adenocarcinoma of the pancreas pancreas(N5) & adenocarcinoma(N4)& of(N4,N5)..

15 Results/Conclusion  Data can be matched up with patients’ medical records to determine if they meet criteria posted in the clinical trial.  Disadvantages Grammar is difficult to write Only one parsed output per utterance  Advantages Fast Robust Implementation in other languages Can be easily integrated with other applications/corpora

