A Comparative Study of Two Natural Language Processing Frameworks Yixin Bian, Gunes Koru, Hongfang Liu Department of Information Systems, University of Maryland, Baltimore County,MD,21250,USA June 11, 2012
Introduction UIMA (Unstructured Information Management Architecture) is a framework for natural language processing, originally developed by IBM but now maintained by the Apache Software Foundation. GATE (General Architecture for Text Engineering) is a Java suite of tools originally developed at the University of Sheffield and now used worldwide by a wide community of scientists, companies for all sorts of natural language processing tasks.
Introduction Both developed in Java. Although they share common goals, the two architectures are different in many aspects. Which one to adopt ?
Introduction In this paper, we compare them from three perspectives: Software design quality Code MetricsCode Metrics Software maintenance Code smellsCode smells Bugs Bug survival curves Bug survival curves User's manualUser's manual
The Comparison of Metrics UIMAGATE The number of classes 2,1872,822 MinMedianMaxTotalAverage Value MinMedianMaxTotalAverage Value Line of Code , , CBO NOC RFC DIT LCOM WMC
The Number of Code Smells Code SmellThe number of code smells in UIMA Average (UIMA/KLOC) The number of code smells in GATE Average (GATE/KLOC) Data Class Data Clumps Feature Envy Refused Bequest Long Message Chain Shortgun Surgery God Class Total
The Number of Bugs Detection ToolUIMAGATE FindBugs (2.0.0)6178 PMD (5.0) Lint4j (0.9.13)84494
The Comparison of Bug Survival Curves
The Comparison of User Manuals ContentsUIMAGATE Catalog Tutoral of manual Overview and characteristics of software product Installation and setup Introduction of product application Frequently Asked Questions (FAQ) × Known issues and problems with the software × Terms, concepts and their basic definitions in software ×
Conclusion Software design quality Software maintenance Users manual UIMA is better than GATE.
Thank you !