Presentation on theme: "A Comparative Study of Two Natural Language Processing Frameworks Yixin Bian, Gunes Koru, Hongfang Liu Department of Information Systems, University of."— Presentation transcript:
A Comparative Study of Two Natural Language Processing Frameworks Yixin Bian, Gunes Koru, Hongfang Liu Department of Information Systems, University of Maryland, Baltimore County,MD,21250,USA June 11, 2012
Introduction UIMA (Unstructured Information Management Architecture) is a framework for natural language processing, originally developed by IBM but now maintained by the Apache Software Foundation. GATE (General Architecture for Text Engineering) is a Java suite of tools originally developed at the University of Sheffield and now used worldwide by a wide community of scientists, companies for all sorts of natural language processing tasks.
Introduction Both developed in Java. Although they share common goals, the two architectures are different in many aspects. Which one to adopt ?
Introduction In this paper, we compare them from three perspectives: Software design quality Code MetricsCode Metrics Software maintenance Code smellsCode smells Bugs Bug survival curves Bug survival curves User's manualUser's manual
The Comparison of Metrics UIMAGATE The number of classes 2,1872,822 MinMedianMaxTotalAverage Value MinMedianMaxTotalAverage Value Line of Code 0252944169,51677.510233869228,45480.95 CBO0284118225.410265112033.97 NOC007111700.53008110270.36 RFC063473522016.1032142990910.6 DIT011038371.7501847311.68 LCOM0161007937436.29001008505130.14 WMC04345151666.9302180152205.39
The Number of Code Smells Code SmellThe number of code smells in UIMA Average (UIMA/KLOC) The number of code smells in GATE Average (GATE/KLOC) Data Class60.035110.05 Data Clumps630.372210.091 Feature Envy260.15300 Refused Bequest1010.64482.05 Long Message Chain 190.112300.137 Shortgun Surgery 230.1361890.863 God Class160.094480.219 Total2541.57473.41
The Number of Bugs Detection ToolUIMAGATE FindBugs (2.0.0)6178 PMD (5.0)17981794 Lint4j (0.9.13)84494
The Comparison of Bug Survival Curves
The Comparison of User Manuals ContentsUIMAGATE Catalog Tutoral of manual Overview and characteristics of software product Installation and setup Introduction of product application Frequently Asked Questions (FAQ) × Known issues and problems with the software × Terms, concepts and their basic definitions in software ×
Conclusion Software design quality Software maintenance Users manual UIMA is better than GATE.