Presentation is loading. Please wait.

Presentation is loading. Please wait.

CIS 895 – MSE P ROJECT KDD- Service based Numerical Entity Searcher (KSNES) Presentation 2 on March 31 st, 2009 Naga Sowjanya Karumuri 1.

Similar presentations


Presentation on theme: "CIS 895 – MSE P ROJECT KDD- Service based Numerical Entity Searcher (KSNES) Presentation 2 on March 31 st, 2009 Naga Sowjanya Karumuri 1."— Presentation transcript:

1 CIS 895 – MSE P ROJECT KDD- Service based Numerical Entity Searcher (KSNES) Presentation 2 on March 31 st, 2009 Naga Sowjanya Karumuri sowji@ksu.edu 1

2 O UTLINE Project Data Flow Diagram Action Items Architectural Design Test Plan Formal Inspection Checklist Project Plan Prototype Demonstration Questions / Comments 2

3 P ROJECT D ATA F LOW D IAGRAM : N UMERICAL E NTITY S EARCHER 3

4 M ODULES IN THE P ROJECT Webpage (JSP): For requesting and receiving information from the service. POS Tagger (Java): Stanford POS Tagger Numerical Phrase Extractor (Java): Implemented using Shallow Parsing Technique Number-Unit/Date Pattern Recognizer (C++): Implemented based on the Numerical Quantifier developed by Benjamin Sapp, UIUC. 4

5 A CTION I TEMS Implemented Numerical Phrase Extractor Detailed Description of Test Plan Wrote Formal Specification using USE UML Representation of the System 5

6 A RCHITECTURAL D ESIGN 6 Service Oriented Architecture

7 P ACKAGE V IEW 7 Overall Package View Class Descriptions, Attributes and Operations are contained in Architecture Design Document

8 S EQUENCE D IAGRAM 8

9 C LASS D IAGRAM ( NPE PACKAGE ) 9

10 C LASS D IAGRAM ( NDPR PACKAGE ) 10

11 I MPLEMENTING N UMERICAL P HRASE E XTRACTOR Input: Tagged Text I/PRP lost/VBD thirty-three/JJ dollars/NNS in/IN 1998/CD Regular Expressions are used to determine the numerical patterns in the input. thirty-three/JJ dollars/NNS in/IN 1998/CD Output: Numerical Phrases thirty-three dollars in 1998 11

12 T AGSET 12

13 S OME P ATTERNS "\\d+-\\d+(/JJ|/CD) [a-zA-Z]+/NN" parses "(between|Between|from|From|In|in|since| Since|during|During)/IN..../CD (([a-zA- Z]+/CC|[a-z]+/TO)..../CD)?” parses 'between 1987 and 1997', 'in 2007 and 2008’ 13 \\d+-\\d+(/JJ|/CD)[a-zA-Z]+/NN 3-2/JJlead/NN 20-20/JJmatch/NN

14 A SSIGNING B OUNDS Words that will be detected so as to set the bounds like >, <, ~, = “ = ” is used if no words are mentioned 14 BoundCorresponding words >more than, no less than, no fewer than, at most, over <up to, not over, no more than, at least, less than, not over than ~about, around, approximately, some, nearly, almost,

15 S OME P ATTERNS [a-zA-Z0-9]+/CD( percent/NN)?( out/IN)? of/IN( the/DT)? ( [a-zA-Z]+/CD)?( [a-zA-Z]+/JJ)? [a-zA- Z]+(/NN|/NNS|/NNP) parses one of the five people two of the groups one of the rare cases 89 percent of people five of the seven former employees 3 out of 5 people 15

16 P HRASES THAT CAN BE PARSED 16 Numerical Phrases 27 year-old boy A 3-2 lead 9 in 10 people About 100 miles per hour 200 adults and children $3 million About two-thirds of the vote The 17-mile drive Less than 10% support Six-bedroom apartment 5.987 ml 10:00 a.m. CST From 400 to 500 miles Temporal Phrases Last year Next week Monday – Sunday January–December 1956-60 Mid-1990s Between 1999 and 2008 17th centaury 18 April 2008 Dec 21, 2009 October 10 th 1984 John, 67 Since 1998

17 P HRASES THAT ARE NOT C URRENTLY P ARSED Numerical PhrasesTemporal Phrases six-pack of drinks31 st of March 1998 $100 moreSince mid-November 252° (as POS can’t parse this)the January-April period 17 Future Work: These phrases can also be parsed by adding more patterns to the current system but for now the most important and commonly occurring patterns are considered. Current goal is to develop a basic idea of numerical phrase extraction.

18 F ORMAL S PECIFICATION Created and validated using USE 2.3.1. All Classes are specified All important attributes and methods are specified Constructor methods are not specified Contained at the end of the Architectural Design Document 18

19 T EST P LAN Outputs are checked at each module by the developer by matching them to the results manually calculated Check if the POS tagger has given the tagged text. Check if the numerical phrases are extracted Check if the numerical phrase is explained to Value, Unit and Unit-Type. UML diagrams and the required specifications will be checked for consistency by two fellow MSE students User interaction will be tested by the developer and the technical inspectors. 19

20 F ORMAL I NSPECTION C HECKLIST The following items are to be checked: The symbols used in the class diagram conform to UML standards The symbols used in the sequence diagrams conform to UML standards The classes in the class diagrams have corresponding descriptions provided in the Architecture Document The descriptions of the classes in the Architecture Document are clear and concise The classes in the USE model are consistent with those in the Architecture Document All the requirements in the Software Requirements Specification have been covered in the Architecture Document The multiplicities in the USE model have been depicted in the class diagram 20

21 P ROJECT S CHEDULE Key Dates Presentation 1:February 24 th, 2009 Complete Numerical Sub-Chunker Presentation 2: March 31 st, 2009 Complete Numerical Phrase Extractor Presentation 3: April 10 th, 2009 Patch up the modules Develop a GUI Set them up on the server To completely submit the documents by April 13 th, 2009 to the committee Final Portfolio submitted by April 15 th, 2009 21

22 P ROJECT S CHEDULE 22

23 P ROTOTYPE D EMONSTRATION POS Tagger working For now it works on the local machine Numerical Pattern Extractor For now it works on the local machine 23

24 P HASE 3 D ELIVERABLES Action items Component Design Assessment Evaluation Project Evaluation User’s Manual Formal Technical Inspection Checklists Presentation 3 Executable Project Source Code 24

25 T O -D O L IST Revise the Documents Revise Project Schedule Work on the Phase3 deliverables Final Demo 25

26 Questions?? Suggestions!! THANK YOU 26


Download ppt "CIS 895 – MSE P ROJECT KDD- Service based Numerical Entity Searcher (KSNES) Presentation 2 on March 31 st, 2009 Naga Sowjanya Karumuri 1."

Similar presentations


Ads by Google