Presentation is loading. Please wait.

Presentation is loading. Please wait.

Supporting Creativity in Science: Cooperative Knowledge Acquisition & Knowledge Refinement Systems Derek Sleeman Department of Computing Science The University.

Similar presentations


Presentation on theme: "Supporting Creativity in Science: Cooperative Knowledge Acquisition & Knowledge Refinement Systems Derek Sleeman Department of Computing Science The University."— Presentation transcript:

1 Supporting Creativity in Science: Cooperative Knowledge Acquisition & Knowledge Refinement Systems Derek Sleeman Department of Computing Science The University ABERDEEN AB24 3FX Tel: +44 (0)1224 272296 Email: d.sleeman@abdn.ac.uk WWW: http//www.csd.abdn.ac.uk Acknowledgements: EPSRC support for the AKT Consortium Students: Eugenio Alberdi, David Corsar, Andy Aiken, Mark Winter

2 OVERVIEW of TALK I:Context: Advanced Knowledge Technologies (AKT) Consortium II: II: Co-operative Knowledge Acquisition & Knowledge Refinement Systems. III: ReTAX system IV: The REFINER++ System Questions / Discussion

3 I: AKT’s CHALLENGES Knowledge Acquisition Knowledge Maintenance Knowledge Publishing Knowledge Modelling Knowledge Reuse Life Cycle, Integration Issues & Testbeds Knowledge Retrieval

4 Knowledge-Based systems inevitably require a sizeable amount of domain knowledge. This can be acquired from: domain experts (KA) detailed examples (using ML techniques) etc However for complex tasks these KBs are inevitably incomplete when further Knowledge-Acquisition is needed; inconsistent when the KB needs to be refined. also it is likely that background knowledge will be incomplete; thus requiring an expert to act as an oracle. Hence the need for: Co-operative (Problem Solving) Knowledge Acquisition & Knowledge Refinement Systems II: Co-operative KA & Knowledge Refinement Systems

5 KRUST (Classical KB; Classification)(Susan Craw) STALKER (Efficient Truth Maintenance based system; Classification)(Leo Carbonara) REFINER/Refiner++ / R5 (Case-base; Classification)(Sunil Sharma; Mark Winter; Andy Aiken) RETAX (Revision of Taxonomies)(Eugenio Alberdi; David Corsar) CRIMSON (Refinement of Constraints)(Mark Winter) TIGONTime Series Data/Causal Model (Diagnosis) (Fraser Mitchell) SALT+Rules & Constraints; Propose & Revise(Piero Leo) References see - WWW: http//www.csd.abdn.ac.uk II: Co-operative KA & Knowledge Refinement Systems

6 KRUST & Wine Adviser STALKER REFINER+Attendance at Medical Clinics & Stock control CRIMSON/ConRefStock control RETAXBotanical Taxonomies TIGONTurbines (Fault Detection & Diagnosis) SALT+Elevators/Lifts References see - WWW: http//www.csd.abdn.ac.uk II: Co-operative KA & Knowledge Refinement Systems

7 III: RETAX+ The heuristics in RETAX are based on a study to determine how Botanists reacted to a rogue item(s). There are 2 (principal) rules which determine whether a taxonomy is well formed: each child node must be more specialized that its parent each of a node’s siblings must be unique. Retax was used to replicate the revision of a major botanical taxonomy done “manually” in Aberdeen’s Botany dept in the 90s. References: Middleton & Wilcox (1990) Edinburgh Journal of Botany {revision of taxonomy for Pernettya / Gaultheria} Alberdi & Sleeman (1997) AI Journal, p257-279. Alberdi, Sleeman & Korpi (1999) Cognitive Science Journal

8 LabelWheelsSizeMotorEngine- Power ParentDepth string ANY integer- range (2 – 8) ordered-set 4 (low medium large high) ordered- set 2 (yes no) Integer- Range (0 20) string ANY Integer- Range (0 3) vehicle2 - 8(low medium Large, high) (yes no)0 - 20root0 train6 - 8(medium Large) (yes)15 - 20vehicle1 car3 - 6(low medium high) (yes)2 - 10vehicle1 cycle2 - 3(low)(yes no)0 - 3vehicle1 lorry4 - 8(medium high large) (yes)5 - 20vehicle1

9 sports- car 4(low)(yes)5 – 10car2 salon-car4(medium)(yes)3 – 5car2 bicycle2(low)(no)0cycle2 motor- cycle 2(low)(yes)1 – 3cycle2 large- lorry 4 – 8(large)(yes)6 - 20lorry2 small- van 4(medium)(yes)5 – 10lorry2 smaller- van 4(medium)(yes)6small- van 3

10 Vehicle Train CarCycle Sports CarSalon CarBicycleMotorbike Lorry Large LorrySmall Van Smaller Van

11 RETAX+ Let’s refer to a new object/node as N, the existing hierarchy/tree as T, and the potential parent node as P. Then possible operations are: Is T well formed? (If not report nodes which violate the rules.) {E.G., If Sibling nodes N1 & N2 are equal, then merge the 2 nodes.} Is N already in T? Assuming T is well-formed, to which parent node, P, can N be attached without causing T to be rearranged or N modified? (Answer could be none) What changes have to be made to N to make it a “legal” child of node P? What changes have to be made to T so that N can be a child of P? Combinations of the last 2 operations

12 ReTAX Ericaceae Arctostaphylos Arbutus Pernettya Leucothoe Gaultheria Agauria Andromeda A. uva-ursiA. unedoP. tasminica G.oppositfolia G. rupestris G. antipoda A. polifolia

13 ReTAX - Historical: In Bentham & Hooker’s (1876*) classification the main differences detected between the Pernettya & Gaultheria genera were type of fruit and succulence of the calyx features. *G Bentham & JD Hooker (1876). Genera Plantarum, Vol II, Part2. (Publ: Reeves & Co, London) - Subsequent botanical investigations in the 20 th Century challenged this analysis, but did not suggest any further distinguishing features for the 2 genera; hence the 2 genera were combined, (Middleton & Wilcox, 1990).

14 ReTAX Simulation (Simplified) - The descriptions of several species of the Pernettya & Gaultheria genus were replaced by others with revised features (descriptors) which effect the definitions of the parent nodes (P +G) - When parent nodes (Pernettya & Gaultheria) are found to be the same, the system checks a set of other features (further facility of ReTAX) to see if they are distinctive & when no differences are found, the 2 nodes (P+G) are collapsed

15 RETAX+: Current / Future activities Use with other experts to help them formulate / refine taxonomies (eg other aspects of botany, microbiology) Use RETAX+, or a variant, to formulate / refine ontologies (eg medical terminologies). This has resulted in the Protégé RepairTAB which detects inconsistencies on OWL Ontologies & gives advice about removing inconsistencies. (Lam, Sleeman, Pan, & Wasconcelos (2008) Journal of Data Semantics)

16 IV: REFINER++ System The Refiner++ algorithm  Sample dataset Interaction with experts Current / future work

17 The Sample Dataset AgeDBPAssociated Disease Category 15090D1A 25690D2A 352101D3A 45095D3B 55697D3B 6-89D5A 75297D3A

18 The Refiner++ Algorithm Each case is assigned to a category Category descriptions are inferred from the case values When a case matches a category it was not assigned, by the expert, this is an inconsistency While inconsistencies exist…  A selection of disambiguation strategies are suggested  The user chooses a strategy to be performed  The list of inconsistencies is re-evaluated The refined dataset is now consistent

19 Generating Descriptions Generalise each field Numeric: range from lowest to highest String: set of all unique items Taxon: nearest common parent Boolean: set of all unique items from the set {‘true’, ‘false’, ‘any’} Combine to get category description

20 Category Descriptions CategoryAgeDBPDisease A50 – 5689 – 101All B50 – 5695 – 97D3 There are inconsistencies: Cases 4 and 5 match A Case 7 matches B We need to remove the overlap

21 Disambiguation Strategies Change values for certain cases Remove values from a category (eg, create a disjunction) Reclassify a case Make a case match an additional category Shelve a problem case Add a new field

22 Refiner++ C1 C2 C3

23 Strategies for this problem Change value of DBP in case 7 to 90 Change value of DBP in case 5 to 95 Reclassify case 7 to category B Add case 7 to category B Shelve case 7 Change value of Disease in cases 3 and 7 to D3 Reclassify cases 4 and 5 to category A Add cases 4 and 5 to category A Shelve cases 4 and 5 Add a new field

24 Strategy Ordering Typically, many strategies are suggested We need heuristics to order them Ordered by number of times suggested; prefer strategies which are suggested many times Ordered by number of cases affected; prefer strategies which affect fewer cases

25 The Refiner++ Main Screen

26 Scalability Measured the time taken to perform validation on randomly-generated datasets with varying numbers of cases, fields and categories For most datasets, time taken is under 1 second

27 Use of REFINER++ by Experts* Refiner++ has been used with various experts including: Pain Control Expert (Anaesthesiology) Child psychologist High Dependency Unit (HDU) Physician * KCAP-2003 paper (Aiken & Sleeman)

28 Pain Control Pre-existing Access dataset on epidural patients Many cases, lots of fields / descriptors Refiner++ imported the data (almost) perfectly Expert categorised cases based on the length of the epidural (in days) REFINER++ took only a few seconds to create category descriptions and validate But…

29 Pain Control Hundreds of inconsistencies found Hundreds of strategies suggested  Almost all which were ‘change value’ Why did it not work better?  Subjective nature of the subject domain.  Categories were contiguous

30 Child Psychology The session was a series of anecdotes and outlines of specific cases Three types of cases were identified: Severely autistic Mildly autistic Difficulties with language development

31 Child Psychology The expert stated that autistic children usually had the following characteristics: Problems with language and verbal communication Problems with social interaction Obsessive behaviour These characteristics were abstracted by the knowledge engineers and subsequently confirmed with the expert The expert showed no inclination to use REFINER++, but a case set was created by the knowledge engineers

32 HDU Task poised by domain expert: when to move high dependency unit (HDU) patients to a general ward, or the intensive care unit (ICU), or leave them in the HDU. Used Refiner++ with three datasets one for each condition (cardiac, neuro & respiratory) Expert did not use the system but did dictate the descriptors & the sets of cases to the knowledge engineers who typed this information into REFINER. Refiner++ found 2 categories were consistent; & in the third identified inconsistencies

33 Inconsistent Dataset HRRRAVPUSat O2Cat. 110527194Higher 212035288Higher 314045380Higher 410528194Same 59022195Same 68018196Lower 77015198Lower

34 Category Descriptions There are inconsistencies:  Case 1 matches Category SAME  Case 4 matches Category HIGHER We need to remove the overlap Refiner++ suggested lower and upper ‘danger zones’ for each field Category HRRRAVPUSat O2 higher105-14027-451-380-94 same90-10522-38194-95 lower70-8015-18196-98

35 Future Work: Use with Domain Experts Make the system’s GUI more intuitive (some changes already made) Ask expert to come along to the session with a document which summarizes the main features of the dataset they wish to discuss. (In session ask them to highlight principal concepts) For each domain expert contacted, record an AVI session of a simple but related domain (eg simple childhood diseases before approach a paediatrician) (demo)demo

36 Current Work (ICU domain) Developed system which is statistically based, so given a case description it returns the likelihood of that case belonging to one of the predefined categories (R5: Andy Aiken) Acquired data set of patients’ physiological parameters from an ICU DB, and have clinicians assign patients on day-by-day & hour-by-hour to a 5-point severity score. (Develop in conjunction with Glasgow Royal Infirmary) Using R5 with the above data set to assign new patient reports to a severity class. (Practically important as the descriptors include clinical interventions which “standard” scales don’t.) Identify & analyse (explain) anomalous / unusual cases (segments of cases)

37 VI: Dimensional Analysis ?? Outline issue Pointer to TR Pointer to WWW systems / sources

38 Questions/Comments

39 V: (Causal) Explanations for Anomalous Medical cases Discuss ICU context Experiment to detect Anomalous cases / sections of cases Outline a typical investigation

40 V: Seeking to Explain an anomalous Observation EXPECTED: An injection of X will cause the heart (Organ, O) to increase its contraction rate within T seconds. SUPPOSE that does not happen, then here are some of the investigations which might be performed: a)Is the injection being given effectively b)IF so then check whether the drug X is being transported to Organ, O a)Is the transport path physically / bio-chemically blocked? b)Is the transport mechanism inhibited slowed down? c)IF the drug is actually arriving at Organ O & the conc is OK, then investigate: a)Is the drug mechanism within the organ being blocked? b)Is the organ for some reason unable to respond in the usual way (eg weaken heart muscle)


Download ppt "Supporting Creativity in Science: Cooperative Knowledge Acquisition & Knowledge Refinement Systems Derek Sleeman Department of Computing Science The University."

Similar presentations


Ads by Google