Presentation is loading. Please wait.

Presentation is loading. Please wait.

A research overview Professor Philip Sallis Auckland University of Technology New Zealand.

Similar presentations


Presentation on theme: "A research overview Professor Philip Sallis Auckland University of Technology New Zealand."— Presentation transcript:

1 A research overview Professor Philip Sallis Auckland University of Technology New Zealand

2 Greetings from Aotearoa-New Zealand (the official bi-cultural name) Maori and English

3 Introductions Dr Philip Sallis (Computer Science) NLP and Computational Linguistics Professor and Senior University Academic The University Deputy Vice Chancellor (Vice Rector or Provost) Dr Kathy Garden (Electrical Engineering) Computer Tomography and Signal Processing Dean, Faculty of Design and Creative Technologies and Regional Pro Vice Chancellor

4 Brief Curriculum Vitae Kathy Kathy PhD (NZ) and Post Doc Fellow (USA) (Electrical Engineering) PhD (NZ) and Post Doc Fellow (USA) (Electrical Engineering) Univ teaching, research & supervision (NZ) Univ teaching, research & supervision (NZ) Government science policy advisor Government science policy advisor Industry and Regional Govt strategic advisor Industry and Regional Govt strategic advisor AUT – Dean and Pro Vice Chancellor AUT – Dean and Pro Vice Chancellor Philip Philip PhD (Computer Science) (England) PhD (Computer Science) (England) Univ teaching, research & supervision (UK, Aust, NZ) Univ teaching, research & supervision (UK, Aust, NZ) Visiting research professor (UK, USA, HK...and Chile!) Visiting research professor (UK, USA, HK...and Chile!) Industry consulting and government commissions Industry consulting and government commissions Full Professor since 1987 and HoD three times since 1979 Full Professor since 1987 and HoD three times since 1979 Deputy Vice Chancellor at AUT since 1999 Deputy Vice Chancellor at AUT since 1999

5 Auckland University of Technology 26,000 students (full and part-time) 26,000 students (full and part-time) 10% post-graduate (Masters and PhD) 10% post-graduate (Masters and PhD) 23% International students (70% in degrees, 30% short courses) 23% International students (70% in degrees, 30% short courses) Faculties: Faculties: Design and Creative Technologies Design and Creative Technologies Business and Law Business and Law Health and Environmental Sciences Health and Environmental Sciences Humanities Humanities Maori Development Maori Development

6 About this Presentation Research in general Research in general Overview of my own research Overview of my own research Description of two areas of work: Description of two areas of work: Software forensics Software forensics Digital Libraries Digital Libraries Publications and further information Publications and further informationwww.aut.ac.nz/serl Enquiries re PhD supervision Enquiries re PhD supervision philip.sallis@aut.ac.nz kathy.garden@aut.ac.nz Leopoldo Leoncio

7 Research Mix (‘Typical’) Papers & Confs Reports Products Alone In teams (Ideas + People + Funding + Work) = Results Funding sources: university funds government grants international grants industry contracts

8 To choose one aspect of research that is effort and cost effective - Clustering P P P Usually more appealing for grant providers too! New one has recently emerged at UCM

9 ‘Teams are best’ – why? Sharing and testing ideas Sharing and testing ideas Mix of expertise for multi dimension and inter disciplinary research Mix of expertise for multi dimension and inter disciplinary research Division of labour (efficient and effective) Division of labour (efficient and effective) Peer pressure to reach conclusions and achieve outcomes – publish papers etc Peer pressure to reach conclusions and achieve outcomes – publish papers etc Writing grant applications Writing grant applications Using more names to strengthen proposals Using more names to strengthen proposals Demonstrating collaboration Demonstrating collaboration Inter colleague, inter institution, inter national Inter colleague, inter institution, inter national

10 A team at work and play!

11 My Research Map Program and data structures. Compilers. Text Parsing Algorithms Data modelling & DBMS Software development process models (CMM) etc Measurement and improvement of effort, activity and product Computational linguistics (stylometrics) Software Metrics Software Metrics Software Forensics PhD research NLP /NLU

12 A journey with computing Elliott 503 PdP1100 & 1125 B6700 & 2700 HP2100A & 3000 ICL1902T & 1905E IBM 1401,360, 6000 Prime 710 VAX 700 series Onyx, Sun, Mac, PC

13 Inspiration

14 Performance Analysis and Improvement milieu The computer DevelopersUsers Data & schema Program code & structure Outputs URS Unplanned input Stress Testing using value changes to parameters and variables in all aspects of the system. Simulation.

15 An early interest in NLP Program code meta languages, Compilers, S-Grammars, parsing algorithms for text proc Command and Edit languages and parsiing NLP and symbol processing – symbolic AI methods Full text, narrative and discourse analysis 1972-6 1976-9 1980-n

16 Two research areas emerged and then merged as Software Forensics Software Engineering Computational Linguistics A fascination with the delta!

17 Software Engineering research Algorithms, program and data structures, programming style Measuring aspects of the process such as programmer productivity (4GLs) User and use profiling for system optimisation using simulation and other methods. ‘Programming in the large’ and the software system development process Process and URS improvement. Data modelling and DB design. Time & Cost estimation. CASE. Mathematics and Computer Science Prog Lang, Operating Systems, Compilers System integration, blended data applications (GIS) and their usability measurement.

18 Computational Linguistics research String handling Algs, Command Editors, Text processing, Bibliometrics (NLP) Transformational Grammars, meta- information and ‘deep’ structures Authorship authentication Topic clustering depictions Symbolic AI. Formal representation of meaning & semantics (NLU). Epistomology. PhD - A domain grammar and parser for generating abstracts from journal articles Mathematics and Computer Science Prog Lang, Operating Systems, Compilers Stylometric parsers for thematic analysis, topic clustering, etc

19 Forensics - convergence and incorporation of new technologies Programming languages, interpreters, CASE, etc Programming languages, interpreters, CASE, etc Geographic Information Systems (GIS) Geographic Information Systems (GIS) Global Positioning Systems (GPS) Global Positioning Systems (GPS) Voice over IP (VoIP) Voice over IP (VoIP) Voice Recognition Voice Recognition Wireless and GPRS...now RFID Wireless and GPRS...now RFID Computational Neural Networks Computational Neural Networks Algorithms, data structures, pattern matching Algorithms, data structures, pattern matching Connectionist alternatives for clustering etc Connectionist alternatives for clustering etc RISC technologies RISC technologies Bio-informatics (first NZ course as PG Dip) Bio-informatics (first NZ course as PG Dip) (bio-medical data [text, image and telemetry] and technologies) (bio-medical data [text, image and telemetry] and technologies)

20 Fingerprint parsing algorithms Count everything Compare everything A lot of data Program, data and image names File extensions and Temporary files Variable, parameter and label names Expressions and data structures (arrays etc) Structure – iteration, recursion, formulae, etc Algorithm characteristics Sub routines, case statements, DB calls, etc Word, sentence & paragraph count Length of words, sentences, etc Word frequencies Phrases and adjacent word pairs Nouns and pronouns, adjectives, etc Prepositions, positive/negative exp Compare with Canon Corpora Differences in expression

21 Some Forensics Tools Identify (program and data structure comparisons) Identify (program and data structure comparisons) www.aut.ac.nz/serl www.aut.ac.nz/serl Beyond Compare (file, variable, labels, line match) Beyond Compare (file, variable, labels, line match) www.scootersoftware.com www.scootersoftware.com Signature (stylometric comparisons) Signature (stylometric comparisons) www.signature.com www.signature.com Viscovery (Data and results visualisation) Viscovery (Data and results visualisation) www. eudaptics.com www. eudaptics.com Improve English (readability & comprehension tests) Improve English (readability & comprehension tests) (www.improve-english.com) (www.improve-english.com)

22 Processing Forensics Data Programming Languages - SNOBOL, LISP, Prolog, C++, Perl Programming Languages - SNOBOL, LISP, Prolog, C++, Perl Data Management - flat and I-S files, RDBMS, MySQL, php, ASP etc Data Management - flat and I-S files, RDBMS, MySQL, php, ASP etc Statistical methods - probability, inference, prediction SPSS and Excel Statistical methods - probability, inference, prediction SPSS and Excel Connectionist alternatives for dependency analysis (FNN) - KEDRI Connectionist alternatives for dependency analysis (FNN) - KEDRI Cluster analysis MatLab and Visualisation alternatives (Viscovery) Cluster analysis MatLab and Visualisation alternatives (Viscovery)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ A lot of data to analyse, represent and reach conclusions about Not an exact science (closeness of fit but also human interpretation)

23 Example formatted results after program code comparison Titles Beyond Compare Software Computer Comparison report: DJFruits Terms of Trade 60 Lines match 13 match on left side only 49 match on right side only 25 lines with important differences 0 lines with only unimportant differences 17 sections different Plaintiff Terms of Trade DJ Fruits Personal Hygiene Policy 3 Lines match7 match on left side only 93 match on right side only 31 lines with important differences 0 lines with only unimportant differences 4 sections different Plaintiff Personal Hygiene Policy DJ Fruits Process Policy 1 Lines match2 match on left side only 21 match on right side only 22 lines with important differences 0 lines with only unimportant differences 2 sections different Plaintiff Process Policy DJ Fruits Trace back and Recalls 1 Lines match0 match on left side only 34 match on right side only 23 lines with important differences 0 lines with only unimportant differences 2 sections different Plaintiff Trace back and Recalls DJ Fruits Control 1 Lines match1 match on left side only 20 match on right side only 14 lines with important differences 0 lines with only unimportant differences 2 sections different Plaintiff Control DJ Fruits Validation 2 Lines match0 match on left side only 64 match on right side only 14 lines with important differences 0 lines with only unimportant differences 3 sections different Plaintiff Validation

24 Example formatted results after documentation comparison DateTitleGrade LevelReading EaseReadability# of words Average syllables per word# of sentences Average words per sentence DJFruits Terms of Trade19.223.1623.7912181.733336.91 Plaintiff Terms of Trade15.7231.8219.8512491.744527.76 DJ Fruits Personal Hygiene Policy22.5717.2226.373301.68747.14 Plaintiff Personal Hygiene Policy32.39-9.7936.432861.7471.5 DJ Fruits Process Policy23.217.3226.731991.64449.75 Plaintiff Process Policy14.6331.119.391611.8723 DJ Fruits Trace back and Recalls10.8642.6413.972861.772014.3 Plaintiff Trace back and Recalls17.6126.2321.032901.75932.22 DJ Fruits Control14.4931.8818.391601.79722.86 Plaintiff Control12.513815.331651.78918.33 DJ Fruits Validation35.01-43.7739.8632.21163 Plaintiff Validation37.61-44.5742.35732.1173 English Language Comparisons of files: using www.improve-english.com

25 Raw Data Counting begin to build a ‘fingerprint picture’ ‘stylometrics’ filenamecharacters_incl_blankscharactersword_countuniq_words CasketLetterEight.txt13031014255143 CasketLetterFive.txt14051092280140 CasketLetterFour.txt26632070527239 CasketLetterOne.txt14391117285149 CasketLetterSeven.txt13371045259145 CasketLetterSix.txt22431731443213 CasketLetterThree.txt36382867698292 CasketLetterTwo.txt18550143823631900 Letter1.txt13811089263142 Letter2.txt12711000230132 Letter3.txt28082223521251 Letter4.txt30202368573270 Totals410583199879653016

26 Writing/Readability Tests the picture becomes more complex filename Fog Flesch FleschKincaid CasketLetterEight.txt14.470663.617811.4247 CasketLetterFive.txt18.714349.9116.235 CasketLetterFour.txt15.284358.342712.823 CasketLetterOne.txt13.78663.920111.4239 CasketLetterSeven.txt12.50769.91869.3564 CasketLetterSix.txt10.695470.658.6457 CasketLetterThree.txt24.514232.588322.0526 CasketLetterTwo.txt11.765568.16619.3893 Letter1.txt14.474461.443211.2229 Letter2.txt16.195749.650313.4764 Letter3.txt25.139430.057522.01 Letter4.txt17.745649.269615.284

27 Word and sentence frequencies, length, etc – visualisation

28 Typical Histogram Visualisation

29 Typical Clustering Visualisation

30 Conventional line graph visualisation all assist interpretation

31 Typical co-efficient vector linkage visualisation

32 Transposed co-efficients for greater granularity (more precision)

33 Greater the data comparison set, more the need for clarity

34 Alternative cluster depiction SOM (Kohonan methods) Viscovery

35 Example cluster dependency depiction for border coefficients

36 Still a need for conventional depictions to reach conclusions

37 Especially for multivariate clusters

38 In summary A blend of conventional statistics and visualisation methods with new alternative (connectionist) methods brings more precision and greater clarity to the mix of precise and imprecise data!

39 Sample published research combining software metrics and stylometrics Semantic structures in empirical science text (from PhD) Semantic structures in empirical science text (from PhD) Generating abstracts from journal articles (from PhD) Generating abstracts from journal articles (from PhD) Railway fault report narrative analaysis (with GIS) Railway fault report narrative analaysis (with GIS) Emergency services events and resources (with GIS) Emergency services events and resources (with GIS) Case Law comparisons with Legislation Preambles Case Law comparisons with Legislation Preambles Family Law topic clustering, Law and Action Taken Family Law topic clustering, Law and Action Taken Dialogue topic clustering (email traffic project) Dialogue topic clustering (email traffic project) Text editing for the visually impaired (Voice Recognition) Text editing for the visually impaired (Voice Recognition) Semantic dependency depiction (CNNs and SOMs) Semantic dependency depiction (CNNs and SOMs) Canonical Scripture analysis (themes eg. “justice”) Canonical Scripture analysis (themes eg. “justice”) English Language expression/readability algorithms English Language expression/readability algorithms Letters of St. Ignatius of Antioch (authorship - extra) Letters of St. Ignatius of Antioch (authorship - extra) Letters of Mary Queen of Scots (authorship - intra) Letters of Mary Queen of Scots (authorship - intra) Litigation projects for copyright etc (Law Courts) Litigation projects for copyright etc (Law Courts)

40 Ongoing Forensics Work Conduct litigation work as it comes Conduct litigation work as it comes Authorship authentication, including plagiarism Authorship authentication, including plagiarism Narrative analysis (topics, themes, etc) Narrative analysis (topics, themes, etc) Always interested in new approaches, methods and tools...also joint projects and PhD Students! Always interested in new approaches, methods and tools...also joint projects and PhD Students! Life after DVC administration work Life after DVC administration work

41 End of Forensics Presentation

42 In 1994 a new and different project Digital Libraries Alexandria Digital Library Project www.alexandria.edu Alexandria Digital Library Project www.alexandria.edu NZADL NZADLwww.nzadl.org

43 Alexandria Digital Library (1994) www.alexandria.ucsb.edu to map the surface of the earth using land sat, radio spectrometry and orthophoto imagery from NASA etc to map the surface of the earth using land sat, radio spectrometry and orthophoto imagery from NASA etc US$9 million (ADL). New US$15 million (NGDA) US$9 million (ADL). New US$15 million (NGDA) a distributed digital library with collections of geo referenced materials and services for accessing collections...a super powerful GIS for research! a distributed digital library with collections of geo referenced materials and services for accessing collections...a super powerful GIS for research! Expectation to build applications by integrating environmental and other data with the images Expectation to build applications by integrating environmental and other data with the images Researchers from 5 US univs, 4 other countries...and AUT Researchers from 5 US univs, 4 other countries...and AUT

44 The ADL Project Invitation to UCSB Invitation to UCSB Map and Imagery Laboratory Map and Imagery Laboratory Alexandria Digital Library Project (NSF) Alexandria Digital Library Project (NSF) Methods for measuring system performance Methods for measuring system performance Profile system users Profile system users Profile system use Profile system use Observe correlations and process dynamics Observe correlations and process dynamics System optimisation & operation management System optimisation & operation management Result = a sampling and simulation suite - metrics again! Result = a sampling and simulation suite - metrics again!

45 Numerous collections of ADL digital images Topographical and terrain maps Topographical and terrain maps Geospatial and geodetic images Geospatial and geodetic images Marine geodetic and composition Marine geodetic and composition Environmental and climatalogical Environmental and climatalogical Demographic and land utilisation Demographic and land utilisation Object location mapping Object location mapping Sundry specific image collections Sundry specific image collections

46 Applications Forestry and crop management Forestry and crop management Land utilisation changes Land utilisation changes Environmental influence mapping Environmental influence mapping Tectonic displacement Tectonic displacement Topographical alterations post typhoon Topographical alterations post typhoon Marine pollution and fisheries management Marine pollution and fisheries management Demographic density trends Demographic density trends etc etc

47 Upon loading the NGDA collection browser Landsat imagery over the US is loaded by default

48 USA - The Great Lakes Area

49 NASA land sat of NZ

50 Telephoto Terrain Projection

51 Stereoscopic (orthophoto) showing physical boundary features (NZ)

52 Satellite image collection of the Maya Forest Mexico

53 Scripps Institute Collection An orchid greenhouse in Hawaii

54 Wine Research An example of using ADL and other technologies could be... Chile and NZ both have excellent wines! Chile and NZ both have excellent wines! What makes for a ‘good wine’? What makes for a ‘good wine’? Four factors apparently: Four factors apparently: Soil, Climate, Variety, Terrain Soil, Climate, Variety, Terrain Personal taste of flavour, robustness, etc Personal taste of flavour, robustness, etc Land-sat images, historical data, telemetry devices and analytical methods to: Land-sat images, historical data, telemetry devices and analytical methods to: Identify the ‘good years’ in both countries Identify the ‘good years’ in both countries Compare the data values and develop a set of correlation coefficients Compare the data values and develop a set of correlation coefficients Build a real-time system to predict the next ‘good year ’...then buy up!!! Build a real-time system to predict the next ‘good year ’...then buy up!!!

55 GIS DB Soil {d...........n} Climate {d...........n} Variety {d...........n} Terrain {d...........n} Analytical software (CNN) Information to growers and consumers Telemetry Devices {d...........n} Spatial data Kept current by NASA Real Time Location related Historical data Fuzzy ‘good year’ Input Data Fuzzy feedback Chemical and marketing f’back

56 Project team undertaking research

57 Perhaps you would like to join our team? Research is a serious matter but it has to be fun too! Thank you for listening


Download ppt "A research overview Professor Philip Sallis Auckland University of Technology New Zealand."

Similar presentations


Ads by Google