A research overview Professor Philip Sallis Auckland University of Technology New Zealand.

Slides:



Advertisements
Similar presentations
The Messy World of Grey Literature in Cyber Security 8 th Grey Literature Conference 4-5 December 2006 New Orleans, Louisiana Patricia Erwin – I3P Senior.
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Doug Elliott Professor, Critical Care Nursing The final step: Presentation and publication Research Workshop: Conducting research in a clinical setting.
GEOGRAPHIC INFORMATION SYSTEMS PRESENTATION 1
Page 1 of 50 Optimization of Artificial Neural Networks in Remote Sensing Data Analysis Tiegeng Ren Dept. of Natural Resource Science in URI (401)
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
New Technologies Supporting Technical Intelligence Anthony Trippe, 221 st ACS National Meeting.
Direction of Research in Statistics and Applied Statistics: Physical Sciences and Engineering John Borkowski Professor of Statistics Montana State University.
Evis Trandafili Polytechnic University of Tirana Albania Functional Programming Languages 1.
STATISTICS DEFINITION AND MEANING
Raster Based GIS Analysis
School of Environmental Sciences University of East Anglia
Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.
CPSC 695 Future of GIS Marina L. Gavrilova. The future of GIS.
Engineering Data Analysis & Modeling Practical Solutions to Practical Problems Dr. James McNames Biomedical Signal Processing Laboratory Electrical & Computer.
Development of a Community Hydrologic Information System Jeffery S. Horsburgh Utah State University David G. Tarboton Utah State University.
Who am I and what am I doing here? Allan Tucker A brief introduction to my research
UNDERSTANDING SPATIAL DISTRIBUTION OF ASTHMA USING A GEOGRAPHICAL INFORMATION SYSTEM Mohammad A. Rob Management Information Systems University of Houston-Clear.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
Geographic Information System Geog 258: Maps and GIS February 17, 2006.
What is a GIS? Geospatial technologies are technolo- gies for collecting and dealing with geographic information. There are three main types: Global.
GEOMATICS AND GEOINFORMATICS IN MODERN INFORMATION SOCIETY PROJECTION OF NEW TRENDS INTO THEIR CURRICULA AT THE UNIVERSITY OF WEST BOHEMIA IN PILSEN Jiří.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
OOSE 01/17 Institute of Computer Science and Information Engineering, National Cheng Kung University Member:Q 薛弘志 P 蔡文豪 F 周詩御.
Åke Sivertun FHS Critique systems for Geographic information and GIS Åke Sivertun Swedish National Defence College. Box , Stockholm, Sweden.
Dr. M. Ahsan Latif Department of Computer Science
Introduction to MATLAB adapted from Dr. Rolf Lakaemper.
Presented by the American Statistical Association.
Indiana GIS Conference, March 7-8, URBAN GROWTH MODELING USING MULTI-TEMPORAL IMAGES AND CELLULAR AUTOMATA – A CASE STUDY OF INDIANAPOLIS SHARAF.
‘INFORMATICS & MULTIMEDIA’ Department of Applied Informatics & Multimedia School of Applied Technology TEI-Crete.
Data and Social Research Chuck Humphrey Data Library Rutherford North Library.
Health Datasets in Spatial Analyses: The General Overview Lukáš MAREK Department of Geoinformatics, Faculty.
1 Improving Statistics for Food Security, Sustainable Agriculture and Rural Development – Action Plan for Africa THE RESEARCH COMPONENT OF THE IMPLEMENTATION.
1 Computer Programming (ECGD2102 ) Using MATLAB Instructor: Eng. Eman Al.Swaity Lecture (1): Introduction.
Mondays, 3:00-3:50 p.m. Wilkinson credit Geo 507 Virtual Seminar in Geographic Information Science.
 ByYRpw ByYRpw.
1 Enviromatics Environmental sampling Environmental sampling Вонр. проф. д-р Александар Маркоски Технички факултет – Битола 2008 год.
Group 6 Application GPS and GIS in agricultural field.
Meeting with ESL Students October 24th & 25th 2007
Information Retrieval
Web Design and Development. World Wide Web  World Wide Web (WWW or W3), collection of globally distributed text and multimedia documents and files 
Peer Teaching Assignment CTCH 603 Chris Braun John Wallin.
SSQSA present and future Gordana Rakić, Zoran Budimac Department of Mathematics and Informatics Faculty of Sciences University of Novi Sad
1-1 Copyright © 2014, 2011, and 2008 Pearson Education, Inc.
Åke Sivertun FHS Critique systems for Geographic information and GIS Åke Sivertun Swedish National Defence College. Box , Stockholm, Sweden.
Geography is part of our everyday lives. Geography Matters!
Statistics and Probability Theory Lecture 01 Fasih ur Rehman.
Perspectives from the Next Generation of Repository Managers
SaMeHFor Egyptian Cement Company1 2. Digital Terrain Models Dr. SaMeH Saadeldin Ahmed Assistant professor of Mining and Environmental Engineering
Lessons Learned from the production of Gridded Population of the World Version 4 (GPW4) Columbia University, CIESIN, USA EFGS October 2014.
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
Center for Satellite Applications and Research (STAR) Review 09 – 11 March 2010 Image: MODIS Land Group, NASA GSFC March 2000 STAR Enterprise Synthesis.
Cluster Analysis This work is created by Dr. Anamika Bhargava, Ms. Pooja Kaul, Ms. Priti Bali and Ms. Rajnipriya Dhawan and licensed under a Creative Commons.
Natural Language Processing (NLP)
Review of RRSF Implementation ICT and Geo-information
Dr. A .K. Bhattacharyya Professor EEI(NE Region), AAU, Jorhat
TRENT UNIVERSITY 2007 CGPSS REPORT
INTRODUCTION TO BASIC MATLAB
Introduction to MATLAB
Compiler Construction
What's New in eCognition 9
Curriculum in Context.
Introduction to Matlab
Natural Language Processing (NLP)
What's New in eCognition 9
What's New in eCognition 9
STEPS Site Report.
Natural Language Processing (NLP)
Presentation transcript:

A research overview Professor Philip Sallis Auckland University of Technology New Zealand

Greetings from Aotearoa-New Zealand (the official bi-cultural name) Maori and English

Introductions Dr Philip Sallis (Computer Science) NLP and Computational Linguistics Professor and Senior University Academic The University Deputy Vice Chancellor (Vice Rector or Provost) Dr Kathy Garden (Electrical Engineering) Computer Tomography and Signal Processing Dean, Faculty of Design and Creative Technologies and Regional Pro Vice Chancellor

Brief Curriculum Vitae Kathy Kathy PhD (NZ) and Post Doc Fellow (USA) (Electrical Engineering) PhD (NZ) and Post Doc Fellow (USA) (Electrical Engineering) Univ teaching, research & supervision (NZ) Univ teaching, research & supervision (NZ) Government science policy advisor Government science policy advisor Industry and Regional Govt strategic advisor Industry and Regional Govt strategic advisor AUT – Dean and Pro Vice Chancellor AUT – Dean and Pro Vice Chancellor Philip Philip PhD (Computer Science) (England) PhD (Computer Science) (England) Univ teaching, research & supervision (UK, Aust, NZ) Univ teaching, research & supervision (UK, Aust, NZ) Visiting research professor (UK, USA, HK...and Chile!) Visiting research professor (UK, USA, HK...and Chile!) Industry consulting and government commissions Industry consulting and government commissions Full Professor since 1987 and HoD three times since 1979 Full Professor since 1987 and HoD three times since 1979 Deputy Vice Chancellor at AUT since 1999 Deputy Vice Chancellor at AUT since 1999

Auckland University of Technology 26,000 students (full and part-time) 26,000 students (full and part-time) 10% post-graduate (Masters and PhD) 10% post-graduate (Masters and PhD) 23% International students (70% in degrees, 30% short courses) 23% International students (70% in degrees, 30% short courses) Faculties: Faculties: Design and Creative Technologies Design and Creative Technologies Business and Law Business and Law Health and Environmental Sciences Health and Environmental Sciences Humanities Humanities Maori Development Maori Development

About this Presentation Research in general Research in general Overview of my own research Overview of my own research Description of two areas of work: Description of two areas of work: Software forensics Software forensics Digital Libraries Digital Libraries Publications and further information Publications and further informationwww.aut.ac.nz/serl Enquiries re PhD supervision Enquiries re PhD supervision Leopoldo Leoncio

Research Mix (‘Typical’) Papers & Confs Reports Products Alone In teams (Ideas + People + Funding + Work) = Results Funding sources: university funds government grants international grants industry contracts

To choose one aspect of research that is effort and cost effective - Clustering P P P Usually more appealing for grant providers too! New one has recently emerged at UCM

‘Teams are best’ – why? Sharing and testing ideas Sharing and testing ideas Mix of expertise for multi dimension and inter disciplinary research Mix of expertise for multi dimension and inter disciplinary research Division of labour (efficient and effective) Division of labour (efficient and effective) Peer pressure to reach conclusions and achieve outcomes – publish papers etc Peer pressure to reach conclusions and achieve outcomes – publish papers etc Writing grant applications Writing grant applications Using more names to strengthen proposals Using more names to strengthen proposals Demonstrating collaboration Demonstrating collaboration Inter colleague, inter institution, inter national Inter colleague, inter institution, inter national

A team at work and play!

My Research Map Program and data structures. Compilers. Text Parsing Algorithms Data modelling & DBMS Software development process models (CMM) etc Measurement and improvement of effort, activity and product Computational linguistics (stylometrics) Software Metrics Software Metrics Software Forensics PhD research NLP /NLU

A journey with computing Elliott 503 PdP1100 & 1125 B6700 & 2700 HP2100A & 3000 ICL1902T & 1905E IBM 1401,360, 6000 Prime 710 VAX 700 series Onyx, Sun, Mac, PC

Inspiration

Performance Analysis and Improvement milieu The computer DevelopersUsers Data & schema Program code & structure Outputs URS Unplanned input Stress Testing using value changes to parameters and variables in all aspects of the system. Simulation.

An early interest in NLP Program code meta languages, Compilers, S-Grammars, parsing algorithms for text proc Command and Edit languages and parsiing NLP and symbol processing – symbolic AI methods Full text, narrative and discourse analysis n

Two research areas emerged and then merged as Software Forensics Software Engineering Computational Linguistics A fascination with the delta!

Software Engineering research Algorithms, program and data structures, programming style Measuring aspects of the process such as programmer productivity (4GLs) User and use profiling for system optimisation using simulation and other methods. ‘Programming in the large’ and the software system development process Process and URS improvement. Data modelling and DB design. Time & Cost estimation. CASE. Mathematics and Computer Science Prog Lang, Operating Systems, Compilers System integration, blended data applications (GIS) and their usability measurement.

Computational Linguistics research String handling Algs, Command Editors, Text processing, Bibliometrics (NLP) Transformational Grammars, meta- information and ‘deep’ structures Authorship authentication Topic clustering depictions Symbolic AI. Formal representation of meaning & semantics (NLU). Epistomology. PhD - A domain grammar and parser for generating abstracts from journal articles Mathematics and Computer Science Prog Lang, Operating Systems, Compilers Stylometric parsers for thematic analysis, topic clustering, etc

Forensics - convergence and incorporation of new technologies Programming languages, interpreters, CASE, etc Programming languages, interpreters, CASE, etc Geographic Information Systems (GIS) Geographic Information Systems (GIS) Global Positioning Systems (GPS) Global Positioning Systems (GPS) Voice over IP (VoIP) Voice over IP (VoIP) Voice Recognition Voice Recognition Wireless and GPRS...now RFID Wireless and GPRS...now RFID Computational Neural Networks Computational Neural Networks Algorithms, data structures, pattern matching Algorithms, data structures, pattern matching Connectionist alternatives for clustering etc Connectionist alternatives for clustering etc RISC technologies RISC technologies Bio-informatics (first NZ course as PG Dip) Bio-informatics (first NZ course as PG Dip) (bio-medical data [text, image and telemetry] and technologies) (bio-medical data [text, image and telemetry] and technologies)

Fingerprint parsing algorithms Count everything Compare everything A lot of data Program, data and image names File extensions and Temporary files Variable, parameter and label names Expressions and data structures (arrays etc) Structure – iteration, recursion, formulae, etc Algorithm characteristics Sub routines, case statements, DB calls, etc Word, sentence & paragraph count Length of words, sentences, etc Word frequencies Phrases and adjacent word pairs Nouns and pronouns, adjectives, etc Prepositions, positive/negative exp Compare with Canon Corpora Differences in expression

Some Forensics Tools Identify (program and data structure comparisons) Identify (program and data structure comparisons) Beyond Compare (file, variable, labels, line match) Beyond Compare (file, variable, labels, line match) Signature (stylometric comparisons) Signature (stylometric comparisons) Viscovery (Data and results visualisation) Viscovery (Data and results visualisation) www. eudaptics.com www. eudaptics.com Improve English (readability & comprehension tests) Improve English (readability & comprehension tests) ( (

Processing Forensics Data Programming Languages - SNOBOL, LISP, Prolog, C++, Perl Programming Languages - SNOBOL, LISP, Prolog, C++, Perl Data Management - flat and I-S files, RDBMS, MySQL, php, ASP etc Data Management - flat and I-S files, RDBMS, MySQL, php, ASP etc Statistical methods - probability, inference, prediction SPSS and Excel Statistical methods - probability, inference, prediction SPSS and Excel Connectionist alternatives for dependency analysis (FNN) - KEDRI Connectionist alternatives for dependency analysis (FNN) - KEDRI Cluster analysis MatLab and Visualisation alternatives (Viscovery) Cluster analysis MatLab and Visualisation alternatives (Viscovery)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ A lot of data to analyse, represent and reach conclusions about Not an exact science (closeness of fit but also human interpretation)

Example formatted results after program code comparison Titles Beyond Compare Software Computer Comparison report: DJFruits Terms of Trade 60 Lines match 13 match on left side only 49 match on right side only 25 lines with important differences 0 lines with only unimportant differences 17 sections different Plaintiff Terms of Trade DJ Fruits Personal Hygiene Policy 3 Lines match7 match on left side only 93 match on right side only 31 lines with important differences 0 lines with only unimportant differences 4 sections different Plaintiff Personal Hygiene Policy DJ Fruits Process Policy 1 Lines match2 match on left side only 21 match on right side only 22 lines with important differences 0 lines with only unimportant differences 2 sections different Plaintiff Process Policy DJ Fruits Trace back and Recalls 1 Lines match0 match on left side only 34 match on right side only 23 lines with important differences 0 lines with only unimportant differences 2 sections different Plaintiff Trace back and Recalls DJ Fruits Control 1 Lines match1 match on left side only 20 match on right side only 14 lines with important differences 0 lines with only unimportant differences 2 sections different Plaintiff Control DJ Fruits Validation 2 Lines match0 match on left side only 64 match on right side only 14 lines with important differences 0 lines with only unimportant differences 3 sections different Plaintiff Validation

Example formatted results after documentation comparison DateTitleGrade LevelReading EaseReadability# of words Average syllables per word# of sentences Average words per sentence DJFruits Terms of Trade Plaintiff Terms of Trade DJ Fruits Personal Hygiene Policy Plaintiff Personal Hygiene Policy DJ Fruits Process Policy Plaintiff Process Policy DJ Fruits Trace back and Recalls Plaintiff Trace back and Recalls DJ Fruits Control Plaintiff Control DJ Fruits Validation Plaintiff Validation English Language Comparisons of files: using

Raw Data Counting begin to build a ‘fingerprint picture’ ‘stylometrics’ filenamecharacters_incl_blankscharactersword_countuniq_words CasketLetterEight.txt CasketLetterFive.txt CasketLetterFour.txt CasketLetterOne.txt CasketLetterSeven.txt CasketLetterSix.txt CasketLetterThree.txt CasketLetterTwo.txt Letter1.txt Letter2.txt Letter3.txt Letter4.txt Totals

Writing/Readability Tests the picture becomes more complex filename Fog Flesch FleschKincaid CasketLetterEight.txt CasketLetterFive.txt CasketLetterFour.txt CasketLetterOne.txt CasketLetterSeven.txt CasketLetterSix.txt CasketLetterThree.txt CasketLetterTwo.txt Letter1.txt Letter2.txt Letter3.txt Letter4.txt

Word and sentence frequencies, length, etc – visualisation

Typical Histogram Visualisation

Typical Clustering Visualisation

Conventional line graph visualisation all assist interpretation

Typical co-efficient vector linkage visualisation

Transposed co-efficients for greater granularity (more precision)

Greater the data comparison set, more the need for clarity

Alternative cluster depiction SOM (Kohonan methods) Viscovery

Example cluster dependency depiction for border coefficients

Still a need for conventional depictions to reach conclusions

Especially for multivariate clusters

In summary A blend of conventional statistics and visualisation methods with new alternative (connectionist) methods brings more precision and greater clarity to the mix of precise and imprecise data!

Sample published research combining software metrics and stylometrics Semantic structures in empirical science text (from PhD) Semantic structures in empirical science text (from PhD) Generating abstracts from journal articles (from PhD) Generating abstracts from journal articles (from PhD) Railway fault report narrative analaysis (with GIS) Railway fault report narrative analaysis (with GIS) Emergency services events and resources (with GIS) Emergency services events and resources (with GIS) Case Law comparisons with Legislation Preambles Case Law comparisons with Legislation Preambles Family Law topic clustering, Law and Action Taken Family Law topic clustering, Law and Action Taken Dialogue topic clustering ( traffic project) Dialogue topic clustering ( traffic project) Text editing for the visually impaired (Voice Recognition) Text editing for the visually impaired (Voice Recognition) Semantic dependency depiction (CNNs and SOMs) Semantic dependency depiction (CNNs and SOMs) Canonical Scripture analysis (themes eg. “justice”) Canonical Scripture analysis (themes eg. “justice”) English Language expression/readability algorithms English Language expression/readability algorithms Letters of St. Ignatius of Antioch (authorship - extra) Letters of St. Ignatius of Antioch (authorship - extra) Letters of Mary Queen of Scots (authorship - intra) Letters of Mary Queen of Scots (authorship - intra) Litigation projects for copyright etc (Law Courts) Litigation projects for copyright etc (Law Courts)

Ongoing Forensics Work Conduct litigation work as it comes Conduct litigation work as it comes Authorship authentication, including plagiarism Authorship authentication, including plagiarism Narrative analysis (topics, themes, etc) Narrative analysis (topics, themes, etc) Always interested in new approaches, methods and tools...also joint projects and PhD Students! Always interested in new approaches, methods and tools...also joint projects and PhD Students! Life after DVC administration work Life after DVC administration work

End of Forensics Presentation

In 1994 a new and different project Digital Libraries Alexandria Digital Library Project Alexandria Digital Library Project NZADL NZADLwww.nzadl.org

Alexandria Digital Library (1994) to map the surface of the earth using land sat, radio spectrometry and orthophoto imagery from NASA etc to map the surface of the earth using land sat, radio spectrometry and orthophoto imagery from NASA etc US$9 million (ADL). New US$15 million (NGDA) US$9 million (ADL). New US$15 million (NGDA) a distributed digital library with collections of geo referenced materials and services for accessing collections...a super powerful GIS for research! a distributed digital library with collections of geo referenced materials and services for accessing collections...a super powerful GIS for research! Expectation to build applications by integrating environmental and other data with the images Expectation to build applications by integrating environmental and other data with the images Researchers from 5 US univs, 4 other countries...and AUT Researchers from 5 US univs, 4 other countries...and AUT

The ADL Project Invitation to UCSB Invitation to UCSB Map and Imagery Laboratory Map and Imagery Laboratory Alexandria Digital Library Project (NSF) Alexandria Digital Library Project (NSF) Methods for measuring system performance Methods for measuring system performance Profile system users Profile system users Profile system use Profile system use Observe correlations and process dynamics Observe correlations and process dynamics System optimisation & operation management System optimisation & operation management Result = a sampling and simulation suite - metrics again! Result = a sampling and simulation suite - metrics again!

Numerous collections of ADL digital images Topographical and terrain maps Topographical and terrain maps Geospatial and geodetic images Geospatial and geodetic images Marine geodetic and composition Marine geodetic and composition Environmental and climatalogical Environmental and climatalogical Demographic and land utilisation Demographic and land utilisation Object location mapping Object location mapping Sundry specific image collections Sundry specific image collections

Applications Forestry and crop management Forestry and crop management Land utilisation changes Land utilisation changes Environmental influence mapping Environmental influence mapping Tectonic displacement Tectonic displacement Topographical alterations post typhoon Topographical alterations post typhoon Marine pollution and fisheries management Marine pollution and fisheries management Demographic density trends Demographic density trends etc etc

Upon loading the NGDA collection browser Landsat imagery over the US is loaded by default

USA - The Great Lakes Area

NASA land sat of NZ

Telephoto Terrain Projection

Stereoscopic (orthophoto) showing physical boundary features (NZ)

Satellite image collection of the Maya Forest Mexico

Scripps Institute Collection An orchid greenhouse in Hawaii

Wine Research An example of using ADL and other technologies could be... Chile and NZ both have excellent wines! Chile and NZ both have excellent wines! What makes for a ‘good wine’? What makes for a ‘good wine’? Four factors apparently: Four factors apparently: Soil, Climate, Variety, Terrain Soil, Climate, Variety, Terrain Personal taste of flavour, robustness, etc Personal taste of flavour, robustness, etc Land-sat images, historical data, telemetry devices and analytical methods to: Land-sat images, historical data, telemetry devices and analytical methods to: Identify the ‘good years’ in both countries Identify the ‘good years’ in both countries Compare the data values and develop a set of correlation coefficients Compare the data values and develop a set of correlation coefficients Build a real-time system to predict the next ‘good year ’...then buy up!!! Build a real-time system to predict the next ‘good year ’...then buy up!!!

GIS DB Soil {d n} Climate {d n} Variety {d n} Terrain {d n} Analytical software (CNN) Information to growers and consumers Telemetry Devices {d n} Spatial data Kept current by NASA Real Time Location related Historical data Fuzzy ‘good year’ Input Data Fuzzy feedback Chemical and marketing f’back

Project team undertaking research

Perhaps you would like to join our team? Research is a serious matter but it has to be fun too! Thank you for listening