Science: Text and Language Dr Andy Evans. Text analysis Processing of text. Natural language processing and statistics.

Slides:



Advertisements
Similar presentations
1 I256: Applied Natural Language Processing Marti Hearst Aug 30, 2006.
Advertisements

October 2006Advanced Topics in NLP1 Finite State Machinery Xerox Tools.
ISBN Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl –(on reserve.
1 Regular Expressions & Automata Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.
1 Query Languages. 2 Boolean Queries Keywords combined with Boolean operators: –OR: (e 1 OR e 2 ) –AND: (e 1 AND e 2 ) –BUT: (e 1 BUT e 2 ) Satisfy e.
Regular Expressions in Java. Namespace in XML Transparency No. 2 Regular Expressions Regular expressions are an extremely useful tool for manipulating.
Regular Expressions in Java. Regular Expressions A regular expression is a kind of pattern that can be applied to text ( String s, in Java) A regular.
1 A Quick Introduction to Regular Expressions in Java.
Regular Expressions & Automata Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Regular expression. Validation need a hard and very complex programming. Sometimes it looks easy but actually it is not. So there is a lot of time and.
1 Overview Regular expressions Notation Patterns Java support.
1 Query Languages. 2 Boolean Queries Keywords combined with Boolean operators: –OR: (e 1 OR e 2 ) –AND: (e 1 AND e 2 ) –BUT: (e 1 BUT e 2 ) Satisfy e.
NATURAL LANGUAGE TOOLKIT(NLTK) April Corbet. Overview 1. What is NLTK? 2. NLTK Basic Functionalities 3. Part of Speech Tagging 4. Chunking and Trees 5.
TokensRegex August 15, 2013 Angel X. Chang.
Regular Expressions. String Matching The problem of finding a string that “looks kind of like …” is common  e.g. finding useful delimiters in a file,
Applications of Regular Expressions BY— NIKHIL KUMAR KATTE 1.
Instructor: Dr. Sahar Shabanah Fall Lectures ST, 9:30 pm-11:00 pm Text book: M. T. Goodrich and R. Tamassia, “Data Structures and Algorithms in.
Programming for Geographical Information Analysis: Advanced Skills Lecture 8: Libraries II: Science Dr Andy Evans.
Sadegh Aliakbary. Copyright ©2014 JAVACUP.IRJAVACUP.IR All rights reserved. Redistribution of JAVACUP contents is not prohibited if JAVACUP.
1 Form Validation. Validation  Validation of form data can be cumbersome using the basic techniques  StringTokenizer  If-else statements  Most of.
Description of programming languages 1 Using regular expressions and context free grammars.
 Text Manipulation and Data Collection. General Programming Practice Find a string within a text Find a string ‘man’ from a ‘A successful man’
Regular Expressions in.NET Ashraya R. Mathur CS NET Security.
Using Regular Expressions in Java for Data Validation Evelyn Brannock Jan 30, 2009.
AN IMPLEMENTATION OF A REGULAR EXPRESSION PARSER
Regular Expressions Regular expressions are a language for string patterns. RegEx is integral to many programming languages:  Perl  Python  Javascript.
Perl and Regular Expressions Regular Expressions are available as part of the programming languages Java, JScript, Visual Basic and VBScript, JavaScript,
Languages, Grammars, and Regular Expressions Chuck Cusack Based partly on Chapter 11 of “Discrete Mathematics and its Applications,” 5 th edition, by Kenneth.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 4. Document Search and Regular Expressions.
Regular Expressions CSC207 – Software Design. Motivation Handling white space –A program ought to be able to treat any number of white space characters.
Regular Expression in Java 101 COMP204 Source: Sun tutorial, …
Regular Expressions.
Regular Expression Dr. Tran, Van Hoai Faculty of Computer Science and Engineering HCMC Uni. of Technology
Post-Module JavaScript BTM 395: Internet Programming.
BY Sandeep Kumar Gampa.. What is Regular Expression? Regex in.NET Regex Language Elements Examples Regular Expression API How to Test regex in.NET Conclusion.
 2003 Jeremy D. Frens. All Rights Reserved. Calvin CollegeDept of Computer Science(1/8) Regular Expressions in Java Joel Adams and Jeremy Frens Calvin.
Overview A regular expression defines a search pattern for strings. Regular expressions can be used to search, edit and manipulate text. The pattern defined.
C# Strings 1 C# Regular Expressions CNS 3260 C#.NET Software Development.
When you read a sentence, your mind breaks it into tokens—individual words and punctuation marks that convey meaning. Compilers also perform tokenization.
Module 6 – Generics Module 7 – Regular Expressions.
Python for NLP Regular Expressions CS1573: AI Application Development, Spring 2003 (modified from Steven Bird’s notes)
Regular Expressions in Perl CS/BIO 271 – Introduction to Bioinformatics.
CSCI 3327 Visual Basic Chapter 12: Strings, Characters and Regular Expressions UTPA – Fall 2011.
Elementary Data Organization. Outline  Data, Entity and Information  Primitive data types  Non primitive data Types  Data structure  Definition 
Compositional Program Synthesis from Natural Language and Examples Mohammad Raza, Sumit Gulwani & Natasa Milic-Frayling Microsoft.
CGS – 4854 Summer 2012 Web Site Construction and Management Instructor: Francisco R. Ortega Chapter 5 Regular Expressions.
Standard Types and Regular Expressions CS 480/680 – Comparative Languages.
STL CSSE 250 Susan Reeder. What is the STL? Standard Template Library Standard C++ Library is an extensible framework which contains components for Language.
Regular Expressions /^Hel{2}o\s*World\n$/ SoftUni Team Technical Trainers Software University
An Introduction to Regular Expressions Specifying a Pattern that a String must meet.
Regular expressions Day 11 LING Computational Linguistics Harry Howard Tulane University.
Regular Expressions /^Hel{2}o\s*World\n$/ SoftUni Team Technical Trainers Software University
Pattern Matching: Simple Patterns. Introduction Programmers often need to scan a file, directory, etc. for a specific substring. –Find all files that.
OOP Tirgul 11. What We’ll Be Seeing Today  Regular Expressions Basics  Doing it in Java  Advanced Regular Expressions  Summary 2.
Problem Solving with NLTK MSE 2400 EaLiCaRA Dr. Tom Way.
Regular Expressions Upsorn Praphamontripong CS 1110
Natural Language Processing (NLP)
/^Hel{2}o\s*World\n$/
JAVA RegEx Manish Shrivastava 11/11/2018.
Query Languages.
Selenium WebDriver Web Test Tool Training
CS 1111 Introduction to Programming Fall 2018
Matcher functions boolean find() Attempts to find the next subsequence of the input sequence that matches the pattern. boolean lookingAt() Attempts to.
PolyAnalyst Web Report Training
Natural Language Processing (NLP)
Regular Expressions in Java
Regular Expressions in Java
Regular Expression in Java 101
Regular Expressions in Java
Natural Language Processing (NLP)
Presentation transcript:

Science: Text and Language Dr Andy Evans

Text analysis Processing of text. Natural language processing and statistics.

Processing text: Regex Java Regular Expressions java.util.regex Regular expressions: Powerful search, compare (and replace) tools. (other types of regex include direct replace options – in java regex these are separate methods)

Regex Standard java: if > 0) && ( .endsWith(“.org”))) { return true; } Regex version: return true;

Example components [abc] a, b, or c (simple class) [^abc] Any character except a, b, or c (negation) [a-zA-Z] a through z, or A through Z, inclusive (range) [a-d[m-p]] a through d, or m through p: [a-dm-p] (union) [a-z&&[def]] d, e, or f (intersection) [a-z&&[^bc]] a through z, except for b and c: [ad-z] (subtraction) [a-z&&[^m-p]] a through z, and not m through p: [a-lq-z] (subtraction). Any character (may or may not match line terminators) \d A digit: [0-9] \D A non-digit: [^0-9] \s A whitespace character: [ \t\n\x0B\f\r] \S A non-whitespace character: [^\s] \w A word character: [a-zA-Z_0-9] \W A non-word character: [^\w] ?Once or not at all * Zero or more times + One or more times

Matching Find all words that start with a number. Pattern p = Pattern.compile(“\\d\\.*”); Matcher m = p.matcher(stringToSearch); while (m.find()) { String temp = m.group(); System.out.println(temp); }

Replacing replaceFirst(String regex, String replacement) replaceAll(String regex, String replacement)

Regex Good start is the tutorial at: Also Mehran Habibi’s Java Regular Expressions.

Natural Language Processing A large part is Part of Speech (POS) Tagging: Marking up of text into nouns, verbs, etc., usually based on the location in the text and other context rules. Often formulates these rules using machine-learning (of various kinds), training the program on corpora of marked-up text. Used for : Text understanding. Knowledge capture and use. Text forensics.

NLP Libraries Popular are: Natural Language Toolkit (NLTK; Python) OpenNLP (Java)

OpenNLP Sentence recognition and tokenising. Name extraction (including placenames). POS Tagging. Text classification. For clear examples, see the manual at:

Other info Other than the Numerical Recipes books, the other classic texts are Donald E. Knuth’s The Art of Computer Programming Fundamental Algorithms Seminumerical Algorithms Sorting and Searching Combinatorial Algorithms But at this stage, you’re better off getting…

Other info Michael T. Goodrich and Roberto Tamassia’s Data Structures and Algorithms in Java. Basic java, arrays and list. Recursion in algorithms. Key mathematical algorithms. Algorithm analysis. Data storage structures (stacks, queues, hashtables, binary trees, etc.) Search and sort. Text processing. Graph/network analysis. Memory management.