NOOJ 0.1 Max Silberztein Université de Franche-Comté 6th INTEX Workshop Sofia, Bulgaria, May 2003.

Slides:



Advertisements
Similar presentations
Office Links - Sharing Data in Microsoft Office A Mixed Bag of Treasures Chester N. Barkan Registrar Long Island University, C.W.Post Campus.
Advertisements

Building International Applications with Visual Studio.NET Achim Ruopp International Program Manager Microsoft Corporation.
Java Script Session1 INTRODUCTION.
QDV 7 Overview A powerful estimating tool designed to match up with your own specific methodologies.
Tutorial 12: Enhancing Excel with Visual Basic for Applications
The Web Warrior Guide to Web Design Technologies
XP Information Technology Center - KFUPM1 Microsoft Office FrontPage 2003 Creating a Web Site.
Converting Microsoft Office Documents Bill Weber E-Learning Systems Administrator E-Learning Operations.
Using Visual Basic 6.0 to Create Web-Based Database Applications
Introduction to SVG & Batik Presented by Shang-Ming Huang.
Tutorial 10 Programming with JavaScript
Advanced Object-Oriented Programming Features
Thayer School of Engineering Dartmouth Lecture 2 Overview Web Services concept XML introduction Visual Studio.net.
 Definition of HTML Definition of HTML  Tags in HTML Tags in HTML  Creation of HTML document Creation of HTML document  Structure of HTML Structure.
Unicode, character sets, and a a little history. Historical Perspective First came EBCIDIC (6 Bits?) Then in the early 1960s came ASCII – Most computers.
HTML 1 Introduction to HTML. 2 Objectives Describe the Internet and its associated key terms Describe the World Wide Web and its associated key terms.
Open Office Writer Computer Training Programme Speaker Kumardeep Chaudhary 17th September 2014 (Wednesday)
Nikto LUCA ALEXANDRA ADELA. Nikto  Web server assessment tool  Written by Chris Solo and David Lodge  Released on December 27, 2001  Stable release:
ECA 228 Internet/Intranet Design I Meta Tags & Directories.
The Road to Pagination Steve Drucker CEO Fig Leaf Software.
 Using Microsoft Expression Web you can: › Create Web pages and Web sites › Set what you site will look like as you design it › Add text, images, multimedia.
Microsoft ® Word Templates and Accessibility. 1 What is a Word template? File with a.dot (document template) extension Can define the following:  Paragraph.
A First Program Using C#
Localizing OpenClinica Hiroaki Honshuku: SQA 1. © What is Character Encoding?  Morse Code (1840) → Latin Alphabet  ASCII (1963)  The American Standard.
2.1 Different Text Attributes Font A set of printable or displayable text characters with its style and size specified Arial 16 point bold Arial 32 point.
1 Introduction to.NET Framework. 2.NETFramework Internet COM+ Orchestration Orchestration Windows.NET Enterprise ServersBuildingBlockServices Visual Studio.NET.
Lecture 2 Character Codes and Low-Structure Text Document Formats.
Basics of Web Databases With the advent of Web database technology, Web pages are no longer static, but dynamic with connection to a back-end database.
ASP.NET + Ajax Jesper Tørresø ITNET2 F08. Ajax Ajax (Asynchronous JavaScript and XML) A group of interrelated web development techniques used for creating.
Tutorial 11 Using and Writing Visual Basic for Applications Code
XSLT for Data Manipulation By: April Fleming. What We Will Cover The What, Why, When, and How of XSLT What tools you will need to get started A sample.
Another PillowTalk Presentation  2004 Dynamic Systems, Inc. Introduction to XML for SOA Lee H. Burstein,
1. Chapter 9 Maintaining Documents 3 Managing Files As with physical documents, folders, and filing cabinets, electronic files and folders must be well.
Lecture 1 Programming in C# Introducing C# Writing a C# Program.
JasperReports and iReport Training Joe Ferrall Senior Programmer/Analyst - NWOCA.
CDS/ISIS Clearing House Workshop 2003 – Patrick Huby, Davide Storti Recent developments.
Microsoft Excel 2007 © Wiley Publishing All Rights Reserved. The L Line The Express Line to Learning L Line.
A new lexical module for NooJ Max Silberztein LASELDI, Université de Franche Comté.
Client Scripting1 Internet Systems Design. Client Scripting2 n “A scripting language is a programming language that is used to manipulate, customize,
Chapter 13-Tools for the World Wide Web. Overview Web servers. Web browsers. Web page makers and site builders. Plug-ins and delivery vehicles. Beyond.
Lesson 2 – Editing a Document Microsoft Word
XP Tutorial 10New Perspectives on Creating Web Pages with HTML, XHTML, and XML 1 Working with JavaScript Creating a Programmable Web Page for North Pole.
XP New Perspectives on Microsoft Office FrontPage 2003 Tutorial 1 1 Microsoft Office FrontPage 2003 Tutorial 1 – Creating a Web Site.
2XML Marko Tadić Department of linguistics, Faculty of philosophy, University of Zagreb ( Tübingen,
10 – 12 APRIL 2005 Riyadh, Saudi Arabia. Building multi-lingual ASP.Net application that handle western languages and Arabic with a single code base.
1 st Semester Module2 Basic C# Concept อภิรักษ์ จันทร์สร้าง Aphirak Jansang Computer Engineering.
SVETLA KOEVA SVETLOZARA LESEVA BORISLAV RIZOV. The project Automatic information extraction based on semantic relations (RILA – a bilateral co-operation.
McGraw-Hill/Irwin © 2004 by The McGraw-Hill Companies, Inc. All rights reserved. Understanding How XML Works Ellen Pearlman Eileen Mullin Programming the.
LINGUATECA FLUP/CLUP The Corpógrafo – a Web-based environment for corpora research extract Term Candidates.
ASP. ASP is a powerful tool for making dynamic and interactive Web pages An ASP file can contain text, HTML tags and scripts. Scripts in an ASP file are.
HTML Concepts and Techniques Fifth Edition Chapter 1 Introduction to HTML.
1 JavaScript 2.0: Evolving a Language for Evolving Systems Waldemar Horwat Netscape.
Users are moving towards web applications Content on the web is more personal & meaningful Development on the web is easier than the OS.
Lesson 5 MULTIMEDIA. Multimedia on the Web has expanded rapidly as broadband connections have allowed users to connect at faster speeds. Almost all Web.
Tutorial 10 Programming with JavaScript. 2New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition Objectives Learn the history of JavaScript.
Internet Searching the World Wide Web. The Internet and the World Wide Web The Internet is a worldwide collection of networks that allows people to communicate.
Microsoft Expression Web - Illustrated Unit A: Getting Started With Microsoft Expression Web.
COMPREHENSIVE Excel Tutorial 12 Expanding Excel with Visual Basic for Applications.
XP Creating Web Pages with Microsoft Office
Lesson 9 Sharing Documents
PHP Introduction.
Microsoft Office Illustrated
PHP.
XML Problems and Solutions
Lesson 5: Multimedia on the Web
Comp Org & Assembly Lang
System Software: Operating system, Utility Programs, & File Management
CSE591: Data Mining by H. Liu
SharePoint Word automation SERVICES
Assoc. Prof. Hussam Elbehiery
Presentation transcript:

NOOJ 0.1 Max Silberztein Université de Franche-Comté 6th INTEX Workshop Sofia, Bulgaria, May 2003

NOOJ v0.1 Rewritten entirely Fully compatible with INTEX 4.3x Corpus processor WEB support Support for any Text Encoding & Format Object-Oriented linguistic engine Dynamic Programming with Published methods

Corpus Processor A Corpus is a set of homogeneous files: same language, same linguistic resources A corpus may include tens of thousand small (i.e. WEB pages) or large files (i.e. Le Monde, 1 year), stored anywhere Different corpora can share certain text files with no extra cost

Text Encoding & Format support Native support means NO FILE CONVERSION -- for TXT files: Windows Default (8 or 16 bits), DOS & ISO (any codepage), Unicode (7 bits, 8 bits, 16 bits, low and big endian) -- for HTML (any encoding) and RTF (ASCII & Unicode) files -- for Microsoft Word files, any version including Apple -- No limit: XML, PDF, LaTeX, Outlook...

WEB support Nooj includes a WEB crawler that can import WEB sites Exploration is performed up to a user- declared depth, or until the WEB site is fully explored (danger!) Indirections are processed during the exploration; they may produce empty text files (i.e. no text unit in the WEB page).

OO linguistic engine the engine can be easily adapted –inheritance means that one can build quickly a new module by inheriting another module’s properties, by default –override means that one needs to provide description only for the methods that perform tasks differently dynamic programming: –NOOJ loads parts of the linguistic engine only when needed –describing extremely specific phenomena or behavior carries no cost for the overall architecture open interface: –Applications access NOOJ from command-line programs (i.e. SHELL), as well as from object & class methods, from user’s programs or from other applications (such as Microsoft Office or Adobe Acrobat).

Français.il namespace Nooj { public class Français: Language // Language is a virtual class; MUST BE OVERRIDEN { // tokenizer public override static bool rightToLeft () { return false; } // true for Arabic public override static bool oneCharPerToken () { return false; } // true for Chinese public override static bool transcription () { return false } // true if text processed != text displayed... // tokens’ properties public override static bool upperCaseLetter (char letter) {... } public override static bool lowerCaseLetter (char letter) {... } public override static bool lowerCassForm (string token) {... }... // dictionary & list lookup & match public override static int compareForms (string wform1,string wform2) {... } public override static int matchForm (string wform,string entry) {... }... // localization... }

Perspectives Text processing is fully operational; Linguistic engine is ½ operational Morphological module by September Dictionaries (new types & tools) by Sept. Grammars developing tools (new types and tools) by end of 2003 => Alpha version by the end of 2003 All 4.3x functionalities by may 2004