Social Network Inspired Models of NLP and Language Evolution

Slides:

Advertisements

Similar presentations

Complex Network Theory

Advertisements

Scale Free Networks.

Traffic-driven model of the World-Wide-Web Graph A. Barrat, LPT, Orsay, France M. Barthélemy, CEA, France A. Vespignani, LPT, Orsay, France.

Analysis and Modeling of Social Networks Foudalis Ilias.

VL Netzwerke, WS 2007/08 Edda Klipp 1 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Networks in Metabolism.

Analysis of Social Media MLD , LTI William Cohen

Self-Organization of the Sound Inventories: An Explanation based on Complex Networks.

Practical Applications of Complex Network Theory Niloy Ganguly (IIT Kharagpur)

Advanced Topics in Data Mining Special focus: Social Networks.

CS 599: Social Media Analysis University of Southern California1 The Basics of Network Analysis Kristina Lerman University of Southern California.

4. PREFERENTIAL ATTACHMENT The rich gets richer. Empirical evidences Many large networks are scale free The degree distribution has a power-law behavior.

Weighted networks: analysis, modeling A. Barrat, LPT, Université Paris-Sud, France M. Barthélemy (CEA, France) R. Pastor-Satorras (Barcelona, Spain) A.

1 Evolution of Networks Notes from Lectures of J.Mendes CNR, Pisa, Italy, December 2007 Eva Jaho Advanced Networking Research Group National and Kapodistrian.

Complex Networks Third Lecture TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA TexPoint fonts used in EMF. Read the.

Scale-free networks Péter Kómár Statistical physics seminar 07/10/2008.

Small Worlds Presented by Geetha Akula For the Faculty of Department of Computer Science, CALSTATE LA. On 8 th June 07.

Mining and Searching Massive Graphs (Networks)

Networks FIAS Summer School 6th August 2008 Complex Networks 1.

Network Statistics Gesine Reinert. Yeast protein interactions.

Peer-to-Peer and Grid Computing Exercise Session 3 (TUD Student Use Only) ‏

Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.

Large-Scale Organization of Semantic Networks Mark Steyvers Josh Tenenbaum Stanford University.

Advanced Topics in Data Mining Special focus: Social Networks.

1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 7 May 14, 2006

Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.

Computer Science 1 Web as a graph Anna Karpovsky.

Redundancy Ratio: An Invariant Property of the Consonant Inventories of the World’s Languages Animesh Mukherjee, Monojit Choudhury, Anupam Basu and Niloy.

Modeling Cross-linguistic Relationships Across Consonant Inventories: A Complex Network Approach.

Models of Influence in Online Social Networks

Large-scale organization of metabolic networks Jeong et al. CS 466 Saurabh Sinha.

Optimization Based Modeling of Social Network Yong-Yeol Ahn, Hawoong Jeong.

(Social) Networks Analysis III Prof. Dr. Daning Hu Department of Informatics University of Zurich Oct 16th, 2012.

Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial

Social Network Inspired Models of NLP and Language Evolution Monojit Choudhury (Microsoft Research India) Animesh Mukherjee (IIT Kharagpur) Niloy Ganguly.

Biological Networks Lectures 6-7 : February 02, 2010 Graph Algorithms Review Global Network Properties Local Network Properties 1.

Author: M.E.J. Newman Presenter: Guoliang Liu Date:5/4/2012.

Statistical Mechanics of Complex Networks: Economy, Biology and Computer Networks Albert Diaz-Guilera Universitat de Barcelona.

Graphical models for part of speech tagging

LANGUAGE NETWORKS THE SMALL WORLD OF HUMAN LANGUAGE Akilan Velmurugan Computer Networks – CS 790G.

Small World Social Networks With slides from Jon Kleinberg, David Liben-Nowell, and Daniel Bilar.

Theory of αBiNs: Alphabetic Bipartite Networks Animesh Mukherjee Dept. of Computer Science and Engineering Indian Institute of Technology, Kharagpur Collaborators:

Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.

Lectures 6 & 7 Centrality Measures Lectures 6 & 7 Centrality Measures February 2, 2009 Monojit Choudhury

Complex Networks First Lecture TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA TexPoint fonts used in EMF. Read the.

Part 1: Biological Networks 1.Protein-protein interaction networks 2.Regulatory networks 3.Expression networks 4.Metabolic networks 5.… more biological.

Social Network Analysis Prof. Dr. Daning Hu Department of Informatics University of Zurich Mar 5th, 2013.

HyperLex: lexical cartography for information retrieval Jean Veronis Presented by: Siddhanth Jain( ) Samiulla Shaikh( )

A Social Network Approach to Unsupervised Induction of Syntactic Clusters for Bengali Monojit Choudhury Microsoft Research India

Complex Network Theory – An Introduction Niloy Ganguly.

Class 9: Barabasi-Albert Model-Part I

Slides are modified from Lada Adamic

Lecture 10: Network models CS 765: Complex Networks Slides are modified from Networks: Theory and Application by Lada Adamic.

Complex Network Theory – An Introduction Niloy Ganguly.

Most of contents are provided by the website Network Models TJTSD66: Advanced Topics in Social Media (Social.

Clusters Recognition from Large Small World Graph Igor Kanovsky, Lilach Prego Emek Yezreel College, Israel University of Haifa, Israel.

How Do “Real” Networks Look?

1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri.

Small World Social Networks With slides from Jon Kleinberg, David Liben-Nowell, and Daniel Bilar.

Information Retrieval Search Engine Technology (10) Prof. Dragomir R. Radev.

Netlogo demo. Complexity and Networks Melanie Mitchell Portland State University and Santa Fe Institute.

Topics In Social Computing (67810) Module 1 Introduction & The Structure of Social Networks.

Algorithms and Computational Biology Lab, Department of Computer Science and & Information Engineering, National Taiwan University, Taiwan Network Biology.

Cmpe 588- Modeling of Internet Emergence of Scale-Free Network with Chaotic Units Pulin Gong, Cees van Leeuwen by Oya Ünlü Instructor: Haluk Bingöl.

Connectivity and the Small World

How Do “Real” Networks Look?

Network Science: A Short Introduction i3 Workshop

How Do “Real” Networks Look?

How Do “Real” Networks Look?

Research Scopes in Complex Network

How Do “Real” Networks Look?

Presentation transcript:

Social Network Inspired Models of NLP and Language Evolution Monojit Choudhury (Microsoft Research India) Animesh Mukherjee (IIT Kharagpur) Niloy Ganguly (IIT Kharagpur)

What is a Social Network? Nodes: Social entities (people, organization etc.) Edges: Interaction/relationship between entities (Friendship, collaboration, sex) Courtesy: http://blogs.clickz.com

Social Network Inspired Computing Society and nature of human interaction is a Complex System Complex Network: A generic tool to model complex systems There is a growing body of work on CNT Theory Applied to a variety of fields – Social, Biological, Physical & Cognitive sciences, Engineering & Technology Language is a complex system

Objective of this Tutorial To show that SNIC (Soc. Net. Inspired Comp.) is an emerging and promising technique Apply it to model Natural Languages NLP, Quantitative Linguistics, Language Evolution, Historical Linguistics, Language acquisition Familiarize with tools and techniques in SNIC Compare it with other standard approaches to NLP

Outline of the Tutorial Part I: Background Introduction [25 min] Network Analysis Techniques [25 min] Network Synthesis Techniques [25 min] Break [3:20pm – 3:40pm] Part II: Case Studies Self-organization of Sound Systems [20 min] Modeling the Lexicon [20 min] Unsupervised Labeling (Syntax & Semantics) [20 min] Conclusion and Discussions [20 min]

Complex System Non-trivial properties and patterns emerging from the interaction of a large number of simple entities Self-organization: The process through which these patterns evolve without any external intervention or central control Emergent Property or Emergent Behavior: The pattern that emerges due to self-organization

Emergence of a networked life Communities Atom Organisms Molecule Tissue Cell Organs

Language – a complex system Language: medium for communication through an arbitrary set of symbols Constantly evolving An outcome of self-organization at many levels Neurons Speakers and listeners Phonemes, morphemes, words … 80-20 Rule in every level of structure

Syntactic Network of Words color sky weight light 1 20 blue 100 blood heavy red

Complex Network Theory Handy toolbox for modeling complex systems Marriage of Graph theory and Statistics Complex because: Non-trivial topology Difficult to specify completely Usually large (in terms of nodes and edges) Provides insight into the nature and evolution of the system being modeled

Internet

9-11 Terrorist Network Social Network Analysis is a mathematical methodology for connecting the dots -- using science to fight terrorism. Connecting multiple pairs of dots soon reveals an emergent network of organization.

What Questions can be asked Do these networks display some symmetry? Are these networks creation of intelligent objects or they have emerged? How have these networks emerged What are the underlying simple rules leading to their complex formation?

Bi-directional Approach Analysis of the real-world networks Global topological properties Community structure Node-level properties Synthesis of the network by means of some simple rules Small-world models …….. Preferential attachment models

Application of CNT in Linguistics - I Quantitative linguistics Invariance and typology (Zipf’s law, syntactic dependencies) Natural Language Processing Unsupervised methods for text labeling (POS tagging, NER, WSD, etc.) Textual similarity (automatic evaluation, document clustering) Evolutionary Models (NER, multi-document summarization)

Application of CNT in Linguistics - II Language Evolution How did sound systems evolve? Development of syntax Language Change Innovation diffusion over social networks Language as an evolving network Language Acquisition Phonological acquisition Evolution of the mental lexicon of the child

Linguistic Networks Name Nodes Edges Why? PhoNet Pho-nemes Co-occurrence likelihood in languages Evolution of sound systems WordNet Words Ontological relation Host of NLP applications Syntactic Network Similarity between syntactic contexts POS Tagging Semantic Network Words, Names Semantic relation IR, Parsing, NER, WSD Mental Lexicon Phonetic similarity and semantic relation Cognitive modeling, Spell Checking Tree-banks Syntactic Dependency links Evolution of syntax Word Co-occurrence Co-occurrence IR, WSD, LSA, …

Summarizing SNIC and CNT are emerging techniques for modeling complex systems at mesoscopic level Applied to Physics, Biology, Sociology, Economics, Logistics … Language - an ideal application domain for SNIC SNIC models in NLP, Quantitative linguistics, language change, evolution and acquisition

Topological Characterization of Networks

Types Of Networks and Representation Unipartite Binary/ Weighted Undirected/ Directed Bipartite Representation Adjacency Matrix Adjacency List a b c 1 a {b,c} b {a,c} c {a,b}

Characterization of Complex N/ws?? They have a non-trivial topological structure Properties: Heavy tail in the degree distribution (non-negligible probability mass towards the tail; more than in the case of an exp. distribution) High clustering coefficient Centrality Properties Social Roles & Equivalence Assortativity Community Structure Random Graphs & Small avg. path length Preferential attachment Small World Properties

Degree Distribution (DD) Let pk be the fraction of vertices in the network that has a degree k. The k versus pk plot is defined as the degree distribution of a network For most of the real world networks these distributions are right skewed with a long right tail showing up values far above the mean – pk varies as k-α Due to noisy and insufficient data sometimes the definition is slightly modified Cumulative degree distribution is plotted Probability that the degree of a node is greater than or equal to k

A Few Examples Power law: Pk ~ k-α

Friend of Friends Consider the following scenario Sourish and Ravi are friends Sourish and Shaunak are friends Are Shaunak and Ravi friends? If so then … This property is known as transitivity Ravi Saurish Saunak

Measuring Transitivity: Clustering Coefficient The clustering coefficient for a vertex ‘v’ in a network is defined as the ratio between the total number of connections among the neighbors of ‘v’ to the total number of possible connections between the neighbors High clustering coefficient means my friends know each other with high probability – a typical property of social networks

# of links between ‘n’ neighbors Mathematically… The clustering coefficient of a vertex i is The clustering coefficient of the whole network is the average Alternatively, Ci = # of links between ‘n’ neighbors n(n-1)/2 C= 1 N ∑Ci C = # triangles in the n/w # triples in the n/w

Centrality Centrality measures are commonly described as indices of 4 Ps -- prestige, prominence, importance, and power Degree – Count of immediate neighbors Betweenness – Nodes that form a bridge between two regions of the n/w Where σst is total number of shortest paths between s and t and σst (v) is the total number of shortest paths from s to t via v

Eigenvector centrality – Bonacich (1972) It is not just how many people knows me counts to my popularity (or power) but how many people knows people who knows me – this is recursive! In context of HIV transmission – A person x with one sex partner is less prone to the disease than a person y with multiple partners But imagine what happens if the partner of x has multiple partners The basic idea of eigenvector centrality

Definition Eigenvector centrality is defined as the principal eigenvector of the adjacency matrix Eigenvector of any symmetric matrix A = {aij} is any vector e such that Where λ is a constant and ei is the centrality of the node i What does it imply – centrality of a node is proportional to the centrality of the nodes it is connected to (recursively)… Practical Example: Google PageRank

Assortativity (homophily) Rich goes with the rich (selective linking) A famous actor (e.g., Shah Rukh Khan) would prefer to pair up with some other famous actor (e.g., Rani Mukherjee) in a movie rather than a new comer in the film industry. Assortative Scale-free network Disassortative Scale-free network

Measures of Assortativity ANND (Average nearest neighbor degree) Find the average degree of the neighbors of each node i with degree k Find the Pearson correlation (r) between the degree of i and the average degree of its neighbors For further reference see the supplementary material

Community structure Community structure: a group of vertices that have a high density of edges within them and a low density of edges in between groups Example: Friendship n/w of children Citation n/ws: research interest World Wide Web: subject matter of pages Metabolic networks: Functional units Linguistic n/ws: similar linguistic categories

Some Examples Community Structure in Political Books Community structure in a Social n/w of Students (American High School)

Community Identification Algorithms Hierarchical Girvan-Newman Radicchi et al. Chinese Whispers Spectral Bisection See (Newman 2004) for a comprehensive survey (you will find the ref. in the supplementary material)

Evolution of Networks Processes on Networks

The World is Small! “Registration fee for IJCNLP 2008 are being waived for all participants – get it collected from the registration counter” How long do you think the above information will take to spread among yourselves Experiments say it will spread very fast – within 6 hops from the initiator it would reach all This is the famous Milgram’s six degrees of separation

The Small World Effect Even in very large social networks, the average distance between nodes is usually quite short. Milgram’s small world experiment: Target individual in Boston Initial senders in Omaha, Nebraska Each sender was asked to forward a packet to a friend who was closer to the target Friends asked to do the same Result: Average of ‘six degrees’ of separation. S. Milgram, The small world problem, Psych. Today, 2 (1967), pp. 60-67.

Measure of Small-Worldness Low average geodesic path length High clustering coefficient Geodesic path – Shortest path through the network from one vertex to another Mean path length ℓ = 2∑i≥jdij/n(n+1) where dij is the geodesic distance from vertex i to vertex j Most of the networks observed in real world have ℓ ≤ 6 Film actors 3.48 Company Directors 4.60 Emails 4.95 Internet 3.33 Electronic circuits 4.34

Random Graphs & Small Average Path Length Q: What do we mean by a ‘random graph’? A: Erdos-Renyi random graph model: For every pair of nodes, draw an edge between them with equal probability p. Poisson distribution Degrees of Separation in a Random Graph N nodes z neighbors per node, on average, z =<k> D degrees of separation P(k)~ e-<k> <k>k/k!

Clustering C = Probability that two of a node’s neighbors are themselves connected In a random graph: Crand ~ 1/N (if the average degree is held constant)

Watts-Strogatz ‘Small World’ Model Watts and Strogatz introduced this simple model to show how networks can have both short path lengths and high clustering. D. J. Watts and S. H. Strogatz, Collective dynamics of “small-world” networks, Nature, 393 (1998), pp. 440–442.

Power Law

Degree distributions for various networks World-Wide Web Coauthorship networks: computer science, high energy physics, condensed matter physics, astrophysics Power grid of the western United States and Canada Social network of 43 Mormons in Utah

How do Power law DDs arise? Barabási-Albert Model of Preferential Attachment (Rich gets Richer) (1) GROWTH : Starting with a small number of nodes (m0) at every timestep we add a new node with m (<=m0) edges (connected to the nodes already present in the system). (2) PREFERENTIAL ATTACHMENT : The probability Π that a new node will be connected to node i depends on the connectivity ki of that node A.-L.Barabási, R. Albert, Science 286, 509 (1999)

Growth analysis Markov chain representation Probability that the new edge is attached to any of the vertices of degree k where total number of edges

Growth analysis Markov chain representation Growth dynamics at time (t+1) Number of nodes of degree (k-1) at t Number of nodes of degree k at t Number of nodes of degree k at t+1

Growth analysis Markov chain representation The net change in npk per vertex added for k > m for k = m In the stationary solution, we find Which results

CASE STUDY I: Self-Organization of the Sound Inventories

Human Speech Sounds Human speech sounds are called phonemes – the smallest unit of a language Phonemes are characterized by certain distinctive features like Mermelstein’s Model Place of articulation Manner of articulation Phonation

Types of Phonemes L Vowels Consonants Diphthongs /ai/ /i/ /t/ /a/ /u/ /k/

Choice of Phonemes How a language chooses a set of phonemes in order to build its sound inventory? Is the process arbitrary? Certainly Not! What are the forces affecting this choice?

Vowels: A (Partially) Solved Mystery Languages choose vowels based on maximal perceptual contrast. For instance if a language has three vowels then in more than 95% of the cases they are /a/,/i/, and /u/. Maximally Distinct /u/ /a/ /i/

J i g s a w Consonants: A puzzle Research: From 1929 – Date No single satisfactory explanation of the organization of the consonant inventories The set of features that characterize consonants is much larger than that of vowels No single force is sufficient to explain this organization Rather a complex interplay of forces goes on in shaping these inventories

Principle of Occurrence PlaNet – The “Phoneme-Language Network” A bipartite network N=(VL,VC,E) VL : Nodes representing languages of the world VC : Nodes representing consonants E : Set of edges which run between VL and VC There is an edge e Є E between two nodes vl Є VL and vc Є VC if the consonant c occurs in the language l. Data Source: UPSID (317 languages) L1 L4 L2 L3 /m/ /ŋ/ /p/ /d/ /s/ /θ/ Consonants Languages Choudhury et al. 2006 ACL Mukherjee et al. 2007 Int. Jnl of Modern Physics C The Structure of PlaNet

Degree Distribution of PlaNet 50 100 150 0.02 0.04 0.06 0.08 Language inventory size (degree k) pk pk = beta(k) with α = 7.06, and β = 47.64 pk = Γ(54.7) k6.06(1-k)46.64 Γ(7.06) Γ(47.64) kmin= 5, kmax= 173, kavg= 21 200 DD of the language nodes follows a β-distribution DD of the consonant nodes follows a power-law with an exponential cut-off Pk 1000 Degree of a consonant, k Pk = k -0.71 Exponential Cut-off 1 10 100 0.001 0.01 0.1 Distribution of Consonants over Languages follow a power-law

Synthesis of PlaNet Non-linear preferential attachment Iteratively construct the language inventories given their inventory sizes L1 L3 L2 L4 After step 3 After step 4 diα+ ε Pr(Ci) = ∑xV* (dxα + ε)

Simulation Result PlaNet PlaNetsyn PlaNetrand Pk Degree (k) 1 10 100 1000 1 .1 .01 .001 Degree (k) Pk The parameters α and ε are 1.44 and 0.5 respectively. The results are averaged over 100 runs

Principle of Co-occurrence Consonants tend to co-occur in groups or communities These groups tend to be organized around a few distinctive features (based on: manner of articulation, place of articulation & phonation) – Principle of feature economy voiced voiceless bilabial dental /b/ /p/ /d/ /t/ plosive If a language has in its inventory then it will also tend to have

How to Capture these Co-occurrences? PhoNet – “Phoneme Phoneme Network” A weighted network N=(VC,E) VC : Nodes representing consonants E : Set of edges which run between the nodes in VC There is an edge e Є E between two nodes vc1 ,vc2 Є VC if the consonant c1 and c2 co-occur in a language. The number of languages in which c1 and c2 co-occurs defines the edge-weight of e. The number of languages in which c1 occurs defines the node-weight of vc1. /kw/ /k′/ /k/ /d′/ 42 14 38 13 283 17 50 39

Construction of PhoNet Data Source : UPSID Number of nodes in VC is 541 Number of edges is 34012 PhoNet

Community Formation Radicchi et al Algorithm S 3 1 2 4 100 110 101 10 5 6 46 52 45 3 1 2 4 11.11 10.94 7.14 0.06 5 6 3.77 5.17 7.5 S η>1 3 1 2 6 4 5 For different values of η we get different sets of communities

Consonant Societies! η=0.35 η=0.60 η=0.72 η=1.25 The fact that the communities are good can quantitatively shown by measuring the feature entropy

Problems to ponder on … Physical significance of PA: Functional forces Historical/Evolutionary process Labeled synthesis of PlaNet and PhoNet Language diversity vs. Preferential attachment

CASE STUDY II: Modeling the Mental Lexicon

Metal Lexicon (ML) – Basics It refers to the repository of the word forms that resides in the human brain Two Questions: How words are stored in the long term memory, i.e., the organization of the ML. How are words retrieved from the ML (lexical access) The above questions are highly inter-related – to predict the organization one can investigate how words are retrieved and vice versa.

Ways of Organization of Mental Lexicon Un-organized (a bag full of words) or, Organized By sound (phonological similarity) E.g., start the same: banana, bear, bean … End the same: look, took, book … Number of phonological segments they share By Meaning (semantic similarity) Banana, apple, pear, orange … By age at which the word is acquired By frequency of usage By POS Orthographically

Some Unsolved Mysteries – You can Give it a Try  What can be a model for the evolution of the ML? How is the ML acquired by a child learner? Is there a single optimal structure for the ML; or is it organized based on multiple criteria (i.e., a combination of the different n/ws) – Towards a single framework for studying ML!!!

CASE STUDY III: Syntax Unsupervised POS Tagging

Labeling of Text Lexical Category (POS tags) Syntactic Category (Phrases, chunks) Semantic Role (Agent, theme, …) Sense Domain dependent labeling (genes, proteins, …) How to define the set of labels? How to (learn to) predict them automatically?

“Nothing makes sense, unless in context” Distribution-based definition of Lexical category Sense (meaning) The X is … If you X then I shall … … looking at the star PP

General Approach Represent the context of a word (token) Define some notion of similarity between the contexts Cluster the contexts of the tokens Get the label of the tokens w1 w2 w3 w4 … w1 w3 w2 w4

Issues How to define the context? How to define similarity How to Cluster? How to evaluate?

Syntactic Network of Words color sky weight light 1 20 blue 100 blood heavy 1 1 – cos(red, blue) red

The Chinese Whisper Algorithm color sky weight 0.9 0.8 light -0.5 0.7 blue 0.9 blood heavy 0.5 red

The Chinese Whisper Algorithm color sky weight 0.9 0.8 light -0.5 0.7 blue 0.9 blood heavy 0.5 red

The Chinese Whisper Algorithm color sky weight 0.9 0.8 light -0.5 0.7 blue 0.9 blood heavy 0.5 red

Word Sense Disambiguation Véronis, J. 2004. HyperLex: lexical cartography for information retrieval. Computer Speech & Language 18(3):223-252. Let the word to be disambiguated be “light” Select a subcorpus of paragraphs which have at least one occurrence of “light” Construct the word co-occurrence graph

HyperLex A beam of white light is dispersed into its component colors by its passage through a prism. Energy efficient light fixtures including solar lights, night lights, energy star lighting, ceiling lighting, wall lighting, lamps What enables us to see the light and experience such wonderful shades of colors during the course of our everyday lives? prism beam dispersed white colors shades energy efficient fixtures lamps

Hub Detection and MST prism light beam dispersed white colors lamps shades beam prism fixtures energy shades energy efficient white dispersed efficient fixtures White fluorescent lights consume less energy than incandescent lamps lamps

Other Related Works Solan, Z., Horn, D., Ruppin, E. and Edelman, S. 2005. Unsupervised learning of natural languages. PNAS, 102 (33): 11629-11634 Ferrer i Cancho, R. 2007. Why do syntactic links not cross? Europhysics Letters Also applied to: IR, Summarization, sentiment detection and categorization, script evaluation, author detection, …

Discussions & Conclusions What we learnt Advantages of SNIC in NLP Comparison to standard techniques Open problems Concluding remarks and Q&A

What we learnt What is SNIC and Complex Networks Analytical tools for SNIC Applications to human languages Three Case-studies: Area Perspective Technique I Sound systems Language evolution and change Synthesis models II Lexicon Psycholinguistic modeling and linguistic typology Topology and search III Syntax & Semantics Applications to NLP Clustering

Insights Language features complex structure at every level of organization Linguistic networks have non-trivial properties: scale-free & small-world Therefore, Language and Engineering systems involving language should be studied within the framework of complex systems, esp. CNT

Advantages of SNIC Fully Unsupervised techniques: Ease of computation: No labeled data required: A good solution to resources scarcity Problem of evaluation: circumvented by semi-supervised techniques Ease of computation: Simple and scalable Distributed and parallel computable Holistic treatment: Language evolution & psycho-linguistic theories

Comparison to Standard Techniques Rule-based vs. Statistical NLP Graphical Models Generative models in machine learning HMM, CRF, Bayesian belief networks JJ NN RB VF

Graphical Models vs. SNIC COMPLEX NETWORK Principled: based on Bayesian Theory Structure is assumed and parameters are learnt Focus: Decoding & parameter estimation Data-driven or computationally intensive The generative process is easy to visualize, but no visualization of the data Heuristic, but underlying principles of linear algebra Structure is discovered and studied Focus: Topology and evolutionary dynamics Unsupervised and computationally easy Easy visualization of the data

Language Modeling A network of words as a model of language vs. n-gram models Hierarchical, hyper-graph based models Smoothing through holistic analysis of the network topology Jedynak, B. and Karakos, D. 2007. Unigram Language Models using Diffusion Smoothing over Graphs. Proc. of TextGraphs - 2

Open Problems Universals and variables of linguistic networks Superimposition of networks: phonetic, syntactic, semantic Which clustering algorithm for which topology? Metrics for network comparison – important for language modeling Unsupervised dependency parsing using networks Mining translation equivalents

Resources Conferences Journals Tools Online Resources TextGraphs, Sunbelt, EvoLang, ECCS Journals PRE, Physica A, IJMPC, EPL, PRL, PNAS, QL, ACS, Complexity, Social Networks Tools Pajek, C#UNG, http://www.insna.org/INSNA/soft_inf.html Online Resources Bibliographies, courses on CNT

Contact Monojit Choudhury Animesh Mukherjee Niloy Ganguly monojitc@microsoft.com http://www.cel.iitkgp.ernet.in/~monojit/ Animesh Mukherjee animeshm@cse.iitkgp.ernet.in http://www.cel.iitkgp.ernet.in/~animesh/ Niloy Ganguly niloy@cse.iitkgp.erent.in http://www.facweb.iitkgp.ernet.in/~niloy/

Thank you!! Book Volume on Dynamics on and of Complex Networks To be published by May 2008 from Birkhauser, Springer http://www.cel.iitkgp.ernet.in/~eccs07/