Social Network Inspired Models of NLP and Language Evolution

Name: Social Network Inspired Models of NLP and Language Evolution
Uploaded: 2017-08-14T03:09:40+00:00
Duration: PTM39S57
Channel: Joel Lee
Description: Social Network Inspired Models of NLP and Language Evolution

Social Network Inspired Models of NLP and Language Evolution
Monojit Choudhury (Microsoft Research India) Animesh Mukherjee (IIT Kharagpur) Niloy Ganguly (IIT Kharagpur)

What is a Social Network?
Nodes: Social entities (people, organization etc.) Edges: Interaction/relationship between entities (Friendship, collaboration, sex) Courtesy:

Social Network Inspired Computing
Society and nature of human interaction is a Complex System Complex Network: A generic tool to model complex systems There is a growing body of work on CNT Theory Applied to a variety of fields – Social, Biological, Physical & Cognitive sciences, Engineering & Technology Language is a complex system

Objective of this Tutorial
To show that SNIC (Soc. Net. Inspired Comp.) is an emerging and promising technique Apply it to model Natural Languages NLP, Quantitative Linguistics, Language Evolution, Historical Linguistics, Language acquisition Familiarize with tools and techniques in SNIC Compare it with other standard approaches to NLP

Outline of the Tutorial
Part I: Background Introduction [25 min] Network Analysis Techniques [25 min] Network Synthesis Techniques [25 min] Break [3:20pm – 3:40pm] Part II: Case Studies Self-organization of Sound Systems [20 min] Modeling the Lexicon [20 min] Unsupervised Labeling (Syntax & Semantics) [20 min] Conclusion and Discussions [20 min]

Complex System Non-trivial properties and patterns emerging from the interaction of a large number of simple entities Self-organization: The process through which these patterns evolve without any external intervention or central control Emergent Property or Emergent Behavior: The pattern that emerges due to self-organization

Emergence of a networked life
Communities Atom Organisms Molecule Tissue Cell Organs

Language – a complex system
Language: medium for communication through an arbitrary set of symbols Constantly evolving An outcome of self-organization at many levels Neurons Speakers and listeners Phonemes, morphemes, words … 80-20 Rule in every level of structure

Syntactic Network of Words
color sky weight light 1 20 blue 100 blood heavy red

Complex Network Theory
Handy toolbox for modeling complex systems Marriage of Graph theory and Statistics Complex because: Non-trivial topology Difficult to specify completely Usually large (in terms of nodes and edges) Provides insight into the nature and evolution of the system being modeled

Internet

9-11 Terrorist Network Social Network Analysis is a mathematical methodology for connecting the dots -- using science to fight terrorism. Connecting multiple pairs of dots soon reveals an emergent network of organization.

What Questions can be asked
Do these networks display some symmetry? Are these networks creation of intelligent objects or they have emerged? How have these networks emerged What are the underlying simple rules leading to their complex formation?

Bi-directional Approach
Analysis of the real-world networks Global topological properties Community structure Node-level properties Synthesis of the network by means of some simple rules Small-world models …….. Preferential attachment models

Application of CNT in Linguistics - I
Quantitative linguistics Invariance and typology (Zipf’s law, syntactic dependencies) Natural Language Processing Unsupervised methods for text labeling (POS tagging, NER, WSD, etc.) Textual similarity (automatic evaluation, document clustering) Evolutionary Models (NER, multi-document summarization)

Application of CNT in Linguistics - II
Language Evolution How did sound systems evolve? Development of syntax Language Change Innovation diffusion over social networks Language as an evolving network Language Acquisition Phonological acquisition Evolution of the mental lexicon of the child

Linguistic Networks Name Nodes Edges Why? PhoNet Pho-nemes
Co-occurrence likelihood in languages Evolution of sound systems WordNet Words Ontological relation Host of NLP applications Syntactic Network Similarity between syntactic contexts POS Tagging Semantic Network Words, Names Semantic relation IR, Parsing, NER, WSD Mental Lexicon Phonetic similarity and semantic relation Cognitive modeling, Spell Checking Tree-banks Syntactic Dependency links Evolution of syntax Word Co-occurrence Co-occurrence IR, WSD, LSA, …

Summarizing SNIC and CNT are emerging techniques for modeling complex systems at mesoscopic level Applied to Physics, Biology, Sociology, Economics, Logistics … Language - an ideal application domain for SNIC SNIC models in NLP, Quantitative linguistics, language change, evolution and acquisition

Topological Characterization of Networks

Types Of Networks and Representation
Unipartite Binary/ Weighted Undirected/ Directed Bipartite Representation Adjacency Matrix Adjacency List a b c 1 a {b,c} b {a,c} c {a,b}

Characterization of Complex N/ws??
They have a non-trivial topological structure Properties: Heavy tail in the degree distribution (non-negligible probability mass towards the tail; more than in the case of an exp. distribution) High clustering coefficient Centrality Properties Social Roles & Equivalence Assortativity Community Structure Random Graphs & Small avg. path length Preferential attachment Small World Properties

Degree Distribution (DD)
Let pk be the fraction of vertices in the network that has a degree k. The k versus pk plot is defined as the degree distribution of a network For most of the real world networks these distributions are right skewed with a long right tail showing up values far above the mean – pk varies as k-α Due to noisy and insufficient data sometimes the definition is slightly modified Cumulative degree distribution is plotted Probability that the degree of a node is greater than or equal to k

A Few Examples Power law: Pk ~ k-α

Friend of Friends Consider the following scenario
Sourish and Ravi are friends Sourish and Shaunak are friends Are Shaunak and Ravi friends? If so then … This property is known as transitivity Ravi Saurish Saunak

Measuring Transitivity: Clustering Coefficient
The clustering coefficient for a vertex ‘v’ in a network is defined as the ratio between the total number of connections among the neighbors of ‘v’ to the total number of possible connections between the neighbors High clustering coefficient means my friends know each other with high probability – a typical property of social networks

# of links between ‘n’ neighbors
Mathematically… The clustering coefficient of a vertex i is The clustering coefficient of the whole network is the average Alternatively, Ci = # of links between ‘n’ neighbors n(n-1)/2 C= 1 N ∑Ci C = # triangles in the n/w # triples in the n/w

Centrality Centrality measures are commonly described as indices of 4 Ps -- prestige, prominence, importance, and power Degree – Count of immediate neighbors Betweenness – Nodes that form a bridge between two regions of the n/w Where σst is total number of shortest paths between s and t and σst (v) is the total number of shortest paths from s to t via v

Eigenvector centrality – Bonacich (1972)
It is not just how many people knows me counts to my popularity (or power) but how many people knows people who knows me – this is recursive! In context of HIV transmission – A person x with one sex partner is less prone to the disease than a person y with multiple partners But imagine what happens if the partner of x has multiple partners The basic idea of eigenvector centrality

Definition Eigenvector centrality is defined as the principal eigenvector of the adjacency matrix Eigenvector of any symmetric matrix A = {aij} is any vector e such that Where λ is a constant and ei is the centrality of the node i What does it imply – centrality of a node is proportional to the centrality of the nodes it is connected to (recursively)… Practical Example: Google PageRank

Assortativity (homophily)
Rich goes with the rich (selective linking) A famous actor (e.g., Shah Rukh Khan) would prefer to pair up with some other famous actor (e.g., Rani Mukherjee) in a movie rather than a new comer in the film industry. Assortative Scale-free network Disassortative Scale-free network

Measures of Assortativity
ANND (Average nearest neighbor degree) Find the average degree of the neighbors of each node i with degree k Find the Pearson correlation (r) between the degree of i and the average degree of its neighbors For further reference see the supplementary material

Community structure Community structure: a group of vertices that have a high density of edges within them and a low density of edges in between groups Example: Friendship n/w of children Citation n/ws: research interest World Wide Web: subject matter of pages Metabolic networks: Functional units Linguistic n/ws: similar linguistic categories

Some Examples Community Structure in Political Books
Community structure in a Social n/w of Students (American High School)

Community Identification Algorithms
Hierarchical Girvan-Newman Radicchi et al. Chinese Whispers Spectral Bisection See (Newman 2004) for a comprehensive survey (you will find the ref. in the supplementary material)

Evolution of Networks Processes on Networks

The World is Small! “Registration fee for IJCNLP 2008 are being waived for all participants – get it collected from the registration counter” How long do you think the above information will take to spread among yourselves Experiments say it will spread very fast – within 6 hops from the initiator it would reach all This is the famous Milgram’s six degrees of separation

The Small World Effect Even in very large social networks, the average distance between nodes is usually quite short. Milgram’s small world experiment: Target individual in Boston Initial senders in Omaha, Nebraska Each sender was asked to forward a packet to a friend who was closer to the target Friends asked to do the same Result: Average of ‘six degrees’ of separation. S. Milgram, The small world problem, Psych. Today, 2 (1967), pp

Measure of Small-Worldness
Low average geodesic path length High clustering coefficient Geodesic path – Shortest path through the network from one vertex to another Mean path length ℓ = 2∑i≥jdij/n(n+1) where dij is the geodesic distance from vertex i to vertex j Most of the networks observed in real world have ℓ ≤ 6 Film actors 3.48 Company Directors 4.60 s 4.95 Internet 3.33 Electronic circuits 4.34

Random Graphs & Small Average Path Length
Q: What do we mean by a ‘random graph’? A: Erdos-Renyi random graph model: For every pair of nodes, draw an edge between them with equal probability p. Poisson distribution Degrees of Separation in a Random Graph N nodes z neighbors per node, on average, z =<k> D degrees of separation P(k)~ e-<k> <k>k/k!

Clustering C = Probability that two of a node’s neighbors are themselves connected In a random graph: Crand ~ 1/N (if the average degree is held constant)

Watts-Strogatz ‘Small World’ Model
Watts and Strogatz introduced this simple model to show how networks can have both short path lengths and high clustering. D. J. Watts and S. H. Strogatz, Collective dynamics of “small-world” networks, Nature, 393 (1998), pp. 440–442.

Power Law

Degree distributions for various networks
World-Wide Web Coauthorship networks: computer science, high energy physics, condensed matter physics, astrophysics Power grid of the western United States and Canada Social network of 43 Mormons in Utah

How do Power law DDs arise?
Barabási-Albert Model of Preferential Attachment (Rich gets Richer) (1) GROWTH : Starting with a small number of nodes (m0) at every timestep we add a new node with m (<=m0) edges (connected to the nodes already present in the system). (2) PREFERENTIAL ATTACHMENT : The probability Π that a new node will be connected to node i depends on the connectivity ki of that node A.-L.Barabási, R. Albert, Science 286, 509 (1999)

Growth analysis Markov chain representation
Probability that the new edge is attached to any of the vertices of degree k where total number of edges

Growth dynamics at time (t+1) Number of nodes of degree (k-1) at t Number of nodes of degree k at t Number of nodes of degree k at t+1

The net change in npk per vertex added for k > m for k = m In the stationary solution, we find Which results

CASE STUDY I: Self-Organization of the Sound Inventories

Human Speech Sounds Human speech sounds are called phonemes – the smallest unit of a language Phonemes are characterized by certain distinctive features like Mermelstein’s Model Place of articulation Manner of articulation Phonation

Types of Phonemes L Vowels Consonants Diphthongs /ai/ /i/ /t/ /a/ /u/
/k/

Choice of Phonemes How a language chooses a set of phonemes in order to build its sound inventory? Is the process arbitrary? Certainly Not! What are the forces affecting this choice?

Vowels: A (Partially) Solved Mystery
Languages choose vowels based on maximal perceptual contrast. For instance if a language has three vowels then in more than 95% of the cases they are /a/,/i/, and /u/. Maximally Distinct /u/ /a/ /i/

J i g s a w Consonants: A puzzle Research: From 1929 – Date
No single satisfactory explanation of the organization of the consonant inventories The set of features that characterize consonants is much larger than that of vowels No single force is sufficient to explain this organization Rather a complex interplay of forces goes on in shaping these inventories

Principle of Occurrence
PlaNet – The “Phoneme-Language Network” A bipartite network N=(VL,VC,E) VL : Nodes representing languages of the world VC : Nodes representing consonants E : Set of edges which run between VL and VC There is an edge e Є E between two nodes vl Є VL and vc Є VC if the consonant c occurs in the language l. Data Source: UPSID (317 languages) L1 L4 L2 L3 /m/ /ŋ/ /p/ /d/ /s/ /θ/ Consonants Languages Choudhury et al ACL Mukherjee et al Int. Jnl of Modern Physics C The Structure of PlaNet

Degree Distribution of PlaNet
50 100 150 0.02 0.04 0.06 0.08 Language inventory size (degree k) pk pk = beta(k) with α = 7.06, and β = 47.64 pk = Γ(54.7) k6.06(1-k)46.64 Γ(7.06) Γ(47.64) kmin= 5, kmax= 173, kavg= 21 200 DD of the language nodes follows a β-distribution DD of the consonant nodes follows a power-law with an exponential cut-off Pk 1000 Degree of a consonant, k Pk = k -0.71 Exponential Cut-off 1 10 100 0.001 0.01 0.1 Distribution of Consonants over Languages follow a power-law

Synthesis of PlaNet Non-linear preferential attachment
Iteratively construct the language inventories given their inventory sizes L1 L3 L2 L4 After step 3 After step 4 diα+ ε Pr(Ci) = ∑xV* (dxα + ε)

Simulation Result PlaNet PlaNetsyn PlaNetrand Pk Degree (k)
1 .1 .01 .001 Degree (k) Pk The parameters α and ε are 1.44 and 0.5 respectively. The results are averaged over 100 runs

Principle of Co-occurrence
Consonants tend to co-occur in groups or communities These groups tend to be organized around a few distinctive features (based on: manner of articulation, place of articulation & phonation) – Principle of feature economy voiced voiceless bilabial dental /b/ /p/ /d/ /t/ plosive If a language has in its inventory then it will also tend to have

How to Capture these Co-occurrences?
PhoNet – “Phoneme Phoneme Network” A weighted network N=(VC,E) VC : Nodes representing consonants E : Set of edges which run between the nodes in VC There is an edge e Є E between two nodes vc1 ,vc2 Є VC if the consonant c1 and c2 co-occur in a language. The number of languages in which c1 and c2 co-occurs defines the edge-weight of e. The number of languages in which c1 occurs defines the node-weight of vc1. /kw/ /k′/ /k/ /d′/ 42 14 38 13 283 17 50 39

Construction of PhoNet
Data Source : UPSID Number of nodes in VC is 541 Number of edges is 34012 PhoNet

Community Formation Radicchi et al Algorithm S
3 1 2 4 100 110 101 10 5 6 46 52 45 3 1 2 4 11.11 10.94 7.14 0.06 5 6 3.77 5.17 7.5 S η>1 3 1 2 6 4 5 For different values of η we get different sets of communities

Consonant Societies! η=0.35 η=0.60 η=0.72 η=1.25 The fact that the communities are good can quantitatively shown by measuring the feature entropy

Problems to ponder on … Physical significance of PA:
Functional forces Historical/Evolutionary process Labeled synthesis of PlaNet and PhoNet Language diversity vs. Preferential attachment

CASE STUDY II: Modeling the Mental Lexicon

Metal Lexicon (ML) – Basics
It refers to the repository of the word forms that resides in the human brain Two Questions: How words are stored in the long term memory, i.e., the organization of the ML. How are words retrieved from the ML (lexical access) The above questions are highly inter-related – to predict the organization one can investigate how words are retrieved and vice versa.

Ways of Organization of Mental Lexicon
Un-organized (a bag full of words) or, Organized By sound (phonological similarity) E.g., start the same: banana, bear, bean … End the same: look, took, book … Number of phonological segments they share By Meaning (semantic similarity) Banana, apple, pear, orange … By age at which the word is acquired By frequency of usage By POS Orthographically

Some Unsolved Mysteries – You can Give it a Try 
What can be a model for the evolution of the ML? How is the ML acquired by a child learner? Is there a single optimal structure for the ML; or is it organized based on multiple criteria (i.e., a combination of the different n/ws) – Towards a single framework for studying ML!!!

CASE STUDY III: Syntax Unsupervised POS Tagging

Labeling of Text Lexical Category (POS tags)
Syntactic Category (Phrases, chunks) Semantic Role (Agent, theme, …) Sense Domain dependent labeling (genes, proteins, …) How to define the set of labels? How to (learn to) predict them automatically?

“Nothing makes sense, unless in context”
Distribution-based definition of Lexical category Sense (meaning) The X is … If you X then I shall … … looking at the star PP

General Approach Represent the context of a word (token)
Define some notion of similarity between the contexts Cluster the contexts of the tokens Get the label of the tokens w1 w2 w3 w4 … w1 w3 w2 w4

Issues How to define the context? How to define similarity
How to Cluster? How to evaluate?

Syntactic Network of Words
color sky weight light 1 20 blue 100 blood heavy 1 1 – cos(red, blue) red

The Chinese Whisper Algorithm
color sky weight 0.9 0.8 light -0.5 0.7 blue 0.9 blood heavy 0.5 red

Word Sense Disambiguation
Véronis, J HyperLex: lexical cartography for information retrieval. Computer Speech & Language 18(3): Let the word to be disambiguated be “light” Select a subcorpus of paragraphs which have at least one occurrence of “light” Construct the word co-occurrence graph

HyperLex A beam of white light is dispersed into its component colors by its passage through a prism. Energy efficient light fixtures including solar lights, night lights, energy star lighting, ceiling lighting, wall lighting, lamps What enables us to see the light and experience such wonderful shades of colors during the course of our everyday lives? prism beam dispersed white colors shades energy efficient fixtures lamps

Hub Detection and MST prism light beam dispersed white colors lamps
shades beam prism fixtures energy shades energy efficient white dispersed efficient fixtures White fluorescent lights consume less energy than incandescent lamps lamps

Other Related Works Solan, Z., Horn, D., Ruppin, E. and Edelman, S Unsupervised learning of natural languages. PNAS, 102 (33): Ferrer i Cancho, R Why do syntactic links not cross? Europhysics Letters Also applied to: IR, Summarization, sentiment detection and categorization, script evaluation, author detection, …

Discussions & Conclusions
What we learnt Advantages of SNIC in NLP Comparison to standard techniques Open problems Concluding remarks and Q&A

What we learnt What is SNIC and Complex Networks
Analytical tools for SNIC Applications to human languages Three Case-studies: Area Perspective Technique I Sound systems Language evolution and change Synthesis models II Lexicon Psycholinguistic modeling and linguistic typology Topology and search III Syntax & Semantics Applications to NLP Clustering

Insights Language features complex structure at every level of organization Linguistic networks have non-trivial properties: scale-free & small-world Therefore, Language and Engineering systems involving language should be studied within the framework of complex systems, esp. CNT

Advantages of SNIC Fully Unsupervised techniques: Ease of computation:
No labeled data required: A good solution to resources scarcity Problem of evaluation: circumvented by semi-supervised techniques Ease of computation: Simple and scalable Distributed and parallel computable Holistic treatment: Language evolution & psycho-linguistic theories

Comparison to Standard Techniques
Rule-based vs. Statistical NLP Graphical Models Generative models in machine learning HMM, CRF, Bayesian belief networks JJ NN RB VF

Graphical Models vs. SNIC
COMPLEX NETWORK Principled: based on Bayesian Theory Structure is assumed and parameters are learnt Focus: Decoding & parameter estimation Data-driven or computationally intensive The generative process is easy to visualize, but no visualization of the data Heuristic, but underlying principles of linear algebra Structure is discovered and studied Focus: Topology and evolutionary dynamics Unsupervised and computationally easy Easy visualization of the data

Language Modeling A network of words as a model of language vs. n-gram models Hierarchical, hyper-graph based models Smoothing through holistic analysis of the network topology Jedynak, B. and Karakos, D Unigram Language Models using Diffusion Smoothing over Graphs. Proc. of TextGraphs - 2

Open Problems Universals and variables of linguistic networks
Superimposition of networks: phonetic, syntactic, semantic Which clustering algorithm for which topology? Metrics for network comparison – important for language modeling Unsupervised dependency parsing using networks Mining translation equivalents

Resources Conferences Journals Tools Online Resources
TextGraphs, Sunbelt, EvoLang, ECCS Journals PRE, Physica A, IJMPC, EPL, PRL, PNAS, QL, ACS, Complexity, Social Networks Tools Pajek, C#UNG, Online Resources Bibliographies, courses on CNT

Contact Monojit Choudhury Animesh Mukherjee Niloy Ganguly
Animesh Mukherjee Niloy Ganguly

Thank you!! Book Volume on Dynamics on and of Complex Networks
To be published by May 2008 from Birkhauser, Springer

Social Network Inspired Models of NLP and Language Evolution

Similar presentations

Presentation on theme: "Social Network Inspired Models of NLP and Language Evolution"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Social Network Inspired Models of NLP and Language Evolution

Similar presentations

Presentation on theme: "Social Network Inspired Models of NLP and Language Evolution"— Presentation transcript:

Similar presentations

About project

Feedback