Presentation is loading. Please wait.

Presentation is loading. Please wait.

Social Network Inspired Models of NLP and Language Evolution

Similar presentations


Presentation on theme: "Social Network Inspired Models of NLP and Language Evolution"— Presentation transcript:

1 Social Network Inspired Models of NLP and Language Evolution
Monojit Choudhury (Microsoft Research India) Animesh Mukherjee (IIT Kharagpur) Niloy Ganguly (IIT Kharagpur)

2 What is a Social Network?
Nodes: Social entities (people, organization etc.) Edges: Interaction/relationship between entities (Friendship, collaboration, sex) Courtesy:

3 Social Network Inspired Computing
Society and nature of human interaction is a Complex System Complex Network: A generic tool to model complex systems There is a growing body of work on CNT Theory Applied to a variety of fields – Social, Biological, Physical & Cognitive sciences, Engineering & Technology Language is a complex system

4 Objective of this Tutorial
To show that SNIC (Soc. Net. Inspired Comp.) is an emerging and promising technique Apply it to model Natural Languages NLP, Quantitative Linguistics, Language Evolution, Historical Linguistics, Language acquisition Familiarize with tools and techniques in SNIC Compare it with other standard approaches to NLP

5 Outline of the Tutorial
Part I: Background Introduction [25 min] Network Analysis Techniques [25 min] Network Synthesis Techniques [25 min] Break [3:20pm – 3:40pm] Part II: Case Studies Self-organization of Sound Systems [20 min] Modeling the Lexicon [20 min] Unsupervised Labeling (Syntax & Semantics) [20 min] Conclusion and Discussions [20 min]

6 Complex System Non-trivial properties and patterns emerging from the interaction of a large number of simple entities Self-organization: The process through which these patterns evolve without any external intervention or central control Emergent Property or Emergent Behavior: The pattern that emerges due to self-organization

7 Emergence of a networked life
Communities Atom Organisms Molecule Tissue Cell Organs

8 Language – a complex system
Language: medium for communication through an arbitrary set of symbols Constantly evolving An outcome of self-organization at many levels Neurons Speakers and listeners Phonemes, morphemes, words … 80-20 Rule in every level of structure

9 Syntactic Network of Words
color sky weight light 1 20 blue 100 blood heavy red

10 Complex Network Theory
Handy toolbox for modeling complex systems Marriage of Graph theory and Statistics Complex because: Non-trivial topology Difficult to specify completely Usually large (in terms of nodes and edges) Provides insight into the nature and evolution of the system being modeled

11 Internet

12 9-11 Terrorist Network Social Network Analysis is a mathematical methodology for connecting the dots -- using science to fight terrorism. Connecting multiple pairs of dots soon reveals an emergent network of organization.

13 What Questions can be asked
Do these networks display some symmetry? Are these networks creation of intelligent objects or they have emerged? How have these networks emerged What are the underlying simple rules leading to their complex formation?

14 Bi-directional Approach
Analysis of the real-world networks Global topological properties Community structure Node-level properties Synthesis of the network by means of some simple rules Small-world models …….. Preferential attachment models

15 Application of CNT in Linguistics - I
Quantitative linguistics Invariance and typology (Zipf’s law, syntactic dependencies) Natural Language Processing Unsupervised methods for text labeling (POS tagging, NER, WSD, etc.) Textual similarity (automatic evaluation, document clustering) Evolutionary Models (NER, multi-document summarization)

16 Application of CNT in Linguistics - II
Language Evolution How did sound systems evolve? Development of syntax Language Change Innovation diffusion over social networks Language as an evolving network Language Acquisition Phonological acquisition Evolution of the mental lexicon of the child

17 Linguistic Networks Name Nodes Edges Why? PhoNet Pho-nemes
Co-occurrence likelihood in languages Evolution of sound systems WordNet Words Ontological relation Host of NLP applications Syntactic Network Similarity between syntactic contexts POS Tagging Semantic Network Words, Names Semantic relation IR, Parsing, NER, WSD Mental Lexicon Phonetic similarity and semantic relation Cognitive modeling, Spell Checking Tree-banks Syntactic Dependency links Evolution of syntax Word Co-occurrence Co-occurrence IR, WSD, LSA, …

18 Summarizing SNIC and CNT are emerging techniques for modeling complex systems at mesoscopic level Applied to Physics, Biology, Sociology, Economics, Logistics … Language - an ideal application domain for SNIC SNIC models in NLP, Quantitative linguistics, language change, evolution and acquisition

19 Topological Characterization of Networks

20 Types Of Networks and Representation
Unipartite Binary/ Weighted Undirected/ Directed Bipartite Representation Adjacency Matrix Adjacency List a b c 1 a {b,c} b {a,c} c {a,b}

21 Characterization of Complex N/ws??
They have a non-trivial topological structure Properties: Heavy tail in the degree distribution (non-negligible probability mass towards the tail; more than in the case of an exp. distribution) High clustering coefficient Centrality Properties Social Roles & Equivalence Assortativity Community Structure Random Graphs & Small avg. path length Preferential attachment Small World Properties

22 Degree Distribution (DD)
Let pk be the fraction of vertices in the network that has a degree k. The k versus pk plot is defined as the degree distribution of a network For most of the real world networks these distributions are right skewed with a long right tail showing up values far above the mean – pk varies as k-α Due to noisy and insufficient data sometimes the definition is slightly modified Cumulative degree distribution is plotted Probability that the degree of a node is greater than or equal to k

23 A Few Examples Power law: Pk ~ k-α

24 Friend of Friends Consider the following scenario
Sourish and Ravi are friends Sourish and Shaunak are friends Are Shaunak and Ravi friends? If so then … This property is known as transitivity Ravi Saurish Saunak

25 Measuring Transitivity: Clustering Coefficient
The clustering coefficient for a vertex ‘v’ in a network is defined as the ratio between the total number of connections among the neighbors of ‘v’ to the total number of possible connections between the neighbors High clustering coefficient means my friends know each other with high probability – a typical property of social networks

26 # of links between ‘n’ neighbors
Mathematically… The clustering coefficient of a vertex i is The clustering coefficient of the whole network is the average Alternatively, Ci = # of links between ‘n’ neighbors n(n-1)/2 C= 1 N ∑Ci C = # triangles in the n/w # triples in the n/w

27 Centrality Centrality measures are commonly described as indices of 4 Ps -- prestige, prominence, importance, and power Degree – Count of immediate neighbors Betweenness – Nodes that form a bridge between two regions of the n/w Where σst is total number of shortest paths between s and t and σst (v) is the total number of shortest paths from s to t via v

28 Eigenvector centrality – Bonacich (1972)
It is not just how many people knows me counts to my popularity (or power) but how many people knows people who knows me – this is recursive! In context of HIV transmission – A person x with one sex partner is less prone to the disease than a person y with multiple partners But imagine what happens if the partner of x has multiple partners The basic idea of eigenvector centrality

29 Definition Eigenvector centrality is defined as the principal eigenvector of the adjacency matrix Eigenvector of any symmetric matrix A = {aij} is any vector e such that Where λ is a constant and ei is the centrality of the node i What does it imply – centrality of a node is proportional to the centrality of the nodes it is connected to (recursively)… Practical Example: Google PageRank

30 Assortativity (homophily)
Rich goes with the rich (selective linking) A famous actor (e.g., Shah Rukh Khan) would prefer to pair up with some other famous actor (e.g., Rani Mukherjee) in a movie rather than a new comer in the film industry. Assortative Scale-free network Disassortative Scale-free network

31 Measures of Assortativity
ANND (Average nearest neighbor degree) Find the average degree of the neighbors of each node i with degree k Find the Pearson correlation (r) between the degree of i and the average degree of its neighbors For further reference see the supplementary material

32 Community structure Community structure: a group of vertices that have a high density of edges within them and a low density of edges in between groups Example: Friendship n/w of children Citation n/ws: research interest World Wide Web: subject matter of pages Metabolic networks: Functional units Linguistic n/ws: similar linguistic categories

33 Some Examples Community Structure in Political Books
Community structure in a Social n/w of Students (American High School)

34 Community Identification Algorithms
Hierarchical Girvan-Newman Radicchi et al. Chinese Whispers Spectral Bisection See (Newman 2004) for a comprehensive survey (you will find the ref. in the supplementary material)

35 Evolution of Networks Processes on Networks

36 The World is Small! “Registration fee for IJCNLP 2008 are being waived for all participants – get it collected from the registration counter” How long do you think the above information will take to spread among yourselves Experiments say it will spread very fast – within 6 hops from the initiator it would reach all This is the famous Milgram’s six degrees of separation

37 The Small World Effect Even in very large social networks, the average distance between nodes is usually quite short. Milgram’s small world experiment: Target individual in Boston Initial senders in Omaha, Nebraska Each sender was asked to forward a packet to a friend who was closer to the target Friends asked to do the same Result: Average of ‘six degrees’ of separation. S. Milgram, The small world problem, Psych. Today, 2 (1967), pp

38 Measure of Small-Worldness
Low average geodesic path length High clustering coefficient Geodesic path – Shortest path through the network from one vertex to another Mean path length ℓ = 2∑i≥jdij/n(n+1) where dij is the geodesic distance from vertex i to vertex j Most of the networks observed in real world have ℓ ≤ 6 Film actors 3.48 Company Directors 4.60 s 4.95 Internet 3.33 Electronic circuits 4.34

39 Random Graphs & Small Average Path Length
Q: What do we mean by a ‘random graph’? A: Erdos-Renyi random graph model: For every pair of nodes, draw an edge between them with equal probability p. Poisson distribution Degrees of Separation in a Random Graph N nodes z neighbors per node, on average, z =<k> D degrees of separation P(k)~ e-<k> <k>k/k!

40 Clustering C = Probability that two of a node’s neighbors are themselves connected In a random graph: Crand ~ 1/N (if the average degree is held constant)

41 Watts-Strogatz ‘Small World’ Model
Watts and Strogatz introduced this simple model to show how networks can have both short path lengths and high clustering. D. J. Watts and S. H. Strogatz, Collective dynamics of “small-world” networks, Nature, 393 (1998), pp. 440–442.

42 Power Law

43 Degree distributions for various networks
World-Wide Web Coauthorship networks: computer science, high energy physics, condensed matter physics, astrophysics Power grid of the western United States and Canada Social network of 43 Mormons in Utah

44 How do Power law DDs arise?
Barabási-Albert Model of Preferential Attachment (Rich gets Richer) (1) GROWTH : Starting with a small number of nodes (m0) at every timestep we add a new node with m (<=m0) edges (connected to the nodes already present in the system). (2) PREFERENTIAL ATTACHMENT : The probability Π that a new node will be connected to node i depends on the connectivity ki of that node A.-L.Barabási, R. Albert, Science 286, 509 (1999)

45 Growth analysis Markov chain representation
Probability that the new edge is attached to any of the vertices of degree k where total number of edges

46 Growth analysis Markov chain representation
Growth dynamics at time (t+1) Number of nodes of degree (k-1) at t Number of nodes of degree k at t Number of nodes of degree k at t+1

47 Growth analysis Markov chain representation
The net change in npk per vertex added for k > m for k = m In the stationary solution, we find Which results

48 CASE STUDY I: Self-Organization of the Sound Inventories

49 Human Speech Sounds Human speech sounds are called phonemes – the smallest unit of a language Phonemes are characterized by certain distinctive features like Mermelstein’s Model Place of articulation Manner of articulation Phonation

50 Types of Phonemes L Vowels Consonants Diphthongs /ai/ /i/ /t/ /a/ /u/
/k/

51 Choice of Phonemes How a language chooses a set of phonemes in order to build its sound inventory? Is the process arbitrary? Certainly Not! What are the forces affecting this choice?

52 Vowels: A (Partially) Solved Mystery
Languages choose vowels based on maximal perceptual contrast. For instance if a language has three vowels then in more than 95% of the cases they are /a/,/i/, and /u/. Maximally Distinct /u/ /a/ /i/

53 J i g s a w Consonants: A puzzle Research: From 1929 – Date
No single satisfactory explanation of the organization of the consonant inventories The set of features that characterize consonants is much larger than that of vowels No single force is sufficient to explain this organization Rather a complex interplay of forces goes on in shaping these inventories

54 Principle of Occurrence
PlaNet – The “Phoneme-Language Network” A bipartite network N=(VL,VC,E) VL : Nodes representing languages of the world VC : Nodes representing consonants E : Set of edges which run between VL and VC There is an edge e Є E between two nodes vl Є VL and vc Є VC if the consonant c occurs in the language l. Data Source: UPSID (317 languages) L1 L4 L2 L3 /m/ /ŋ/ /p/ /d/ /s/ /θ/ Consonants Languages Choudhury et al ACL Mukherjee et al Int. Jnl of Modern Physics C The Structure of PlaNet

55 Degree Distribution of PlaNet
50 100 150 0.02 0.04 0.06 0.08 Language inventory size (degree k) pk pk = beta(k) with α = 7.06, and β = 47.64 pk = Γ(54.7) k6.06(1-k)46.64 Γ(7.06) Γ(47.64) kmin= 5, kmax= 173, kavg= 21 200 DD of the language nodes follows a β-distribution DD of the consonant nodes follows a power-law with an exponential cut-off Pk 1000 Degree of a consonant, k Pk = k -0.71 Exponential Cut-off 1 10 100 0.001 0.01 0.1 Distribution of Consonants over Languages follow a power-law

56 Synthesis of PlaNet Non-linear preferential attachment
Iteratively construct the language inventories given their inventory sizes L1 L3 L2 L4 After step 3 After step 4 diα+ ε Pr(Ci) = ∑xV* (dxα + ε)

57 Simulation Result PlaNet PlaNetsyn PlaNetrand Pk Degree (k)
1 .1 .01 .001 Degree (k) Pk The parameters α and ε are 1.44 and 0.5 respectively. The results are averaged over 100 runs

58 Principle of Co-occurrence
Consonants tend to co-occur in groups or communities These groups tend to be organized around a few distinctive features (based on: manner of articulation, place of articulation & phonation) – Principle of feature economy voiced voiceless bilabial dental /b/ /p/ /d/ /t/ plosive If a language has in its inventory then it will also tend to have

59 How to Capture these Co-occurrences?
PhoNet – “Phoneme Phoneme Network” A weighted network N=(VC,E) VC : Nodes representing consonants E : Set of edges which run between the nodes in VC There is an edge e Є E between two nodes vc1 ,vc2 Є VC if the consonant c1 and c2 co-occur in a language. The number of languages in which c1 and c2 co-occurs defines the edge-weight of e. The number of languages in which c1 occurs defines the node-weight of vc1. /kw/ /k′/ /k/ /d′/ 42 14 38 13 283 17 50 39

60 Construction of PhoNet
Data Source : UPSID Number of nodes in VC is 541 Number of edges is 34012 PhoNet

61 Community Formation Radicchi et al Algorithm S
3 1 2 4 100 110 101 10 5 6 46 52 45 3 1 2 4 11.11 10.94 7.14 0.06 5 6 3.77 5.17 7.5 S η>1 3 1 2 6 4 5 For different values of η we get different sets of communities

62 Consonant Societies! η=0.35 η=0.60 η=0.72 η=1.25 The fact that the communities are good can quantitatively shown by measuring the feature entropy

63 Problems to ponder on … Physical significance of PA:
Functional forces Historical/Evolutionary process Labeled synthesis of PlaNet and PhoNet Language diversity vs. Preferential attachment

64 CASE STUDY II: Modeling the Mental Lexicon

65 Metal Lexicon (ML) – Basics
It refers to the repository of the word forms that resides in the human brain Two Questions: How words are stored in the long term memory, i.e., the organization of the ML. How are words retrieved from the ML (lexical access) The above questions are highly inter-related – to predict the organization one can investigate how words are retrieved and vice versa.

66 Ways of Organization of Mental Lexicon
Un-organized (a bag full of words) or, Organized By sound (phonological similarity) E.g., start the same: banana, bear, bean … End the same: look, took, book … Number of phonological segments they share By Meaning (semantic similarity) Banana, apple, pear, orange … By age at which the word is acquired By frequency of usage By POS Orthographically

67 Some Unsolved Mysteries – You can Give it a Try 
What can be a model for the evolution of the ML? How is the ML acquired by a child learner? Is there a single optimal structure for the ML; or is it organized based on multiple criteria (i.e., a combination of the different n/ws) – Towards a single framework for studying ML!!!

68 CASE STUDY III: Syntax Unsupervised POS Tagging

69 Labeling of Text Lexical Category (POS tags)
Syntactic Category (Phrases, chunks) Semantic Role (Agent, theme, …) Sense Domain dependent labeling (genes, proteins, …) How to define the set of labels? How to (learn to) predict them automatically?

70 “Nothing makes sense, unless in context”
Distribution-based definition of Lexical category Sense (meaning) The X is … If you X then I shall … … looking at the star PP

71 General Approach Represent the context of a word (token)
Define some notion of similarity between the contexts Cluster the contexts of the tokens Get the label of the tokens w1 w2 w3 w4 … w1 w3 w2 w4

72 Issues How to define the context? How to define similarity
How to Cluster? How to evaluate?

73 Syntactic Network of Words
color sky weight light 1 20 blue 100 blood heavy 1 1 – cos(red, blue) red

74 The Chinese Whisper Algorithm
color sky weight 0.9 0.8 light -0.5 0.7 blue 0.9 blood heavy 0.5 red

75 The Chinese Whisper Algorithm
color sky weight 0.9 0.8 light -0.5 0.7 blue 0.9 blood heavy 0.5 red

76 The Chinese Whisper Algorithm
color sky weight 0.9 0.8 light -0.5 0.7 blue 0.9 blood heavy 0.5 red

77 Word Sense Disambiguation
Véronis, J HyperLex: lexical cartography for information retrieval. Computer Speech & Language 18(3): Let the word to be disambiguated be “light” Select a subcorpus of paragraphs which have at least one occurrence of “light” Construct the word co-occurrence graph

78 HyperLex A beam of white light is dispersed into its component colors by its passage through a prism. Energy efficient light fixtures including solar lights, night lights, energy star lighting, ceiling lighting, wall lighting, lamps What enables us to see the light and experience such wonderful shades of colors during the course of our everyday lives? prism beam dispersed white colors shades energy efficient fixtures lamps

79 Hub Detection and MST prism light beam dispersed white colors lamps
shades beam prism fixtures energy shades energy efficient white dispersed efficient fixtures White fluorescent lights consume less energy than incandescent lamps lamps

80 Other Related Works Solan, Z., Horn, D., Ruppin, E. and Edelman, S Unsupervised learning of natural languages. PNAS, 102 (33): Ferrer i Cancho, R Why do syntactic links not cross? Europhysics Letters Also applied to: IR, Summarization, sentiment detection and categorization, script evaluation, author detection, …

81 Discussions & Conclusions
What we learnt Advantages of SNIC in NLP Comparison to standard techniques Open problems Concluding remarks and Q&A

82 What we learnt What is SNIC and Complex Networks
Analytical tools for SNIC Applications to human languages Three Case-studies: Area Perspective Technique I Sound systems Language evolution and change Synthesis models II Lexicon Psycholinguistic modeling and linguistic typology Topology and search III Syntax & Semantics Applications to NLP Clustering

83 Insights Language features complex structure at every level of organization Linguistic networks have non-trivial properties: scale-free & small-world Therefore, Language and Engineering systems involving language should be studied within the framework of complex systems, esp. CNT

84 Advantages of SNIC Fully Unsupervised techniques: Ease of computation:
No labeled data required: A good solution to resources scarcity Problem of evaluation: circumvented by semi-supervised techniques Ease of computation: Simple and scalable Distributed and parallel computable Holistic treatment: Language evolution & psycho-linguistic theories

85 Comparison to Standard Techniques
Rule-based vs. Statistical NLP Graphical Models Generative models in machine learning HMM, CRF, Bayesian belief networks JJ NN RB VF

86 Graphical Models vs. SNIC
COMPLEX NETWORK Principled: based on Bayesian Theory Structure is assumed and parameters are learnt Focus: Decoding & parameter estimation Data-driven or computationally intensive The generative process is easy to visualize, but no visualization of the data Heuristic, but underlying principles of linear algebra Structure is discovered and studied Focus: Topology and evolutionary dynamics Unsupervised and computationally easy Easy visualization of the data

87 Language Modeling A network of words as a model of language vs. n-gram models Hierarchical, hyper-graph based models Smoothing through holistic analysis of the network topology Jedynak, B. and Karakos, D Unigram Language Models using Diffusion Smoothing over Graphs. Proc. of TextGraphs - 2

88 Open Problems Universals and variables of linguistic networks
Superimposition of networks: phonetic, syntactic, semantic Which clustering algorithm for which topology? Metrics for network comparison – important for language modeling Unsupervised dependency parsing using networks Mining translation equivalents

89 Resources Conferences Journals Tools Online Resources
TextGraphs, Sunbelt, EvoLang, ECCS Journals PRE, Physica A, IJMPC, EPL, PRL, PNAS, QL, ACS, Complexity, Social Networks Tools Pajek, C#UNG, Online Resources Bibliographies, courses on CNT

90 Contact Monojit Choudhury Animesh Mukherjee Niloy Ganguly
Animesh Mukherjee Niloy Ganguly

91 Thank you!! Book Volume on Dynamics on and of Complex Networks
To be published by May 2008 from Birkhauser, Springer


Download ppt "Social Network Inspired Models of NLP and Language Evolution"

Similar presentations


Ads by Google