Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Small World of Human Language Ramon Ferrer i Cancho & Richard V. Sole presented by Emre Erdem.

Similar presentations


Presentation on theme: "The Small World of Human Language Ramon Ferrer i Cancho & Richard V. Sole presented by Emre Erdem."— Presentation transcript:

1 The Small World of Human Language Ramon Ferrer i Cancho & Richard V. Sole presented by Emre Erdem

2 Introduction Zipf’s Law (Zipf 1972) Zipf’s Law: the frequency of words decays as a power function of its rank In spite of its relevance and universality, such a law can be obtained by various mechanisms and does not provide deep insight into the organization of the language A complete theory of language requires a theoretical understanding of its implicit statistical regularities. Zip’s Law is the best known

3 Introduction Lexicons lexicon1.dictionary 2.list of vocabulary belonging to a specific field kernel lexicon: a common lexicon for successful basic communication Human brains store lexicons that are usually formed by thousands of words. (in the range of words)

4 Introduction Co-occurrence of words in sentences relies on the network structure of the lexicon. Human language can be described in terms of a graph of word interactions. This graph has some unexpected properties that might underlie its diversity and flexibility, and create new questions about its origins and organization

5 Graph Properties of Human Language Words co-occur in sentences Syntactical relationships Stereotyped expressions or collocations (New York, take it easy)

6 Graph Properties of Human Language Links If the distance is long, the risk of capturing spurious co-occurrences increases If the distance is too short, certain strong co- occurrences can be systematically not taken into account The most correlated words in a sentence are the closest. A decision must be taken about the maximum distance considered for forming links. Links: Significant co-occurrences between words in the same sentence.

7 Graph Properties of Human Language Links A toy network constructed with four sentences John is tall John drinks water Mary is blonde Mary drinks wine The graph is constructed by linking words at a distance one or two in the same sentence

8 Graph Properties of Human Language Links The maximum distance is decided according to minimum distance at which most of the co- occurrences are likely to happen Many co-occurrences take place at a distance of one red flowers (adjective-noun), stay here (verb-adverb), can see (modal-verb), getting dark (verb-adjective), the/this house (article- determiner-noun) Many co-occurrences take place at a distance of two hit the ball (verb-object), Mary usually cries (subject-verb), table of wood (noun-noun through a prepositional phrase), live in Boston (verb-noun)

9 Graph Properties of Human Language Links Seek will be stopped at a distance of two Lack of an automatic capturing technique Method fails to capture the exact relationships but does capture almost every possible type of links We are not interested in all the relationships. Our goal is to capture as many links as possible through an automatic procedure. A long-distance syntactic link implies the existence of lower- distance syntactic links. By contrast a short-distance link does not imply a long-distance link

10 Graph Properties of Human Language Improving the technique Choose only pairs of consecutive words, the mutual co-occurrence of which is larger than expected by chance. : presence of correlations (co-occurances in real case) : expected from random ordering (theoretical probability of co-occurance) if this condition is used in the graph restricted graph

11 Graph Properties of Human Language The Graph : the graph of human language set of words set of edges or connections between words

12 Graph Properties of Human Language The Graph Possible pattern of wiring in. Black nodes are common words and white nodes are rare words. Two words are linked if they occur significantly

13 Graph Properties of Human Language The Small World Properties C: clustering coefficient d: path length The small world pattern can be detected from the analysis of two basic statistical properties:

14 Graph Properties of Human Language The Small World Properties = 1 if there is a link betweenand = 0 otherwise : set of links : average number of links per word : the set of nearest neighbors of a word The clustering coefficient for this word ( ) is defined as the number of connections between the words

15 Graph Properties of Human Language Clustering coefficient define (total number of edges that exists) : the set of nearest neighbors (possible number of edges X 2)

16 Graph Properties of Human Language Average path length average path length of a word min path length between two words

17 Scaling and Small-World Patterns UWN (Unrestricted Word Network): the networks that results from basic method RWN (Restricted Word Network): the networks that results from improved method edgesnodes average connectivity

18 Scaling and Small-World Patterns Distribution of degrees both the UWN and RWN obtained after processing three- quarters of the words The exponent in the second regime is similar to the so-called Barabasi-Albert model (exponent is –3) BA model leads to scale free distributions using the rule of preferential attachment

19 Scaling and Small-World Patterns More frequent a word, the more available it is for production and comprehension. This phenomenon is known as frequency or recency effect. This phenomenon explains why preferential attachment shapes the scale-free distribution of our case For the most frequent words, where k is the degree and f is the frequency Higher the degree of a word, the higher its availability complete relationship between k and f in RWN

20 Scaling and Small-World Patterns Kernel Words The network formed exclusively by interaction of kernel words, hereafter called the Kernel Word Network ( KWN ) better agrees with the predictions that can be performed when preferential attachment is at play.

21 Scaling and Small-World Patterns Kernel Words Connectivity distribution for the kernel word network The connectivity distribution for the kernel word network formed by 5000 most connected vertices in RWN The average connectivity in the kernel is

22 Power law tail for The exponent of the power tail is indicating that preferential attachment is happening Scaling and Small-World Patterns Kernel Words

23 Discussion If the SW features derive from optimal navigation needs Words the main purpose of which is to speed- up navigation must exist.__ Brain disorders characterized by navigation deficits in which such words are involved must exist__

24 10 most connected words: and the of in a to s with by is These words are characterized by a very low or zero semantic content (meaning) Although they are supposed to contribute to the sentence structure, they are not generally crucial for sentence understanding _ _ Discussion First Prediction

25 Discussion Second Prediction Agrammatism: a kind of aphasia in which speech is non-fluent, laboured, halting and lacking in function words aphasia: total or partial loss of the ability to use or understand spoken or written language. It is a symptom of brain disease or injury Agrammatism is the only syndrome in which function words are particularly omitted. Function words are the most connected ones. Such halts and lack of fluency are due to fragility associated with the removal of highly connected words. It is known that omission of function words is often accompanied by substitution of such words. Patients in which substitutions predominate and speech is fluent are said to undergo paragrammatism. Paragrammatism recovers fluency (i.e. low average word-word distance) by inadequately using the remaining highly connected vertices and thus often producing substitutions of words during discourse. _ _

26 Thank you…


Download ppt "The Small World of Human Language Ramon Ferrer i Cancho & Richard V. Sole presented by Emre Erdem."

Similar presentations


Ads by Google