Human Wayfinding in Information Networks

Human Wayfinding in Information Networks
Presented by Ori Yaish -Jure Leskovec Computer Science Department Stanford University -Robert West Computer Science Department Stanford University

Navigation Finding a path between two nodes, from “start” node to a “target” node when we use only local information. Local information - seeing only current and previously visited nodes as well as their direct neighbors.

What do you think is a good online resource to understand how humans navigate

Why Wikipedia? 1. Hyperlinks. 2. Representing human knowledge.
הקישורים עוזרים לנו לנווט ברגע שזה ידע אנושי זה נותן יותר מקום לאינטואיציה

Present work Wikispeedia:
we use the online human-computation game Wikispeedia ,in which player are given two random articles and aim to solve the task of navigating from one to the other by clicking as few hyperlinks as possible. להציג דוגמא למשחק מ DIK-DIK to ALBERT EINSTEIN

Contributions: Providing important insights about the methods used by information seekers and human efficiency. predicting what piece of information the information seeker is trying to locate.

EFFICIENCY OF HUMAN SEARCH
Black- shortest possible paths Blue –effective human paths( ignoring back clicks) Red- complete human paths (including back clicks) להסביר על הגרף ועל כל הצבעים בפרטי פרטים: שחור- מסלול קצר ביותר אדום – מסלול של אדם שהחישוב כולל לחיצות אחורת כחול – אדם- חישוב לא כולל לחיצות אחורה. -לומר שאצל בני אדם יש שונות לומר שנראה שבני אדם יעילים. לשאול למה השונות?

What causes the high variance
Hardness of mission? Individual skills?

And the answer is…both. הגרף מראה את שניהם:
1) רואים שבכל המשחקים עדין קיימת שונות למרות שהם spl 3 אז זה מצדד ביכולות אישיות 2) ירוק מול כחול מראה שהמשחק הירוק יותר קשה למרות ששניהם בעלי spl 3 אז זה מצדד בקושי

The second question: Why is human search so efficient on average?

What about drop-outs? 54% of all games in the data set were canceled before finishing. The probability of giving up at every step is around 10%. להסביר שמכיוון שיש פרישות ומספר הפורשים הולך וגדל ככל שהמסלול ארוך אז נדרש תיקון.

Effect of correcting for drop-outs
אנחנו רואים שלמרות תיקון ה"פורשים" עדין היעילות טובה. -We see the effect on mode/median is small Green – drop out corrected

EFFICIENCY OF HUMAN SEARCH
Another explanation: Wikipedia graph is efficiently navigable for humans because they have an intuition about what links to expect.

It’s hard to formalize human node distance measure…
אבל אנחנו צריכים לפרמל את אותה אינטואציה. באיזה שיטה נוכל לחשב זאת?

TF-IDF (a reminder) We define the similarity of two articles as the cosine of their TF-IDF vectors and the distance as (1-similarity).

TF-IDF (a reminder) term frequency tf(t,d)- the number of times that term t occurs in document d. inverse document frequency is a measure of how much information the word provides, that is, whether the term is common or rare across all documents.

TF-IDF (a reminder) :Then TF–IDF is calculated as

TF-IDF (a reminder)

Elements of human wayfinding
They investigate how some key features change as games progress from the start towards the target article. List of features: Average degree of an article at every step. Lucrative degree- the number of outgoing links that decrease the SPL to the target. TF-IDF distance. … AVERAGE DEGREE - כלומר כמה הקשת היא מרכזית----רכז קשת משמעותית כלומר כמה היא מקרבת אותנו אל היעד.

1) Average degree: רואים שכבר בצעד הראשון עבור כל המשחקים הדרגה הממוצעת עולה ל80-100

2) Lucrative degree- the number of outgoing links that decrease the SPL to the target.

3) TF-IDF distance. יש יותר דמיון בסוף המסלול

So what are the most important factors in human wayfinding in Wikipedia?

Most important factors: 1) Lucrative & average Degree – in the beginning, when finding a good hub is important. 2) Similarity – later, when homing in is important.

Endgame strategy Full category & top-level category(C) Example: The full category of DIK-DIK is SCIENCE/BIOLOGY/MAMMALS, and C(DIK-DIK) = SCIENCE. endgame strategy corresponding to an endgame (u[n−2],u[n−1],u[n]) is defined as: (C(u[n−2]),C(u[n−1]),C(u[n]))

Endgame strategy There are two approaches:
‘simple’ strategy: people tend to approach the target through articles from the same category as the target. ‘multi-category’ strategy: people tend to approach the target through a category not as the target. Which is more efficient? לשאול את המרצה.

Endgame strategy Overhead = (l−l∗)/l∗
From left to right: PEOPLE, MUSIC, IT, LANGUAGE AND LITERATURE, HISTORY, SCIENCE, RELIGION, DESIGN AND TECHNOLOGY, CITIZENSHIP, ART, BUSINESS STUDIES,MATHEMATICS, EVERYDAY LIFE, GEOGRAPHY.

Endgame strategy - Conclusion
Reaching the target by the ‘simple’ strategy is more safe & conceptually simple but can prolong the game. It often pays off to think out of the box

Target prediction

Target prediction Designing a learning algorithm for predicting an information seeker’s target, given only a prefix(q) of a few clicks.

Target prediction P(q|t;Θ),
Designing a learning algorithm for predicting an information seeker’s target, given only a prefix(q) of a k-1 clicks. So we define the likelihood of t given the prefix q as : P(q|t;Θ),

Human Markov model The most likely target is:
Multiplying the local click probabilities

Target prediction Two models of click probability:
Binomial logistic model – 2) Multinomial logistic model – לדבר על features לדבר על כך שהמכנה הוא בעצם סה"כ השכנים של u(i) לדבר עם המרצה על השינוי בין שתי המשוואות.

Target prediction Features for learning: Local features:

Target prediction

Target prediction Bold – multi Thin – binomial Dashed – tf-idf
Performance of the target prediction algorithms: Given a prefix q and a choice of two targets. הגרף השמאלי הוא בחירה בין שני מאמרים...אחד נכון ואחד שגוי. הגרף הימני הוא בחירה בין כל מאמרים שבעלי אותה דרגה. לדבר עם המרצה לגבי cumulative

Target prediction Bold – multi Thin – binomial Dashed – tf-idf
Performance of the target prediction algorithms: 2. Given a prefix q and a set of articles. הגרף השמאלי הוא בחירה בין שני מאמרים...אחד נכון ואחד שגוי. הגרף הימני הוא בחירה בין כל מאמרים שבעלי אותה דרגה. לדבר עם המרצה לגבי cumulative

Conclusions studying more than 30,000 goal-directed human search paths and identify aggregate strategies people use when navigating information spaces. In the opening of games it is common to navigate through hubs, but later through similarity. building a predictive model of human wayfinding that can be applied towards intelligent browsing interfaces.

Human Wayfinding in Information Networks

Similar presentations

Presentation on theme: "Human Wayfinding in Information Networks"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Human Wayfinding in Information Networks

Similar presentations

Presentation on theme: "Human Wayfinding in Information Networks"— Presentation transcript:

Similar presentations

About project

Feedback