Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Introduction to Triple Scoring (WSDM Cup T2)

Similar presentations


Presentation on theme: "An Introduction to Triple Scoring (WSDM Cup T2)"— Presentation transcript:

1 An Introduction to Triple Scoring (WSDM Cup T2)
Meng Jiang TA: Huan Gui Slides: OR you can visit and find the slides and a complementary dataset (Freebase feature).

2 Task Definition (Data)
Profession Nationality Train (.train) 515 pairs Barack Obama, Politician, 7 Barack Obama, Law professor, 1 Lady Gaga, Singer-songwriter, 7 Lady Gaga, Fashion Designer, 3 162 pairs George Santayana, United States of America, 6 George Santayana, Spain, 5 Leslie Cheung, Hong Kong, 7 Leslie Cheung, Canada, 3 Candidates (.list) 200 Baseball Manager/player Fashion Designer/Model Film (Art) Director 100 China/France/Spain (Chinese? French? Spanish?) United States of America (USA? US? U.S.? America?) England/Scotland/Wales/Northern Ireland/United Kingdom (UK?) .all 499,244 pairs 318,779 pairs Wiki 385,426 persons 33,159,354 sentences: (every person >= 3) Freebase Available for all Person (Entity) – Ontology ID – Type

3 How to Evaluate Your Software?
Learning: for every person/value, find K-length vector v(.). Testing: similarity(v(person), v(value)) in [0,1]: score(0-7)/7 Ev. 1: Cross validation. Ev. 2: Human judgment. Profession Nationality .train Training 80%: 412 pairs Testing 20%: 103 pairs Acc. ± Std. (10 times) Training 80%: 130 pairs Testing 20%: 32 pairs Profession Nationality .train Training 100%: 515 pairs Training 100%: 162 pairs .all Testing: 499,244 pairs Acc. (100 pairs by TAs) Testing: 318,779 pairs

4 What You Have for Learning
Weighted small sparse matrix (.train): Person by Value, 0-1 Binary large sparse matrix (.all): Person by Value, 0/1 Methods: Non-negative matrix factorization, Support Vector Decomposition, Network embedding, etc. Optimization function? K Person (few) Person (few) Value (few) Value (few) K K Person (full) Person (full) Value (full) Value (full) K

5 S1: Learning Freebase Features
Freebase feature dataset: Barack Obama m.02mjmr award.award_winner people.measured_person organization.organization_member award.ranked_item government.u_s_congressperson base.type_ontology.non_agent film.person_or_entity_appearing_in_film base.type_ontology.physically_instantiable tv.tv_personality base.liveuspoliticians.topic people.appointer user.robert.default_domain.my_favorite_things government.political_appointer music.featured_artist user.neothilic.default_domain.funny_guy symbols.name_source base.type_ontology.agent education.honorary_degree_recipient base.crime.topic base.nobelprizes.nobel_prize_winner medicine.diagnostic_test base.samplepro.topic base.type_ontology.abstract people.person base.politicalconventions.topic base.qualia.topic internet.blogger base.famouspets.pet_owner base.academia.topic user.robert.x2008_presidential_election.candidate government.polled_entity business.board_member common.topic base.poldb.topic base.type_ontology.animate base.schemastaging.person_extra base.todolists.topic award.award_nominee business.employer internet.social_network_user user.narphorium.people.topic biology.animal_owner government.us_president broadcast.producer base.ovguide.topic fictional_universe.person_in_fiction base.x2011internationalyearforpeopleofafricandescent.topic base.creativemindsatwork.topic base.cannapedia.topic influence.influence_node organization.organization_founder base.politicalconventions.convention_speaker base.coinsdaily.design base.duiattorneys.topic book.author base.firsts.first fictional_universe.fictional_character base.crime.lawyer base.qualia.recreational_drug_user tv.tv_program_guest base.politicalconventions.primary_candidate base.nobelprizes.topic user.robert.us_congress.topic base.mybase4.topic base.popstra.organization base.popstra.sww_base base.litcentral.named_person base.litcentral.topic government.politician user.robert.default_domain.presidential_candidate user.robert.x2008_presidential_election.topic user.colin.default_domain.twitter_topic base.propositions.proposition_issue base.schemastaging.government_position_held_extra base.firsts.topic base.famouspets.topic book.book_subject user.loveyou2madly.default_domain.famous_author visual_art.art_subject base.inaugurations.topic event.public_speaker base.endorsements.endorsee base.saturdaynightlive.topic music.composer people.family_member base.blackhistorymonth.topic book.poem_character music.artist base.schemastaging.context_name royalty.chivalric_order_officer base.popstra.topic celebrities.celebrity base.schemastaging.topic base.popstra.celebrity base.tagit.concept base.saturdaynightlive.person_impersonated_on_snl film.film_subject architecture.building_occupant base.popstra.company user.jamie.sunlight.legislator music.group_member base.inaugurations.inauguration_speaker base.propositions.topic base.politicalconventions.presidential_nominee user.narphorium.people.nndb_person media_common.quotation_subject

6 S1: Learning Freebase Features
Binary large sparse matrix: Person by Freebase-feature, 0-1 How to integrate the three matrices? K Person (full) Person (full) Feature (many) K Feature (many) K Person (few) Person (full) Person (full) Person (full) Value (full) Value (few) Value (full) K Feature (many)

7 S2. Learning with Text Truth: (Adeyto, France, 7), (Adeyto, Germany, 1) Sentences: [Adeyto] ( France ) . [Adeyto] ( born 1976 ) , French singer-songwriter , actress and director . [Adeyto] was born in Strasbourg , France , to a German father and French mother . word2vec [Mikolov et al. NIPS’13] : “Distributed Representations of Words and Phrases and their Compositionality” Skip-gram model architecture: input, projection, output Hierarchical Softmax Negative sampling Subsampling of frequent words

8 Pipeline Enrich the type vocabulary (manually attaching freq. words)
Linking/unifying value candidates in the text [Adeyto] ( [France] ) . [Adeyto] ( born 1976 ) , [French] [singer-songwriter] , [actress] and [director] .  [France] [Singer-songwriter] , [Actress] and [Film Director] [Adeyto] was born in Strasbourg , [France] , to a [German] father and [French] mother . Embedding (word2vec, etc.) for K-length vectors Profession Nationality Candidates (.list) 200 Baseball Manager/player Fashion Designer/Model Film (Art) Director 100 China/France/Spain (Chinese? French? Spanish?) United States of America (USA? US? U.S.? America?) England/Scotland/Wales/Northern Ireland/United Kingdom (UK?)

9 Type-Aware Factorization/Embedding
Typed sentences [$Person:Adeyto] ( [$Nation:France] ) . [$Person:Adeyto] ( born 1976 ) , [$Nation:France] [$Profession:Singer-songwriter] , [$Profession:Actress] and [$Profession:Film Director] . [$Person:Adeyto] was born in Strasbourg , [$Nation:France] , to a [$Nation:German] father and [$Nation:French] mother . Factorization: rich non-negative (co-exist in sentences) Person (amlost full) Person (amlost full) Profession (almost full) Nation (almost full)

10 S3. Learning with Meta Patterns
Typed sentences [$Person:Adeyto] ( [$Nation:France] ) . [$Person:Adeyto] ( born 1976 ) , [$Nation:France] [$Profession:Singer-songwriter] , [$Profession:Actress] and [$Profession:Film Director] . [$Person:Adeyto] was born in Strasbourg , [$Nation:France] , to a [$Nation:German] father and [$Nation:French] mother . Segmentation: Meta Patterns for Precise Value [$Person] ( [$Nation] ) : 100% accuracy? [$Person] ( born [$Year] ) , [$Nation] : 100% accuracy? [$Nation] [$Profession] : 100% accuracy? [$Nation] [$Profession] , [$Profession] and [$Profession] : (find context [$Person], 100% accuracy?

11 Summary Learning how to do data-mining experiments/projects (cross-validation, evaluation, etc.) Machine learning for low-dimensional representation Selecting the best similarity/distance measure S1: Learning vectors with Freebase features S2: Learning vectors with Text (embedding, etc.) S3: Learning precise values with Meta Pattern (segmentation) Finally, how to merge results of the above to achieve the best performance?


Download ppt "An Introduction to Triple Scoring (WSDM Cup T2)"

Similar presentations


Ads by Google