Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha.

Slides:



Advertisements
Similar presentations
What Did We See? & WikiGIS Chris Pal University of Massachusetts A Talk for Memex Day MSR Redmond, July 19, 2006.
Advertisements

Peter Griffith and Megan McGroddy 4 th NACP All Investigators Meeting February 3, 2013 Expectations and Opportunities for NACP Investigators to Share and.
The Florida College System House Bill 7135: Relating to Postsecondary Education Julie Alexander & Carrie Henderson April 20,
Social networks, in the form of bibliographies and citations, have long been an integral part of the scientific process. We examine how to leverage the.
Title: The Author-Topic Model for Authors and Documents
Pre-bid Conference Self-Pay & Third Party Collections Agenda Sign-in Introductions Alliance Overview Sourcing Event Overview Historically Underutilized.
EuroCRIS Best Practices & Solutions Members Helping Members Move Forward.
Framework for Inferring Ongoing Activities of Workstation Users Yifen Huang, Sophie Wang and Tom Mitchell School of Computer Science Carnegie Mellon University.
Unsupervised and Weakly-Supervised Probabilistic Modeling of Text Ivan Titov April TexPoint fonts used in EMF. Read the TexPoint manual before.
Generative Topic Models for Community Analysis
Search Engines and Information Retrieval
CSE 574 – Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.
Research Introspection “ICML does ICML” Andrew McCallum Computer Science Department University of Massachusetts Amherst.
Part IV: Inference algorithms. Estimation and inference Actually working with probabilistic models requires solving some difficult computational problems…
Latent Dirichlet Allocation a generative model for text
Concepts & Categorization. Measurement of Similarity Geometric approach Featural approach  both are vector representations.
Multiscale Topic Tomography Ramesh Nallapati, William Cohen, Susan Ditmore, John Lafferty & Kin Ung (Johnson and Johnson Group)
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Corporation For National Research Initiatives NSF SMETE Library Building the SMETE Library: Getting Started William Y. Arms.
1 A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search 1 Jie Tang, 2 Ruoming Jin, and 1 Jing Zhang 1 Knowledge.
IEKA - Albanian Institute of Authorized Chartered Auditors Towards application of new standards on accounting and auditing – Albanian challenge on implementing.
UMass and Learning for CALO Andrew McCallum Information Extraction & Synthesis Laboratory Department of Computer Science University of Massachusetts.
Research Overview for Harvard Medical Library Andrew McCallum Associate Professor Computer Science Department University of Massachusetts Amherst.
Topic Models for Social Network Analysis and Bibliometrics
SERVICES TRADE RESTRICTIVENESS INDEX PROFESSIONAL SERVICES ARCHITECTURE Russell V. Keune Architect, USA.
Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.
Bridging the Gap Interview Workshop. Agenda Technology & Entrepreneurship Talent Network Employment Landscape Your Personal Brand Introduction Statement.
1 Educational System of the Official Statistics in the Russian Federation: new methods and approaches Mr. Lev Lovat, Deputy Director of Administrative.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Bylaws, Rules, and Periodic Review – Updates from Standards and Practices Julie Adams, ASCCC Executive Director Craig Rutan, ASCCC South Representative.
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
1 People in CALO’s World: Contact Info, Expertise, Groups & Roles Information Extraction, Coreference, Group/Topic Models Andrew McCallum Aron Culotta,
Search Engines and Information Retrieval Chapter 1.
Bibliometric Impact Measures Leveraging Topic Analysis Gideon Mann David Mimno Andrew McCallum Computer Science Department University of Massachusetts.
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
Project:“Support to the Internationalization of Kosova Higher Education System through establishment of the Kosova Students’ Union” Status Quo on Student.
Structured Topic Models: Jointly Modeling Words and Their Accompanying Modalities Xuerui Wang Computer Science Department University of Massachusetts Amherst.
KDD 2012, Beijing, China Community Discovery and Profiling with Social Messages Wenjun Zhou Hongxia Jin Yan Liu.
Human Factors in Approved Maintenance Organizations: An International Survey Dr. William B. Johnson Chief Scientific & Technical Advisor for Human Factors.
Information Extraction: Distilling Structured Data from Unstructured Text. -Andrew McCallum Presented by Lalit Bist.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Topic Modelling: Beyond Bag of Words By Hanna M. Wallach ICML 2006 Presented by Eric Wang, April 25 th 2008.
ICDM 2003 Review Data Analysis - with comparison between 02 and 03 - Xindong Wu and Alex Tuzhilin Analyzed by Shusaku Tsumoto.
Presenter: Shanshan Lu 03/04/2010
Computing & Information Sciences Kansas State University Wednesday, 22 Oct 2008CIS 530 / 730: Artificial Intelligence Lecture 22 of 42 Wednesday, 22 October.
Rational Requirements Management with Use Cases v5.5 Copyright © Rational Software, all rights reserved 1 Requirements Management with Use Cases.
BUSINESS STATISTICS MGT 2302 BUSINESS STATISTICS MGT 2302 Lecturer Name : Liyana ‘Adilla 1 SCHOOLOGY ACCESS CODE: 7QRB9-4MPNN.
A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,
Probabilistic Models for Discovering E-Communities Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, Hongyuan Zha The Pennsylvania State University WWW.
DEMOGRAPHICS & STATISTICS JAN LinkedIn’s members have reached 147 million, although this figure is an approximation provided by LinkedIn The standard.
Tracking national portfolios and assessing results Sub-regional Workshop for GEF Focal Points in West and Central Africa June 2008, Douala, Cameroon.
Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.
Bibliometric Impact Measures Leveraging Topic Analysis Gideon Mann David Mimno Andrew McCallum Computer Science Department University of Massachusetts.
Information Retrieval
Topic Models for Social Network Analysis and Bibliometrics Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with.
Discovering Latent Structure in Multiple Modalities Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui.
Artificial Intelligence: Research and Collaborative Possibilities a presentation by: Dr. Ernest L. McDuffie, Assistant Professor Department of Computer.
Artificial Intelligence, simulation and modelling.
Topic and Role Discovery in Social Networks Andrew McCallum Andre Corrada-Emmanuel Xuerui Wang Computer Science Department University of Massachusetts.
Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.
Treasury of the Republic of Kazakhstan
Brief Intro to Machine Learning CS539
Online Multiscale Dynamic Topic Models
Latent Variable Models of Social Networks and Text
Topic Models for Social Network Analysis and Bibliometrics
Topic and Role Discovery In Social Networks
Bibliometric Impact Measures Leveraging Topic Analysis
Topic Models for Groups, Correlations, Trends, and Phrases
Michal Rosen-Zvi University of California, Irvine
Topic Models in Text Processing
Presentation transcript:

Statistical Models of (Social) Networks Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang, Natasha Mohanty, Andres Corrada

Workplace effectiveness ~ Ability to leverage network of acquaintances But filling Contacts DB by hand is tedious, and incomplete. Inbox Contacts DB WWW Automatically Managing and Understanding Connections of People in our World

System Overview Contact Info and Person Name Extraction Person Name Extraction Name Coreference Homepage Retrieval Social Network Analysis Keyword Extraction CRF WWW names

An Example To: “Andrew McCallum” Subject... First Name: Andrew Middle Name: Kachites Last Name: McCallum JobTitle:Associate Professor Company:University of Massachusetts Street Address: 140 Governor’s Dr. City:Amherst State:MA Zip:01003 Company Phone: (413) Links:Fernando Pereira, Sam Roweis,… Key Words: Information extraction, social network,… Search for new people

Summary of Results Token Acc Field Prec Field Recall Field F1 CRF PersonKeywords William CohenLogic programming Text categorization Data integration Rule learning Daphne KollerBayesian networks Relational models Probabilistic models Hidden variables Deborah McGuiness Semantic web Description logics Knowledge representation Ontologies Tom MitchellMachine learning Cognitive states Learning apprentice Artificial intelligence Contact info and name extraction performance (25 fields) Example keywords extracted 1.Expert Finding: When solving some task, find friends-of-friends with relevant expertise. Avoid “stove-piping” in large org’s by automatically suggesting collaborators. Given a task, automatically suggest the right team for the job. (Hiring aid!) 2.Social Network Analysis: Understand the social structure of your organization. Suggest structural changes for improved efficiency.

Social Network in an Dataset

Outline Social Network Analysis with (Language) Attributes –Roles and Topics (Author-Recipient-Topic Model) –Groups and Topics (Group-Topic Model) Demo: Rexa, a Web portal for researchers

Outline Social Network Analysis with (Language) Attributes –Roles and Topics (Author-Recipient-Topic Model) –Groups and Topics (Group-Topic Model) Demo: Rexa, a Web portal for researchers

Clustering words into topics with Latent Dirichlet Allocation [Blei, Ng, Jordan 2003] Sample a distribution over topics,  For each document: Sample a topic, z For each word in doc Sample a word from the topic, w Example: 70% Iraq war 30% US election Iraq war “bombing” Generative Process:

STORY STORIES TELL CHARACTER CHARACTERS AUTHOR READ TOLD SETTING TALES PLOT TELLING SHORT FICTION ACTION TRUE EVENTS TELLS TALE NOVEL MIND WORLD DREAM DREAMS THOUGHT IMAGINATION MOMENT THOUGHTS OWN REAL LIFE IMAGINE SENSE CONSCIOUSNESS STRANGE FEELING WHOLE BEING MIGHT HOPE WATER FISH SEA SWIM SWIMMING POOL LIKE SHELL SHARK TANK SHELLS SHARKS DIVING DOLPHINS SWAM LONG SEAL DIVE DOLPHIN UNDERWATER DISEASE BACTERIA DISEASES GERMS FEVER CAUSE CAUSED SPREAD VIRUSES INFECTION VIRUS MICROORGANISMS PERSON INFECTIOUS COMMON CAUSING SMALLPOX BODY INFECTIONS CERTAIN Example topics induced from a large collection of text FIELD MAGNETIC MAGNET WIRE NEEDLE CURRENT COIL POLES IRON COMPASS LINES CORE ELECTRIC DIRECTION FORCE MAGNETS BE MAGNETISM POLE INDUCED SCIENCE STUDY SCIENTISTS SCIENTIFIC KNOWLEDGE WORK RESEARCH CHEMISTRY TECHNOLOGY MANY MATHEMATICS BIOLOGY FIELD PHYSICS LABORATORY STUDIES WORLD SCIENTIST STUDYING SCIENCES BALL GAME TEAM FOOTBALL BASEBALL PLAYERS PLAY FIELD PLAYER BASKETBALL COACH PLAYED PLAYING HIT TENNIS TEAMS GAMES SPORTS BAT TERRY JOB WORK JOBS CAREER EXPERIENCE EMPLOYMENT OPPORTUNITIES WORKING TRAINING SKILLS CAREERS POSITIONS FIND POSITION FIELD OCCUPATIONS REQUIRE OPPORTUNITY EARN ABLE [Tennenbaum et al]

STORY STORIES TELL CHARACTER CHARACTERS AUTHOR READ TOLD SETTING TALES PLOT TELLING SHORT FICTION ACTION TRUE EVENTS TELLS TALE NOVEL MIND WORLD DREAM DREAMS THOUGHT IMAGINATION MOMENT THOUGHTS OWN REAL LIFE IMAGINE SENSE CONSCIOUSNESS STRANGE FEELING WHOLE BEING MIGHT HOPE WATER FISH SEA SWIM SWIMMING POOL LIKE SHELL SHARK TANK SHELLS SHARKS DIVING DOLPHINS SWAM LONG SEAL DIVE DOLPHIN UNDERWATER DISEASE BACTERIA DISEASES GERMS FEVER CAUSE CAUSED SPREAD VIRUSES INFECTION VIRUS MICROORGANISMS PERSON INFECTIOUS COMMON CAUSING SMALLPOX BODY INFECTIONS CERTAIN FIELD MAGNETIC MAGNET WIRE NEEDLE CURRENT COIL POLES IRON COMPASS LINES CORE ELECTRIC DIRECTION FORCE MAGNETS BE MAGNETISM POLE INDUCED SCIENCE STUDY SCIENTISTS SCIENTIFIC KNOWLEDGE WORK RESEARCH CHEMISTRY TECHNOLOGY MANY MATHEMATICS BIOLOGY FIELD PHYSICS LABORATORY STUDIES WORLD SCIENTIST STUDYING SCIENCES BALL GAME TEAM FOOTBALL BASEBALL PLAYERS PLAY FIELD PLAYER BASKETBALL COACH PLAYED PLAYING HIT TENNIS TEAMS GAMES SPORTS BAT TERRY JOB WORK JOBS CAREER EXPERIENCE EMPLOYMENT OPPORTUNITIES WORKING TRAINING SKILLS CAREERS POSITIONS FIND POSITION FIELD OCCUPATIONS REQUIRE OPPORTUNITY EARN ABLE Example topics induced from a large collection of text [Tennenbaum et al]

From LDA to Author-Recipient-Topic (ART)

Inference and Estimation Gibbs Sampling: - Easy to implement - Reasonably fast r

Enron Corpus 250k messages 23k people Date: Wed, 11 Apr :56: (PDT) From: To: Subject: Enron/TransAltaContract dated Jan 1, 2001 Please see below. Katalin Kiss of TransAlta has requested an electronic copy of our final draft? Are you OK with this? If so, the only version I have is the original draft without revisions. DP Debra Perlingiere Enron North America Corp. Legal Department 1400 Smith Street, EB 3885 Houston, Texas

Topics, and prominent senders / receivers discovered by ART Topic names, by hand

Topics, and prominent senders / receivers discovered by ART Beck = “Chief Operations Officer” Dasovich = “Government Relations Executive” Shapiro = “Vice President of Regulatory Affairs” Steffes = “Vice President of Government Affairs”

Comparing Role Discovery connection strength (A,B) = distribution over authored topics Traditional SNA distribution over recipients distribution over authored topics Author-TopicART

Comparing Role Discovery Tracy Geaconne  Dan McCarty Traditional SNAAuthor-TopicART Similar roles Different roles Geaconne = “Secretary” McCarty = “Vice President”

Traditional SNAAuthor-TopicART Different roles Very similarNot very similar Geaconne = “Secretary” Hayslett = “Vice President & CTO” Comparing Role Discovery Tracy Geaconne  Rod Hayslett

Traditional SNAAuthor-TopicART Different roles Very differentVery similar Blair = “Gas pipeline logistics” Watson = “Pipeline facilities planning” Comparing Role Discovery Lynn Blair  Kimberly Watson

McCallum Corpus 2004 January - October k messages 825 people From: Subject: NIPS and.... Date: June 14, :27:41 PM EDT To: There is pertinent stuff on the first yellow folder that is completed either travel or other things, so please sign that first folder anyway. Then, here is the reminder of the things I'm still waiting for: NIPS registration receipt. CALO registration receipt. Thanks, Kate

McCallum Blockstructure

Four most prominent topics in discussions with ____?

Two most prominent topics in discussions with ____?

Pairs with highest rank difference between ART & SNA 5 other professors 3 other ML researchers

Role-Author-Recipient-Topic Models

Results with RART: People in “Role #3” in Academic olc lead Linux sysadmin gauthier sysadmin for CIIR group irsystem mailing list CIIR sysadmins system mailing list for dept. sysadmins allan Prof., chair of “computing committee” valerie second Linux sysadmin tech mailing list for dept. hardware steve head of dept. I.T. support

Roles for allan (James Allan) Role #3I.T. support Role #2Natural Language researcher Roles for pereira (Fernando Pereira) Role #2Natural Language researcher Role #4SRI CALO project participant Role #6Grant proposal writer Role #10Grant proposal coordinator Role #8Guests at McCallum’s house

Summary Traditionally, SNA examines links, but not the language content on those links. This talk introduced ART, an Bayesian network model for messages sent in a social network: it captures topics and role-similarity. RART explicitly represents roles. Future work: –Explicitly model & discover roles and groups –Integrate with coreference and relation extraction –Model correlations and topic/group trends over time

Traditional SNAAuthor-TopicART Block structured Not ART: Roles but not Groups Enron TransWestern Division

Outline Social Network Analysis with (Language) Attributes –Roles and Topics (Author-Recipient-Topic Model) –Groups and Topics (Group-Topic Model) Demo: Rexa, a Web portal for researchers 

Groups and Topics Input: –Observed relations between people –Attributes on those relations (text, or categorical) Output: –Attributes clustered into “topics” –Groups of people---varying depending on topic

Discovering Groups from Observed Set of Relations Admiration relations among six high school students. Student Roster Adams Bennett Carter Davis Edwards Frederking Academic Admiration Acad(A, B) Acad(C, B) Acad(A, D) Acad(C, D) Acad(B, E) Acad(D, E) Acad(B, F) Acad(D, F) Acad(E, A) Acad(F, A) Acad(E, C) Acad(F, C)

Adjacency Matrix Representing Relations ABCDEF A B C D E F ABCDEF G1G2G1G2G3 G1 G2 G1 G2 G3 A B C D E F ACBDEF G1 G2 G3 G1 G2 G3 A C B D E F Student Roster Adams Bennett Carter Davis Edwards Frederking Academic Admiration Acad(A, B) Acad(C, B) Acad(A, D) Acad(C, D) Acad(B, E) Acad(D, E) Acad(B, F) Acad(D, F) Acad(E, A) Acad(F, A) Acad(E, C) Acad(F, C)

Group Model: Partitioning Entities into Groups Stochastic Blockstructures for Relations [Nowicki, Snijders 2001] S: number of entities G: number of groups Enhanced with arbitrary number of groups in [Kemp, Griffiths, Tenenbaum 2004] Beta Dirichlet Binomial Multinomial

Two Relations with Different Attributes ACBDEF G1 G2 G3 G1 G2 G3 ACEBDF G1 G2 G1 G2 A C E B D F Student Roster Adams Bennett Carter Davis Edwards Frederking Academic Admiration Acad(A, B) Acad(C, B) Acad(A, D) Acad(C, D) Acad(B, E) Acad(D, E) Acad(B, F) Acad(D, F) Acad(E, A) Acad(F, A) Acad(E, C) Acad(F, C) Social Admiration Soci(A, B) Soci(A, D) Soci(A, F) Soci(B, A) Soci(B, C) Soci(B, E) Soci(C, B) Soci(C, D) Soci(C, F) Soci(D, A) Soci(D, C) Soci(D, E) Soci(E, B) Soci(E, D) Soci(E, F) Soci(F, A) Soci(F, C) Soci(F, E) A C B D E F

D: number of documents T: number of topics : number of tokens in document d Simple Topic Model: Good for Single Topic Documents Mixture of Unigrams Dirichlet Multinomial Uniform

Goal: Model relations and their (textual) attributes simultaneously to obtain better groups and more meaningful topics. budget, funding, annual, cash document, corrections, review, annual

The Group-Topic Model: Discovering Groups and Topics Simultaneously Dirichlet Multinomial Uniform Beta Dirichlet Binomial Multinomial

Inference and Estimation Gibbs Sampling: - Many r.v.s can be integrated out - Easy to implement - Reasonably fast We assume the relationship is symmetric.

Dataset #1: U.S. Senate 16 years of voting records in the US Senate (1989 – 2005) a Senator may respond Yea or Nay to a resolution 3423 resolutions with text attributes (index terms) 191 Senators in total across 16 years S.543 Title: An Act to reform Federal deposit insurance, protect the deposit insurance funds, recapitalize the Bank Insurance Fund, improve supervision and regulation of insured depository institutions, and for other purposes. Sponsor: Sen Riegle, Donald W., Jr. [MI] (introduced 3/5/1991) Cosponsors (2) Latest Major Action: 12/19/1991 Became Public Law No: Index terms: Banks and banking Accounting Administrative fees Cost control Credit Deposit insurance Depressed areas and other 110 terms Banks and bankingAccountingAdministrative feesCost control CreditDeposit insuranceDepressed areas Adams (D-WA), Nay Akaka (D-HI), Yea Bentsen (D-TX), Yea Biden (D-DE), Yea Bond (R-MO), Yea Bradley (D-NJ), Nay Conrad (D-ND), Nay ……

Topics Discovered (U.S. Senate) EducationEnergy Military Misc. Economic educationenergygovernmentfederal schoolpowermilitarylabor aidwaterforeigninsurance childrennucleartaxaid druggascongresstax studentspetrolaidbusiness elementaryresearchlawemployee preventionpollutionpolicycare Mixture of Unigrams Group-Topic Model Education + Domestic ForeignEconomic Social Security + Medicare educationforeignlaborsocial schooltradeinsurancesecurity federalchemicalstaxinsurance aidtariffcongressmedical governmentcongressincomecare taxdrugsminimummedicare energycommunicablewagedisability researchdiseasesbusinessassistance

Groups Discovered (US Senate) Groups from topic Education + Domestic

Senators Who Change Coalition the most Dependent on Topic e.g. Senator Shelby (D-AL) votes with the Republicans on Economic with the Democrats on Education + Domestic with a small group of maverick Republicans on Social Security + Medicaid

Dataset #2: The UN General Assembly Voting records of the UN General Assembly ( ) A country may choose to vote Yes, No or Abstain 931 resolutions with text attributes (titles) 192 countries in total Also experiments later with resolutions from Vote on Permanent Sovereignty of Palestinian People, 87th plenary meetingPermanent Sovereignty of Palestinian People The draft resolution on permanent sovereignty of the Palestinian people in the occupied Palestinian territory, including Jerusalem, and of the Arab population in the occupied Syrian Golan over their natural resources (document A/54/591) was adopted by a recorded vote of 145 in favour to 3 against with 6 abstentions: In favour: Afghanistan, Argentina, Belgium, Brazil, Canada, China, France, Germany, India, Japan, Mexico, Netherlands, New Zealand, Pakistan, Panama, Russian Federation, South Africa, Spain, Turkey, and other 126 countries. Against: Israel, Marshall Islands, United States. Abstain: Australia, Cameroon, Georgia, Kazakhstan, Uzbekistan, Zambia.

Topics Discovered (UN) Everything Nuclear Human Rights Security in Middle East nuclearrightsoccupied weaponshumanisrael usepalestinesyria implementationsituationsecurity countriesisraelcalls Mixture of Unigrams Group-Topic Model Nuclear Non-proliferation Nuclear Arms Race Human Rights nuclear rights statesarmshuman unitedpreventionpalestine weaponsraceoccupied nationsspaceisrael

Groups Discovered (UN) The countries list for each group are ordered by their 2005 GDP (PPP) and only 5 countries are shown in groups that have more than 5 members.

Do We Get Better Groups with the GT Model? 1.Cluster bills into topics using mixture of unigrams; 2.Apply group model on topic- specific subsets of bills. Agreement Index (AI) measures group cohesion. Higher, better. DatasetsAvg. AI for BaselineAvg. AI for GTp-value Senate <.01 UN <.01 1.Jointly cluster topic and groups at the same time using the GT model. Baseline Model GT Model

Groups and Topics, Trends over Time (UN)

Outline Social Network Analysis with (Language) Attributes –Roles and Topics (Author-Recipient-Topic Model) –Groups and Topics (Group-Topic Model) Demo: Rexa, a Web portal for researchers  

Previous Systems

Research Paper Cites Previous Systems

Research Paper Cites Person UniversityVenue Grant Groups Expertise More Entities and Relations

Outline Examples of IE and Data Mining. Brief introduction of Conditional Random Fields Joint inference: Motivation and examples –Joint Labeling of Cascaded Sequences (Belief Propagation) –Joint Labeling for Transfer Learning (Piecewise Training & BP) –Joint Labeling of Distant Entities (BP by Tree Reparameterization) –Joint Co-reference Resolution (Graph Partitioning) –Joint Segmentation and Co-ref (Sparse BP) Joint Topic Discovery and Social Network Analysis –Roles and Topics (Author-Recipient-Topic Model) –Groups and Topics (Group-Topic Model) Demo: Rexa, a Web portal for researchers     

Summary CRFs: conditional probability structured models Joint inference can avoid accumulating errors in an pipeline from extraction to data mining Early examples –Factorial finite state models –Jointly labeling distant entities –Coreference analysis –Segmentation uncertainty aiding coreference , contact management, expert-finding, SNA –Discover topics, roles, & groups from text and relational data. New research paper search engine coming soon.

End of Talk

Summary Traditionally, SNA examines links, but not the language content on those links. Presented ART, an Bayesian network for messages sent in a social network: captures topics and role-similarity. RART explicitly represents roles. Additional work –Group-Topic model discovers groups and clusters attributes of relations. [Wang, Mohanty, McCallum, LinkKDD 2005]