Faculty: Dr. Chengcui Zhang Students: Wei-Bang Chen Song Gao Richa Tiwari.

Slides:



Advertisements
Similar presentations
Web Mining.
Advertisements

Recommender Systems & Collaborative Filtering
Complex Networks for Representation and Characterization of Images For CS790g Project Bingdong Li 9/23/2009.
Analysis and Modeling of Social Networks Foudalis Ilias.
Xiaowei Ying, Xintao Wu, Daniel Barbara Spectrum based Fraud Detection in Social Networks 1.
Identifying Image Spam Authorship with a Variable Bin-width Histogram-based Projective Clustering Song Gao, Chengcui Zhang, Wei Bang Chen Department of.
LYRIC-BASED ARTIST NETWORK METHODOLOGY Derek Gossi CS 765 Fall 2014.
Mid-level Visual Element Discovery as Discriminative Mode Seeking Harley Montgomery 11/15/13.
1 Social Influence Analysis in Large-scale Networks Jie Tang 1, Jimeng Sun 2, Chi Wang 1, and Zi Yang 1 1 Dept. of Computer Science and Technology Tsinghua.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
UNDERSTANDING VISIBLE AND LATENT INTERACTIONS IN ONLINE SOCIAL NETWORK Presented by: Nisha Ranga Under guidance of : Prof. Augustin Chaintreau.
Communities in Heterogeneous Networks Chapter 4 1 Chapter 4, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool,
Discovering Overlapping Groups in Social Media Xufei Wang, Lei Tang, Huiji Gao, and Huan Liu Arizona State University.
CS 728 Lecture 4 It’s a Small World on the Web. Small World Networks It is a ‘small world’ after all –Billions of people on Earth, yet every pair separated.
ACM Multimedia th Annual Conference, October , 2004
Detecting Fraudulent Personalities in Networks of Online Auctioneers Duen Horng (“Polo”) Chau Shashank Pandit Christos Faloutsos School of Computer Science.
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
Segmentation Graph-Theoretic Clustering.
Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz University of Maryland Tuesday, June 29, 2010 This work is licensed.
Web Projections Learning from Contextual Subgraphs of the Web Jure Leskovec, CMU Susan Dumais, MSR Eric Horvitz, MSR.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
BotGraph: Large Scale Spamming Botnet Detection Yao Zhao Yinglian Xie *, Fang Yu *, Qifa Ke *, Yuan Yu *, Yan Chen and Eliot Gillum ‡ EECS Department,
CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.
Query Log Analysis Naama Kraus Slides are based on the papers: Andrei Broder, A taxonomy of web search Ricardo Baeza-Yates, Graphs from Search Engine Queries.
GAYATRI SWAMYNATHAN, CHRISTO WILSON, BRYCE BOE, KEVIN ALMEROTH AND BEN Y. ZHAO UC SANTA BARBARA Do Social Networks Improve e-Commerce? A Study on Social.
Models of Influence in Online Social Networks
CHAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling
Social Network Analysis via Factor Graph Model
Author: M.E.J. Newman Presenter: Guoliang Liu Date:5/4/2012.
Presented to you by Christian A. Penner - Mortgage Banker WebSite: Facebook:
Anomalous Node Detection in Time Series of Mobile Communication Graphs Leman Akoglu January 28, 2010.
PageRank for Product Image Search Kevin Jing (Googlc IncGVU, College of Computing, Georgia Institute of Technology) Shumeet Baluja (Google Inc.) WWW 2008.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Texture. Texture is an innate property of all surfaces (clouds, trees, bricks, hair etc…). It refers to visual patterns of homogeneity and does not result.
Temporal Analysis using Sci2 Ted Polley and Dr. Katy Börner Cyberinfrastructure for Network Science Center Information Visualization Laboratory School.
Center for E-Business Technology Seoul National University Seoul, Korea BrowseRank: letting the web users vote for page importance Yuting Liu, Bin Gao,
Attributed Visualization of Collaborative Workspaces Mao Lin Huang, Quang Vinh Nguyen and Tom Hintz Faculty of Information Technology University of Technology,
Mining Social Network for Personalized Prioritization Language Techonology Institute School of Computer Science Carnegie Mellon University Shinjae.
BotGraph: Large Scale Spamming Botnet Detection Yao Zhao, Yinglian Xie, Fang Yu, Qifa Ke, Yuan Yu, Yan Chen, and Eliot Gillum Speaker: 林佳宜.
INDIANAUNIVERSITYINDIANAUNIVERSITY FlowRank Presentation by ANML July 2004.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
1. 2 CIShell Features A framework for easy integration of new and existing algorithms written in any programming language. CIShell Sci2 Tool NWB Tool.
Jiafeng Guo(ICT) Xueqi Cheng(ICT) Hua-Wei Shen(ICT) Gu Xu (MSRA) Speaker: Rui-Rui Li Supervisor: Prof. Ben Kao.
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
Object Recognition Part 2 Authors: Kobus Barnard, Pinar Duygulu, Nado de Freitas, and David Forsyth Slides by Rong Zhang CSE 595 – Words and Pictures Presentation.
Network Community Behavior to Infer Human Activities.
1 A Biterm Topic Model for Short Texts Xiaohui Yan, Jiafeng Guo, Yanyan Lan, Xueqi Cheng Institute of Computing Technology, Chinese Academy of Sciences.
Post-Ranking query suggestion by diversifying search Chao Wang.
Context-Aware Query Classification Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang Microsoft Research Asia SIGIR.
Contextual models for object detection using boosted random fields by Antonio Torralba, Kevin P. Murphy and William T. Freeman.
Overlapping Community Detection in Networks
A code-centric cluster-based approach for searching online support forums for programmers Christopher Scaffidi, Christopher Chambers, Sheela Surisetty.
Speaker : Yu-Hui Chen Authors : Dinuka A. Soysa, Denis Guangyin Chen, Oscar C. Au, and Amine Bermak From : 2013 IEEE Symposium on Computational Intelligence.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
1 Random Walks on the Click Graph Nick Craswell and Martin Szummer Microsoft Research Cambridge SIGIR 2007.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
1 Relational Factor Graphs Lin Liao Joint work with Dieter Fox.
Soft Computing Lecture 15 Constructive learning algorithms. Network of Hamming.
Ariel Fuxman, Panayiotis Tsaparas, Kannan Achan, Rakesh Agrawal (2008) - Akanksha Saxena 1.
Fraud Detection with Machine Learning: A Case Study from Sift Science
Document Clustering with Prior Knowledge Xiang Ji et al. Document Clustering with Prior Knowledge. SIGIR 2006 Presenter: Suhan Yu.
Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.
Search User Behavior: Expanding The Web Search Frontier
Gephi Gephi is a tool for exploring and understanding graphs. Like Photoshop (but for graphs), the user interacts with the representation, manipulate the.
NetMine: Mining Tools for Large Graphs
Dieudo Mulamba November 2017
Distributed Representations of Subgraphs
Homology network of 4,131 putative oomycete RxLR effectors.
Shan Lu, Jieqi Kang, Weibo Gong, Don Towsley UMASS Amherst
Presentation transcript:

Faculty: Dr. Chengcui Zhang Students: Wei-Bang Chen Song Gao Richa Tiwari

Past projects Image Spam Clustering Project – Cluster image spam through common visual features present in image attachments – Reveal common origins of image spam

examples 3 These two spam images exemplify illustrations with similar color composition but different layouts. This example demonstrates illustrations in spam with similar layouts but different color composition.

Ongoing projects: – Phishing website clustering by text and visual similarity

Nat West Helpful Bonking Accessibility I Help Got a question? We can help … Nat West Helpful Bonking Help 24x7 can’t I log in? Accessibility I Help … RBS ThQ Roy& Bank cq3codand Make it happen … Text Recognized by OCR

A Sample Cluster for PayPal

4 Clusters Relate to PayPal Cluster ID: 15 (76 Images)Cluster ID: 28 (20 Images)Cluster ID: 49 (13 Images)Cluster ID: 57 (22 Images)

Dataset Statistics 8 Days (7-10,17-19 & 22 Feb., 2011) Total number of phishing website screen-shot images: 1461 Total number of produced clusters (cutoff similarity value = 60%): (ungrouped)

Observations: high cluster purity Hard to measure completeness Next step: – Incorporate visual features such as visual layout – Brand

Ongoing projects: – Uncovering auction fraud from eBay transaction graph - Initial study

Data set: eBay transaction feedbacks – A total of 220,000 (two-hundred and twenty thousand) users are crawled. Idea of belief propagation: – Fraudsters create two types of identities - fraud and accomplice, where fraud identities are the ones used eventually to carry out the actual fraud, and the accomplice identities are the ones used to help build the reputation for the fraud identities. This pattern forms a near bipartite core in the transaction graph.

Algorithm: – Each vertex in the transaction graph is labeled by one of {fraud, accomplice, honest} based on their pattern of interaction with other vertexes. – Belief propagation (BP) is used to optimize the labeling across the entire graph by maximizing the joint probabilities of all the vertexes. – Honest user model: Barabasi-Albert model

Evaluation results on the sparse eBay transaction dataset – 20% accomplice – 50% fraud??? What can be improved: – Network too sparse (average degree is ~5, ideally >=10) – Initial probabilities (1/3, 1/3, 1/3) may not make sense. – BP seems not to scale well with large graphs.

Projects under plan: – Modeling online user navigation patterns and detecting anomalies using click stream data

Idea #1: Each user session is represented by an n-dimensional feature vector, where n is the number of Web pages in the session. – The value of each feature is a weight, indicating the degree of interest of the user in the particular Web page. – Based on these vectors, clusters of similar sessions are produced and characterized by the Web pages with the highest associated weights.

Idea #2: Markov Model – Pages (or page categories) as states Or page+parameters as nodes – Transition probabilities between nodes Idea #3: Graph partitioning – Pages as nodes – Edges as connectivity/weight between a pair of pages Co-occurrence, time difference, etc. – Graph partitioning to find groups of strongly correlated pages

Projects under plan: – Novel biometrics

Palm print photo

Touch panel: handdrawing