Motivation and Problem Statement The problem statement of our research paper is to predict the location of the user based purely on the content of the user tweets, based on the blogs of the author also predicting the personal information of the user and also protecting the privacy by proposing different method in defending them In predicting the geolocation of the user enable content personalization e.g.: targeting advertisements, publishing related stories, targeting public health web mining. By addressing this problem statement one can predict the geolocation of the user thereby not using the IP address, private user information.
Our Contributions Studying the geographical scope of the online content like Twitter, webpages, digital library, search engine query logs and even web users attracted many researches The techniques of predicting the geolocation of the users can be classified into 3 different categories.(In accordance with the papers we have taken). Content Analysis with terms in a gazetteer Content Analysis with the probabilistic language models Inference via social relations
Classification The location of user is estimated using the web content based on the geo- related terms in specialized external knowledge base as gazetteer.The terms include extracting the addresses, the postal code and other information listed in a geographical gazetteer from the web content to identify the associated geographical scope of web pages and blogs Gelocation techniques Inference via social relations Content Analysis with the probabilistic language models Content Analysis with terms in gazetteer
The probabilistic model is based on the photos labelled by Flickr etc. Based on these models and Bayesian inference, estimate the location of photo. In the case of privacy information, using the user’s social relations the user’s private information may be inferred.Many researchers assume that users related in social networks usually have common attributes.
Content Analysis with terms in gazetteer M1. The system is implement using Web Fountain data-mining framework that was developed at the IBM research. This tool helps is crawling the web, storing the resulting pages and indexing their contents. The process used 600 pages containing over 7000 geotags and collections of pages: Arbitrary collection gov collection ODP collection M2:The method proposed by the author extracts all the geographic location entity mentioned in the blogs.Using the disambiguation locations extracted the method determines the geographic focus of the blog
M3:The main aim of the algorithm is to determine for a webpage or page segment a set of places it describes. As a web page is essentially a document tree, the algorithm traverses the tree to construct segments and segment tree in a depth first fashion M4: Toponym disambiguation is used. It consists of identifying and categorizing and disambiguation of names. The method applied by author depends on the internal and external to the text A heuristic method is used since more concentration is on identifying the geographic terms. Algorithm sorts the term into geographic and non geographic –then classifies accordingly.
Content Analysis with Probabilistic Analysis: Method 1: Tweolocator
You are where you Tweet Baseline location estimation: Based on maximum likelihood estimation, the probabilistic distribution over cities for word w can be formalized as p(i|w), which identifies for each word w the likelihood that it was issued by a user located in city i. Identifying local tweets in words: Local words are identified that are specific to a location Focus C and dispersion ‘alpha’ are related as Laplace smoothing:
Placing Flickr photos on Map: A location in our framework is represented by a multinomial probability distribution over the vocabulary of tags. The locations are then ranked by the probability to generate the tag set of the image. Tag based smoothing with neighbours : Smoothing cell relevance probabilities: It is reasonable to assume that“good”locations come from “good” neighbourhoods. Boosting geo related tags: Spatial ambiguity-aware smoothing:
Mapping the world’s photos Estimating location from Visual Features and tags: Features based on both Bayesian classifiers and linear support vector machines are used by the author to report the results. Features: For experimental consideration only the image features, only the text features and a combination of both were considered. Visual features have the advantage that they are inherent to the photo itself, whereas textual tags are only available if a human user has added them and even then can be irrelevant to geo- classification. Visual features: Invariant interest point detection has been a popular technique for handling variations. Representative images: Main difference is that reconstructing or using a detailed 3 d information about a landmark is not used but rather finding canonical images for each landmark among vast amounts of data is used.
Social Network Classification Incorporating Link Type Values There are mainly two new Bayesian classification methods Link type relational Bayes Classifier and Weighted Link Type Bayes Classifier. Link Type: In this we use the Link types in the algorithm which can differentiate what type of the links two individual shared which is used for the probability of the calculations. Weighted Link: The author has included the weights to the Link Type rBC. So as to differentiate the priority of the links such as director, producer, hero....etc, from the dataset.
Inferring Private Information Using Social Network Data Consider ways to infer private information via friendship links by creating a Bayesian Network from the links inside a social network The author has proposed a program that crawl the Facebook Network that is used for the experiment. The author has implemented a new version of traditional Naive Bayes classifier to create a list of most representative traits in graph, so that we can remove the K links and K most predictive traits.
Find Me If You Can: Improving Geographical Prediction with Social and Spatial Proximity The author introduced an algorithm that predicts the location of an individual from a sparse set of located users with performance that exceeds IP-based geolocation Knowledge of user’s location helps in providing the improved services and security to the users.
Inferring Privacy Information via Social Relations: In this algorithm chooses discriminative groups of social networks and then extracts the sub-network of those groups and finally classifiers users with missing labels only in this sub-network. The algorithm stops when no more users can be classified.
Conclusion: The 12 survey papers were classified into 3 different categories which can be used in predicting the location of individual. The 3 different categories proposed use massive human powered sensing capabilities of Twitter and related micro blogging services which depends heavily on the presence of location information. In order to overcome the limitation of predicting the user information without using IP address the authors proposed new content based techniques which enables in location based personalized data. The different methods proposed like the probabilistic model accurately predicted the location of individual based on the probability model. The privacy information which can be extracted using the social relation of users helps in defending the users personal data from attacker.
Future Work The methods proposed and described are purely data driven approach, which can be refined through incorporation of more data i.e. in global scale. Future there can be consideration of population for cities when normalizing the probability for a word to occur in different cities . Accuracy can be improved by improving the “disambiguating context” heuristics, and by devising additional ones. Heuristics based on the coordinates of places should be compared with the taxonomy-based methods . Also the content based approaches and social tie as well as incorporating temporal information into location estimation can be included in the future work.
References .Einat Amitay, Nadav Har'EI, Ron Sivan, Aya Soffer, Web -a- Where: Geotagging Web Content, SIGIR '04 Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval Pages 273-280.SIGIR '04 .Wenbo Zong, Dan Wu, Aixin Sun, Dion Hoe-Lian Goh, On Assigning Place Names to Geography Related Web Pages, JCDL '05 Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries pages 354-362.JCDL '05 .Clayton Fink, Christine Piatko, James Mayfield, Danielle Chou, Tim Finin, Justin Martineau, The Geo-location of Web Logs from Textual clues, Computational Science and Engineering, 2009. CSE '09. International Conference on 29-31 Aug2009.Computational Science and Engineering, 2009. CSE '09. International Conference on . David A. Smith and Gregory Crane, Disambiguating Geographic Names in a Historical Digital Library, ECDL '01 Proceedings of the 5th European Conference on Research and Advanced Technology for Digital Libraries Pages 127-136. .Rodolfo Gonzalez, Gerardo Figueroa, Yi-Shin Chen,TweoLocator: A Non Intrusive Geographical Locator System for Twitter, LBSN '12 Proceedings of the 5th ACM SIGSPATIAL International Workshop on Location-Based Social Networks Pages 24-31.LBSN '12 .Zhiyuan Cheng, James Caverlee, Kyumin Lee, You are where you Tweet: A Content-Based approach to Geo-locating Twitter Users, CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management Pages 759-768.
.Pavel Serdyukov, Vanessa Murdock, Roelot van Zwol, Placing Flickr Photos on a Map, SIGIR '09 Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval Pages 484-491.SIGIR '09 .David Crandall, Lars Backstrom, Daniel Huttenlocher, Jon Kleinberg, Mapping the world’s photos, WWW '09 Proceedings of the 18th international conference on World wide web Pages 761-770.WWW '09 .R.Heatherly, M. Kantarcioglu, and B. Thuraisingham, Social network classification Incorporating Link Types. In IEEE Intelligence and Security Informatics, 2009 ISI '09. IEEE International Conference on 8-11June 2009. ISI '09. IEEE International Conference on .Jack Lindamood, Raymond Heatherly, Murat Kantarcioglu, Inferring Private Information Using Social Network Data, WWW '09 Proceedings of the 18th international conference onWWW '09 World wide web pages 1145-1146. .Lars Backstrom, Eric Sun, Cameron Marlow, Find Me If You Can: Improving Geographical Prediction with Social and Spatial Proximity, In proceedings of 19th international conference on World wide web, page 61- 70, 2011. .Wanhong Xu, Xi Zhou, Lei Li, Inferring Privacy Information via Social Relations, Data Engineering Workshop, 2008. ICDEW 2008. IEEE 24th International Conference on 7-12 April 2008.Data Engineering Workshop, 2008. ICDEW 2008. IEEE 24th International Conference on