HOMEPAGE & SEARCH ENGINE 2008.12.08.  2. About Cloud computing  3. Application Introduction - Nutch - Google App Engine  4. Presentation Contents.

Slides:



Advertisements
Similar presentations
Welcome to Middleware Joseph Amrithraj
Advertisements

Retrieval of Information from Distributed Databases By Ananth Anandhakrishnan.
© Copyright 2012 STI INNSBRUCK Apache Lucene Ioan Toma based on slides from Aaron Bannert
How to Use LucidWorks Search
Google App Engine Cloud B. Ramamurthy 7/11/2014CSE651, B. Ramamurthy1.
Web Categorization Crawler – Part I Mohammed Agabaria Adam Shobash Supervisor: Victor Kulikov Winter 2009/10 Final Presentation Sep Web Categorization.
Lucene & Nutch Lucene  Project name  Started as text index engine Nutch  A complete web search engine, including: Crawling, indexing, searching  Index.
How Search Engines Work Source:
Engineering the Cloud Andrew McCombs March 10th, 2011.
SEO Techniques Tech Talk 29 th August 2013 (By PEN Vannak)
IDK0040 Võrgurakendused I Building a site: Publicising Deniss Kumlander.
Google App Engine and Java Application: Clustering Internet search results for a person Aleksandar Kartelj Faculty of Mathematics,
Platform as a Service (PaaS)
Google AppEngine. Google App Engine enables you to build and host web apps on the same systems that power Google applications. App Engine offers fast.
By: Devesh Sharma.  Why Cloud Computing? ◦ Traditional Business Applications  Expensive  Complicated  Difficult to manage  Idea behind Cloud Computing.
Amazon EC2 Quick Start adapted from EC2_GetStarted.html.
Google App Engine Danail Alexiev Technical Trainer SoftAcad.bg.
Nutch Search Engine Tool. Nutch overview A full-fledged web search engine Functionalities of Nutch  Internet and Intranet crawling  Parsing different.
Crawling Ida Mele. Nutch Apache Nutch is an open source Java implementation of a search engine We can use Nutch for crawling a portion of the Web Useful.
CS344: Introduction to Artificial Intelligence Vishal Vachhani M.Tech, CSE Lecture 34-35: CLIR and Ranking, Crawling and Indexing in IR.
Prof. A.R. Rele. What Is Google App Engine? Google App Engine lets users run web applications on Google's infrastructure. App Engine applications are.
Lecture 8 – Platform as a Service. Introduction We have discussed the SPI model of Cloud Computing – IaaS – PaaS – SaaS.
Cloud Distributed Computing Environment Content of this lecture is primarily from the book “Hadoop, The Definite Guide 2/e)
1 NETE4631 Using Google Web Services and Using Microsoft Cloud Services Lecture Notes #7.
Nutch in a Nutshell (part I) Presented by Liew Guo Min Zhao Jin.
Secure Search Engine Ivan Zhou Xinyi Dong. Introduction  The Secure Search Engine project is a search engine that utilizes special modules to test the.
Meet with the AppEngine Márk Gergely eu.edge. What is AppEngine? It’s a tool, that lets you run your web applications on Google's infrastructure. –Google's.
EXPOSE GOOGLE APP ENGINE AS TASKTRACKER NODES AND DATA NODES.
Web Categorization Crawler Mohammed Agabaria Adam Shobash Supervisor: Victor Kulikov Winter 2009/10 Design & Architecture Dec
National Center for Supercomputing Applications NCSA OPIE Presentation November 2000.
Introduction to Nutch CSCI 572: Information Retrieval and Search Engines Summer 2010.
Module 10 Administering and Configuring SharePoint Search.
Curtis Spencer Ezra Burgoyne An Internet Forum Index.
The Anatomy of a Large-Scale Hyper textual Web Search Engine S. Brin, L. Page Presenter :- Abhishek Taneja.
Search Engines.
The Digital Archive Database Tool Shih Lin Computing Center Academia Sinica.
CS 347Notes101 CS 347 Parallel and Distributed Data Processing Distributed Information Retrieval Hector Garcia-Molina Zoltan Gyongyi.
Using Software-as-a-Service for tracking document delivery (& ILL) requests Perseus Rex Molina Cataloger Librarian Br. Fidelis Leddy Learning Resource.
HUSKY CONSULTANTS FRANKLIN VALENCIA WIOLETA MILCZAREK ANTHONY GAGLIARDI JR. BRIAN CONNERY.
Paperless Timesheet Management Project Anant Pednekar.
My project  Small-Medium Enterprises (SMEs)  faces goods distribution problems  needs necessary resources, money and technical expertise, to purchase.
1 Google App Engine APIs :Overview Feb – March, 2010 Patrick Chanezon Developer Advocate Google Developer Relations
1 Google App Engine APIs :Overview Feb – March, 2010 Patrick Chanezon Developer Advocate Google Developer Relations
GOOGLE APP ENGINE By Muktadiur Rahman. Contents  Cloud Computing  What is App Engine  Why App Engine  Development with App Engine  Quote & Pricing.
Experiments in Utility Computing: Hadoop and Condor Sameer Paranjpye Y! Web Search.
Cloud Distributed Computing Environment Hadoop. Hadoop is an open-source software system that provides a distributed computing environment on cloud (data.
VIVO architecture March 1, Major Components Vitro is a general-purpose Web-based application leveraging semantic standards VIVO is a customized.
1 NETE4631 Using Google Web Services Lecture Notes #6.
Apache Solr Dima Ionut Daniel. Contents What is Apache Solr? Architecture Features Core Solr Concepts Configuration Conclusions Bibliography.
XAMPP.
Search Engine and Optimization 1. Introduction to Web Search Engines 2.
Google App Engine. Contents Overview Getting Started Databases Inter-app Communications Modes.
Crawling When the Google visit your website for the purpose of tracking, Google does this with help of machine, known as web crawler, spider, Google bot,
Platform as a Service (PaaS)
Deploying Web Application
Platform as a Service (PaaS)
Platform as a Service (PaaS)
IST 516 Fall 2010 Dongwon Lee, Ph.D. Wonhong Nam, Ph.D.
Amazon Storage- S3 and Glacier
Platform as a Service.
Andrew McCombs March 10th, 2011
The Improvement of PaaS Platform ZENG Shu-Qing, Xu Jie-Bin 2010 First International Conference on Networking and Distributed Computing SQUARE.
PHP / MySQL Introduction
Google App Engine Danail Alexiev
Crawling Ida Mele.
Google App Engine Ying Zou 01/24/2016.
Sergey Brin, lawrence Page, The anatomy of a large scale hypertextual web search Engine Rogier Brussee ICI
Introduction to Nutch Zhao Dongsheng
Web Application Development Using PHP
Presentation transcript:

HOMEPAGE & SEARCH ENGINE

 2. About Cloud computing  3. Application Introduction - Nutch - Google App Engine  4. Presentation Contents

2. ABOUT CLOUD COMPUTING

 Cloud computing is Internet-based ("cloud") development and use of computer technology ("computing").  Cloud computing is a general concept that incorporates software as a service (SaaS), Web 2.0 and other recent, well-known technology trends, in which the common theme is reliance on the Internet for satisfying the computing needs of the users. 2. What is Cloud computing?

3. APPLICATION INTRODUCTION

 open source web-search software based Lucene  원래는 Apache Lucene project 의 sub-project  Lucene 을 좀더 사용하기 편하게 하기 위한 목적  Lucene Java : Apache 의 매우 유명한 open source search engine 3-1.What is ‘Nutch’?

 Transparency. Nutch is open source, so anyone can see how the ranking algorithms work.  Understanding. Nutch has been built using ideas from academia and industry for instance, core parts of Nutch are currently being re-implemented to use the Map Reduce distributed processing modelMap Reduce Nutch is attractive for researchers who want to try out new search algorithms, since it is so easy to extend What is Nutch?

 Extensibility. Nutch is very flexible it can be customized and incorporated into your application. For developers, Nutch is a great platform for adding search to heterogeneous collections of information, and being able to customize the search interface, or extend the out-of-the-box functionality through the plugin mechanism What is Nutch?

 Nutch divides naturally into two pieces: the crawler the searcher  Crawl 페이지를 수집 페이지에 대한 index 를 만든다 index 는 Crawl 과 Search 간의 가교 역할을 한다  Search 유저의 요청에 따라 필요한 정보를 찾아서 보여준다 3-1. What is Nutch?

 More detail about crawler the Nutch crawler system produces three key data structures: The WebDB containing the web graph of pages and links. A set of segments containing the raw data retrieved from the Web by the fetchers. The merged index created by indexing and de- duplicating parsed data from the segments What is Nutch?

 More detail about searcher Nutch looks for these in the index and segments subdirectories of the directory defined in the searcher.dir property. The default value for searcher.dir is the current directory (.), which is where you started Tomcat What is Nutch?

1. crawl db 로부터 url 의 목록을 생성한다. 2. segment 에서 url 의 목록을 fetch 한다. 3. segment 에서 fetch 한 contents 를 분석 (parse) 한다. 4. 세그먼트로부터 crawl db 와 분석한 데이터를 업데이트 한다 5. segments 로부터 invert 링크를 분석한다. 6. segment 문서와 anchor 문서에 대한 색인을 생성한다. 이 부분을 계속 반복 실행 3-1. What is Nutch?

 Nutch 실행 방법 Nutch 가 설치된 directory 에서 cralwing 을 시작 >> /bin/nutch crawl –dir urls crawl –depth 3 -topN 10 Tomcat 5.5 를 실행 주의할 점 : Nutch directory 에서 tomcat 을 실행시켜 야 함 >> /opt/apache-tomcat /bin/catalina.sh start What is Nutch?

 Nutch 0.9 from apache-nutch homepage  JAVA JDK-6  Tomcat 5.5 version 이상 version  OS : Linux server Edition Cygwin for Window’s developer 3-2.Development environment of Nutch

A project for Cloud Computing of Google Google web application platform Easy to build, easy to maintain, and easy to scale as user’s traffic and data storage needs grow No servers to maintain, with App Engine : just upload an application, and it’s ready to serve your users What is ‘Google App Engine’?

 Google App Engine 에서 제공하는 기능 Python 이 제공하는 기본 기능 Python 으로 만들어 졌기 때문 BigTable/GFS 기술이 뒷받침하는 견고한 Datastore Google 에서 만든 기존의 oracle, mysql 과 같은 database 확장성을 제공하는 호스팅 공간 Free ‘Google’ account SDK 를 이용한 로컬 개발 및 테스트 3-3. What is ‘Google App Engine’?

Google’s Moto : “Web Development that doesn’t hurt” Google App Engine 을 통해 웹 서비스 개발자들은 또 다른 고통 없이 개발할 수 있는 선택권을 갖게 된다. Load balancing, automatic scaling, dynamic web serving 등을 Google App Engine 에서 제공할테니 걱정 없이 application 개발만 신경 써라 다만, 이 선택에는 세가지의 제약이 따른다. 1. 모든 코드는 반드시 Python 으로 작성해야 한다. 현재, perl 로 개발 중 2. 사용량 제한을 통해 비용 지불의 가능성이 존재한다. 무료로 제공되는 사용량 500MB of persistent storage and enough CPU and bandwidth for about 5 million page views a month 3. 모든 데이타는 구글 플랫폼에서 움직이며 구글이 갖게 된다는 점이다. 이는, 구글 플랫폼에 종속된 어플리케이션은 쉽게 구글 플랫폼을 벗어나 지 못하게 할 것이다. 3 번 째 제약이 Google App Engine 의 가장 치명적 3-3. What is ‘Google App Engine’?

 Google App Engine 실행 방법 Google-engine 이 설치된 directory 로 이동 Google-engine 실행 명령 dev_appserver.py bono/ : Test 용 appcfg.py update bono/ : Web 에 uploading 함 ID & PWD 를 매번 입력하여 uploading 결과 화면 확인 What is Google App Engine?

 Google App Engine using the App Engine software development kit (SDK)  Python 2.5 You need active Python in window environment  OS : Windows Mac OS X Linux 3-4. The Development Environment

4. PRESENTATION

 Nutch  Google App Engine + Nutch  Another example of using Google App Engine 4. Presentation