Cluster / Grid with Web and Semantic Services

Cluster / Grid with Web and Semantic Services
Dr G Sudha Sadasivam Professor, CSE PSG College of Technology Coimbatore

Agenda Web Services SOA Semantics Grid Architecture
3rd Generation Grid Architecture Semantic Grid Cluster Architecture- Hadoop Amazon Web Services Work at Grid and Cloud Computing Lab - PSGCT ORGANISING A BIRTHDAY PARTY????

PRODUCTS AND SERVICES – A TRADITIONAL WAY OF DISCOVERING AND ACCESSING

INFORMATION SERVICES

1. Web Service A service is a set of actions that form a coherent whole from the point of view of service providers and service requesters - Arranging for a birthday party. Web services provide a standard means of interoperating between different software applications, running on a variety of platforms and/or frameworks in a transparent and loosely coupled manner A Web service is a software system designed to support interoperable machine-to-machine interaction has an interface described in a machine-processable format (WSDL). communication using standard SOAP-messages, on HTTP with an XML serialization in conjunction with other Web-related std. UDDI registry identified by URI Web service is an entity that can be: Described (using WSDL) Published Discovered Invoked by a client W3C technology standardization process

Web Service Interactions

COMPONENTS A Web service is an abstract notion that is implemented by a concrete agent. Elements The provider entity is the person or organization that provides an appropriate agent to implement a particular service. A requester entity is a person or organization that wishes to make use of a provider entity's Web service. Registry – to register the services Web Service Discovery: Before message exchange, the requester entity and the provider entity must first agree on both the semantics and the mechanics of the message exchange The service description (WSD) (message formats, datatypes, transport protocols, and transport serialization formats) represents a contract governing the mechanics of interacting with a particular service. The semantics represents a contract governing the meaning (consequence and purpose) of that interaction.

2. SOA Aim: Alignment of Business needs with IT
Architectural style of building enterprise solutions based on services SOA is a blueprint that governs creation, deployment, execution and management of reusable business services. WSA is an instance of SOA (Architecture – independent of tech.) Services provide independent, loosely coupled, transparent, composable invocation of tasks in a standard way. SOA separates functions into distinct units (services), which can be distributed over a network and can be combined and reused to create business applications. These services communicate with each other by passing data from one service to another, or by coordinating an activity between two or more services. Guiding principles – Reusability, Open standards Services (identification, categorization, monitoring and tracking)

Alignment of Business needs with IT

services services

Service Oriented architecture
Services created using an SOA and provided by an organisation’s IT should directly support the services that the organisation provides to its customers. (BP – IT) Human-mediated service Self- service System-system delivery service Service Oriented architecture contract Legacy system New system Composite system SOA is a blueprint that governs creation, deployment, execution and management of reusable business services It aligns Business and Technology SO business delivers services to its customers

SOA roles Business Role: SOA is viewed as a set of services that a business wants to expose to customers and clients. Architectural Role: SOA is an architectural style which requires a service provider, requestor and a service description. It provides services that fosters modularity, encapsulation, loose coupling, separation of concerns, reuse, composable and single implementation. Implementation Role: SOA is a complete programming model (process) with standards, tools, methods and techniques, technologies.

SOA suite Model and Capture business processes and policies
Integrate the services using ESB and orchestrate the services into BP Develop, connect and bind services to build composite applications Deploy composite applications and to perform service level management Apply runtime policies to services and govern them Activity monitoring to gain real-time information on BP

Service A service is a manager entity that consists of a collection of components that work together to deliver the business function (currency conversion/airline reservations) A service maps to a business function but a component maps to business entities and the business rules that operate on them. Bank teller application components - loan component, savings bank component (with withdrawal / deposit), account manager (to create new accounts). Service - the interfaces of all components (group) can be composed and exposed as services - creation of new accounts, withdrawal and deposit services and loan service.

DYNAMIC RECONFIGURATION CHOREOGRAPHY
SUPPORTS SERVICES BUSINESS GOALS HAS SERVICE DESCRIPTION COMBINED DYNAMIC RECONFIGURATION CHOREOGRAPHY

UI, Business processes, Service Layer, Component Layer, Object Layer
PRESENTATION – portal for aggregation of contents to users Business Process Layer Automation logic Orchestration of services. Service layer – collection of units of work (interfaces) Processing logic Component layer – operations that are units of work. SLA Object layer / legacy – Messages for communication (Operational)

Terms in SOA Services Service provider
Service consumer (or Service requestor Service locator or service registry Service broker – passes service requests to one or more service providers.

SOA LIFE CYCLE Incremental Iterative Expose Compose Consume
Business Drivers CREATION OF SERVICES FROM EXISTING / new COMPONENTS COMBINE EXISTING SERVICES USE SERVICES Consumer view : Service identification Service Categorisation Service exposure Choreography QoS Provider view : Component identification Component Specification Service realisation Service management Standards Implementation

Advantages standardisation Faster time to market
Operational efficiency and adaptability Agility to collaborate Continuous improvement Aligns business to IT Ease of introducing new technologies Return of Investment (ROI) Vendor diversity Services – encap, loose coupling, contract, reuse, composability, autonomous, dynamic, higher granularity

SERVICE ORIENTED ARCHITECTURE
Transport layer Service Communication Protocol (ESB) Service Description Service Business Process Service Registry Policy Security Transaction Management

Problems in Web services (Point – Point)
Service consumers need to be modified whenever the service provider interface changes. (dynamic) Every consumer should have a suitable protocol adapter for each provider it is connected to. (interoperability) ESB ESB acts as a mediator that transforms, routes, notifies and augments information. It provides virtualization of the enterprise resources. The Enterprise Service Bus is an enterprise-class messaging bus. It has the following facilities: messaging infrastructure message transformation facility between consumer and provider Content-based routing between service consumers and providers. Capability to convert transport protocols between consumer and provider.

SOA based Web services Business Process (BPEL)
Transport layer (HTTP, JMS, SMTP) Service Communication Protocol (SOAP) Service Description (XML, WSDL) Service Business Process (BPEL) Service Registry (UDDI) Policy (WSPolicy) Security (WSSecurity) Transaction (WSTransaction) Management (WSManageability)

Missing person Org Reg Camps Reg Request Mgmt Shelter Mobile Person Family Search Org Services Vol Camp Match Requests Aids Place SMS Alerts Wired Internet Office Systems Laptop/PDA/Cell Web Client BUSINESS SERVICES OFFERRED BY SERVER GRID BUSINESS PROCESSES Channel Access PRESENTATION / UI Responders SAHANA Search procedures DDoS and Load Balancing

Missing person’s registry with efficient search
Organisation registry with efficient match and volunteer coordination Camps registry Request management registry with inventory management and optimisation – search Shelter registry Messaging alerts Damages registry Grid management module to manage coordination efforts among districts and relief organisations Bulletin board – user area

SOA – screen shots 1. Organisation Registry
New Organization Registration with the System Maintaining details about each organization with unique ID Updating Organization’s services

DESCRIPTION When a Organization wants to provide service it must provide the Organization name, city, branch to the system By Default, every Organization that registers for the first time has to provide a single service On successful registration, an automatically generated Organization Id will be displayed to the Organization authority To update the service provided, both Organization ID and password are validated The various services are displayed in the form and from which Service provider have to select their additional service

ORG NAME REGISTRATION CITY SYSTEM BRANCH SERVICE ORGANIZATION DB
NEW ORGANIZATION REGISTRATION: ORG NAME CITY BRANCH SERVICE ORGANIZATION DB UPDATION ORGANIZATION ID REGISTRATION SYSTEM SERVICE PROVIDER

SERVICE REGISTRATION PROVIDER SYSTEM ORGANIZATION’S SERVICE UPDATION
ORG ID AND PASSWORD RECORD RETRIVAL AND VALIDATION VALIDATION RESULT SERVICES LIST SELECTED SERVICE SERVICE UPDATION SERVICE INFROMATION UPDATED FORM

BUSINESS PROCESSES Service Provider registers to the system Service provider login validation Services updating FORMS 3 X Forms LOGIN XFORM ORGANIZATION DETAILS XFORM SERVICE UPDATED XFORM

BUSINESS PROCESS

LOG IN X FORM

ORGANISATION DETAILS X FORM & GETTING DETAILS XML

SERVICE UPDATED X FORM

SERVICE SELECTION XFORM

DATABASE RELATIONSHIPS

Mainly for file sharing Geographically dispersed peers
P2P Mainly for file sharing Geographically dispersed peers Autonomous nodes Decentralised Clusters Resource sharing Close to each other, Usually homogenous Centralised control, cooperative working Shared Memory Computing Parallel systems, multicore Divide and conquer synchronization Tightly Coupled High Throughput Computing Distributed Computing, loosely coupled Disparate Autonomous heterogenous systems Computation intensive – Sharing , single adm High Performance Computing Tightly coupled, fine grain parallelism Homogenous Systems high computing power, short period Low latency communication GRID Heterogeneous systems, HTC VO – trust groups, dynamic, cross organisational Geographically dispersed Resource sharing Scientific, distribution of work among all resources CLOUD Heterogeneous systems , HPC On demand resource provisioning over Internet Data centric with grid backbone, utility value Elastic , Business, full utilization of resources Web Services Application integration Separation of concerns Data integration, interop Virtualisation System integration Multi tenancy Sharing a resource among multiple clients Viewing a single system as multiple resources

Some Characteristics of Grids
Numerous resources Owned by multiple organizations & individuals Connected by heterogeneous, multi-level networks Different security requirements & policies Different resource management policies Geographically separated Unreliable resources and environments Resources are heterogeneous

Stages to using the Grid – Classical View
write (code) to solve problem “compile” against middleware accounting Steering and visualisation Stage data submit to Grid security middleware advertise Select resources Deploy to resources

Technical capabilities
Resource modeling Monitoring and notification Allocation Provisioning, life-cycle management, and decommissioning Accounting and auditing security

Fabric layer: Provides the resources for shared access
G2 Fabric layer: Provides the resources for shared access Connectivity layer: Core communication and authentication protocols Resource layer: Protocols for secure negotiations, initiation, monitoring control, accounting on individual resources. Collective Layer: Protocols and services to capture interactions among a collection of resources. Application Layer: User applications that operate within VO environment.

G3- Services - OGSA Service based infrastructure for grid
Grid aims to integrate, virtualize, and manage resources and services within distributed, heterogeneous, dynamic “virtual organizations” Standardization is critical to create interoperable, portable, secure robust, scalable and reusable components and systems Goal is to standardize grid services by specifying set of standard interfaces. Aims to develop a common , standard and open architecture for grid based applications. Service-oriented architecture, based the Open Grid Services Architecture (OGSA), addresses this need for standardization by defining a set of core capabilities and behaviors that address key concerns in Grid systems. OGSA is based on Grid Service ( extension of web service) .

OGSA realizes the logical middle layer in terms of services, the interfaces these services expose, the individual and collective state of resources belonging to these services, and the interaction between these services within a service-oriented architecture (SOA). The architecture is not layered, Services are loosely coupled peers that, either work single or part of an interacting group of services,

OGSI Requirements not met in Web services were implemented as Grid services confirming to OGSI specifications OGSI specification defines How grid service instances are named and referenced How the interfaces and behaviors are common to all Grid services How to specify additional interfaces, behaviors and extensions GWSDL (Grid WSDL) Introduces Service Data Elements (SDEs) portType inheritance Grid Service Handle (GSH) Grid Service Reference (GSR) Factory Handle resolver Notification Service groups (light-weight registries)

Service relationships

2-level naming scheme – GSH and GSR
Grid vs Web services • Web Services • Messages exchange • Documents • No notion of “pointer” • Service orientation? • Grid Services • The architecture encourages everything to be exposed through an interface rather than being sent as a document • GSH is the “pointer” • Object orientation? (CORBA?) 2-level naming scheme – GSH and GSR SDE – Web services static discovery vs SDE – dynamic Instantiation and life cycle management - factory

STATEFUL WEBSERVICE

1 2. CREATE 3

OGSA services defined and implemented as Web Services
G4- Grid WSRF OGSA services defined and implemented as Web Services

3. Semantic Web Semantic Web architecture information management
Keywords, Statistical, Natural Language, Semantic Web Semantic Web architecture automated conversion and storage of unstructured text machine process able format automatically extract and process the concepts and context in the database –uses intelligent techniques Uses metadata to capture meaning of the information

To capture Knowledge Metadata Ontology – formal specification of information A network of concepts, relationships, and constraints that provide context for data and information as well as processes. classes (concepts) and relationships (hierarchy) in the domain. It provides a shared understanding of the domain. Ontology languages - XML, RDF, OWL Logic – formal languages for representing knowledge with semantics Reasoners to infer conclusions Agents Pieces of software that work autonomously and proactively Eg- search personalisation Uses metadata, ontologies, logic

Semantic Web Architecture

Architecture Unicode International encoding standard
Any language can be used on the web using one standardized form. Uniform Resource Identifier (URI) uniquely identify resources (e.g., doc) URL+URN XML language to write structured web documents with user defined vocabulary To send documents across the Web RDF Data model (representation) of web objects XML based syntax

RDFS Has modeling to organise web objects into hierarchies (taxonomies) – class, subclass, properties, domain and range restriction Based on RDF Used to write ontology Logic Layer Application specific declarative knowledge – RIF and SWRL Proof layer Deductive process SPARQL can be used for querying ontologies and knowledge bases – SQL like Trust layer Users trust using Web services

RDF triples subject-predicate-object in RDF
Joe Smith has homepage (subject) is intended to identify Joe Smith (predicate) (object) is Joe's homepage

"Joe has family name Smith"
RDF graph describing Joe Smith

RDFS for the company ( resource) identified by URI Name is Webify Solutions, address is and phone number is WEBIFY.

OWL Classes - named class, intersection classes, union classes, complement classes, restrictions, and enumerated classes Properties Object type Data type Property types Functional Inverse functional Symmetric Transitive Individuals – instances of classes and properties relate them

Need for ontology in IT Bank An ontology-driven approach
Offers a number of services which can use the same data but with redundancy New services can be added – but reuse existing data / functionality An ontology-driven approach can capture and represent its total product knowledge in a language-neutral form deploy the knowledge in a central repository (shared). a single, unified view of data across its applications. precise retrieval of information and seamless enterprise integration, business processes and various data sources can map to each other through a common meta-model. shared ontology eliminates point-to point integration simplifies application integration reduces data redundancy and provides the same semantic meaning across applications, eases the bank's maintenance and upgrades.

Need for semantic web WWW has vast amount of heterogenous information
Searching is based on contents Semantic meaning attached to content items describes the information precisely Relevancy of information extraction can be improved. Provided services can be tagged with meaning; Web-based software agents can dynamically find these services on the fly and use them to your benefit or in collaboration with other services.

Need for semantics in SOA
In SOA service representations of the available services must be maintained. Metadata to discover and organize services Metadata to model and assemble services metadata to encapsulate business logic for dynamic binding, Metadata manage with metadata. Ontology provide a very powerful and flexible way to aggregate, visualize, and normalize service metadata layer. Ontology enhance service discovery, modeling, assembly, mediation, and semantic interoperability Semantic technologies provide an abstraction layer above existing IT technologies, one that enables the bridging and interconnection of data, content, and processes across business and IT silos.

Semantics for Business
A business ontology is a formal specification of business concepts and their interrelationships that facilitates machine reasoning and inference. A business ontology ties systems together using metadata, much as a database ties together discrete pieces of data. Organizations can provide a single, unified view of data across their applications, Allows for precise retrieval of information, simplifies enterprise and SOA integration, reduces data redundancy, and Provides uniform semantic meaning across applications. eases development, maintenance, and upgrades across the enterprise.

Grid semantics The Grid’s vision - sharing diverse resources in a flexible, coordinated and secure manner through dynamic formation and disbanding of virtual communities, strongly depends on metadata. Ad hoc expression and use of metadata causes chronic dependency on human intervention The Semantic Grid is an extension of the Grid in which rich resource metadata is exposed and handled explicitly, and shared and managed via Grid protocols. It exposes semantically rich information associated with grid resources to build more intelligent grid services The layering of an explicit semantic infrastructure over the Grid Infrastructure leads to increased interoperability and greater flexibility. Reference Architecture that extends OGSA (standardisation) to support the explicit handling of semantics, and defines the associated knowledge services to support a spectrum of service capabilities. S-OGSA defines a model (abstraction), the capabilities (what) and the mechanisms (how) for the Semantic Grid.

Metadata – to label grid resources and entities with concepts (data file according to appln domain)
Rules and classification-based reasoning can be used to generate new metadata from existing metadata. (VO membership) S-OGSA has Model (elements and relationships) Capabilities (services for the components) Mechanisms (elements to deliver the service)

S-OGSA entities and relationships
Grid entities (id in grid) Knowledge entities (K-entities) – Grid entities to operate on knowledge. Semantic Bindings – association between grid and knowledge entities. Semantic grid entities – entities subject to semantic bindings, or semantic bindings, knowledge entity.

S-OGSA

Fabric layer – resources are virtualised through Web services
Grid middleware with services – OGSA interact with one another. It deploys web services with port types through which resources are accessed OGSA is extended with light weight semantics and knowledge services to support a spectrum of service capabilities Top – application layer Semantics of middleware and fabric layers are considered.

Services Semantic provisioning services Semantic aware grid services
Knowledge provisioning services Semantic binding provisioning services Semantic aware grid services Consume semantic bindings and take actions based on knowledge and metadata

Semantic aware authorisation service
Subject – John Doe, object – resource Semantic bindings based on match Ontology service provides knowledge to understand semantic bindings

Hadoop What is Hadoop? It's a framework for running applications on large clusters of commodity hardware which produces huge data and to process it Apache Software Foundation Project Open source Amazon’s EC2, Google alpha (0.21) release available for download Hadoop Includes HDFS a distributed filesystem Map/Reduce HDFS implements this programming model. It is an offline computing engine Concept Moving computation is more efficient than moving large data 74

Data intensive applications with Petabytes of data.
Web pages billion web pages x 20KB = 400+ terabytes One computer can read MB/sec from disk ~four months to read the web same problem with 1000 machines, < 10 mins

FACTS Single-thread performance doesn’t matter
We have large problems and total throughput/price more important than peak performance Stuff Breaks – more reliability • If you have one server, it may stay up three years (1,000 days) • If you have 10,000 servers, expect to lose ten a day “Ultra-reliable” hardware doesn’t really help At large scales, super-fancy reliable hardware still fails, albeit less often software still needs to be fault-tolerant Commodity machines without fancy hardware give better price – performance ratio. DECISION : COMMODITY HARDWARE. DFS : HADOOP – REASONS????? WHAT SOFTWARE MODEL????????

HDFS Why? Seek vs Transfer
BTree (Relational DBS) – operate at seek rate, log(N) seeks/access -- memory / stream based sort/merge flat files (MapReduce) – operate at transfer rate, log(N) transfers/sort -- Batch based

Moving computation to place of data
Characteristics Fault tolerant, scalable, Efficient, reliable distributed storage system Moving computation to place of data Single cluster with computation and data. Process huge amounts of data. Scalable: store and process petabytes of data. Economical 79

• Data Model – Data is organized into files and directories – Files are divided into uniform sized blocks and distributed across cluster nodes – Replicate blocks to handle hardware failure – Checksums of data for corruption detection and recovery – Expose block placement so that computes can be migrated to data large streaming reads and small random reads

Files are broken in to large blocks.
– Typically 128 MB block size – Blocks are replicated for reliability One replica on local node, another replica on a remote rack, Third replica on local rack, Additional replicas are randomly placed • Understands rack locality – Data placement exposed so that computation can be migrated to data • Client talks to both NameNode and DataNodes – Data is not sent through the namenode, clients access data directly from DataNode – Throughput of file system scales nearly linearly with the number of nodes.

Block Placement 82

Hadoop Cluster Architecture:
83

Components DFS Master “Namenode” Manages the file system namespace
Controls read/write access to files Manages block replication Checkpoints namespace and journals namespace changes for reliability Metadata of Name node in Memory – The entire metadata is in main memory – No demand paging of FS metadata Types of Metadata: List of files, file and chunk namespaces; list of blocks, location of replicas; file attributes etc.

DFS SLAVES or DATA NODES
Serve read/write requests from clients Perform replication tasks upon instruction by namenode Data nodes act as: 1) A Block Server – Stores data in the local file system – Stores metadata of a block (e.g. CRC) – Serves data and metadata to Clients 2) Block Report: Periodically sends a report of all existing blocks to the NameNode 3) Periodically sends heartbeat to NameNode (detect node failures) 4) Facilitates Pipelining of Data (to other specified DataNodes)

Map/Reduce Master “Jobtracker”
Accepts MR jobs submitted by users Assigns Map and Reduce tasks to Tasktrackers Monitors task and tasktracker status, reexecutes tasks upon failure Map/Reduce Slaves “Tasktrackers” Run Map and Reduce tasks upon instruction from the Jobtracker Manage storage and transmission of intermediate output.

• Uploads new FSImage to the NameNode
SECONDARY NAME NODE • Copies FsImage and Transaction Log from NameNode to a temporary directory • Merges FSImage and Transaction Log into a new FSImage in temporary directory • Uploads new FSImage to the NameNode – Transaction Log on NameNode is purged

HDFS Architecture • NameNode: filename, offset> blockid, block > datanode • DataNode: maps block > local disk • Secondary NameNode: periodically merges edit logs Block is also called chunk 88

JOBTRACKER, TASKTACKER AND JOBCLIENT
89

Software Model - ??? Parallel programming improves performance and efficiency. In a parallel program, the processing is broken up into parts, each of which can be executed concurrently Identify whether the problem can be parallelised (fib) Matrix operations with independency

CALCULATING PI The area of the square, denoted As = (2r)^2 or 4r^2. The area of the circle, denoted Ac, is pi * r2. pi= 4 * No of pts on the circle / num of points on the square Count the number of generated points that are both in the circle and in the square  MAP PI = 4 * r  REDUCE Restricted parallel programming model meant for large clusters User implements Map() and Reduce()‏

WORD COUNT EXAMPLE

File Hello World Bye World Hello Hadoop GoodBye Hadoop Map For the given sample input the first map emits: < Hello, 1> < World, 1> < Bye, 1> The second map emits: < Hadoop, 1> < Goodbye, 1>

The output of the first combine:
< Bye, 1> < Hello, 1> < World, 2> The output of the second combine: < Goodbye, 1> < Hadoop, 2> Thus the output of the job (reduce) is: < Hello, 2>

Map()‏ Reduce()‏ Input <filename, file text>
Parses file and emits <word, count> pairs eg. <”hello”, 1> Reduce()‏ Sums all values for the same key and emits <word, TotalCount> eg. <”hello”, ( )> => <”hello”, 17>

File Hello World Bye World Hello Hadoop GoodBye Hadoop Map For the given sample input the first map emits: < Hello, 1> < World, 1> < Bye, 1> The second map emits: < Hadoop, 1> < Goodbye, 1>

MR model Map()‏ Process a key/value pair to generate intermediate key/value pairs Reduce()‏ Merge all intermediate values associated with the same key Users implement interface of two primary methods: 1. Map: (key1, val1) → (key2, val2) 2. Reduce: (key2, [val2]) → [val3] Map - clause group-by (for Key) of an aggregate function of SQL Reduce - aggregate function (e.g., average) that is computed over all the rows with the same group-by attribute (key).

Cloud need ‘Era of tera’ ever-growing datasets, Changing demands/loads
unpredictable traffic patterns, and the demand for faster response times. Elasticity – use and relinquish resources as per demand Software applications should be internet accessible Large scale applications – cloud provides large number of machines, when needed, distributes work among them, provisions new machines on failure, auto scale, relinquish machines when not needed

Advantages Almost zero upfront infrastructure investment Just-in-time Infrastructure More efficient resource utilization Usage-based costing Potential for shrinking the processing time Less time for development Basis – automated elasticity - on-demand and elastic nature Example – e-ticketing application

AWS The Amazon Web Services (AWS) cloud provides a highly reliable and scalable infrastructure for deploying web-scale solutions, with minimal support and administration costs, and good flexibility

Amazon S3 to retrieve/store input /output datasets.
Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides resizable compute capacity in the cloud. Operating system, application software and associated configuration settings can be bundled in an Amazon Machine Image (AMI). Scale up / down is done by provisioning / decommissioning multiple instances using simple web service calls On-Demand Instances / Reserve instances / Spot Instances Amazon S3 to retrieve/store input /output datasets. store / retrieve large amounts of data as objects in buckets (containers) on the web using standard HTTP Copies can be made in 14 locations using CloudFront Amazon Simple Queue Service (Amazon SQS) is a reliable, highly scalable, distributed queue for storing messages as they travel between computers and application components .

Amazon SimpleDB is a web service for real-time lookup and simple querying of structured data
Amazon Relational Database Service (Amazon RDS) provides an easy way to setup, operate and scale a relational database in the cloud On-demand hadoop cluster- distributed processing, automatic parallelization, and job scheduling Amazon Elastic MapReduce provides a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud Amazon Virtual Private Cloud (Amazon VPC) extends corporate network into a private cloud contained within AWS

Availability Zones are distinct locations engineered to be insulated from failures in other Availability Zones and provide inexpensive, low latency network connectivity to other Availability Zones in the same Region. Elastic IP addresses allocates a static IP address and programmatically assigns it to an instance. CloudWatch can monitor an Amazon EC2 instance for resource utilization, operational performance, and overall demand patterns . Auto scaling feature to create Auto-scaling Group. Incoming traffic can be distributed using elastic load balancing service. Amazon Elastic Block Storage (EBS) volumes provide network-attached persistent storage to Amazon EC2 instances. AWS offers payment and billing services. Amazon CloudFront. provides a high performance, globally distributed content delivery system

GrepTheWeb Application

Cloud Services best practices
Design for failure and nothing will fail - design, implement and deploy for automated recovery from failure. In AWS Failover gracefully using Elastic IPs Utilize multiple Availability Zones Maintain an Amazon Machine Image Utilize Amazon CloudWatch Decouple the components – based on SOA design principle of the loosely coupled the components for scalability Message queues: If one component fails the system will buffer the messages and get them processed when the component comes back up.

SQS for decoupling and buffering
Service interfaces for components AMI created Stateless applications

Implement elasticity Think parallel The beauty of the cloud shines when you combine elasticity and parallelization Keep dynamic data closer to the compute and static data closer to the end-user

PSG-Yahoo Grid and Cloud Computing Lab 2008 till date
54 rack servers – SC145 & PowerEdge 2950 40 end connectors 10 client nodes RHEL Hadoop Globus OpenVZ Xen

Courses conducted – 10 Papers published – 11 Internship – 3 Placement – 3 PhD – 4 Conference talks - 3

An Efficient Approach to Task Scheduling in Computational Grids
Data Discovery in Grid using Content Based Searching Technique P2P Information Retrieval Framework for Digital Library System using Hadoop DFS. Integration of Xen and Hadoop framework DNA sequencing using hadoop data grids DNA sequencing in public clouds Virtualisation – using Xen and Open VZ- a comparison of performance Grid Security – a tree based dynamic approach Study of some existing scheduling algorithms Grid Task Scheduling using PPSO Content based Image Retrieval Modification of fairshare scheduling in Hadoop Two level scheduler for clouds Hybrid Search using content based and semantic approaches

Cluster / Grid with Web and Semantic Services

Similar presentations

Presentation on theme: "Cluster / Grid with Web and Semantic Services"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Cluster / Grid with Web and Semantic Services

Similar presentations

Presentation on theme: "Cluster / Grid with Web and Semantic Services"— Presentation transcript:

Similar presentations

About project

Feedback