Presentation is loading. Please wait.

Presentation is loading. Please wait.

Grid Computing Research and Applications Sornthep Vannarat Large scale Simulation Research Laboratory National Electronics and Computer Technology Center.

Similar presentations


Presentation on theme: "Grid Computing Research and Applications Sornthep Vannarat Large scale Simulation Research Laboratory National Electronics and Computer Technology Center."— Presentation transcript:

1 Grid Computing Research and Applications Sornthep Vannarat Large scale Simulation Research Laboratory National Electronics and Computer Technology Center

2 Outline • Introduction to Grid computing • Open Grid Service Architecture • Bioinformatics applications on Grid • Information Grid project • GEO Grid project • Knowledge Grid • Web 2.0 and Grid computing • Grid activities at NECTEC

3 Introduction to Grid computing

4 4 หน่วยปฏิบัติการวิจัยการ จำลองขนาดใหญ่ พัฒนาองค์ความรู้ นวัตกรรม และจัดการแก้ปัญหาด้วยการจำลองทางคอมพิวเตอร์ Understand, innovate and manage problems through computer simulations การจำลองด้วยคอมพิวเตอร์นำไปสู่การค้นพบองค์ความรู้ใหม่ ซึ่งจำเป็น ต่อการพัฒนาเทคโนโลยีชั้นสูง เพื่อเศรษฐกิจและคุณภาพชีวิตของ ประชาชน การประยุกต์ใช้การจำลองด้วยคอมพิวเตอร์ในการออกแบบทาง วิศวกรรมนำไปสู่ผลิตภัณฑ์ที่มีคุณภาพและความสามารถสูงขึ้น รวมถึงกระบวนการผลิตที่มีประสิทธิภาพ ประหยัดพลังงานและวัตถุดิบ ในการแก้ปัญหาสิ่งแวดล้อม และ ภัยพิบัติ การจำลองด้วยคอมพิวเตอร์ สามารถช่วยทำนายการเปลี่ยนแปลง และ ผลกระทบของปัจจัยต่างๆ นำไปสู่ความเข้าใจปัญหา และสนับสนุนให้เกิดการวางแผนที่ดี พัฒนาองค์ความรู้ สร้างนวัตกรรม จัดการแก้ปัญหา

5 5 กิจกรรมหลัก 1.การสร้างระบบคอมพิวเตอร์สมรรถนะสูง และ ระบบ จัดเก็บข้อมูลขนาดใหญ่ 2.การศึกษาและประยุกต์ใช้ virtualization middleware 3.การพัฒนาโครงสร้างพื้นฐานและ middleware สำหรับ การบูรณาการระบบคอมพิวเตอร์และข้อมูล 4.การพัฒนาโปรแกรมเพื่อสร้างแบบจำลอง 5.การประยุกต์ใช้การสร้างแบบจำลองด้วยคอมพิวเตอร์ เพื่อสร้างองค์ความรู้ เพื่อการออกแบบทางวิศวกรรม และ เพื่อการจัดการและแก้ไขปัญหา  คลัสเตอร์คอมพิวติ้ง กริดคอมพิวติ้ง ระบบจัดเก็บข้อมูลขนาดใหญ่ • การประมวลผลแบบกระจาย Web Services, XML, Java Programming • การคำนวณเชิงตัวเลข ไฟไนต์เอลิเม้นต์(FEM) และ กลศาสตร์ของไหลเชิง คำนวณ(CFD) เทคโนโลยีที่เกี่ยวข้อง

6 What is Grid computing? • Next-generation computing platform and global cyberinfrastructure for solving large-scale problems in science, engineering, and business • Grid Café [http://gridcafe.web.cern.ch/gridcafe/] • Web is a service for sharing information over the Internet, the Grid is a service for sharing computer power and data storage capacity over the Internet • Ian Foster – 1998: Computational Grid is a hardware and software infrastructure that provides dependable, consistent, and pervasive access to high-end computational capabilities – 2000: Grid computing is concerned with coordinated resource sharing and problem solving in dynamic, multi-institutional virtual organizations – 2002: Grid is a system that (1) coordinates resources that are NOT subject to centralized control (2) uses standard, open, general purpose protocols and interfaces (3) delivers non-trivial qualities of service

7 Status of Grid computing • A promising work in progress • Usable with a lot of efforts • WISDOM: – EGEE Docking project – Find new inhibitors for proteins produced by Plasmodium falciparum – Over 46 million docking simulations in 6 weeks using 1,700 computers in 15 countries, equivalent to 80 CPU-years • Beyond computing power

8 Types of Grids • Computing grid • Data/storage grid • Information grid • Instrument grid • Access grid

9 9 The Grid Problem • Flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resource From “The Anatomy of the Grid: Enabling Scalable Virtual Organizations” • Enable communities (“virtual organizations”) to share geographically distributed resources as they pursue common goals -- assuming the absence of… – central location, – central control, – omniscience, – existing trust relationships.

10 10 Elements of the Problem • Resource sharing – Computers, storage, sensors, networks, … – Sharing always conditional: issues of trust, policy, negotiation, payment, … • Coordinated problem solving – Beyond client-server: distributed data analysis, computation, collaboration, … • Dynamic, multi-institutional virtual orgs – Community overlays on classic org structures – Large or small, static or dynamic

11 Challenges • To provide seamless access • Heterogeneous environments • Multiple administrative domains and autonomy issues • Scalability • Dynamicity/adaptability

12 Grid computing middleware • “Global Grids and Software Toolkits: A Study of Four Grid Middleware Technologies”, Parvin Asadzadeh et al. • UNICORE – Uniform Interface to Computing Resources – Ready-to-run Grid system including client and server software – UNICORE release26 Nov 2007: WSRF based implementation • Globus Toolkit – Developed by Globus Alliance – Open source software toolkit used for building grids with services written in a combination of C and Java – GT OGSA WSRF based • Legion, Gridbus • EGEE’s gLite

13 13 One View of Requirements • Identity & authentication • Authorization & policy • Resource discovery • Resource characterization • Resource allocation • (Co-)reservation, workflow • Distributed algorithms • Remote data access • High-speed data transfer • Performance guarantees • Monitoring  Adaptation  Intrusion detection  Resource management  Accounting & payment  Fault management  System evolution  Etc.  …

14 14 Layered Grid Architecture Application Fabric “Controlling things locally”: Access to, & control of, resources Connectivity “Talking to things”: communication (Internet protocols) & security Resource “Sharing single resources”: negotiating access, controlling use Collective “Coordinating multiple resources”: ubiquitous infrastructure services, app-specific distributed services Internet Transport Application Link Internet Protocol Architecture

15 Open Grid Services Architecture

16 • Service-oriented architecture – Key to virtualization, discovery, composition, local- remote transparency • Leverage industry standards – Internet, Web services • Distributed service management – A “component model for Web services” • A framework for the definition of composable, interoperable services “The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration”, Foster, Kesselman, Nick, Tuecke, 2002

17 Web Services • XML-based distributed computing technology • Web service = a server process that exposes typed ports to the network • Described by the Web Services Description Language, an XML document that contains – Type of message(s) the service understands & types of responses & exceptions it returns – “Methods” bound together as “port types” – Port types bound to protocols as “ports” • A WSDL document completely defines a service and how to access it • WSRF

18 Extension of WS • Lifecycle management • Statefull • Subscribable

19 Writing Grid Service • Define the interface with WSDL, wsrp • Implement the service (Java) • Define the deployment parameters (WSDD, JNDI) • Compile GAR file (Ant) • Deploy service (GT4)

20 Notification • Polling and pushing • WS-Topics: topic trees • WS-BaseNotification: subscribe, notify • WS-BrokeredNotification: broker

21 Lifecycle management • Creation operation: factory service • Access and destroy operations: instance service • Destroy operation – Immediate – Scheduled (lease based)

22 22

23 23 GRAM services GT4 Java Container GRAM services Delegation RFT File Transfer request GridFTP Remote storage element(s)‏ Local scheduler User job Compute element GridFTP sudo GRAM adapter FTP control Local job control Delegate FTP data Client Job functions Delegate Service host(s) and compute element(s)‏ GT4 GRAM Architecture SEG Job events

24 24 GT4 Container Monitoring & Discovery GRAMUser Index GT4 Cont. RFT Index GT4 Container Index GridFTP adapter Registration & WSRF/WSN Access Custom protocols for non-WSRF entities Clients (e.g., WebMDS) ‏ Automated registration in container WS-ServiceGroup

25 Security • Privacy • Integrity • Authenticate • Authorization • Non-reputable

26 PKI • Public Key Infrastructure • Key based encryption • Symmetry and Asymmetric encryptions • Public and Private keys • Digital signature • Digital certificate • CA

27 GSI • Grid Security Infrastructure • Transport and message-level security • Authorization schemes • Credential delegation and single sign-on • Different levels of security: container, service, and, resource

28 28 OGSA-DAI • An extensible framework for data access and integration • Expose heterogeneous data resources to a grid through web services • Interact with data resources – Queries and updates – Data transformation / compression – Data delivery – Application-specific functionality • A base for higher-level services – Federation, mining, visualisation,… • Open Grid Forum DAIS Working Group – DAIS (Database Access and Integration) specifications – OGSA-DAI to be a reference implementation of DAIS

29 29 OGSA-DAI functionality • Interaction with data resources – Relational – MySQL, SQL Server, DB2, PostGres, Oracle – XMLDB – eXist, Xindice – Files – text, binary, indexed – SQL multi-resources – aggregation of OGSA-DAI services exposing relational resources • Transformation and compression – ZIP, GZIP, XSLT, ResultSet-to-WebRowSet, ResultSet-to- CSV, … – WebRowSet projection, frequency distribution, random sample, … • Delivery – Local file, HTTP, SMTP, SOAP attachments, GridFTP, other OGSA-DAI services • Resource creation and destruction • Document-oriented interface – service interface is resource agnostic

30 30 Bioinformatics Applications on Grid

31 31 Bioinformatics and Grid • Bioinformatics applications often require high-performance computing and large data handling • Tools: bioinformatics tools and web services • Data: – Public databases – Biological knowledge: ontology and meta data – unpublished data • Grid computing meets the requirements – Computing Grids – Data Grids – Knowledge Grids

32 32 Computing Grid • High throughput computing – Thousands of small independent tasks • Grid computing v.s. cluster computing – aims at parallel and distributed computing – differ in network latency and robustness. – frequency of task failures is much higher in grid computing • Two types of high-throughput computing – numerical processing – symbolic processing

33 33 High throughput numerical processing • Systems biology aims at modeling of biological dynamics in molecules, cells, organs and individuals • Huge computational power is needed for – molecular folding – molecular docking – spatiotemporal molecular interaction – kinetic parameter estimation • Problem decomposition techniques – parameter sweep – stochastic modeling

34 34 WISDOM • EGEE Docking project • Find new inhibitors for proteins produced by Plasmodium falciparum • over 46 million docking simulations 6 weeks • 1,700 computers in 15 countries • Equivalent to 80 CPU-years

35 35 DIANE • Enhanced version of WISDOM • Light-weight framework • Search for drugs for predicted variants of H5N1 • 2 millions docking complexes with a size of 600 gigabytes • 2,000 grid worker nodes in 17 countries

36 36 Limitations of EGEE Infrastucture • Experiences from virtual screening projects • Overall grid efficiency about 50 percent • Major sources of failure – Server license failure 23% – Workload management failure 10% – Site failure 9%

37 37 Study of kinetic pathways • Estimation of ODEs for modeling of metabolic pathways and signal transduction pathways • Genetic algorithms: – Estimating optimal parameter fitting to biological experimental results – High degrees of parallelism (multiple trials with initial conditions)‏ • Parameter-parameter dependencies: – Calculating moment parameters, such as AUC, MRT, VRT

38 38 High throughput symbolic processing • Sequence analysis: Homology searches, Genome comparisons, Genome-wide analyses • Sequencing data are expected to increase more rapidly – High-throughput DNA sequencing technologies – Metagenomic projects – Human resequencing projects – Genome sequencing projects on other species • Requires large databases such as DNA and protein sequence • Sharing and updating of biological databases on the grid are of key importance

39 39 Sharing biological databases • Become more and more difficult and intractable • Automatic updating of databases is necessary • Concerns – Duplicated database copying – Disk overflow – Unexpected shutdown – Version management – File checksum integrity verification – Parallel and pipelined mechanisms for high-throughput data transfer

40 40 EGEE Framework • EGEE provides a general framework for sharing replicas of biological databases represented • Physical File Name (PFN)‏ • Logical File Name (LFN)‏ • Globally Unique Identifier (GUID)‏ • Replica Manager System (RMS)‏ – Replica Metadata Catalog (RMC)‏ – Replica Location Service (RLS)‏ LFN-3 LFN-2 GUID PFN-2 PFN-1 LFN-1 RMC RLS

41 41 GADU • Genome Analysis and Database Update system • Automated, scalable, high- throughput computational workflow engine • Executes bioinformatics tools (BLAST, BLOCKS, PFam, Chisel and InterPro) • Public databases (NCBI RefSeq, PIR, InterPro and KEGG)

42 42 Homology Search • GRID BLAST implementations have been developed and reported – Prestaging of sequence databases to minimize the runtime overhead of transferal of large sequence databases – Databases update which keeps data consistency on the data-grid – Dynamic load balancing of query sequences – Assembling of the results from distributed jobs

43 43 Genome Comparison • Most promising life science applications for grid computing • Expandable and flexible large scale computing facility is needed • E.g. Investigation of horizontal gene transfer among 354,606 ORFs extracted from more than 100 microbial genomes – Used 229 CPUs located in 5 institutions • Number of pair-wise sequence comparison ∝ N 2

44 44 Integration of bioinformatics services • Resourceome – Uniform and secure interface – Providing workflows – Using Metadata and ontology • Metadata, ontology, XML: fill the semantic gap of heterogeneous databases • Framework: OGSA based on WSRF

45 45 BioGrid

46 46 RbsB in Different Formats • DDBJ • SWISS-PROT • PDB

47 47 BioPfuga • Workflow system integrating application programs • Separating application programs into smaller parts. • Standardize the data format for transferring data between different application programs.

48 48 Bioinformatics workflow • Necessary for end-users of bioinformatics web/grid services • Taverna provides a workflow language and graphical user interface for: building, running and editing of workflows • Semantic indexing system of bioinformatics services has become essential for choosing resources • Searching functionally similar bioinformatics workflows is also important • Bioinformatics ontology is essential for automatic generation of bioinformatics workflows

49 49 Secure Data Access • Many bioinformatics databases are public and freely available • But access to the data needs to be strictly controlled in distributed collaborative research (For example: clinical data)‏ • Public Key Infrastructures (PKI) is the predominant method for enforcing authentication • Virtual Organization for Trials and Epidemiological Studies (VOTES) project uses Internet2 Shibboleth technology

50 50 Information Grid

51 51 Information Grid •an open and flexible infrastructure that facilitates the integration of any information anywhere across heterogeneous data sources under grid environment. •3 essential components –MDL: Marker Description Language –Information Services –Information Brokers

52 52 MDL: Marker Description Language •a unified language that defines: –standard schema model –integration configuration model –standard schema discovery model

53 53 Information Service •as an agent to publish information •Responsibilities: –connect to a current data source of an organization –transform generic query (mdlQuery) into specific query –transform query result into standard schema defined in the specified MDL document generic query (mdlQuery) specific query (SQL) query result (table) query result (mdl-based result) Generic Information Service Tool • manual mapping • RDBMS • no authentication

54 54 Information Broker •as a broker of Information Services •Responsibilities: –connect to Information Services –connect to others Information Brokers –discover potential Information Brokers and Services –integrate information mdlQuery integrated mdl-based result mdlQuery integrated mdl-based result

55 55 Information Grid Deployment MDL Query resut

56 GEO Grid

57

58

59

60

61

62

63

64

65

66 Knowledge Grid

67 67 Knowledge Grid • Tacit knowledge – "We should start from the fact that we can know more than we can tell", Michael Polanyi, a 20th-century philosopher • Knowledge represented on computers is just a part of out knowledge • Grid as place where people work together and create knowledge • Sharing explicit and tacit knowledge • This framework gives a meta-philosophical approach to rationalise the current Grid phenomemon.

68 68 Knowledge spiral theory • Knowledge creation requires a cyclic process of knowledge conversion between tacit knowledge and explicit knowledge – Socialization (tacit knowledge to tacit knowledge) – Externalization (tacit knowledge to explicit knowledge) – Combination (explicit knowledge to explicit knowledge) – Internalization (explicit knowledge to tacit knowledge)‏

69 69 Socialization • First step in formulating a community • Grid portals are helpful for attracting those who are interested in some specific field • Must allow formulation of user-defined communities • Knowledge grids should provide social communication system-like facilities – Participants formulate new communities – Participants recruit other participants • Face-to-face meeting or off-site meeting will be also helpful in promoting mutual understanding in a community.

70 70 Externalization • For example publication of research papers • Externalization is the essence of knowledge creation • Knowledge grid should provide facilities for participants to publish their knowledge in a community • Web-based dynamic contents are one of the promising ways of publication of knowledge

71 71 Combination • Combination expands knowledge by the sharing of explicit knowledge in a community • Synergy effects can be expected if participants bring together their own knowledge • Grid portals and application-oriented grids play an essential role in this process

72 72 Internalization • Internalization is a process of acquiring tacit knowledge by experience • To make use of a grid for real world life science problems, a problem solving layer for bioinformatics must be developed • Gridfication of public databases and bioinformatics tools are necessary conditions but not sufficient • Bioinformatics environment should provide secure facilities to deal with unpublished data and customization facilities to develop one's own bioinformatics environment coordinated with global bioinformatics environment

73 Web 2.0 and Grid computing

74 Web 2.0 Design Patterns • The Long Tail • Data is the Next Intel Inside • Users Add Value "architecture of participation" • Network Effects by Default • Some Rights Reserved Design for "hackability" and "remixability." • The Perpetual Beta • Cooperate, Don't Control • Software Above the Level of a Single Device • What Is Web by Tim O'Reilly, what-is-web-20.html

75 Web 2.0 Core Competencies • Services, not packaged software, with cost- effective scalability • Control over unique, hard-to-recreate data sources that get richer as more people use them • Trusting users as co-developers • Harnessing collective intelligence • Leveraging the long tail through customer self- service • Software above the level of a single device • Lightweight user interfaces, development models, AND business models

76 What are we talking about? • Communities & all that social stuff? – Great, love it, should have done all this 20 years ago… • Easier to use web interfaces? – Love them as a user but they are (still) hard to build (tried JSF+AJAX+Swing Webflow - argh!!!) – Is it worth the effort? Researchers are not occasional users! • Existing web 2.0 applications? – Each great individually but try using them in combination… – How can I share my connotea™ bookmarks with my Facebook™ friends? • REST as an architectural style? – Good idea - for some applications - flipside of the Grid btw.

77 Web 2.0 and Grid computing • Simplify user interface • More flexible than (conventional) portal • Software as a service • Collaboration Grid • Knowledge Grid

78 Tools and mashups based on web service infrastructure

79 Shared Bookmarking for Social Networks • MSI-CIEC project to support tagging and online shared bookmarking. – Pioneered by del.icio.us in 2003 (!) • Bookmarking services allow you to – Share links (URLs) with networks of friends – Organize your links by mnemonic tags – Find other interesting URLs by popularity (most bookmarked) – Find interesting URLs by keywords • When used collectively, tags form folksonomies. – “Pave the cow paths” – Typically about tagged URLs. – But also about people who tag. – Semantic Web Lesson: everything is a URI.

80 Grid activities of NECTEC

81 Grid Activity Summary • Grid Testbed – CFD Application – Virtualization • Information Grid • Grid CA

82 Grid Computing Testbed • Internal Level – NECTEC conducts tests of Globus Toolkit 4 for following issues: • Middleware • Pre-WS and WebServices components • CFD on Grid • Gfarm file system

83 Grid Computing Testbed •National Level –NECTEC cooperates with Thai National Grid Project (TNGP) to set Thailand Grid community standards.

84 Grid Computing Testbed • International Level – NECTEC has been an active member of PRAGMA resources and data working group. – Improve the interoperability of Grid middleware in the Asia Pacific region and make Grid enable to use for scientists.

85 • Computation requirements for CFD very high. • Nevertheless, some major challenges still exist in CFD research; for examples, turbulence research and very-large-scale CFD simulations. CFD on Grid • Grid provides dependable, consistent, pervasive and inexpensive access to high-end computational capability. • We investigate the feasibility and scalability of cross-platform simulation paradigms for a fine-grain application as CFD application on our Grid testbed.

86 CFD on Grid • Hardware – Grid64 Cluster (Itanium2 1.4GHz, 4 nodes) • Memory 4 GB • Network 100 Mbps – Grid3 Cluster (AMD Athlon 1.6GHz, 2 nodes) • Memory 1 GB • Network 100 Mbps • Software – Rock Cluster version 4.1 – GT (Pre-WS) – MPICH-G2 version – Intel Complier version 8 – PBS scheduler

87 CFD on Grid • Some Remarks – Grid infrastructure with independency of public IP. – Ability to do job migration automatically. – Dedicated Grid environment for fine-grain applications such as CFD application. – Improvement of algorithm for high latency Grid infrastructure.

88

89 NECTEC GOC CA • A digital certificate issuer developed specifically to support authentication for Grid resources. • Developed under X.509 Public Key Infrastructure by Large Scale Simulation Research Laboratory (LSR), National Electronics and Computer Technology Center. • A issues certificates to users, hosts and services. • Current Status: – Production Level CA under APGrid PMA – ~ 10 Certificates issued (all for internal users)

90 NECTEC GOC CA • Collaboration

91 Questions?


Download ppt "Grid Computing Research and Applications Sornthep Vannarat Large scale Simulation Research Laboratory National Electronics and Computer Technology Center."

Similar presentations


Ads by Google