Presentation is loading. Please wait.

Presentation is loading. Please wait.

Apt Software Avenues Pvt Ltd, Unit G302 Block DC, City Centre, Salt Lake, Kolkata 700064 Informatics perspectives in Bio-Informatics Atul P Agarwal Apt.

Similar presentations


Presentation on theme: "Apt Software Avenues Pvt Ltd, Unit G302 Block DC, City Centre, Salt Lake, Kolkata 700064 Informatics perspectives in Bio-Informatics Atul P Agarwal Apt."— Presentation transcript:

1 Apt Software Avenues Pvt Ltd, Unit G302 Block DC, City Centre, Salt Lake, Kolkata Informatics perspectives in Bio-Informatics Atul P Agarwal Apt Software Avenues Pvt Ltd

2 Two aspects of Informatics Computational Biology Computational Biology All the plumbing needed to put a Bio- informatics application together All the plumbing needed to put a Bio- informatics application together

3 Application architecture Standalone Standalone Local computation Local computation Needs to be installed on individual machines Needs to be installed on individual machines Can connect to a web service Can connect to a web service Updates are difficult to manage Updates are difficult to manage Web based Web based Runs in a browser Runs in a browser Needs no install Needs no install Updates are easy Updates are easy Can connect to other web services Can connect to other web services

4 Web application architecture Web server Database Application logic ApplicationBrowser HTTP, MIME HTML, XHTML, DHTML, Javascript, AJAX CGI/ASP.N ET/JSP Database driver, SQL SOAP XML Proprietary, SOAP Lite Apache, JBoss, IIS Perl, Python, PHP, C/C++, C# MySQL, Postgress, SqlServer, Oracle

5 Platforms - Two camps Public domain Public domain LAMP LAMP Linux Linux Apache, JBoss Apache, JBoss MySQL MySQL Perl, Python, PHP, Java Perl, Python, PHP, Java Microsoft Microsoft.Net.Net SQLServer SQLServer ASP.NET (C, C++, C#, VB.net) ASP.NET (C, C++, C#, VB.net)

6 World Wide Web The World Wide Web (WWW, or simply Web) is an information space in which the items of interest, referred to as resources, are identified by global identifiers called Uniform Resource Identifiers (URI). The World Wide Web (WWW, or simply Web) is an information space in which the items of interest, referred to as resources, are identified by global identifiers called Uniform Resource Identifiers (URI).

7 Browsers – the display Responsible for user input and result display Responsible for user input and result display No algorithmic computation No algorithmic computation Displays HTML Displays HTML Some programmability through Javascript Some programmability through Javascript

8 Browser Operation The browser recognizes that what a user has typed is a URI. The browser recognizes that what a user has typed is a URI. The browser performs an information retrieval action in accordance with its configured behavior for resources identified via the "http" URI scheme. The browser performs an information retrieval action in accordance with its configured behavior for resources identified via the "http" URI scheme. The authority responsible for handling the URI provides information in a response to the retrieval request. The authority responsible for handling the URI provides information in a response to the retrieval request. The browser interprets the response, identified as HTML by the server, and performs additional retrieval actions for inline graphics and other content as necessary. The browser interprets the response, identified as HTML by the server, and performs additional retrieval actions for inline graphics and other content as necessary. The browser displays the retrieved information, which includes hypertext links to other information. The user can follow these hypertext links to retrieve additional information. The browser displays the retrieved information, which includes hypertext links to other information. The user can follow these hypertext links to retrieve additional information.

9 Portability across Browsers There are many browsers out there There are many browsers out there IE IE Firefox Firefox Safari Safari Opera Opera They have their own idiosyncracies They have their own idiosyncracies Application needs lots of testing Application needs lots of testing

10 Web Server Handle multiple incoming requests Handle multiple incoming requests Process the HTTP requests Process the HTTP requests Serve the requests Serve the requests Multiple possibilities Multiple possibilities static pages static pages cgi-bin cgi-bin jsp jsp servlets servlets Form the HTTP responses Form the HTTP responses Send back the responses Send back the responses Maintain sessions Maintain sessions

11 HTTP (Hypertext transfer protocol) RFC 2616 (The official specification ) RFC 2616 (The official specification ) A request/response protocol. A request/response protocol. A client sends a request to the server in the form of a request method, URI, and protocol version, followed by a MIME-like message containing request modifiers, client information, and possible body content over a connection with a server. A client sends a request to the server in the form of a request method, URI, and protocol version, followed by a MIME-like message containing request modifiers, client information, and possible body content over a connection with a server. The server responds with a status line, including the message's protocol version and a success or error code, followed by a MIME-like message containing server information, entity meta-information, and possible entity-body content. The server responds with a status line, including the message's protocol version and a success or error code, followed by a MIME-like message containing server information, entity meta-information, and possible entity-body content.

12 HTTP Message format The format of the request and response messages are similar, and English-oriented. Both kinds of messages consist of: The format of the request and response messages are similar, and English-oriented. Both kinds of messages consist of: an initial line, an initial line, zero or more header lines, zero or more header lines, a blank line (i.e. a CRLF by itself), and a blank line (i.e. a CRLF by itself), and an optional message body (e.g. a file, or query data, or query output). an optional message body (e.g. a file, or query data, or query output).

13 Example request To retrieve the file at the URL To retrieve the file at the URL open a connection to the host open a connection to the host send something like the following through the connection: send something like the following through the connection: GET /path/file.html HTTP/1.0 From: User-Agent: HTTPTool/1.0 [blank line here]

14 Example response The server will respond with something like The server will respond with something like HTTP/ OK Date: Fri, 31 Dec :59:59 GMT Content-Type: text/html Content-Length: 1354 Happy New Millennium! Happy New Millennium! After sending the response, the server closes the network connection. After sending the response, the server closes the network connection.

15 HTML (Hypertext Markup Language) A markup language which consists of tags embedded in the text of a document. A markup language which consists of tags embedded in the text of a document. The browser reading the document interprets these markup tags to help format the document for subsequent display to a reader. The browser reading the document interprets these markup tags to help format the document for subsequent display to a reader. However, many of the decisions about layout are made by the browser. However, many of the decisions about layout are made by the browser.

16 Basic HTML tags TagDescription Defines an HTML document Defines the document's body to Defines header 1 to header 6 Defines a paragraph Inserts a single line break Defines a horizontal rule Defines a comment

17 Evolution of HTML Emergence of new platforms Emergence of new platforms Mobiles, TVs, Digital phones Mobiles, TVs, Digital phones Dynamic HTML Dynamic HTML Interactive web pages Interactive web pages Combines HTML, Javascript, DOM, CSS Combines HTML, Javascript, DOM, CSS XHTML XHTML Stricter and cleaner version of HTML Stricter and cleaner version of HTML

18 Evolution of the Web technologies Static content Static content Cgi-bin Cgi-bin Servlets Servlets JSP JSP ASP ASP Struts Struts JSF JSF AJAX AJAX

19 AJAX Asynchronous JavaScript and XML Asynchronous JavaScript and XML Improve the User experience Improve the User experience The browser can continue to communicate with the web server while the user interacts with the page The browser can continue to communicate with the web server while the user interacts with the page The User can do something during long running computationally intensive jobs The User can do something during long running computationally intensive jobs The User can manipulate complex data in a more friendly manner The User can manipulate complex data in a more friendly manner Aggregate data from multiple sources into a single view Aggregate data from multiple sources into a single view

20 Enhancing the User experience iPhone has set a new standard iPhone has set a new standard More demands from the Browser More demands from the Browser Rich Internet Applications (RIA) Rich Internet Applications (RIA) Silverlight – Microsoft Silverlight – Microsoft Flex – Adobe Flex – Adobe GWT – Google GWT – Google Web 2.0 Web 2.0 Communities and sharing Communities and sharing

21 Building your application Choice of programming language Choice of programming language Lightweight Lightweight Pearl, Ruby, Python Pearl, Ruby, Python Heavyweight Heavyweight C#, Java, C++ C#, Java, C++ Specialized Specialized R, Matlab, Mathematica R, Matlab, Mathematica Choice of architecture/framework Choice of architecture/framework Costs Costs

22 Perl – The language An interpreted language An interpreted language Easy and fast Easy and fast Very good for prototyping Very good for prototyping Powerful text manipulation features Powerful text manipulation features Has been used a lot for “plumbing” Has been used a lot for “plumbing”

23 Disadvantages of Perl Interpreted, hence slow Interpreted, hence slow Poor GUI support, screen based or command line user interaction only Poor GUI support, screen based or command line user interaction only Novice can be caught on the wrong foot Novice can be caught on the wrong foot Variables can be used without initialization Variables can be used without initialization No type checking of variables No type checking of variables

24 BioPerl A collection of Perl modules A collection of Perl modules Specifically for Bio-Informatics Specifically for Bio-Informatics Object oriented Object oriented Can be a little difficult to get started with Can be a little difficult to get started with

25 Objects in BioPerl Sequences Sequences Databases Databases Alignments Alignments Features and genes on sequences Features and genes on sequences

26 Parallel Computing Advent of cheap multi-core CPUs Advent of cheap multi-core CPUs Availability of libraries to help parallel processing Availability of libraries to help parallel processing STAPL STAPL Standard Template Adaptive Parallel Library Standard Template Adaptive Parallel Library Protein folding problem using STAPL Protein folding problem using STAPL Intel TBB Intel TBB Intel Threading Building Blocks Intel Threading Building Blocks Google MapReduce Google MapReduce Parallelized version of Smith Waterman algorithm Parallelized version of Smith Waterman algorithm Specialized hardware Specialized hardware FPGA implementation of Blast FPGA implementation of Blast Very hard to program parallel algorithms Very hard to program parallel algorithms

27 CGI (Common Gateway Interface) a standard way for a web server to invoke a script, passing certain environment variables and user input data to the script, and allow the script to return a result. a standard way for a web server to invoke a script, passing certain environment variables and user input data to the script, and allow the script to return a result. one of the oldest ways of providing dynamic web content. one of the oldest ways of providing dynamic web content. supported on innumerable low cost web hosting services supported on innumerable low cost web hosting services included out of the box with many Apache installations, such as that provided on Red Hat Linux. included out of the box with many Apache installations, such as that provided on Red Hat Linux.

28 CGI in operation

29 XML (eXtensible Markup Language) XML is a data format that represents data in a structured form XML is a data format that represents data in a structured form XML is a simple, standard way for interchange of structured textual data between multi-vendor platforms XML is a simple, standard way for interchange of structured textual data between multi-vendor platforms XML can be used to store data XML can be used to store data

30 XML is used to create new languages XHTML the latest version of HTML XHTML the latest version of HTML WSDL for describing available web services WSDL for describing available web services WAP and WML as markup languages for handheld devices WAP and WML as markup languages for handheld devices RSS languages for news feeds RSS languages for news feeds RDF and OWL for describing resources and ontology RDF and OWL for describing resources and ontology SMIL for describing multimedia for the web SMIL for describing multimedia for the web

31 Domain Specific XML WITSML WITSML Oil drilling Oil drilling JDF JDF Printing Printing Gen2Phen Gen2Phen

32 XML documents Well formed Well formed Conform to the syntax Conform to the syntax Valid Valid Conform to the semantics Conform to the semantics

33 Data Models in BioInformatics Not much standardization so far Not much standardization so far Laboratory specific modeling Laboratory specific modeling New initiative for genome data modeling New initiative for genome data modeling Based on XML Based on XML

34 Databases Public domain databases Public domain databases MySQL, Postgress MySQL, Postgress Commercial databases Commercial databases Oracle, SQLServer Oracle, SQLServer SQL is the language SQL is the language The heart and soul of BioInformatics applications The heart and soul of BioInformatics applications Commercial deployments are expensive ! Commercial deployments are expensive !

35 RDBMS (Relational Database Management System) Based on a “Relational” model proposed by Codd Based on a “Relational” model proposed by Codd A “Relational” is a formal mathematical concept A “Relational” is a formal mathematical concept The operations on Relations are based on “Relational Algebra” The operations on Relations are based on “Relational Algebra” Implemented as tables Implemented as tables Each row defines a relation Each row defines a relation

36 Relational Algebra 3 primitive operations 3 primitive operations Projection Projection Select a subset of columns Select a subset of columns Selection Selection Select a subset of rows Select a subset of rows Join Join Cross product of two tables Cross product of two tables Set Operations Set Operations Union Union Intersection Intersection Difference Difference

37 SQL (Structured Query Language) For manipulating an RDBMS For manipulating an RDBMS Data Definition Language (DDL) statements Data Definition Language (DDL) statements To build and modify the structure of tables To build and modify the structure of tables Data Manipulation Language(DML) statements Data Manipulation Language(DML) statements To work with the data in the tables To work with the data in the tables 4 basic statements 4 basic statements SELECT SELECT INSERT INSERT UPDATE UPDATE DELETE DELETE

38 Transaction RDBMS are multi-user systems RDBMS are multi-user systems Different programs may be updating the database at the same time Different programs may be updating the database at the same time A DML operation that changes the database is “effected” only when a COMMIT is issued A DML operation that changes the database is “effected” only when a COMMIT is issued To undo a DML change, you can use the ROLLBACK command instead To undo a DML change, you can use the ROLLBACK command instead

39 Datatype An RDBMS has its own type system An RDBMS has its own type system The service provider “maps” from the programming language types to the database types The service provider “maps” from the programming language types to the database types

40 MySQL – the database The ‘M’ in LAMP architecture The ‘M’ in LAMP architecture Free (GPL License) Free (GPL License) Many enterprise features Many enterprise features Distributed databases Distributed databases Triggers and stored procedures Triggers and stored procedures Poor XML support Poor XML support

41 Some MySQL DataTypes INT integer INT integer FLOAT Small floating-point number FLOAT Small floating-point number DOUBLE Double-precision floating-point number DOUBLE Double-precision floating-point number CHAR(N)Text N characters long (N=1..255) CHAR(N)Text N characters long (N=1..255) VARCHAR(N) Variable length text up to N characters long VARCHAR(N) Variable length text up to N characters long TEXTText up to characters long TEXTText up to characters long LONGTEXTText up to characters long LONGTEXTText up to characters long

42 DBI (Database Interface) Perl to access databases from different vendors transparently to access databases from different vendors transparently e.g., MySQL, Oracle, Sybase (even Plain text files) e.g., MySQL, Oracle, Sybase (even Plain text files) relies on proper DBD (DataBase Ddrive) modules to talk to the real databases relies on proper DBD (DataBase Ddrive) modules to talk to the real databases there is one DBD module for every different type of database there is one DBD module for every different type of database to connect to different databases (of different types) at the same time and easily move data between them. to connect to different databases (of different types) at the same time and easily move data between them. single generalized API for all types of databases single generalized API for all types of databases program at a "higher level" than the API provided by the database system program at a "higher level" than the API provided by the database system

43 DBD (Database Driver) Perl convert the general DBI API into the database system-specific API. convert the general DBI API into the database system-specific API. also provide mechanism to access database specific functionality directly (won’t be used) also provide mechanism to access database specific functionality directly (won’t be used)

44 Future Databases in Bioinformatics Parallel database architectures Parallel database architectures Data mining Data mining Data warehousing Data warehousing Improved query techniques Improved query techniques Object oriented databases ? Object oriented databases ?

45 Web Services Simulates a remote function invocation Simulates a remote function invocation A calling program wants to use function hosted on another machine A calling program wants to use function hosted on another machine Inputs are passed to a remote function Inputs are passed to a remote function The remote function is executed The remote function is executed The output is returned to the calling program The output is returned to the calling program WSDL to define services WSDL to define services SOAP/XML to invoke services SOAP/XML to invoke services

46 SOAP::Lite a collection of Perl modules a collection of Perl modules provides a simple and lightweight interface to the Simple Object Access Protocol (SOAP) provides a simple and lightweight interface to the Simple Object Access Protocol (SOAP) on client and server side on client and server side the programmer doesn’t have to worry about the details of the SOAP protocol the programmer doesn’t have to worry about the details of the SOAP protocol

47 Service Oriented Architecture Structuring large applications as an ad hoc collection of smaller modules called "services“ Structuring large applications as an ad hoc collection of smaller modules called "services“ encapsulation encapsulation Many web-services are consolidated to be used under the SOA. Many web-services are consolidated to be used under the SOA. loose coupling loose coupling Services maintain a relationship that minimizes dependencies and only requires that they maintain an awareness of each other Services maintain a relationship that minimizes dependencies and only requires that they maintain an awareness of each other contract contract Services adhere to a communications agreement, as defined collectively by one or more service description documents Services adhere to a communications agreement, as defined collectively by one or more service description documents abstraction abstraction Beyond what is described in the service contract, services hide logic from the outside world Beyond what is described in the service contract, services hide logic from the outside world reusability reusability Logic is divided into services with the intention of promoting reuse Logic is divided into services with the intention of promoting reuse composability composability Collections of services can be coordinated and assembled to form composite services Collections of services can be coordinated and assembled to form composite services autonomy autonomy Services have control over the logic they encapsulate Services have control over the logic they encapsulate discoverability discoverability Services are designed to be outwardly descriptive so that they can be found and assessed via available discovery mechanisms Services are designed to be outwardly descriptive so that they can be found and assessed via available discovery mechanisms

48 Cloud Computing Thin clients Thin clients Software as a service Software as a service Pay per use ? Pay per use ? Data stored on servers Data stored on servers

49 Web 3.0 (wiki) transformation of the Web from a network of separately siloed applications and content repositories to a more seamless and interoperable whole transformation of the Web from a network of separately siloed applications and content repositories to a more seamless and interoperable whole ubiquitous connectivity, broadband adoption, mobile Internet access and mobile devices ubiquitous connectivity, broadband adoption, mobile Internet access and mobile devices network computing, software-as-a-service business models, Web services interoperability, distributed computing, grid computing and cloud computing network computing, software-as-a-service business models, Web services interoperability, distributed computing, grid computing and cloud computing open technologies, open APIs and protocols, open data formats, open-source software platforms and open data (e.g. Creative Commons) open technologies, open APIs and protocols, open data formats, open-source software platforms and open data (e.g. Creative Commons) open identity, OpenID, open reputation, roaming portable identity and personal data open identity, OpenID, open reputation, roaming portable identity and personal data the intelligent web, Semantic Web technologies such as RDF, OWL, semantic application platforms, and statement-based datastores the intelligent web, Semantic Web technologies such as RDF, OWL, semantic application platforms, and statement-based datastores distributed databases, the "World Wide Database" (enabled by Semantic Web technologies) distributed databases, the "World Wide Database" (enabled by Semantic Web technologies) intelligent applications, natural language processing, machine learning, machine reasoning, autonomous agents intelligent applications, natural language processing, machine learning, machine reasoning, autonomous agents

50 Example Bio-workflow Quickly integrate different web service Quickly integrate different web service Pdb Pdb EBI EBI Kegg Kegg AJAX and Microsoft Atlas technologies AJAX and Microsoft Atlas technologies All data exchanged as XML All data exchanged as XML

51 The Lab A simple cgi-bin application A simple cgi-bin application Reads some EBI sequence ids from a local mysql database Reads some EBI sequence ids from a local mysql database Retrieves the DNA sequence from EBI corresponding to an id Retrieves the DNA sequence from EBI corresponding to an id Transcribes the DNA to RNA Transcribes the DNA to RNA


Download ppt "Apt Software Avenues Pvt Ltd, Unit G302 Block DC, City Centre, Salt Lake, Kolkata 700064 Informatics perspectives in Bio-Informatics Atul P Agarwal Apt."

Similar presentations


Ads by Google