Presentation is loading. Please wait.

Presentation is loading. Please wait.

BioM 3 Basics of Systems Bioinformatics 沈祖望 博士 Tsu-Wang (David) Shen, Ph.D. Medical Informatics Department Tzu Chi University, Taiwan

Similar presentations


Presentation on theme: "BioM 3 Basics of Systems Bioinformatics 沈祖望 博士 Tsu-Wang (David) Shen, Ph.D. Medical Informatics Department Tzu Chi University, Taiwan"— Presentation transcript:

1

2 BioM 3 Basics of Systems Bioinformatics 沈祖望 博士 Tsu-Wang (David) Shen, Ph.D. Medical Informatics Department Tzu Chi University, Taiwan E-mail: tshen@mail.tcu.edu.tw Web: http://www.biom3.tcu.edu.tw

3 BioM 3 2015/5/10 2 Agenda 1. About me & Data mining 2. Introduction of Biological signal processing 3. Basics of Biology 4. Basics of Signals & Systems 5. Applications 6. Future Works

4 BioM 3  Education  美國威斯康辛大學麥迪遜分校生物醫學工程博士, Biomedical Engineering, Ph.D.(University of Wisconsin - Madison, WI, USA) 美國威斯康辛大學麥迪遜分校電機電腦工程碩士, Electrical and Computer Engineering, M.S. (University of Wisconsin - Madison, WI, USA) 美國伊利諾理工學院電機電腦工程碩士, Electrical and Computer Engineering M.S.(Illinois Institute of Technology, Chicago, IL, USA)生物醫學工程(University of Wisconsin - Madison,電機電腦工程(University of Wisconsin - Madison,Illinois Institute of Technology  Specialty  生醫工程、生物醫學訊號處理、生物安全辨識、類神經網 路、生物醫學統計、人工智慧、生醫電子 Introduce myself 2015/5/10 3

5 BioM 3 Introduce myself  Experience  慈濟大學課務組組長 (2005.8.1~2006.7.31)  國立東華大學全球運籌管理研究所兼任助理教授 (2007)  慈濟大學醫學研究所合聘助理教授 (2006~)  教育部高科技專利取得與攻防種子教師 (2006)  IEEE Member since 1996  台灣癲癇醫學會會員 since 2007  中華民國生醫工程師證照 (2007 I1090)  Educational Psychology Department, University of Wisconsin-Madison, WI, USA, 電腦專案助理 (1999~2005)  大華技術學院專任講師 (1995~1996) 2015/5/10 4

6 BioM 3 Data mining 2015/5/10 5 Why mining? We need information from big data warehouse, but it is too many information! What is data mining? Data mining is the process of sorting through large amounts of data and picking out relevant information.

7 BioM 3 Traditional data mining skills 2015/5/10 6

8 BioM 3 Traditional data mining skills 2015/5/10 7

9 BioM 3 Today – New view of data mining  Systems Bioinformatics: An Engineering Case-Based Approach by Gil Alterovitz, Marco F. Ramoni 2015/5/10 8 Computational Biology : This trail-blazing work introduces a quantitative systems approach to bioinformatics research using powerful computational tools drawn from signal processing, circuit analysis, control systems, and communications. It presents the functionality of biological processes in an engineering context to facilitate the application of technical skills in solving the field's challenges, from the lab bench to data analysis and modeling, and to enable reverse engineering from biology in the development of synthetic biological devices.

10 BioM 3 Systems Bioinformatics  Today, we will focus on using signal processing and system skill to solve Bioinformatics & Biology problems as the “mining” skills. 2015/5/10 9

11 BioM 3 What are signals? ♦ Signals are everywhere! Signals for detecting the presence of an object. For example, Signals for making medical diagnosis (eg. electrocardiogram (ECG), electroencephalogram (EEG)] The trading volume in the stock market. ♦ Signals can be categorized in various ways 2015/5/10 10

12 BioM 3 What are signals?  A signal is an abstract element of information, or (more commonly) a flow of information (in one or more dimensions).  One or more variable -> one dimensional vs. multidimensional  Signal vs. noise -> 端看是否需要 2015/5/10 11

13 BioM 3 Continuous (Analog), Discrete, and Digital signals 2015/5/10 12

14 BioM 3 Continuous (Analog), Discrete, and Digital signals 2015/5/10 13

15 BioM 3 What is a system?  A system is defined as an entity that manipulates one or more signals to accomplish a function, thereby yielding new signals x[n] h[n] y[n] 2015/5/10 14

16 BioM 3 Specific systems  Communication Systems  Control Systems  Microelectromechanical systems (MEMS)  Remote sensing  Biomedical signal systems  Auditory system  Bioinformatics systems  Others … 2015/5/10 15

17 BioM 3 Biological Signal Processing  The discipline aimed at  Understanding and modeling the biological algorithms implemented by living systems using signal processing theory. (biology as an endpoint)  The efforts seeking to use biology as a metaphor to formulate novel signal processing algorithms for engineering systems. (signal processing as an endpoint) 2015/5/10 16

18 BioM 3 History  Claude Bernard 1800’s – concept of homeostasis  Ludwig von Bertalanffy 1968 – General System Theory: there appear to exist general system laws which apply to any system of a particular type, irrespective of the particular properties of the systems and the elements involved …  Norbert Wiener 1900’s – system feedback  Reiner 1960s – System works vs. DNA findings  Focused on reductionist component-level view. 2015/5/10 17

19 BioM 3 Electrical vs. Biological Signal Processing 2015/5/10 18 整合訊號處理 與生物

20 BioM 3 BSP at the cell level -Today’s coverage  Signal detection and estimation  DNA sequencing  Gene identification  Protein hotspots identification  System identification and analysis  Gene regulation systems  Protein signal systems 2015/5/10 19

21 BioM 3 Basics of Biology DNA -> RNA -> amino acid ->protein -> cells (Gene) 2015/5/10 20

22 BioM 3 Introduction of Basics of Biology  In April 2003, sequencing of all three billion nucleotides in the human genome was declared complete.  1631 human genetic diseases are now associated with know DNA sequence.  The human genome was not the only organism targeted. (June 2004, 1557 viruses, 165 microbes, and 26 eukaryotes.) 2015/5/10 21

23 BioM 3 DNA and Gene Expression DNA is found in nucleus and mitochondria of eukaryotic cells. DNA 是存在細胞核內,包含動物細 胞、植物細胞與真菌細胞。 DNA in the nucleus contains information from both parents. The chemical that directs protein synthesis and serves as a genetic blueprint is DNA, which is found in the nucleus of the cell. 2015/5/10 22

24 BioM 3 DNA  脫氧核糖核酸( DNA ,為英文 Deoxyribonucleic acid 的縮寫),又稱去 氧核糖核酸,是染色體的主要化學成分,同時也是組成基因的材料。有 時被稱為「遺傳微粒」,因為在繁殖過程中,父代把它們自己 DNA 的一 部分複製傳遞到子代中,從而完成性狀的傳播。英文染色體基因性狀  事實上,原核細胞(無細胞核)的 DNA 存在於細胞質中,而真核生物的 DNA 存在於細胞核中。嚴格的說, DNA 是由兩條單鏈像葡萄藤那樣相互 盤繞成雙螺旋形,根據螺旋的不同分為 A 型 DNA , B 型 DNA 和 Z 型 DNA , 詹姆斯 · 沃森與佛朗西斯 · 克里克所發現的雙螺旋,是稱為 B 型的水結合型 DNA ,在細胞中最為常見。原核細胞細胞核細胞質真核生物細胞核雙螺旋形 A 型 DNA B 型 DNA Z 型 DNA 詹姆斯 · 沃森佛朗西斯 · 克里克細胞  這種核酸高聚物是由核苷酸連結成的序列,每一個核苷酸都由一分子脫 氧核糖,一分子磷酸以及一分子鹼基組成。 DNA 有四種不同的核苷酸結 構,它們是腺嘌呤( adenine ,縮寫為 A ),胸腺嘧啶( thymine ,縮寫 為 T ),胞嘧啶( cytosine ,縮寫為 C )和鳥嘌呤( guanine , 縮寫為 G )。在雙螺旋的 DNA 中,分子鏈是由互補的核苷酸配對組成的,兩條鏈 依靠氫鍵結合在一起。由於氫鍵鍵數的限制, DNA 的鹼基排列配對方式 只能是 A 對 T 或 C 對 G 。因此,一條鏈的鹼基序列就可以決定了另一條的 鹼基序列,因為每一條鏈的鹼基對和另一條鏈的鹼基對都必須是互補的 。在 DNA 複製時也是採用這 種互補配對的原則進行的:當 DNA 雙螺旋 被展開時,每一條鏈都用作一個模板,通過互補的原則補齊另外的一條 鏈。核苷酸脫 氧核糖磷酸鹼基腺嘌呤胸腺嘧啶胞嘧啶鳥嘌呤  分子鏈的開頭部分稱為 3' 端而結尾部分稱為 5' 端,這些數字表示脫氧核 糖中的碳原子編號。 3' 端 5' 端 2015/5/10 23

25 BioM 3 Base pairs  鹼基對是形成核酸 DNA 、 RNA 單 體以及編碼遺傳 信息的化學結構。組成鹼基對的鹼基包括 A 、 G 、 T 、 C 、 U 。嚴格地說,鹼基對是一對相互匹配的 鹼基(即 A : T , G : C , A : U 相互作用)被氫鍵 連接起來。然而,它常被用來衡量 DNA 和 RNA 的 長度(儘管 RNA 是單鏈)。它還與核苷酸互換使 用,儘管後者是由一個五碳 糖、磷酸和一個鹼基 組成。 DNA RNA  鹼基對通常簡寫做 bp( 英語 base pair) ,千鹼基對 爲 kbp ,或簡寫作 kb (對於雙鏈核酸。對於單鏈 核酸, kb 指千鹼基)。 2015/5/10 24

26 BioM 3 Base pairs, DNA 2015/5/10 25

27 BioM 3 DNA and Gene Human Genome has been estimated about 30000. 2015/5/10 26

28 BioM 3 DNA-> Gene Each gene has a particular location in a specific chromosome and contains the “code” for producing one of three forms of RNA (ribosomal RNA (rRNA), messenger RNA (mRNA), and transfer RNA (tRNA)). 2015/5/10 27

29 BioM 3 Genomes  在生物學中,一個生物體的基因組是指該生物 的 DNA (對一部分病毒是 RNA )中所包含的 全部遺傳信息。基因組包括基因和非編碼序列 。 1920 年,德國漢堡大學植物學教授 Hans Winkler 首次使用基因組這一名詞。生物學 DNA RNA基因非編碼序列 1920  精確地講,一個生物體的基因組是指一組染 色體中的完整的 DNA 序列。例如,生物個體體 細胞中的二倍體由兩條染色體組成,其中之一 的 DNA 序列就是一個基因組。染 色體 DNA體 細胞二倍體染色體 DNA  Yu-Gi-Oh! 2015/5/10 28

30 BioM 3  The Central Dogma of biology. DNA is copied into RNA (transcription); the RNA is used to make proteins (translation); and the proteins perform functions such as copying the DNA (replication). (www.biologyforengineers.org). The Central Dogma of biology 2015/5/1029

31 BioM 3  Replication of DNA by DNA polymerase. After the two strands of DNA separate (top), DNA polymerase uses nucleotides to synthesize a new strand complementary to the existing one (bottom). Images from the online tutorial “Biological Information Handling: Essentials for Engineers” (www.biologyforengineers.org). Replication 2015/5/10 30

32 BioM 3 Replication  During Replication, some enzymes check for accuracy.  Error rate: approximately one per billion. 2015/5/10 31

33 BioM 3  Transcription of DNA by RNA polymerase. Note that RNA contains U’s instead of T’s. Image from the online tutorial “Biological Information Handling: Essentials for Engineers” (www.biologyforengineers.org). Transcription ( 轉錄 ) 2015/5/10 32

34 BioM 3 Transcription mRNA ,為 messenger RNA 的簡稱,或稱為信使 RNA 。 mRNA 上帶著從 DNA 轉錄來的,提供轉譯成蛋白質所需 訊息。 Messenger RNA (mRNA) 是攜帶從 DNA 而來的 遺傳訊息,到細胞中合成蛋白質的核糖體位置的 RNA 。 mRNA 在它短暫的存在時間中,經過了數個步驟:在轉 錄的過程中,一個叫做 RNA 聚合酶的酵素,按照其需要, 從 DNA 中複製出一段基因到 mRNA 上。在原核生物中, mRNA 並未被進一步去處理 ( 但有些罕有的特例 ) ,而經 常是在轉錄過程中,同時也進行轉譯。在真核生物中, 轉錄跟轉譯發生在細胞的不同位置 ( 轉錄發生在 DNA 所 儲存的細胞核中,而轉譯是發生在核糖體所在的細胞質 中 ) 。 2015/5/10 33

35 BioM 3 Transcription mRNA 在真核細胞 中的交互作用。 RNA 在轉錄之後被 創造出來;在經過 修剪和加上多腺嘌 呤尾之後,被運送 到細胞質,然後在 核糖體那進行轉譯。轉錄腺嘌 呤細胞質 核糖體轉譯 2015/5/10 34

36 BioM 3 Transcription vs. replication  Only a certain stretch of DNA acts as the template and not whole strand  Different enzymes are used.  Only a single strand is produced. 2015/5/10 35

37 BioM 3  In the process of translation, each mRNA codon attracts a tRNA molecule containing a complementary anticodon. (www.biologyforengineers.org). Translation ( 轉譯 ) 2015/5/10 36

38 BioM 3 反密碼 (anti-codon)  是位在 tRNA 上, tRNA 上反密碼決定 tRNA 所要攜帶的氨 基酸,反密碼會與 mRNA 上的密碼配對。 2015/5/10 37

39 BioM 3 Transcription & Translation 2015/5/10 38

40 BioM 3 DNA to RNA to Protein  The genetic information is stored in DNA, copied to RNA, and then interpreted from the RNA copy to form a functional protein.  mRNA rRNA tRNA  蛋白質是由氨基酸串成的 。細胞內有 20 種氨基酸 ,可以串出不同長度、不同形狀、不同功能的蛋 白質 。 2015/5/10 39

41 BioM 3 蛋白質是依照基因的密碼來合成 DNA 是遺傳物質, DNA 上面三個相鄰 的鹼基就可以組成 一個密碼,基因要 做蛋白質的時候, 是先將 DNA 的密碼 轉錄成 RNA , RNA 離開細胞核進 入細胞質中,與核 糖體結合,就可以 依 RNA 上的密碼合 成蛋白質。 2015/5/10 40

42 BioM 3 DNA to RNA to Protein (蛋白質) 2015/5/10 41

43 BioM 3 如何知道是三個相鄰? 二十個氨基酸 (amino acid)  一個:  兩個:  三個 2015/5/10 42

44 BioM 3 64 codons and the amino acid for each 2015/5/10 43 http://www.medigenomix.de/ http://en.wikipedia.org/

45 BioM 3  Complexity at the protein level exceeds complexity at the gene and transcript levels. Individual genes in the genome are transcribed into RNA. In eukaryotes, the RNA may be further processed to remove intervening sequences (RNA splicing) and result in a mature transcript that encodes a protein. Different proteins may be encoded by differently spliced transcripts (alternative splicing products). Moreover, once proteins are produced, they can be processed (e.g. cleaved by a protein-cutting protease) or modified (e.g. by addition of a sugar or lipid molecule). Moreover, proteins may have non-covalent interaction with other proteins (and/or with other biomolecules such as lipids or nucleotides). Each of these can have tissue-, stage- and cell-type specific effects on the abundance, function and/or stability of proteins produced from a single gene. 2015/5/1044

46 BioM 3 Transcription 而且在真核生物中, mRNA 在準備好轉譯前需要經過許多處理的步驟: 1. 加上一個 5'cap- 一個經過修飾的鳥嘌呤 (guanine) 被加到 mRNA 的 5' 端 (5'end) 。這個 5'cap 對於辨識與接到適當的核糖體是相當重 要的。 5'cap鳥嘌呤 2. 修剪 (splicing)-pre-mRNA ( 尚未經過修飾或是部份經過修飾的 mRNA ,稱作 pre-mRNA ,或是 heterogeneous nuclear RNA , hnRNA) 被修飾去除掉內含子 (intron) ,也就是不被轉錄的區段; 其餘存留著,能夠轉譯成蛋白質的序列,則被稱作外顯子 (exon) 。 通常一個 pre-mRNA 能經由數種不同的修剪方式,產生不同的成 熟 mRNA ,使一段基因 ( 在不同組織,器官中 ) 能經過轉錄轉譯後產 生不同的蛋白質,表現出不同的作用。這種修剪叫做 alternative splicing 。大部分的 mRNA 修剪都是由酵素所執行,不過有些 RNA 分子也有能力催化自身的修剪,例如: ribozymes 。內含子外顯子 alternative splicing 3. 加上多個腺嘌呤尾 (polyadenylation)- 藉由酵素 polyA polymerase ,在 pre-mRNA 的 3' 端上加上了一段數個腺嘌呤序列 ( 通常是數百個 ) , ( 這項修飾並不會出現在原核生物中 ) 。這段多腺 嘌呤尾在轉錄的時候,會加上一段特殊的片段, AAUAAA 。腺嘌呤 2015/5/10 45

47 BioM 3 Gel electrophoresis  Method for separating DNA, RNA, or protein using an electrical charge to separate DNA molecules through a three- dimensional matrix. The larger the DNA fragment, the slower it will move through the matrix. DNA isolated on a gel can be recovered and purified away from the matrix. 2015/5/10 46

48 BioM 3 Polymerase Chain Reaction (PCR)  A method for amplification of a specific DNA fragment in which paired DNA strands are separated (by high temperature) and then each is used as a template for production of complementary strand by an enzyme (a DNA polymerase).  PCR 的發展可以說是從 DNA 合成酵素的發現緣起。但 由於這個酵素是一種易被熱所破壞之酵素,因此不符合 一連串的高溫連鎖反應所需。  現今所使用的酵素 ( 簡稱 Taq polymerase) ,則是於 1976 年從熱泉中的細菌 (Thermus Aquaticus) 分離 出來的。它的特性就在於能耐高溫,是一個很理想的酵 素. 2015/5/10 47

49 BioM 3 Polymerase Chain Reaction (PCR) 2015/5/10 48 Flash DEMO

50 BioM 3  Flow-chart of the major steps involved in gene clone set production and use. Gene clone 2015/5/1049

51 BioM 3  Bioinformatic approaches to select target genes for a cloning project. For bacterial genomes, target selection primarily draws from genome sequence, where introns are not a consideration and genome-scale projects are feasible. For eukaryotes, researchers commonly use one or more informatics-based methods to identify sub-groups of target genes that share a common feature, such as function, localization, expression or disease association. As noted, these information sources draw significantly on one another (as experimental data is in genome annotation, etc.). 2015/5/10 50

52 BioM 3 Basics of Signals and Systems 2015/5/10 51

53 BioM 3 Signal processing Basics  Time- Domain Representation 隨時間改變之訊號  Frequency – Domain Representation 傅立業發現:任何的訊號都可表達為 SIN 和 COS 函數的 組合 ( 驚! ) 2015/5/10 52

54 BioM 3 Time Operations  Time scaling  If a >0, Y(t) is compressed.  If 0< a <1, y(t) is expended 2015/5/10 53

55 BioM 3 Time scaling 2015/5/10 54

56 BioM 3 Reflection  Y(t) = x(-t)  The y(t) represents a reflected version of x(t) about t=0. 2015/5/10 55

57 BioM 3 Time shifting  Y(t)=x(t-t 0 )  If t 0 >0 shifting toward the right.  If t 0 <0 shifting toward the left. 2015/5/10 56

58 BioM 3 Example: Precedence Rule  Y(t)=x(at-b)  V(t)=x(t-b)  Y(t)=v(at)=x(at-b) 2015/5/10 57

59 BioM 3 時間訊號,傅力業轉換, Z 轉換 藍星人=地球人 只是角度不同, 說話因而不同! 2015/5/10 58

60 BioM 3 Relationship between Time Properties of a signal and the Appropriate Fourier Representation 2015/5/10 59 Time propertyPeriodicNon-periodic Continuous (t) Fourier Series (FS) Fourier Transform (FT) Discrete [n] Discrete-time Fourier Series (DTFS) Discrete-time Fourier Transform (DTFT)

61 BioM 3 2015/5/10 60 Relations Among Fourier Methods

62 BioM 3 各種轉換方式 Time Fourier S -Laplace Z 2015/5/10 61

63 BioM 3 Filters 2015/5/10 62

64 BioM 3 Advantages of digital filters over analog filters  Highly immune to noise because of the way it is implemented (software/digital circuits)  Accuracy dependent only on round-off error, directly determined by the number of bits  Easy and inexpensive to change a filter’s operating characteristics (e.g., cutoff frequency)  Performance not a function of component aging, temperature variation, and power supply voltage 2015/5/10 63

65 BioM 3 Signal conversion Band- limiter and sampler Digital filter (processor) Reconstruction filter Continuous signal Continuous signal Sampled signal 2015/5/10 64

66 BioM 3 Sampling as multiplication by a train of impulses 2015/5/10 65

67 BioM 3 Sampling theorem  Must sample at a rate at least twice the highest frequency present in the signal (including noise)  If a signal contains no frequencies higher than f c, the original signal can be completely recovered by sampling at least 2 f c samples/s.  Sampling frequency f s must be at least twice the highest frequency present in a signal (Nyquist frequency) 2015/5/10 66

68 BioM 3 LTI system 2015/5/10 67 何謂 LTI (Linear Time Invarent) two conditions

69 BioM 3 System operation 2015/5/10 68 convolution multiple

70 BioM 3 Autocorrelation and PSD 2015/5/10 69 The power spectral density (PSD) of a signal is defined as Fourier transform of the autocorrelation function.

71 BioM 3 Example 2015/5/10 70 數位 OR 類比?

72 BioM 3 Signal Detection and Estimation 2015/5/10 71

73 BioM 3 Estimation and detection 2015/5/10 72

74 BioM 3 Signal detection and estimation  Estimation  Detection Hypotheses testing  Five steps for analyzing genomic and proteomic data  Describe and identify the measurement system S  Define the signal of interest x[n]. Map the biological space into a numerical space.  Formulate the problem (estimation, detection, or analysis  Solve problem and compute output signals  Interpret the results in a biological context. 2015/5/10 73

75 BioM 3 DNA sequencing 2015/5/10 74  The DNA sequencing process  DNA sample preparation  Electrophoresis  Processing  Processing the eletropherogram data to identify the DNA sequence  Conditioning the signal and increasing S/N ratio  Identifying the underlying DNA seqence.

76 BioM 3 DNA sequencing - eletropherogram 2015/5/10 75 http://www.genome.uab.edu/

77 BioM 3 Model of DNA sequencing 2015/5/10 76 Blurring : for example, diffusion effects, instrument noises

78 BioM 3 Wiener filter 2015/5/10 77 http://en.wikipedia.org/wiki/Wiener_filter

79 BioM 3 DNA sequence estimation using LTI filtering 2015/5/10 78

80 BioM 3 Homomorphic Blind Deconvolution 2015/5/10 79 Wiener filter can lead significant errors on diffusion effects. High-pass filter to reduce blurring (low frequency)

81 BioM 3 Results 2015/5/10 80 Error rate 1.06% is better than reports from florescence – base sequencing instrument!

82 BioM 3 Model based estimation techniques 2015/5/10 81 There are four main distortions introduced by sequencing: 1.Loading artifects 2.Diffusion effects 3.Fluorescence interference 4.Additive instrument noise 以MODEL方式反推

83 BioM 3 Gene identification  Once the DNA sequence has been identified, it needs to be analyzed to identify genes and coding sequence. 2015/5/10 82

84 BioM 3 DNA signal properties 2015/5/10 83 Consider autocorrelation function of x a [n] The nonflat shape of the spectrum revels correlations at low frequencies, indicating that base pairs that are far away seems to be correlated. Bacterium Aquifex aeolicus

85 BioM 3 DNA signal properties  At thin peak 2pi/3, the increased correlation corresponds to the tendency of nucleotides to be repeated along the DNA sequence with period 3 and is indicative of coding regions.  The triplet nature of the codon  Potentially codon bias (unequal usage of codon)  The biased usage of nucleotide triples in genomic DNA (triplet bias).  Yin and Yau showed that the period-3 property is not affected by codon bias.  The period -3 property of coding regions seems to be generated by unbalanced nucleotide distributions in three codon potions. 2015/5/10 84

86 BioM 3 DNA signal processing for Gene identification  Fickett – the problem of interpreting nucleotide sequences by computer, in order to provide tentative annotation on the location, structure, and functional class of protein-coding gene. i.e. predict the amino acid sequence of protein to provide the insight of function.  The premise of all methods is to exploit the period-3 property of coding regions by processing the DNA signal to identify regions with strong period – 3 correlation. 2015/5/10 85

87 BioM 3 DNA signal processing for Gene identification  DNA spectrum  Signal-to-noise ratio  Tiwari et al. observed that for most coding sequences in variety of organism, Px is large, but not in non coding regions. 2015/5/10 86

88 BioM 3 Fourier spectra 2015/5/10 87 Coding stretch of DNA noncoding stretch of DNA

89 BioM 3 Filtering methods applied 2015/5/10 88 window IIR antinotch Multistage filters Predict the five exons for gene F56F11.4 in C. elegans chromosome III.

90 BioM 3 Protein Hotspots identification  Once coding regions have been identified, the corresponding protein sequence can be determined by mapping the coding region to the amino acid sequence using the genetic code. 2015/5/10 89

91 BioM 3 Protein Signal Definition  The new physicomathematical approach resented here is called the Resonant Recognition Model (RRM). The RRM is based on the representation of the protein primary structure as a numerical series by assigning to each amino acid a physical parameter value relevant to the protein’s biological activity. 2015/5/10 90

92 BioM 3 Protein Signal Definition  The RRM is a physical and mathematical model which interprets protein sequence linear information using signal analysis methods.  It comprises two stages:  The first involves the transformation of the amino acid sequence into a numerical sequence. Each amino acid is represented by the value of the electron-ion interaction potential (EIIP) which describes the average energy states of all valence electrons, in particular amino acids.  Numerical series obtained this way are then analyzed by digital signal analysis methods in order to extract information pertinent to the biological function. 2015/5/10 91

93 BioM 3 EIIP 2015/5/10 92

94 BioM 3 RESONANT RECOGNITION MODEL (RRM) 2015/5/10 93

95 BioM 3 Prediction of protein hotspot cytochrome C proteins 2015/5/10 94

96 BioM 3 System identification and Analysis 2015/5/10 95

97 BioM 3 Signal processing view of the cell 2015/5/10 96

98 BioM 3 Signal coordination at cell level 2015/5/10 97 System view

99 BioM 3 Gene expression  A DNA microarray (also commonly known as gene chip, DNA chip, or biochip) is a collection of microscopic DNA spots attached to a solid surface, such as glass, plastic or silicon chip forming an array.DNAglassplasticsilicon chiparray  DNA microarrays, such as cDNA microarrays and oligonucleotide microarrays. DNA microarrayscDNA microarrays oligonucleotide microarrays  In genetics, complementary DNA (cDNA) is DNA synthesized from a mature mRNA template. cDNA is often used to clone eukaryotic genes in prokaryotes.genetics DNAmRNAeukaryoticgenes prokaryotes 2015/5/10 98

100 BioM 3 Gene expression One sample from a tumor; One sample from a normal tissue. White – highly expressed under treatment A Gray – no difference Dark - highly expressed under treatment B 2015/5/10 99

101 BioM 3 Time changes – measure drugs in 6 hours. 2015/5/10 100

102 BioM 3 cDNA microarrays The procedure begins by attaching the DNA sequences of thousands of genes onto microscope slide in the pattern of spots, with each spot containing only DNA sequences of a single gene. 2015/5/10 101

103 BioM 3 Oligonucleotide microarrays In oligonucleotide microarrays (or single- channel microarrays), the probes are designed to match parts of the sequence of known or predicted mRNAs. In stead of attaching full-length DNAs, oligonucleotide microarrays make use of short oligonucleotide chosen to be specific to individual genes. 2015/5/10 102

104 BioM 3 Oligonucleotide microarrays These microarrays give estimations of the absolute value of gene expression and therefore the comparison of two conditions requires the use of two separate microarrays. 2015/5/10 103

105 BioM 3 DNA microarray 2015/5/10 104

106 BioM 3 Principle components analysis  Principle component analysis transforms the original set of variables into a smaller set of linear combinations that account for most of variance of the original set. The purpose of principle component analysis is to determine factors (i.e., principle components) in order to explain as much of the total variation in the data as possible with as few of these factors as possible. 2015/5/10 The principal components are those uncorrelated linear combinations PC (1), PC (2), …, PC (m) whose variances are as large as possible.

107 BioM 3 Eigenvalues and Eigenvectors Definition Let A be an n  n matrix. A scalar is called an eigenvalue of A if there exists a nonzero vector x in R n such that Ax = x. The vector x is called an eigenvector corresponding to. 2015/5/10 106

108 BioM 3 Computation of Eigenvalues and Eigenvectors Let A be an n  n matrix with eigenvalue and corresponding eigenvector x. Thus Ax = x. This equation may be rewritten Ax – x = 0 giving (A – I n )x = 0 Solving the equation |A – I n | = 0 for leads to all the eigenvalues of A. On expending the determinant |A – I n |, we get a polynomial in. This polynomial is called the characteristic polynomial of A. The equation |A – I n | = 0 is called the characteristic equation of A. 2015/5/10 107

109 BioM 3 Example 1 Find the eigenvalues and eigenvectors of the matrix Let us first derive the characteristic polynomial of A. We get Solution We now solve the characteristic equation of A. The eigenvalues of A are 2 and –1. = 2 2015/5/10 108

110 BioM 3 This leads to the system of equations giving x 1 = –x 2. The solutions to this system of equations are x 1 = –r, x 2 = r, where r is a scalar. Thus the eigenvectors of A corresponding to = 2 are nonzero vectors of the form  = – 1 Thus x 1 = –2x 2. The eigenvectors of A corresponding to = –1 are nonzero vectors of the form s[-2 1] t 2015/5/10 109

111 BioM 3 Singular value decomposition (SVD)  Suppose M is an m-by-n matrix whose entries come from the field K, which is either the field of real numbers or the field of complex numbers. Then there exists a factorization of the form 2015/5/10 110 The matrix V thus contains a set of orthonormal "input" or "analysing" basis vector directions for M The matrix U contains a set of orthonormal "output" basis vector directions for M The matrix Σ contains the singular values, which can be thought of as scalar "gain controls" by which each corresponding input is multiplied to give a corresponding output.

112 BioM 3 SVD example  A non-negative real number σ is a singular value for M if and only if there exist unit-length vectors u in K m and v in K n such that 2015/5/10 111

113 BioM 3 Eigengenes from applying SVD 2015/5/10 112

114 BioM 3 Project CLB2 and CLN3 2015/5/10 113

115 BioM 3 Apoptosis System Identification 2015/5/10 114

116 BioM 3 Apoptosis System Identification 2015/5/10 115

117 BioM 3 PCA results 2015/5/10 116

118 BioM 3 Summarizations  We tried to link signals, systems, and biology.  Filtering is necessary for removing artifacts.  We learned “Signal detection and estimation” 1.DNA sequencing 2.Gene identification 3.Protein hotspots identification  We learned “System identification and analysis” 1.Gene regulation systems 2.Protein signal systems  We provided the new thought of data mining. 2015/5/10 117

119 BioM 3 Thanks 2015/5/10 118


Download ppt "BioM 3 Basics of Systems Bioinformatics 沈祖望 博士 Tsu-Wang (David) Shen, Ph.D. Medical Informatics Department Tzu Chi University, Taiwan"

Similar presentations


Ads by Google