Presentation is loading. Please wait.

Presentation is loading. Please wait.

(Really) Basic Computer Science James A. Foster U. Idaho, IBEST.

Similar presentations


Presentation on theme: "(Really) Basic Computer Science James A. Foster U. Idaho, IBEST."— Presentation transcript:

1 (Really) Basic Computer Science James A. Foster U. Idaho, IBEST

2 What it is The study of computation and how to design and implement artifacts that compute correctly and efficiently Computer science is not programming (“to a biologist, a computer scientist is a wrench”)

3 What it is, practically  Characterize fundamental limits of algorithmic manipulations of formal languages (CS Theory)  Design efficient, correct algorithms (all loaded terms)  Implement algorithms as useful programs  Organize data for efficient, flexible access  Design and build hardware

4 Formal languages Formal language: abstract symbols combined into sequences according to algorithmic rules Alphabet  : any finite set of meaningless characters, e.g. {A,C,G,T}  Sentences: any sequence from   Language: any set of sentences  Expressions: patterns to describe languages  Automata: finite states, symbol-driven transitions. Recognizers.  Grammars: rewriting rules. Generators.

5 Types of languages  Regular: Can describe, recognize, or generate (DRG) algorithmically without memory  Context free: Can DRG everywhere with only local information  Context sensitive: need global information  Recursively enumerable: need unrestricted information  Undecidable: cannot DRG with algorithms

6 Expressions  One string of symbols used to match a language  E.g., [A,C,M][C,G,S]A* matches all sequences that begin with A, C, or M followed by C, B, or S, followed by zero or more A’s  Regular expressions: describes alternatives, sequences, repeats

7 Automata  Finite set of states, with rules to specify when to change states in response to reading a symbol

8 Types of Automata NameMemoryLanguage Finite A. (DFA, NFA) NoneRegular Pushdown A.StackContext free Turing machine RAM (tape)Recursively enumerable Stochastic-any variety above- -varies-

9 Grammars  Terminal & non-terminal symbols  Productions (rewriting rules) Terminals={A,C,M,C,G,S} Nonterminals={ ,a,b,c} (Initial symbol is  ) Productions=  a a  Ab | a  Cb | a  Mb b  Cc | b  Gc | b  Sc c  Ac | c 

10 Types of grammars  Regular: productions like   A   Context free: productions like   A  (lhs has no terminals)  Context sensitive: productions like anything  anything (lhs not shorter than rhs)  Phrase structured: anything goes  Stochastic: productions have probabilities

11 Algorithm Abstract, precise description of a step- by-step process for transforming data  Programming language independent  Machine independent  Using “reasonable” steps

12 Algorithm example Input: DNA sequences S1, S2, lengths m,n For each position p1 from 1 to n in S2 For each position p2 from 1 to m in S1 If character p1 in S1 doesn’t match p2 in S2 Then break out of this loop to outer loop Return “S1 starts at position p1 in S2” Return “S1 is not in S2”

13 Efficiency  Implementation independent –Property of algorithms, not programs –Same for all hardware  Describes resource (time, space) consumption as function of increasing amount of input data  Usually pessimistic: worst case  Represented with big-Oh notation

14 Algorithm example Above algorithm uses O(mn) time, and O(n+m space)  Worst case scenario: S1 not in S2 –outer loop iterates n, inner loop m times –Space used only to store S1, S2, p1 and p2: mx+nx+c+c  c is independent of amount of input  X is bits to store a nucleotide, also a constant

15 Limits to efficiency  Complexity classes –P: (Decision) problems solvable in O(n k ) for some k –NP: (Decision) problems with solutions that are easy to verify  Completeness –A problem is complete for a complexity class if every other problem in the class is “really” a special instance of this one

16 Complexity lessons  NP Complete problems have no known O(n k ) solutions –Do they exist? Beats the hell out of me.  For biological problems, k=2 is too big  Most interesting problems are NP complete –Multiple sequence alignment, phylogenetic inferencing, e.g.

17 Correctness The input-output transformation of correct algorithms is consistent with that given by the algorithm’s specifications  Ways of being wrong: incorrect or no output for an input in the specifications  Specifications have to be clear, complete, consistent (very hard!)

18 Types of algorithms  Iterative  Recursive  Branch and bound  Divide and conquer  Dynamic programming  Stochastic  Parallel and distributed  Software engineering

19 Software engineering Implementing algorithms as programs  Levels of implementation –In hardware –Machine code –Compiled code –Interpreted code  Types of programming languages –Imperative: describe how to do things –Declarative: describe facts and constraints –Object-oriented: describe objects, methods relations between them –“scripting” languages

20 Useful software  Graphical user interface (GUI)  I/O devices  I’m rotten at this, most computer scientists are

21 Programming advice  Read lots of code, not lots of books  Write your programs to be read  Play “what if” a lot, get to know errors  Play two roles: maker & breaker  If it doesn’t work, YOU screwed up

22 Data organization  Just data, no structure: FASTA  With field tags: Genbank, PDB  Marked for relational DBs: ASN.1

23 Relational databases  Data –Data stored in tables (like spreadsheets) –Joining tables: merges their information –Projecting tables: selects subsets of information  Advantages –Flexible and efficient storage –Easy to update –Supports ad hoc queries –Little extra processing necessary

24 Hardware  Data storage: hard disk, CD, Zip, etc.  Transmission: ethernet, phone, wireless, other (T1)  Computer –CPU(s) –Cache –RAM (consider virtual storage) –Bus –Peripherals

25 Client Server paradigm  Types of servers –Mail –name –file –web –data –X-window –compute

26 Operating system  Windows –Easy to use, and to keep running –Very inflexible –Inefficient  Unix –Harder to use, and to keep running –Very flexible –Efficient


Download ppt "(Really) Basic Computer Science James A. Foster U. Idaho, IBEST."

Similar presentations


Ads by Google