Download presentation
Presentation is loading. Please wait.
Published byArthur Daniel Modified over 8 years ago
1
(Really) Basic Computer Science James A. Foster U. Idaho, IBEST
2
What it is The study of computation and how to design and implement artifacts that compute correctly and efficiently Computer science is not programming (“to a biologist, a computer scientist is a wrench”)
3
What it is, practically Characterize fundamental limits of algorithmic manipulations of formal languages (CS Theory) Design efficient, correct algorithms (all loaded terms) Implement algorithms as useful programs Organize data for efficient, flexible access Design and build hardware
4
Formal languages Formal language: abstract symbols combined into sequences according to algorithmic rules Alphabet : any finite set of meaningless characters, e.g. {A,C,G,T} Sentences: any sequence from Language: any set of sentences Expressions: patterns to describe languages Automata: finite states, symbol-driven transitions. Recognizers. Grammars: rewriting rules. Generators.
5
Types of languages Regular: Can describe, recognize, or generate (DRG) algorithmically without memory Context free: Can DRG everywhere with only local information Context sensitive: need global information Recursively enumerable: need unrestricted information Undecidable: cannot DRG with algorithms
6
Expressions One string of symbols used to match a language E.g., [A,C,M][C,G,S]A* matches all sequences that begin with A, C, or M followed by C, B, or S, followed by zero or more A’s Regular expressions: describes alternatives, sequences, repeats
7
Automata Finite set of states, with rules to specify when to change states in response to reading a symbol
8
Types of Automata NameMemoryLanguage Finite A. (DFA, NFA) NoneRegular Pushdown A.StackContext free Turing machine RAM (tape)Recursively enumerable Stochastic-any variety above- -varies-
9
Grammars Terminal & non-terminal symbols Productions (rewriting rules) Terminals={A,C,M,C,G,S} Nonterminals={ ,a,b,c} (Initial symbol is ) Productions= a a Ab | a Cb | a Mb b Cc | b Gc | b Sc c Ac | c
10
Types of grammars Regular: productions like A Context free: productions like A (lhs has no terminals) Context sensitive: productions like anything anything (lhs not shorter than rhs) Phrase structured: anything goes Stochastic: productions have probabilities
11
Algorithm Abstract, precise description of a step- by-step process for transforming data Programming language independent Machine independent Using “reasonable” steps
12
Algorithm example Input: DNA sequences S1, S2, lengths m,n For each position p1 from 1 to n in S2 For each position p2 from 1 to m in S1 If character p1 in S1 doesn’t match p2 in S2 Then break out of this loop to outer loop Return “S1 starts at position p1 in S2” Return “S1 is not in S2”
13
Efficiency Implementation independent –Property of algorithms, not programs –Same for all hardware Describes resource (time, space) consumption as function of increasing amount of input data Usually pessimistic: worst case Represented with big-Oh notation
14
Algorithm example Above algorithm uses O(mn) time, and O(n+m space) Worst case scenario: S1 not in S2 –outer loop iterates n, inner loop m times –Space used only to store S1, S2, p1 and p2: mx+nx+c+c c is independent of amount of input X is bits to store a nucleotide, also a constant
15
Limits to efficiency Complexity classes –P: (Decision) problems solvable in O(n k ) for some k –NP: (Decision) problems with solutions that are easy to verify Completeness –A problem is complete for a complexity class if every other problem in the class is “really” a special instance of this one
16
Complexity lessons NP Complete problems have no known O(n k ) solutions –Do they exist? Beats the hell out of me. For biological problems, k=2 is too big Most interesting problems are NP complete –Multiple sequence alignment, phylogenetic inferencing, e.g.
17
Correctness The input-output transformation of correct algorithms is consistent with that given by the algorithm’s specifications Ways of being wrong: incorrect or no output for an input in the specifications Specifications have to be clear, complete, consistent (very hard!)
18
Types of algorithms Iterative Recursive Branch and bound Divide and conquer Dynamic programming Stochastic Parallel and distributed Software engineering
19
Software engineering Implementing algorithms as programs Levels of implementation –In hardware –Machine code –Compiled code –Interpreted code Types of programming languages –Imperative: describe how to do things –Declarative: describe facts and constraints –Object-oriented: describe objects, methods relations between them –“scripting” languages
20
Useful software Graphical user interface (GUI) I/O devices I’m rotten at this, most computer scientists are
21
Programming advice Read lots of code, not lots of books Write your programs to be read Play “what if” a lot, get to know errors Play two roles: maker & breaker If it doesn’t work, YOU screwed up
22
Data organization Just data, no structure: FASTA With field tags: Genbank, PDB Marked for relational DBs: ASN.1
23
Relational databases Data –Data stored in tables (like spreadsheets) –Joining tables: merges their information –Projecting tables: selects subsets of information Advantages –Flexible and efficient storage –Easy to update –Supports ad hoc queries –Little extra processing necessary
24
Hardware Data storage: hard disk, CD, Zip, etc. Transmission: ethernet, phone, wireless, other (T1) Computer –CPU(s) –Cache –RAM (consider virtual storage) –Bus –Peripherals
25
Client Server paradigm Types of servers –Mail –name –file –web –data –X-window –compute
26
Operating system Windows –Easy to use, and to keep running –Very inflexible –Inefficient Unix –Harder to use, and to keep running –Very flexible –Efficient
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.