Combinatorics (Important to algorithm analysis ) Problem I: How many N-bit strings contain at least 1 zero? Problem II: How many N-bit strings contain.

Slides:



Advertisements
Similar presentations
9.5 Counting Subsets of a Set: Combinations
Advertisements

Counting. Counting = Determining the number of elements of a finite set.
Lecture 4 (week 2) Source Coding and Compression
4/16/2015 MATH 224 – Discrete Mathematics Counting Basic counting techniques are important in many aspects of computer science. For example, consider the.
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
SIMS-201 Compressing Information. 2  Overview Chapter 7: Compression Introduction Entropy Huffman coding Universal coding.
Counting and Probability The outcome of a random process is sure to occur, but impossible to predict. Examples: fair coin tossing, rolling a pair of dice,
Multiplication Rule. A tree structure is a useful tool for keeping systematic track of all possibilities in situations in which events happen in order.
Biotech 4490 Bioinformatics I Fall 2006 J.C. Salerno 1 Biological Information.
Discrete Structures & Algorithms More Counting. + + ( ) + ( ) = ? Counting II: Recurring Problems and Correspondences.
School of Computing Science Simon Fraser University
Huffman Encoding 16-Apr-17.
Today Today: Reading: –Read Chapter 1 by next Tuesday –Suggested problems (not to be handed in): 1.1, 1.2, 1.8, 1.10, 1.16, 1.20, 1.24, 1.28.
Fundamental limits in Information Theory Chapter 10 :
2015/6/15VLC 2006 PART 1 Introduction on Video Coding StandardsVLC 2006 PART 1 Variable Length Coding  Information entropy  Huffman code vs. arithmetic.
A Data Compression Algorithm: Huffman Compression
CSE 321 Discrete Structures Winter 2008 Lecture 16 Counting.
Combinations We should use permutation where order matters
Data Representation CS105. Data Representation Types of data: – Numbers – Text – Audio – Images & Graphics – Video.
Information Theory and Security
Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)
Huffman Coding Vida Movahedi October Contents A simple example Definitions Huffman Coding Algorithm Image Compression.
How many ways are there to pass through city A where the arrows represent one-way streets? Answer: mn ways The counting principal: Suppose two experiments.
Counting and Probability Sets and Counting Permutations & Combinations Probability.
Exam 1 Review 5.1, , 8.1, 8.2.
Combinatorics 3/15 and 3/ Counting A restaurant offers the following menu: Main CourseVegetablesBeverage BeefPotatoesMilk HamGreen BeansCoffee.
Basic Counting. This Lecture We will study some basic rules for counting. Sum rule, product rule, generalized product rule Permutations, combinations.
Representing Data. Representing data u The basic unit of memory is the bit  A transistor that can hold either high or low voltage  Conceptually, a tiny.
9.3 Addition Rule. The basic rule underlying the calculation of the number of elements in a union or difference or intersection is the addition rule.
Slide 7- 1 Copyright © 2006 Pearson Education, Inc. Publishing as Pearson Addison-Wesley.
(Important to algorithm analysis )
3. Counting Permutations Combinations Pigeonhole principle Elements of Probability Recurrence Relations.
A powerful strategy: Symmetry The world is full of symmetry, so use it!
Simple Arrangements & Selections. Combinations & Permutations A permutation of n distinct objects is an arrangement, or ordering, of the n objects. An.
JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) Essential Information Theory I AI-lab
COMPSCI 102 Discrete Mathematics for Computer Science.
Discrete Mathematics for Computer Science. + + ( ) + ( ) = ? Counting II: Recurring Problems and Correspondences Chapter 9 slides 1-54.
Error Control Code. Widely used in many areas, like communications, DVD, data storage… In communications, because of noise, you can never be sure that.
Coding Theory Efficient and Reliable Transfer of Information
Source Coding Efficient Data Representation A.J. Han Vinck.
The Pigeonhole Principle. The pigeonhole principle Suppose a flock of pigeons fly into a set of pigeonholes to roost If there are more pigeons than pigeonholes,
CS654: Digital Image Analysis
Discrete Mathematics Lecture # 25 Permutation & Combination.
Unit VI Discrete Structures Permutations and Combinations SE (Comp.Engg.)
STATISTIC & INFORMATION THEORY (CSNB134) MODULE 11 COMPRESSION.
Quiz highlights 1.Probability of the song coming up after one press: 1/N. Two times? Gets difficult. The first or second? Or both? USE THE MAIN HEURISTICS:
Multi-media Data compression
Section 1.3 Each arrangement (ordering) of n distinguishable objects is called a permutation, and the number of permutations of n distinguishable objects.
2/24/20161 One, two, three, we’re… Counting. 2/24/20162 Basic Counting Principles Counting problems are of the following kind: “How many different 8-letter.
Copyright © Peter Cappello 2011 Simple Arrangements & Selections.
Principles of Teamwork Learning how to function as a team member, and make your team succeed is one of the key objectives of this course. You can not get.
COUNTING Discrete Math Team KS MATEMATIKA DISKRIT (DISCRETE MATHEMATICS ) 1.
The Multiplication Rule
CSE15 Discrete Mathematics 04/19/17
(Important to algorithm analysis )
Discrete Structures for Computer Science
Hashing Alexandra Stefan.
Permutations and Combinations
Minds on! If you choose an answer to this question at random, what is the probability you will be correct? A) 25% B) 50% C) 100% D) 25%
4.5 – Finding Probability Using Tree Diagrams and Outcome Tables
Hashing Alexandra Stefan.
CS100: Discrete structures
(Important to algorithm analysis )
Why Compress? To reduce the volume of data to be transmitted (text, fax, images) To reduce the bandwidth required for transmission and to reduce storage.
Basic Counting.
Image Transforms for Robust Coding
Basic Counting Lecture 9: Nov 5, 6.
Presentation transcript:

Combinatorics (Important to algorithm analysis ) Problem I: How many N-bit strings contain at least 1 zero? Problem II: How many N-bit strings contain more than 1 zero?

Which areas of CS need all this? 1.Information Theory 2.Data Storage, Retrieval 3.Data Transmission 4.Encoding

Information (Shannon) Entropy quantifies, in the sense of an expected value, the information contained in a message. Example 1: A fair coin has an entropy of 1 [bit]. If the coin is not fair, then the uncertainty is lower (if asked to bet on the next outcome, we would bet preferentially on the most frequent result) => the Shannon entropy is lower than 1. Example 2: A long string of repeating characters: S=0 Example 3: English text: S ~ 0.6 to 1.3 The source coding theorem: as the length of a stream of independent and identically-distributed random variable data tends to infinity, it is impossible to compress the data such that the code rate (average number of bits per symbol) is less than the Shannon entropy of the source, without information loss.

Information (Shannon) Entropy Cont’d The source coding theorem: It is impossible to compress the data such that the code rate (average number of bits per symbol) is less than the Shannon entropy of the source, without information loss. (works in the limit of large length of a stream of independent and identically-distributed random variable data )

Data Compression Lossless Compression: => 25.[9]8 Lossy Compression: => 26 Lossless compression: exploit statistical redundancy. For example, In English, letter “e” is common, but “z” is not. And you never have a “q” followed by a “z”. Drawback: not universal. If no pattern - no compression. Lossy compression (e.g. JPEG images).

Combinatorics cont’d Problem: “Random Play” on your I-Touch works like this. When pressed once, it plays a random song from your library of N songs. The song just played is excluded from the library. Next time “Random Play” is pressed, it draws another song at random from the remaining N-1 songs. Suppose you have pressed “Random Play” k times. What is the probability you will have heard your one most favorite song?

Combinatorics cont’d The “Random Play” problem. 1.Tactics. Get your hands dirty. 2.P(1) = 1/N. P(2) = ? Need to be careful: what if the song has not played 1st? What if it has? Becomes complicated for large N. Let’s compute the complement: the probability the song has NOT played: 3.Table. Press # (k) vs. P /N /(N-1) k 1- 1/(N-k +1) 4.Key Tactics: find complimentary probability P(not) = (1 - 1/N)(1 - 1/(N- 1))*…(1- 1/(N-k +1)); then P(k) = 1 - P(not). 5.Re-arrange. P(not) = (N-1)/N * (N-2)/ (N-1) * (N-3)/(N-2)*…(N-k)/(N- k+1) = (N-k)/N. Thus, P = 1 - P(not) = k/N. 6.If you try to guess the solution, make sure your guess works for simple cases when the answer is obvious. E.g. k=1, k=N. Also, P <= The very simple answer suggests that a simpler solution may be possible. Can you find it?

The “clever” solution. 8 K K favorite N

What is DNA? All organisms on this planet are made of the same type of genetic blueprint. Within the cells of any organism is a substance called DNA which is a double-stranded helix of nucleotides. DNA carries the genetic information of a cell. This information is the code used within cells to form proteins and is the building block upon which life is formed. Strands of DNA are long polymers of millions of linked nucleotides.

Graphical Representation of inherent bonding properties of DNA

Combinatorics Cont’d Problem: DNA sequence contains only 4 letters (A,T,G and C). Short “words” made of K consecutive letters are the genetic code. Each word (called “codon”, ) codes for a specific amino-acid in proteins. For example, ATTTC is a 5 letter word. There are a total of 20 amino-acids. Prove that genetic code based on fixed K is degenerate, that is there are amino-acids which are coded for by more than one “word”. It is assumed that every word codes for an amino-acid

How can you use the solution in an argument For intelligent design Against intelligent design 12

Permutations How many three digit integers (decimal representation) are there if you cannot use a digit more that once?

P(n,r) The number of ways a subset of r elements can be chosen from a set of n elements is given by

Theorem Suppose we have n objects of k different types, with n k identical objects of the kth type. Then the number of distinct arrangements of those n objects is equal to Visualize: n k identical balls of color k. total # of balls = n

Combinatorics Cont’d The Mississippi formula. Example: What is the number of letter permutations of the word BOOBOO? 6!/(2! 4!)

Permutations and Combinations with Repetitions How many distinct arrangements are there of the letters in the word MISSISSIPPI? Each of the 11 letters in this word has a unique color (e.g. 1 st “I” is blue, the 2 nd is red, etc. ). Each arrangement must still read the same MISSISSIPPI.

The “shoot the hoops” wager. Google interview You are offered one of the two wagers: (1) You make one hoop in one throw or (2) At least 2 hoops in 3 throws. You get $1000 if you succeed. Which wager would you choose? 18

Warm-up 19 Use our “test the solution” heuristic to immediately see that P(2/3) so calculated is wrong.

The hoops wager. Find P(2/3): 20

The hoops wager. Compare P and P(2/3) 21 Notice: if p ½, 3p 2 – 2p 3 > p, and so, if you are LeBron James you are more likely to get $1000 if you go with the 2 nd wager, 2 out of 3.