Presentation is loading. Please wait.

Presentation is loading. Please wait.

Using a Genetic Algorithm for Approximate String Matching on Genetic Code Carrie Mantsch December 5, 2003.

Similar presentations


Presentation on theme: "Using a Genetic Algorithm for Approximate String Matching on Genetic Code Carrie Mantsch December 5, 2003."— Presentation transcript:

1 Using a Genetic Algorithm for Approximate String Matching on Genetic Code Carrie Mantsch December 5, 2003

2 Outline Problem Statement Current Techniques GA Motivation My Algorithm Results Extension Possibilities

3 Problem Statement The problem is to search and align strands of DNA using a genetic algorithm.

4 Current Techniques Approximate string matching –Usually meant for smaller strings –Many are set up for k mismatches 2 DNA strands of size 90 and 85 –Allowing for 5 gaps in the second strand gives almost 44 million possible alignments

5 Current Techniques (cont.) Needleman-Wunsch –Gap penalty -1 –Match bonus +1 –Mismatch 0 Not practical if the sequence starts in the middle – Counts the gaps at the beginning and end as penalties.

6 Current Techniques (cont.) BLAST (Basic Local Alignment Search Tool) and FASTA –Use domain specific knowledge http://www.ncbi.nlm.nih.gov/BLAST http://fasta.bioch.virginia.edu

7 GA Motivation Alien DNA Junk DNA Extendable to similar text searches without domain specific knowledge

8 My Algorithm The population –Bit strings of 0’s and 1’s –0’s are spaces, 1’s mean a letter is placed there –The number of 1’s stays constant as the number of letters in the smaller search string

9 My Algorithm (cont.) Breeding –Rank based selection Crossover –The common place markers are kept the same –The rest of the place markers are split evenly between the two children

10 My Algorithm (cont.) Mutation –If the amount of gaps is less than one tenth of the small string size add a gap –Otherwise delete a gap

11 Results The target match

12 Results (cont.) Ran for 50 generations Different random numbers for the same number of generations give best fitness values between about 32 and 67 (optimal fitness - 90)

13 Extension Possibilities Better representation of population Be able to alter fitness evaluation to be more specific to different problems Ability to add domain specific knowledge Parallel searching

14 Questions?


Download ppt "Using a Genetic Algorithm for Approximate String Matching on Genetic Code Carrie Mantsch December 5, 2003."

Similar presentations


Ads by Google