A Music Search Engine for Plagiarism Detection

Slides:



Advertisements
Similar presentations
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Advertisements

Whole genome alignments Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Gapped BLAST and PSI-BLAST Altschul et al Presenter: 張耿豪 莊凱翔.
BLAST Sequence alignment, E-value & Extreme value distribution.
FINGER PRINTING BASED AUDIO RETRIEVAL Query by example Content retrieval Srinija Vallabhaneni.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Space/Time Tradeoff and Heuristic Approaches in Pairwise Alignment.
Expect value Expect value (E-value) Expected number of hits, of equivalent or better score, found by random chance in a database of the size.
MANISHA VERMA, VASUDEVA VARMA PATENT SEARCH USING IPC CLASSIFICATION VECTORS.
©CMBI 2007 Search tools Google, MRS, (SRS). ©CMBI 2007 Search tools Google= Thé best generic search and retrieval system MRS= Maarten’s Retrieval System.
Index-based search of single sequences Omkar Mate CS 374 Stanford University.
©CMBI 2005 Search tools Google, MRS, SRS. ©CMBI 2004 Search tools SRS = Sequence Retrieval System MRS = Maarten’s Retrieval System Google = Thé best generic.
Similar Sequence Similar Function Charles Yan Spring 2006.
The Chinese University of Hong Kong Department of Computer Science and Engineering Lyu0202 Advanced Audio Information Retrieval System.
Introduction to computational genomics – hands on course Gene expression (Gasch et al) Unit 1: Mapper Unit 2: Aggregator and peak finder Solexa MNase Reads.
Sequence alignment, E-value & Extreme value distribution
Overview of Search Engines
Whole genome alignments Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
L. Padmasree Vamshi Ambati J. Anand Chandulal J. Anand Chandulal M. Sreenivasa Rao M. Sreenivasa Rao Signature Based Duplicate Detection in Digital Libraries.
BLAT – The B LAST- L ike A lignment T ool Kent, W.J. Genome Res : Presenter: 巨彥霖 田知本.
© Wiley Publishing All Rights Reserved.
Effect of gap penalty on Local Alignment Score:Score: 161 at (seq1)[2..36] : (seq2)[53..90] 2 ASTV----TSCLEPTEVFMDLWPEDHSNWQELSPLEPSD || | | |||||||||||||||||||||||||||
Biostatistics-Lecture 15 High-throughput sequencing and sequence alignment Ruibin Xi Peking University School of Mathematical Sciences.
Gapped BLAST and PSI-BLAST : a new generation of protein database search programs Team2 邱冠儒 黃尹柔 田耕豪 蕭逸嫻 謝朝茂 莊閔傑 2014/05/12 1.
Gapped BLAST and PSI- BLAST: a new generation of protein database search programs By Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui.
BLAST: A Case Study Lecture 25. BLAST: Introduction The Basic Local Alignment Search Tool, BLAST, is a fast approach to finding similar strings of characters.
SISAP’08 – Approximate Similarity Search in Genomic Sequence Databases using Landmark-Guided Embedding Ahmet Sacan and I. Hakki Toroslu
Computational Biology, Part 9 Efficient database searching methods Robert F. Murphy Copyright  1996, 1999, All rights reserved.
Hugh E. Williams and Justin Zobel IEEE Transactions on knowledge and data engineering Vol. 14, No. 1, January/February 2002 Presented by Jitimon Keinduangjun.
Database Searches BLAST. Basic Local Alignment Search Tool –Altschul, Gish, Miller, Myers, Lipman, J. Mol. Biol. 215 (1990) –Altschul, Madden, Schaffer,
Hash Algorithm and SSAHA Implementations Zemin Ning Production Software Group Informatics.
BLAST Anders Gorm Pedersen & Rasmus Wernersson. Database searching Using pairwise alignments to search databases for similar sequences Database Query.
CISC667, F05, Lec9, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Sequence Database search Heuristic algorithms –FASTA –BLAST –PSI-BLAST.
BLAST Basic Local Alignment Search Tool (Altschul et al. 1990)
A Tutorial of Sequence Matching in Oracle Haifeng Ji* and Gang Qian** * Oklahoma City Community College ** University of Central Oklahoma.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
BLAST, which stands for basic local alignment search tool, is a heuristic algorithm that is used to find similar sequences of amino acids or nucleotides.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
MMDB-8 J. Teuhola Audio databases About digital audio: Advent of digital audio CD in Order of magnitude improvement in overall sound quality.
Constructing Probability Matrices Redux Suppose we live in a world with only 3 amino acids: Alanine Leucine Serine Furthermore suppose: Alanine Leucine.
BLAST, which stands for basic local alignment search tool, is a heuristic algorithm that is used to find similar sequences of amino acids or nucleotides.
Doug Raiford Phage class: introduction to sequence databases.
1 Automatic Music Style Recognition Arturo Camacho.
Step 3: Tools Database Searching
Practice -- BLAST search in your own computer 1.Download data file from the course web page, or Ensemble. Save in the blast\dbs folder. 2.Start a CMD window,
Genome Revolution: COMPSCI 004G 8.1 BLAST l What is BLAST? What is it good for?  Basic.
CS 6293 AT: Current Bioinformatics HW2 Papers 1
Biocomputational Languages December 1, 2011 Greg Antell & Khoa Nguyen.
Introduction to Programming
Text Based Information Retrieval
Blast Basic Local Alignment Search Tool
BLAST Anders Gorm Pedersen & Rasmus Wernersson.
Introduction to Music Information Retrieval (MIR)
Hash functions Open addressing
Department of Computer Science
Sequence comparison: Significance of similarity scores
Local alignment and BLAST
Fast Sequence Alignments
Searching EIT, Author Gay Robertson, 2017.
Searching Similar Segments over Textual Event Sequences
EXTENDING GENE ANNOTATION WITH GENE EXPRESSION
BIOINFORMATICS Fast Alignment
Constructing Probability Matrices
Sequence comparison: Significance of similarity scores
Self-organizing Tuple Reconstruction in Column-stores
Basic Local Alignment Search Tool
Sequence alignment, E-value & Extreme value distribution
CSE 5290: Algorithms for Bioinformatics Fall 2009
Presentation transcript:

A Music Search Engine for Plagiarism Detection Yicong Tao Jinglin Wang

Motivation Detect music plagiarism in huge music dataset. The Lakh MIDI dataset (associated with Million song dataset ) Help people capture great melody suddenly crossing their mind. Search for songs with similar melody

Observation about Music Plagiarism Similar pattern appear in small sections of the whole song There are three types of plagiarism if the similar patterns are reproduced with different instruments —Sampling Plagiarism (1) with different speed —Rhythm Plagiarism (2) with tones elevated or decreased —Melody Plagiarism (3) Our focus: Melody Plagiarism Since Rhythm Plagiarism (2) can be ignored if we regularize the note speed, and Sampling plagiarism (1) is a direct combination of Rhythm Plagiarism (2) and Melody Plagiarism (3).

Algorithm — BLAST We mainly adapt an algorithm called Basic Local Alignment Search Tool (BLAST) BLAST is originally designed for rapid gene and protein sequence comparison in human genome. The algorithm supposes the query sequence has some segments that can perfectly match indexes in the database. It then treats these hits as the starting points and tries to extend from both sides, until the match score drops below a cut-off limit.

Source: http://bioinformatica.upf.edu/P13_2011/

Design The database should be incremental and can add new entries without rebuilding Index and music should be efficiently stored and can be quickly retrieved Index structure: file name (index key, i.e. the ‘window’), file content (a list of (MIDI_ID, position) tuples) The ‘window’ is generated by converting notes to binary representation and converting into an integer Midi file structure: file name (MIDI_ID), file content (a map, {notes: note sequence, title: song_name, and other metadata…}) The index and music database is randomly distributed among N index servers and M music servers

Standard Note Encoding MIDI note number range: 0~127 MIDI octave size: 12

Query Score Calculation Perfect matching: the query note number is exactly the same as the MIDI note number. Highest weight is given for this category. "Standard note" matching: the query note number is not the same as the MIDI note number, but their difference is divisible by an octave (which is 12). A moderate weight is given for this category Mismatching: any cases other than perfect matching or "standard note" matching is mismatching. Apenalty is given for this category.

Demonstration Just do it!