Download presentation

Presentation is loading. Please wait.

Published byAngelina Crawford Modified over 2 years ago

1
Shailendra Mishra Director (CEP)

2
Case Study – Collusion detection in multi-player Games Consider the following problem: (A) Commits an identity theft. (A) Acquires (n) credit cards as a result of the identity theft. (A) Goes to an online gaming site uses the credit cards to play online poker with his (s) friends. (A) loses all his money to his (s) friends. The online gaming company has to now pay his friends. Analysis of the problem Assume, for a moment that (A) didnt commit identity theft. (A) is playing a fair game with his friends or otherwise. The results of this game, generate a stream of outcomes of wins and losses by (A) to any no. of his friends where 1<= i <= s. The problem is to detect whether the pattern of wins and losses are genuine or not. More formally, we are asking: When is a certain number of a particular subsequence unlikely to be fortuitous.

3
Modeling the Collusion Detection Problem Let T be an ordered sequence of events. Let W be the window observation of size w within which the analysis is confined. Formally consider an alphabet of cardinality | |. Consider an event sequence T = t 1, t 2,…t n of length n over. We then define an episode over as follows: Single pattern S = s 1 s 2 s 3 …s m of length m Set of patterns S 1,S 2,…,S d. Set of all distinct permutations of S where ordering within window of observation doesnt matter.

4
Formal Statement of the Problem Assume event sequence is generated by a memory less Bernoulli or Markov source. Lets restate our problem formally – we are interested in finding (n, w, m) that represents the number of windows containing atleast one occurrence of S, when sliding the window n events over T. To address this: Compute the Expected value (n, w, m). Compute Var( (n, w, m). Show that (n, w, m) converges to a normal distribution. Allows us to set a threshold (n, m, w) s.t for a given confidence level that P{( (n, w, m) > (n, w, m))} <. Implies, For (n, w, m) occurrences of such windows, probability that such a number is generated by randomness is highly unlikely.

5
Formulation of equivalent Pattern Matching Problem Given an alphabet Д = {a 1, a 2, …, a | Д | } and a pattern S=s 1 s 2 …s m of length m. Search occurrences of S as subsequence within a window W of size w in another sequence known as the event sequence T = t 1 t 2..t n of length n. A valid occurrence of S in T corresponds to a set of integers i 1, i 2,..,i m such that the following hold: 1 <=i 1 < i 2 < … < i m <= n t i1 = s 1, t i2 = s2, … t im = s m i m – i 1 < w We now estimate (n, w, m, S, Д ) which represents #(windows) that contains atleast one occurrence of S, when sliding window over n consecutive events in event sequence T over alphabet Д.

6
Theorams & Results – Gwadera, Attalah & Szpankowski (Purdue) Consider a memoryless source with p i being the probability of generating symbol ai ε Д. Also, assume P(S) = m i=1 p i Result -1: Probability that a window of size w contains atleast one occurrence of episode S. For all m and w >= m we have: P (w, m) = P(S) w-m i=0 k=0 m n k q k nk where q k = 1-p k Result -2: Let now m be fixed and i j => p i p j, then for any ε > 0: P (w, m) = 1 - P(S) m i=1 (1-p i ) w /p i jI m 1/(p j -p i ) + O(ε w ) where w ->

7
Computation of Bounds Assume a memoryless source, then for x = O(1), we have lim n-> P{ (n, w, m)-E( (n, w, m))/(Var(P( (n, w, m)) < x} = 1/2π - x exp (-t 2 /2) dt for a fixed m and w. Now lets establish the threshold for (n, m, w). First we find an α 0 for a given β s.t β = α 0 exp (-t 2 /2) dt = P {N(0, 1) > α 0 } Where N(0, 1) is the standard normal distribution. We set the threshold: (n, w, m) = E( (n, w, m) + Var( (n, w, m) As long as we are in the region where central limit theoram applies P{ (n, w, m) > (n, w, m) <= β

8
A Q &

9
Shailendra Mishra Director (CEP)

10
SQL Standards update Pattern Matching Proposal – Version 12 of the review draft has been circulated. Participants – Coral8, IBM, Oracle, Streambase. BEA systems also reviewed the draft. Status – 12 th version of the draft is ready and has been circulated. Objective - Submit a working draft to ANSI SQL Discussing a streams language proposal with IBM Participants IBM & ORACLE Status – Exchanged Docs. Regarding language specifications Objective - Submit a working draft to ANSI SQL Discussing convergence language proposal with Streambase Participants IBM & Streambase Status – Discussing convergence proposal for the last 6 months Objective - Submit a paper to Transactions on Databases (TODS)

11
Pattern Query With ONE ROW PER MATCH SELECT a_symbol, a_tstamp, /* start time */, a_price, /* start price */, max_c_tstamp, /* inflection time */, last_c_price, /* low price */, max_f_tstamp, /* end time */, last_c_price, /* end price */, Matchno FROM Ticker MATCH_RECOGNIZE (PARTITION BY Symbol MEASURES A.Symbol AS a_symbol, A.Tstamp AS a_tstamp, A.Price AS a_price, MAX (C.Tstamp) AS max_c_tstamp, LAST (C.Price) AS last_c_price, MAX (F.Tstamp) AS max_f_tstamp MATCH_NUMBER AS matchno ONE ROW PER MATCH AFTER MATCH SKIP PAST LAST ROW MAXIMAL MATCH PATTERN (A B C* D E* F+) DEFINE B AS (B.price < PREV(B.price)), C AS (C.price <= PREV(C.price)), D AS (D.Price > PREV(D.price)), E AS (E.Price >= PREV(E.Price)), F AS (F.Price >= PREV(F.price) AND F.price > A.price))

12
Pattern Query With All ROWs PER MATCH SELECT a_symbol, a_tstamp, /* start time */, a_price, /* start price */, max_c_tstamp, /* inflection time */, last_c_price, /* low price */, max_f_tstamp, /* end time */, last_c_price, /* end price */, Matchno FROM Ticker MATCH_RECOGNIZE (PARTITION BY Symbol MEASURES A.Symbol AS a_symbol, A.Tstamp AS a_tstamp, A.Price AS a_price, MAX (C.Tstamp) OVER () AS max_c_tstamp, LAST (C.Price) OVER () AS last_c_price, MAX (F.Tstamp) OVER () AS max_f_tstamp MATCH_NUMBER AS matchno CLASSIFIER AS classy AFTER ROW PER MATCH AFTER MATCH SKIP PAST LAST ROW MAXIMAL MATCH PATTERN (A B C* D E* F+) DEFINE B AS (B.price < PREV(B.price)), C AS (C.price <= PREV(C.price)), D AS (D.Price > PREV(D.price)), E AS (E.Price >= PREV(E.Price)), F AS (F.Price >= PREV(F.price) AND F.price > A.price))

13
MATCH_RECOGNIZE syntax The full syntax of the MATCH_RECOGNIZE clause is as under: PARTITION BY optional MEASURES - optional, but we expect this will always be used { ONE ROW | ALL ROWS } PER MATCH default to ONE ROW AFTER MATCH SKIP { TO NEXT ROW | PAST LAST ROW | TO | TO LAST | TO FIRST } - default AFTER MATCH SKIP PAST LAST ROW { MAXIMAL | INCREMENTAL } MATCH - defaults to MAXIMAL MATCH PERMUTE – optional PERMUTE EXPAND - optional PATTERN mandatory SUBSET optional DEFINE mandatory CLASSIFIER - optional (ALL ROWS PER MATCH only) MATCH_NUMBER - optional

14
A Q &

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google