Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bahman Bahmani Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]

Similar presentations


Presentation on theme: "Bahman Bahmani Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]"— Presentation transcript:

1 Bahman Bahmani

2 Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11] Conclusion 1

3 Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11] Conclusion 2

4 Length of 8 to 20 Both letters and numbers Both lower and upper case letters Non-alphanumeric characters A number between first and last character Not your dogs name … Oh, by the way, change it once a month! 3

5 RuleConsequence Require minimum lengthUse dictionary words, write down passwords Include special charactersE 3, No simple character replacements# {lb, hash}, ^ {hat, top},... 4

6 5

7 Statistical guessing attacks 6

8 Popularity oracle: Map passwords to counts If password popular, prompt user to change it Can limit attack to % rather than 0.22% (MySpace) or 0.9% (RockYou) 7

9 Allows no salting If compromised, attack is optimized! 8

10 Keep counts without keeping passwords Quick updates Quick queries 9

11 d w 10

12 d w 11

13 (=0+1) (=0+1) (=0+1) d w 12

14 (=0+1) (=0+1) (=0+1) d w 13

15 (=0+1) (=0+1) (=0+1) d w 14

16 1 (=0+1) (=0+1) (=0+1)... 1 (=0+1) (=0+1) 1 (=0+1) d w 15

17 1 (=0+1) (=0+1) (=0+1)... 1 (=0+1) (=0+1) 1 (=0+1) d w 16

18 1 (=0+1) (=0+1) (=0+1)... 1 (=0+1) (=0+1) 1 (=0+1) d w 17

19 18

20 2 (=0+1+1) (=0+1) (=0+1+1)... 1 (=0+1) (=0+1) 1 (=0+1)... 1 (=0+1) d w 19

21 2 (=0+1+1) (=0+1) (=0+1+1)... 1 (=0+1) (=0+1) 1 (=0+1)... 1 (=0+1) d w 20

22 2 (=0+1+1) (=0+1) (=0+1+1)... 1 (=0+1) (=0+1) 1 (=0+1)... 1 (=0+1) d w 21

23 2 (=0+1+1) (=0+1+1) (= )... 1 (=0+1) (=0+1+1) 1 (=0+1)... 1 (=0+1) d w 22

24 d w 23

25 d w 24

26 Choosing d,w properly leads to tiny errors in frequencies with very large probability Formally, at most ε error with probability 1-δ: 25

27 With w=270,000 and d=14, error in frequencies less than = with probability = ! 26

28 Guarantee independent of number of passwords Example: Fit (approximate) counts of 100M passwords in less than 4M counters! 27

29 Choose d and w small enough to ensure a minimum false positive rate! Trouble users just a little bit, but confound attackers 28

30 Small memory remember only what matters Quick updates Quick queries Thats the definition of a sketch 29

31 Stream of numbers a 1, a 2, …, a t, … SUM sketch: running sum AVG sketch: (running sum, count) 30

32 Stream of sensory observations Remember only parts of observations Still function properly Everyone is doing it! [Muthukrishnan, 2005] 31

33 Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11] Conclusion 32

34 Is a word used more in a positive or a negative sense? 33

35 34 ***nice*** *myPhone*** myPhone**great* **myPhone** * **excellent** myPhone*** ** bad **** **myPhone ** *myPhone*** **terrible myPhone**good*

36 myPhone and words good, great, nice,... myPhone and words bad, awful, terrible, … 35

37 Statistical machine translation Spelling correction Part-of-speech tagging Paraphrasing Word sense disambiguation Language modeling Speech and character recognition … 36

38 Large corpus of documents Tweet stream Web corpus Vocabulary {w 1,w 2,…,w N } English language: N10 5 Web: N10 9 Goal: For any two words in the vocabulary, compute the number of documents containing both 37

39 38 Example [Goyal et al., 2010]: 78M word corpus of size 577MB 63K unique words 118M unique word pairs, 2GB to only store them

40 39

41 Compute all co-occurrence counts exactly Ref. [Data-Intensive Text Processing with MapReduce, Lin et al.] Problem: Too inefficient 40

42 Use a CM sketch to track the counts of word pairs 41

43 d w

44 How do you shoot a yellow elephant? d w (shoot, yellow)

45 How do you shoot a yellow elephant? d w (shoot, yellow) (shoot, elephant)

46 How do you shoot a yellow elephant? d w (shoot, yellow) (shoot, elephant) (yellow, elephant)

47 How do you shoot a yellow elephant? d w (shoot, yellow) (shoot, elephant) (yellow, elephant)

48 Query the CM sketch with the pairs (myPhone, good) (myPhone, nice) (myPhone, bad) (myPhone, terrible) … 47

49 Does not store the word pairs themselves 30X less space (37GB corpus, almost no error) [Goyal et al., 2010] 48

50 Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11] Conclusion 49

51 50

52 Well known reputation system [Page et al., 1998] Treats each link as an endorsement A node highly reputed if endorsed by many other such nodes 51

53 52 Network edges arrive over time Friendships Social events Maintain an accurate estimate of PageRank of every node after each edge arrival

54 A random surfer traverses the network Teleports to a completely random node with some probability ε (e.g., ε=0.2) at each step Follows a random link otherwise PageRank: stationary distribution of this walk 53

55

56

57

58

59

60

61 60 Power Iteration: Iterative linear algebraic method. Monte Carlo: Simulate the PageRank walk. Use the empirical distribution to approximate PageRank. Neither can be done efficiently on the fly

62 61 Store R random walks starting at each node Whenever a new edge arrives modify only the random walks needing an update New edge (u, v) Only walks passing through u Each with probability 1/degree(u)

63 Node 1Node 2Node

64 Node 1Node 2Node

65 64 Most edges miss most random walks! Even more pronounced as network grows larger.

66 65

67 66

68 67

69 68

70 As the network grows, the marginal number of operations per update decreases! Theorem: Given random arrivals, if M t is the update work at time t 69

71 Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11] Conclusion 70

72 Different view of big data analysis Nimble and on the fly, compared to bulky and inefficient Direct reduction in data infrastructure costs, both CAPEX and OPEX 71

73 Mathematical guarantees behind rates and sizes of errors If you can not make a decision based on an analytics result, which has less than % error with probability , then you most likely should not make that decision! 72

74 Lots of applications: Security, Social media analytics, Recommendation systems, Sensor networks, Intelligent mobile applications The math and algorithms are there Needed: Technologists: build systems with sketching techniques Entrepreneurs: build products with these techniques Big business leaders: learn about, adopt, and benefit from these techniques 73

75 Get in touch: Office Hour, 2:20pm 74

76 Slide 4: Slide 6: Slide 7: Slide 8: Slide 9,27, 41, 48: Slide 18: Slide 31: brainhttp://sciencesoup.tumblr.com/post/ /learning-foreign-languages-triggers- brain Slide 33: Slide 34: Slide 40: Slide 51: 75


Download ppt "Bahman Bahmani Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]"

Similar presentations


Ads by Google