Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bahman Bahmani Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]

Similar presentations


Presentation on theme: "Bahman Bahmani Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]"— Presentation transcript:

1 Bahman Bahmani bahman@stanford.edu

2 Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11] Conclusion 1

3 Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11] Conclusion 2

4 Length of 8 to 20 Both letters and numbers Both lower and upper case letters Non-alphanumeric characters A number between first and last character Not your dogs name … Oh, by the way, change it once a month! 3

5 RuleConsequence Require minimum lengthUse dictionary words, write down passwords Include special charactersE 3, a @,… No simple character replacements# {lb, hash}, ^ {hat, top},... 4

6 5

7 Statistical guessing attacks 6

8 Popularity oracle: Map passwords to counts If password popular, prompt user to change it Can limit attack to 0.0001% rather than 0.22% (MySpace) or 0.9% (RockYou) 7

9 Allows no salting If compromised, attack is optimized! 8

10 Keep counts without keeping passwords Quick updates Quick queries 9

11 00...000 00 000 00 000.............................. d w 10

12 00...000 00 000 00 000.............................. d w 11

13 00...0 1 (=0+1) 0 0 1 (=0+1)... 000 1 (=0+1) 0... 000.............................. d w 12

14 00...0 1 (=0+1) 0 0 1 (=0+1)... 000 1 (=0+1) 0... 000.............................. d w 13

15 00...0 1 (=0+1) 0 0 1 (=0+1)... 000 1 (=0+1) 0... 000.............................. d w 14

16 1 (=0+1) 0...0 1 (=0+1) 0 0 1 (=0+1)... 1 (=0+1) 00... 1 (=0+1) 1 (=0+1)... 000.............................. d w 15

17 1 (=0+1) 0...0 1 (=0+1) 0 0 1 (=0+1)... 1 (=0+1) 00... 1 (=0+1) 1 (=0+1)... 000.............................. d w 16

18 1 (=0+1) 0...0 1 (=0+1) 0 0 1 (=0+1)... 1 (=0+1) 00... 1 (=0+1) 1 (=0+1)... 000.............................. d w 17

19 18

20 2 (=0+1+1) 0...0 1 (=0+1) 0 0 2 (=0+1+1)... 1 (=0+1) 00... 1 (=0+1) 1 (=0+1)... 1 (=0+1) 00.............................. d w 19

21 2 (=0+1+1) 0...0 1 (=0+1) 0 0 2 (=0+1+1)... 1 (=0+1) 00... 1 (=0+1) 1 (=0+1)... 1 (=0+1) 00.............................. d w 20

22 2 (=0+1+1) 0...0 1 (=0+1) 0 0 2 (=0+1+1)... 1 (=0+1) 00... 1 (=0+1) 1 (=0+1)... 1 (=0+1) 00.............................. d w 21

23 2 (=0+1+1) 0...0 2 (=0+1+1) 0 0 3 (=0+1+1+1)... 1 (=0+1) 00... 2 (=0+1+1) 1 (=0+1)... 1 (=0+1) 00.............................. d w 22

24 20...020 03 100 21 100.............................. d w 23

25 20...020 03 100 21 100.............................. d w 24

26 Choosing d,w properly leads to tiny errors in frequencies with very large probability Formally, at most ε error with probability 1-δ: 25

27 With w=270,000 and d=14, error in frequencies less than 10 -5 = 0.00001 with probability 1-10 -6 = 0.999999! 26

28 Guarantee independent of number of passwords Example: Fit (approximate) counts of 100M passwords in less than 4M counters! 27

29 Choose d and w small enough to ensure a minimum false positive rate! Trouble users just a little bit, but confound attackers 28

30 Small memory remember only what matters Quick updates Quick queries Thats the definition of a sketch 29

31 Stream of numbers a 1, a 2, …, a t, … SUM sketch: running sum AVG sketch: (running sum, count) 30

32 Stream of sensory observations Remember only parts of observations Still function properly Everyone is doing it! [Muthukrishnan, 2005] 31

33 Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11] Conclusion 32

34 Is a word used more in a positive or a negative sense? 33

35 34 ***nice*** *myPhone*** myPhone**great* **myPhone** * **excellent** myPhone*** ** bad **** **myPhone ** *myPhone*** **terrible myPhone**good*

36 myPhone and words good, great, nice,... myPhone and words bad, awful, terrible, … 35

37 Statistical machine translation Spelling correction Part-of-speech tagging Paraphrasing Word sense disambiguation Language modeling Speech and character recognition … 36

38 Large corpus of documents Tweet stream Web corpus Vocabulary {w 1,w 2,…,w N } English language: N10 5 Web: N10 9 Goal: For any two words in the vocabulary, compute the number of documents containing both 37

39 38 Example [Goyal et al., 2010]: 78M word corpus of size 577MB 63K unique words 118M unique word pairs, 2GB to only store them

40 39

41 Compute all co-occurrence counts exactly Ref. [Data-Intensive Text Processing with MapReduce, Lin et al.] Problem: Too inefficient 40

42 Use a CM sketch to track the counts of word pairs 41

43 42 00...000 00 000.............................. 00 000 d w

44 How do you shoot a yellow elephant? 43 00...000 00 000.............................. 00 000 d w (shoot, yellow)

45 How do you shoot a yellow elephant? 44 01...000 00 100.............................. 10 000 d w (shoot, yellow) (shoot, elephant)

46 How do you shoot a yellow elephant? 45 01...100 01 100.............................. 20 000 d w (shoot, yellow) (shoot, elephant) (yellow, elephant)

47 How do you shoot a yellow elephant? 46 02...100 01 101.............................. 20 100 d w (shoot, yellow) (shoot, elephant) (yellow, elephant)

48 Query the CM sketch with the pairs (myPhone, good) (myPhone, nice) (myPhone, bad) (myPhone, terrible) … 47

49 Does not store the word pairs themselves 30X less space (37GB corpus, almost no error) [Goyal et al., 2010] 48

50 Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11] Conclusion 49

51 50

52 Well known reputation system [Page et al., 1998] Treats each link as an endorsement A node highly reputed if endorsed by many other such nodes 51

53 52 Network edges arrive over time Friendships Social events Maintain an accurate estimate of PageRank of every node after each edge arrival

54 A random surfer traverses the network Teleports to a completely random node with some probability ε (e.g., ε=0.2) at each step Follows a random link otherwise PageRank: stationary distribution of this walk 53

55 54 1 2 34 5 6 7 8 9 10 11

56 55 1 2 34 5 6 7 8 9 10 11

57 56 1 2 34 5 6 7 8 9 10 11

58 57 1 2 34 5 6 7 8 9 10 11

59 58 1 2 34 5 6 7 8 9 10 11

60 59 1 2 34 5 6 7 8 9 10 11

61 60 Power Iteration: Iterative linear algebraic method. Monte Carlo: Simulate the PageRank walk. Use the empirical distribution to approximate PageRank. Neither can be done efficiently on the fly

62 61 Store R random walks starting at each node Whenever a new edge arrives modify only the random walks needing an update New edge (u, v) Only walks passing through u Each with probability 1/degree(u)

63 Node 1Node 2Node 3 1121232122323232 2123211123232211232111232332 311233232321 41111232321111232132323 5112111123212321232321 61232323232123 7121113232121112321 8121232321211123212 91123 10111212111232211121121321121 62 1 3 2

64 Node 1Node 2Node 3 1 13212 2323232 2 132132121232321 32 3 11111 233232321 4132332323 5 113213211321 2 321232323 61232323232123 71 232 3232121112321 8123212111232 9 1323 23 10 1321 2321121 63 1 3 2

65 64 Most edges miss most random walks! Even more pronounced as network grows larger.

66 65

67 66

68 67

69 68

70 As the network grows, the marginal number of operations per update decreases! Theorem: Given random arrivals, if M t is the update work at time t 69

71 Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11] Conclusion 70

72 Different view of big data analysis Nimble and on the fly, compared to bulky and inefficient Direct reduction in data infrastructure costs, both CAPEX and OPEX 71

73 Mathematical guarantees behind rates and sizes of errors If you can not make a decision based on an analytics result, which has less than 0.0001% error with probability 0.99999, then you most likely should not make that decision! 72

74 Lots of applications: Security, Social media analytics, Recommendation systems, Sensor networks, Intelligent mobile applications The math and algorithms are there Needed: Technologists: build systems with sketching techniques Entrepreneurs: build products with these techniques Big business leaders: learn about, adopt, and benefit from these techniques 73

75 Get in touch: Office Hour, 2:20pm bahman@stanford.edu 74

76 Slide 4: http://www.the-games-blog.com/and-the-cat-and-mouse-game-continues/http://www.the-games-blog.com/and-the-cat-and-mouse-game-continues/ Slide 6: http://www.security-faqs.com/what-exactly-is-a-dictionary-attack.htmlhttp://www.security-faqs.com/what-exactly-is-a-dictionary-attack.html Slide 7: http://krepon.armscontrolwonk.com/archive/3182/forecasting-proliferation/crystalball-2http://krepon.armscontrolwonk.com/archive/3182/forecasting-proliferation/crystalball-2 Slide 8: http://www.hdwallpaperspics.com/crystal-ball-wallpapers.htmlhttp://www.hdwallpaperspics.com/crystal-ball-wallpapers.html Slide 9,27, 41, 48: http://lissarankin.com/do-you-expect-people-to-read-your-mindhttp://lissarankin.com/do-you-expect-people-to-read-your-mind Slide 18: http://ouroregon.org/category/content-authors/alina-harway?page=2http://ouroregon.org/category/content-authors/alina-harway?page=2 Slide 31: http://sciencesoup.tumblr.com/post/39608896216/learning-foreign-languages-triggers- brainhttp://sciencesoup.tumblr.com/post/39608896216/learning-foreign-languages-triggers- brain Slide 33: http://livingqlikview.blogspot.com/2012/03/my-sentiments-on-sentiment-analysis.htmlhttp://livingqlikview.blogspot.com/2012/03/my-sentiments-on-sentiment-analysis.html Slide 34: http://www.presentermedia.com/index.php?target=closeup&maincat=clipart&id=2221http://www.presentermedia.com/index.php?target=closeup&maincat=clipart&id=2221 Slide 40: http://www.clker.com/clipart-yellow-elephant.htmlhttp://www.clker.com/clipart-yellow-elephant.html Slide 51: http://en.wikipedia.org/wiki/PageRankhttp://en.wikipedia.org/wiki/PageRank 75


Download ppt "Bahman Bahmani Password Security [Schechter et al. 10] Semantic Analytics [Goyal et al. 11] Reputation Systems [Bahmani et al. 11]"

Similar presentations


Ads by Google