By Jyh-haw Yeh Department of Computer Science Boise State University.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

1 CIS 5371 Cryptography 5b. Pseudorandom Objects in Practice Block Ciphers.
Stream ciphers 2 Session 2. Contents PN generators with LFSRs Statistical testing of PN generator sequences Cryptanalysis of stream ciphers 2/75.
Rachana Y. Patil 1 Data Encryption Standard (DES) (DES)
Data Encryption Standard (DES)
Session 5 Hash functions and digital signatures. Contents Hash functions – Definition – Requirements – Construction – Security – Applications 2/44.
Chapter18 Determining and Interpreting Associations Among Variables.
Simulation Modeling and Analysis
The Simple Regression Model
Chap 9-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 9 Estimation: Additional Topics Statistics for Business and Economics.
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
Experimental Evaluation
Today Concepts underlying inferential statistics
5-3 Inference on the Means of Two Populations, Variances Unknown
Decryption Algorithms Characterization Project ECE 526 spring 2007 Ravimohan Boggula,Rajesh reddy Bandala Southern Illinois University Carbondale.
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Chapter 14 Inferential Data Analysis
Relationships Among Variables
Statistics Idiots Guide! Dr. Hamda Qotba, B.Med.Sc, M.D, ABCM.
1/49 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 9 Estimation: Additional Topics.
ECE454/CS594 Computer and Network Security Dr. Jinyuan (Stella) Sun Dept. of Electrical Engineering and Computer Science University of Tennessee Fall 2011.
1 Chapter 20 Two Categorical Variables: The Chi-Square Test.
Hypothesis Testing and T-Tests. Hypothesis Tests Related to Differences Copyright © 2009 Pearson Education, Inc. Chapter Tests of Differences One.
© 2004 Prentice-Hall, Inc.Chap 12-1 Basic Business Statistics (9 th Edition) Chapter 12 Tests for Two or More Samples with Categorical Data.
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
Hypothesis Testing:.
Hypothesis Testing.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Confusion and Diffusion1 Ref: William Stallings, Cryptography and Network Security, 3rd Edition, Prentice Hall, 2003.
NONPARAMETRIC STATISTICS
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Basic Probability (Chapter 2, W.J.Decoursey, 2003) Objectives: -Define probability and its relationship to relative frequency of an event. -Learn the basic.
1 Statistical Distribution Fitting Dr. Jason Merrick.
Tests for Random Numbers Dr. Akram Ibrahim Aly Lecture (9)
CS433: Modeling and Simulation Dr. Anis Koubâa Al-Imam Mohammad bin Saud University 15 October 2010 Lecture 05: Statistical Analysis Tools.
Ch9. Inferences Concerning Proportions. Outline Estimation of Proportions Hypothesis concerning one Proportion Hypothesis concerning several proportions.
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license.
Chapter 16 The Chi-Square Statistic
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
Introduction to Inferential Statistics Statistical analyses are initially divided into: Descriptive Statistics or Inferential Statistics. Descriptive Statistics.
Essential Question:  How do scientists use statistical analyses to draw meaningful conclusions from experimental results?
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Statistical Testing of Differences CHAPTER fifteen.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 13 - ANOVA. ANOVA Be able to explain in general terms and using an example what a one-way ANOVA is (370). Know the purpose of the one-way ANOVA.
Chapter Seventeen. Figure 17.1 Relationship of Hypothesis Testing Related to Differences to the Previous Chapter and the Marketing Research Process Focus.
Inferential Statistics. The Logic of Inferential Statistics Makes inferences about a population from a sample Makes inferences about a population from.
Lecture 23 Symmetric Encryption
Three Broad Purposes of Quantitative Research 1. Description 2. Theory Testing 3. Theory Generation.
Sampling and estimation Petter Mostad
NON-PARAMETRIC STATISTICS
Inferential Statistics Introduction. If both variables are categorical, build tables... Convention: Each value of the independent (causal) variable has.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Bullied as a child? Are you tall or short? 6’ 4” 5’ 10” 4’ 2’ 4”
Chapter Eleven Performing the One-Sample t-Test and Testing Correlation.
Significance Tests for Regression Analysis. A. Testing the Significance of Regression Models The first important significance test is for the regression.
Block Ciphers and the Data Encryption Standard. Modern Block Ciphers  One of the most widely used types of cryptographic algorithms  Used in symmetric.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
Building Valid, Credible & Appropriately Detailed Simulation Models
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
@Yuan Xue CS 285 Network Security Block Cipher Principle Fall 2012 Yuan Xue.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
INTRODUCTION TO MULTIPLE REGRESSION MULTIPLE REGRESSION MODEL 11.2 MULTIPLE COEFFICIENT OF DETERMINATION 11.3 MODEL ASSUMPTIONS 11.4 TEST OF SIGNIFICANCE.
Chapter 4 Basic Estimation Techniques
I. Statistical Tests: Why do we use them? What do they involve?
Presentation transcript:

By Jyh-haw Yeh Department of Computer Science Boise State University

 Measuring the correlation between inputs and outputs of complicated functions.  The model was designed for measuring cryptographic algorithms.  Other possible applications: Environmental factors V.S. gene mutation Dependable variables V.S. nature change such as climate, land surface, see level, etc

 Use neural networks to learn the relationship between a set of inputs and it’s corresponding set of outputs.  Predict outputs from other N sets of inputs.  Compare predictions and real outputs, and then generate N chi-square statistics, one for each set of data.

 From these N statistics, some quantitative measurements can be formulated.  These measurements indicate how much those tested inputs related to the known outputs.

 Cryptographic algorithms: For each algorithm, the model generates measurements. The measurements indicate how random the algorithm is. An algorithm is more secure if it is more random. Through this model, security strength among different algorithms can be quantitatively compared.

 Nature changes: Scientist recorded nature change (independent variable) over a period of time T - outputs in our model. Over the same time period T, they also recorded the changes of several other factors (dependent variables ), which may cause the nature change – inputs in our model. Our model evaluates which factor is more related to the nature change.

 Gene mutation: Outputs to our model: recorded mutation over a time period T. Inputs to our model: recorded environmental factors in the same T – temperature, humidity, … Our model evaluates which factor may be more related to gene mutation.

 Raw data generation: A data set: M, say 1,000k pairs of plain(text)s and cipher(text)s. For each algorithm, generate N, say 101, data sets. One data set (training set) for training the networks. The other 100 data sets (testing sets) for testing the networks.

 Network training: use the training set to train the network.  Network testing: use each testing set to test the networks. For each testing set, there are 1,000k predictions of ciphers.  Observed data generation: 1,000k hamming distances (HDs) are produced, from 1,000k of (predictions, real ciphers). If the algorithm is truly random, the distribution of these HDs is binomial.

 Chi-square analysis: apply chi-square analysis to these 1,000 HDs, and generate a statistic V. N=1,000k Ni : the # of HDs with value i. Pi : the probability of a HD with value i, for a truly random algorithm. d : degree of freedom (or block size).

 Chi-square analysis: A critical statistic value CV can be calculated, based on a pre-picked significance level α. If V > CV, this analysis is considered failed, i.e., the data set being tested is statistical non-random, or the algorithm is considered non-random based on the tested data set.

 More chi-square analyses: Random/non-random decided by one data set and one chi-square analysis – risky. 100 or more data sets. For each data set, perform many chi-square analyses, one for each bit, each 2-bit, each 4- bit, … the whole block. (power of 2) Let be the set of portion sizes used for chi-square analysis. For a128-bit algorithm, there are totally 25,500 chi-square analyses.

 Generate quantitative measurements: after testing 100 testing sets, there are 25,500 statistics are produced. : the statistics for the j-th d-bit analysis in i-th data set. : the critical statistics for a d-bit analysis. : the failure weight for a d-bit analysis. For example, set

: the failure frequency of d-bit analyses in the i-th data set. : estimated failure rate for the i-th data set. Estimated Failure Rate: represents the expected failure percentage for a data set generated from the algorithm.

Estimated Failure Variance : estimates how bad each (failed) non- random data set is. That is, those tested non-random data sets, whose chi-square statistics is about times than critical statistics.

 Both EFR and EFV are not absolute, but relative quantities.  Used to measure relative security strength among algorithms.  In general, smaller values of EFR and EFV, the algorithm is more random.

 The measuring methodology described, called ANN test (using Artificial Neural Networks).  For comparison, two other measuring methodologies Avalanche test and plain- cipher test were also performed.  The observed data set for each test: Avalanche: the hamming distance between two ciphertexts, where their plaintexts differ by one bit. Plain-cipher: the hamming distance between the plaintext and it’s ciphertext.

 Have measured AES, MD5, and DES, each with 100 ANN tests, 100 avalanche tests and 100 plain-cipher tests.  Comparing AES and MD5, the portion sizes to be chi-square analyzed are S={1,2,4,…,128}. Thus, 255 chi-analyses in each test.  Comparing all three algorithms, S={1,2,4,…64} since the block size of DES is 64. Thus, 127 chi-square analyses in each test.

ANNavalancheplain-cipher MD5AESMD5AESMD5AES EFR12.98%11.91%11.31%10.88%10.48%10.61% EFV ANNAvalancheplain-cipher DESMD5AESDESMD5AESDESMD5AES EFR 12.95%11.92%10.94%12.22%10.20%9.91%6.44%9.58%9.38% EFV

 A hypothesis: ANN test is more effective on identifying security weakness – need more measuring methodologies to solidify.  What is a good ANN architecture? What is appropriate parameter setting for ANN training process?  A single ANN or multiple ANNs to simulate the encryption mapping?  In ANN test, what is a good prediction logic?  In addition to hamming distance, other way to generate observed data? Cumulative sum, approximate entropy?

 To avoid over- or under-counting the non- randomness, how many different portions within a block to be analyzed in a test?  In addition to EFR and EFV, other meaningful quantitative measurements?  Comparison strategy if conflicting indications among quantitative measurements.  Fair comparison method for algorithms with different block sizes.

 Data from other applications may not be binary.  Unlike cryptographic algorithms, other applications may be difficult to gather large amount of data.  The model is not used to predict the future, but for measuring relative correlation among different factors.  Different applications may need to modify the model more or less, and in different ways.