Why and How should we experiment in computer science?

Why and How should we experiment in computer science?
Gu Mingyang Dept. Computer Science NTNU

The Three Topics Should Computer Scientists Experiment More? – Why do we experiment? Experimental Models for Validating Technology – Validation models Case Studies for Method and Tool Evaluation – How to use Case-study?

Why Do We Experiment _ summary
Computer scientists and practitioners defend their lack of experimentation with a wide range of arguments. In this article, we will discuss several such arguments to illustrate the importance of experimentation for computer science.

Why Do We Experiment _Is CS an engineering or not
CS is not a science, but a synthetic, an engineering discipline (computer) In an engineering field, testing theories by experiments would be misplaced The primary subjects of CS are not merely computers, but information structures and information processes Like nervous systems, immune systems, genetic processes and so on

Why Do We Experiment _the purpose of experiment
Test theory Experiment can prove that a theory has bugs Example: the failure probabilities of a multi-version program A community gradually accepts a theory if All known facts within its domain can be deduced from the theory It has withstood numerous experimental tests It correctly predicts new phenomena

Why Do We Experiment _the purpose of experiment
Exploration Probe the influence of assumptions Eliminate alternative explanations of phenomena Unearth new phenomena

Why Do We Experiment _the arguments against experiment
Traditional scientific method is not applicable There are plenty of CS theories that have not been tested: functional programming, o-o programming, formal software development processes and so on As other scientific fields, CS should test and explore them iteratively to validate them, to formulate new theories

The current level of experimentation is good enough Some surveys hold by the author and other people proved it is not like so 40 or 50 percents of all the papers with claims needing empirical support had none at all, but the rates in other fields are much smaller than in CS. The data suggests that CS publish a lot of untested ideas, we should try to improve it.

Experiment cost too much It is true that experiment clearly requires more resources than theory does, but it is worth spending so much resources. To answer question is the aim of science: the test of general relativity which is not a big waste. We will waste much more resources if we accept a wrong theory without validation through experiment: C->C++, OO The software industry is beginning to value experiments, because results may give a company a three to five years lead over the competition

Demonstrations will suffice Demonstration critically depends on the observers’ imagination and their willingness to extrapolate, and they can not produce solid evidence. To obtain such solid evidence, we need careful analysis involving experiments, data, and replication. Such as, SE methods evaluations, testing the algorithms behavior, testing of relative merits of parallel systems and so on

There is too much noise in the way An effective way to simplify repeated experiments is by benchmarking. A benchmark provides a fair playing field for competing ideas, and allows repeatable and objective comparisons We can borrow the methods which are used in medicine and psychology to deal with the human subjects in experiment Such as control groups, random assignments, placebos and so on

Progress will slow If everything must be experimentally supported, publication progress will slow In fact, the papers with meaningful validation can be accepted easier than questionable ideas, so not slowing but accelerating On the other hand, papers with good concepts or formulating new hypotheses can publish first, and experimental testing come later

Technology changes too fast “The changing in CS is so fast that by the time results are confirmed they may no longer be of any relevance.” Behind many questions with a short lifetime lurks a fundamental problem with a long lifetime, for example, behind so many SE methods, there is a fundamental question that what are the features of Software Development. Scientists should anticipate changes in assumptions and proactively employ experiments to explore the consequences of such changes

You will never get it published Papers about experiment are difficult to published? If fact, the author’s experience told us that publishing experimental results is not difficult if one choose the right outlet. Not just building systems, the experimenters should try to find some thing new and contribute to current knowledge from the concepts and phenomena underlying such experiments

Why substitutes won’t work Traditional paper type The work describes a new idea, prototyped perhaps in a small system – this type fits for a radically new idea The work claims its place in science by making features comparison Scientists should create models, formulate hypotheses, and test them using experiments

Trust your intuition For example, meetings were essential for software reviews Trust the experts It is a good system to check results carefully and to accept them until they have been independently confirmed Problems do exist It is true that problems exist in the field of SC experiment, but we should not discard it because of that

Competing theories A prerequisite for competition among theories is falsifiability. The lack of observation and experiment may cause the SC to have difficulties to discovery new and interesting phenomena worthy of better theories. Unbiased results A list of merits can lead managers or funding agencies to make a decision no matter whether it is right or not It is very dangerous

Validation models _summary
To determine whether a particular technique is effective, we need refined experimentation to measure it. This article provides us some validation models, and tells us how to choose a validation model and evaluate such models

Validation models _Introduction of experiment
Classification: Scientific, Engineering, Empirical, Analytical Some aspects in data collection Replication, local control Some other aspect about SE Influence of the experiment design(active or passive), temporal properties(data: historical or current)

Validation models _Validation Models
In the article, author list 12 validation models classified to 3 categories: Observational: little control Collecting relevant data as a project develops Historical: no control Collecting data from projects that have already been completed Controlled: most control Providing many instances for statistical validation

Validation models _Observational Category
Project monitoring Feature: Lowest level, passive model Shortcoming: difficulty in retrieving information later Case study Feature: data collection is derived from a specific goal, active model, little additional cost Shortcoming: project is relatively unique, the goal of process improvement against that of competing management

Validation models _Observational Category
Assertion Feature: the developer and the experimenter of a technology are same Shortcoming: the experiment is not a real test, but a select Field study Feature: Examining data collected from several projects simultaneously, less intrusive Fitness: measurement of process, products

Validation models _Historical methods
Literature search Feature: least invasive and most passive, collecting data from publications Shortcoming: selection bias(publishing positive results), lack of quantitative data Legacy data Feature: data from source program, specification, design, testing documentation and data collected in the program’s development stages; quantitative data Shortcoming: lack of data about cost, schedule and so on, can not compare between projects

Validation models _Historical methods
Lessons Learned Feature: collecting data from lessons learned documents, fit for improve future developments Shortcoming: lack of concrete data, and such lessons is just for writing Static analysis Feature: collection data from completed product, analyzing the structure to determine its characteristics Shortcoming: the model’s quantitative definition is hard to relate to the attribute of interest

Validation models _ Controlled methods
Replicated experiment Feature: several projects are staffed to perform a task in multiple ways Shortcoming: a lot of cost, subjects are not serious Synthetic environment experiments Feature: performing in smaller artificial setting Shortcoming: it is not fitful to transfer a result generated from a smaller artificial setting to a real environment

Validation models _ Controlled methods
Dynamic analysis Feature: add tools and debugging or testing code to demonstrate the product’s features, can compare between different products Shortcoming: perturbing the product behavior, can not fit for different data set Simulation Feature: using a model of real environment to evaluate a technology Shortcoming: not know how well the synthetic environment models reality

Validation models _Choose model
When we design an experiment, we have to select data collecting type(s) that conform(s) to one or several of our data collection models.

Validation models _Model validation
Too many papers have no experimental validation at all(36%, 29%, 19%) Too many papers use an informal(assertion) form of validation(one third) Researchers use lessons learned and case studies about 19 percent, much smaller in other field(19%) Experimentation terminology is sloppy(no standards) Use of validation methods in 612 published papers

How to use Case study _summary
Case studies can be used to evaluate the benefits of methods and tools, but unlike formal experiment, Case studies do not have a well understood theoretical basis. This article provides guidelines for organizing and analyzing case studies.

How to use Case study _empirical investigation methods
Classification How to design a experimentation Focus on single project: case study Involve many projects or a single type of project: formal experiment or case study Looks at many teams and many projects: formal experiment or survey (planned or not) Single-project Multi-project Single-team Single-Project studies Multi-projects studies Multi-team Replicated-project studies Blocked subject-project studies

How to choose a method (involved factors) Case studies are easier to plan, but hard to interpret and generalize Case studies fit for process improvement of particular organizations The special conditions fit for case studies The process changes are very wide ranging The effects of the change can not be identified immediately

The results of a well-designed experiment can be applied to many types of project. Formal experiments are useful if performing self standing tasks Self-standing tasks can be isolated from the overall product-development process The results of self-standing tasks can be judged immediately The results of self-standing tasks can be isolated and the small differences from variables can be identified

Survey advantages Surveys can be used to ensure that process changes are successful Data collection will take a lot of time, and the results may not be available until many projects complete The most common form is based on questionnaires

How to use Case study _Case study guidelines
Seven guidelines Define the hypothesis Select the pilot projects Identify the method of comparison Minimize the effect of confounding factors Plan the case study Monitor the case study against the plan Analyze and report the results

Define the hypothesis Define the effects you expect the method to have, such as quality, reliability The definition must be detailed enough It is easy to disprove something, so we often define the null hypothesis The more clearly you define your hypotheses, the more likely you are to collect the right measures

Select the pilot projects The pilot project should be representative Using the significant characteristics to identify the project, such as application domain, programming language, design method and so on

Identify the method of comparison Select a sister project with which to compare Compare the results of using the new method against a company baseline If the method applies to individual components, apply it at random to some product components and not to others

Minimize the effect of confounding factors Separate Learning and assessment Not using staff who are either very enthusiastic or very skeptical to the objective method Be careful to compare different application types

Plan the case study The plan identifies all the issues to be addressed, such as the training requirements, the necessary measures, the data-collection procedures and so on The evaluation should have a budget, schedule, staffing plan separate from the actual project

Monitor the case study against the plan The case study’s progress and results should be compared with the plan to ensure the methods or tools are used correctly, and that any factors that would bias the results are recorded.

Analyze and report the results The analysis procedures you follow depend on the number of data items and data characteristics you must analyze. For example, if the treatments assigned at random, you can use standard statistical methods; If you have only one value from each method or tool being evaluated, no analysis techniques are available.

How to use Case study _analysis methods for case studies
Not just compare the data to show the goal of case studies, but the other data about the environment, such as some data about the developing groups’ experience, the representativeness of the case in the research domain and so on

The end of the literature
Thank you!

Questions about why to experiment
How to choose researching methods? Reading-Thinking-Idea-Discussion-Paper Reading-Thinking-Idea-Experiment-Paper Experiment+Reading-Thinking-Idea-Paper As a Pd.D student in SE, how to experiment? Time, paper, experiment, deepness or wideness of research

Question about validation models
How should we choose experiment model? Data collection, goal, cost limitation, type of projects, the number of projects involved and so on As a Ph.D student, which type of experiment should we select? Time, paper, thesis

Why and How should we experiment in computer science?

Similar presentations

Presentation on theme: "Why and How should we experiment in computer science?"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Why and How should we experiment in computer science?

Similar presentations

Presentation on theme: "Why and How should we experiment in computer science?"— Presentation transcript:

Similar presentations

About project

Feedback