Difference between revisions of "CddMeeting01082008"

(Experiment Hypotheses)
(Data to harvest)
 
(17 intermediate revisions by 2 users not shown)
Line 6: Line 6:
 
== Tasks ==
 
== Tasks ==
 
* Add filters and tags for extracted, manual tests and automated tests
 
* Add filters and tags for extracted, manual tests and automated tests
* Fix extraction for tuples -> DONE, but needs testing, there are probably still problems with agents, but it's not certain if related to tuples or extraction (Stefan)
+
* Fix extraction for tuples -> DONE
 
* Look at/fix test case execution for agents (Stefan)
 
* Look at/fix test case execution for agents (Stefan)
* Add non-commited test cases (Stefan)
 
 
* CDD log window in IDE (Arno)
 
* CDD log window in IDE (Arno)
 
* "New manual test case" Button (Arno)
 
* "New manual test case" Button (Arno)
Line 45: Line 44:
 
* Environment variable (or better user preference) for qualifying class names (to avoid svn conflicts)
 
* Environment variable (or better user preference) for qualifying class names (to avoid svn conflicts)
 
* Uniqe id to tag test cases with. To be used in logs. So test logs are resiliant to test class renamings
 
* Uniqe id to tag test cases with. To be used in logs. So test logs are resiliant to test class renamings
 
+
* While extracting test cases, flag objects that are target to a currently executing routine
 
+
* During setup check inv of all objects that are not flaged
  
 
== Software Engineering Project ==
 
== Software Engineering Project ==
Line 59: Line 58:
  
 
== Data to harvest ==
 
== Data to harvest ==
 +
* IDE Time with CDD(extraction) enabled / IDE Time with CDD(extraction) disabled
 
* Test Case Source (just final version, or all versions?)
 
* Test Case Source (just final version, or all versions?)
 
** Use Profiler to get coverage approximation
 
** Use Profiler to get coverage approximation
* TC Meta Data (with timestamps)
+
* TC Meta Data (with timestamps -> Evolution of Test Case)
 
** TC Added/Removed
 
** TC Added/Removed
** TC Outcome
+
** TC Outcome (transitions from FAIL/PASS/UNRESOLVED[bad_communication <-> does_not_compile <-> bad_input])
 
** TC execution time
 
** TC execution time
 +
** Modificiations to a testcase (compiler needs to recompile)
 
* Development Session Data
 
* Development Session Data
 
** IDE Startup
 
** IDE Startup
Line 74: Line 75:
 
==Experiment Hypotheses==
 
==Experiment Hypotheses==
  
===CDD makes development easier/more productive===
+
===Use of CDD increases development productivity===
===CDD makes more correct code===
+
* Did the use of testing decrease development time?
===Profile of students (dev a, deb b style comparison)===
+
* This can be meassured by either looking at
===Given 3 kinds of test. what actually gets used and how effective is it?===
+
** Number of compilations
 +
** Number of saves
 +
** Number of revisions
 +
** IDE time
 +
** Asking the students
 +
 
 +
None of the above strikes me as particualry reliable though. Also, it is easy to develop quickly if you do a bad job.
 +
In order to compare apples to apples we must be careful to compare projects with a similar correcntess and completeness. We could use an external test suite to assess correctness, or the grade of the students.
 +
 
 +
 
 +
===Use of CDD increases code correctness===
 +
* Is there a relation between code correctness of project (vs. some system level test suite) and test activity?
 +
 
 +
Measures for test activity:
 +
* number of tests
 +
* number of times test were run
 +
* Number of pass/fail, fail/pass transitions
 +
 
 +
===Developer Profile===
 +
* How did students use the testing tools.
 +
* Are ther clusters of similar use?
 +
* What is charactersitic for these clusters?
 +
* Meassures:
 +
** Aksing students before and after
 +
** Are there projects where tests initially always fail resp. pass
 +
** How often do they test?
 +
** How correct is their project?
 +
 
 +
I am not completely sure yet what to assess here.
 +
 
 +
===How do extracted, synthesized and manually written test cases compare?===
 +
* Which tests are the most useful to students?
 +
* How many tests are there in each category?
 +
* What's the test suite quality of each category?
 +
* Were some excluded from testing more often than others?
 +
* How many red/green and green/red transitions are there in each category?
 +
* Which had compile-time errors most often that did not get fixed?

Latest revision as of 02:20, 11 January 2008

CDD Meeting, Tuesday, 8.1.2008, 10:00

Next Meeting

  • Friday, 11.1.2008, 10:00

Tasks

  • Add filters and tags for extracted, manual tests and automated tests
  • Fix extraction for tuples -> DONE
  • Look at/fix test case execution for agents (Stefan)
  • CDD log window in IDE (Arno)
  • "New manual test case" Button (Arno)
  • Better Icons for GUI (Arno)
  • Status / Progress bar (Arno)
  • Port to 6.1 (?, probably only after Beta 1)
  • Manual re-run to find true prestate (Jocelyn, Stefan)
  • Logging (Stefan)
    • What data to log?
    • Implement storing
    • Define how students should submit logs
  • Data Gathering (Stefan)
    • Define what data to gather
    • Define how to process gather data
  • Forumulate Experiment Hypothesis (Andreas)
  • Define Project for SoftEng (Manu)
    • Find System level test suite for us to test students code
    • Find project with pure functional part
  • "Execute visible test cases only" Button (?)
  • Restore open nodes and selection after grid update (Arno)
    • Maybe better/easier solved via incremental updates from tree
  • Automate CDD System level tests (Stefan)
  • Install CDD in student labs (Manu)
  • Pause test execution and compilation during regular compilation and execution (Arno)
  • Add most important convenience routine to CDD_TEST_CASE (Stefan)
  • Add failure context window (Arno)
    • Maybe also additional information such as previous outcomes?
  • Check why Gobo slows down compilation of project not using gobo when melting (performance issue for compiling interpreter)
  • Fix AutoTest for courses
    • Integrate AUT_TEST_CASE into CDD_TEST_CASE hierarchy
    • Variable declaration for failing test cases
    • New release
  • Move logs below cdd_tests
  • Environment variable (or better user preference) for qualifying class names (to avoid svn conflicts)
  • Uniqe id to tag test cases with. To be used in logs. So test logs are resiliant to test class renamings
  • While extracting test cases, flag objects that are target to a currently executing routine
  • During setup check inv of all objects that are not flaged

Software Engineering Project

  • One large project, but divided into testable subcomponents
  • Students required to write test cases
  • Fixed API to make things uniformly testable
  • Public/Secret test cases (similar to Zeller course)
  • Competitions:
    • Group A test cases applied to Group A project
    • Group A test cases applied to Groupt B project

Data to harvest

  • IDE Time with CDD(extraction) enabled / IDE Time with CDD(extraction) disabled
  • Test Case Source (just final version, or all versions?)
    • Use Profiler to get coverage approximation
  • TC Meta Data (with timestamps -> Evolution of Test Case)
    • TC Added/Removed
    • TC Outcome (transitions from FAIL/PASS/UNRESOLVED[bad_communication <-> does_not_compile <-> bad_input])
    • TC execution time
    • Modificiations to a testcase (compiler needs to recompile)
  • Development Session Data
    • IDE Startup
    • File save
  • Questionnairs
    • Initial
    • Final

Experiment Hypotheses

Use of CDD increases development productivity

  • Did the use of testing decrease development time?
  • This can be meassured by either looking at
    • Number of compilations
    • Number of saves
    • Number of revisions
    • IDE time
    • Asking the students

None of the above strikes me as particualry reliable though. Also, it is easy to develop quickly if you do a bad job. In order to compare apples to apples we must be careful to compare projects with a similar correcntess and completeness. We could use an external test suite to assess correctness, or the grade of the students.


Use of CDD increases code correctness

  • Is there a relation between code correctness of project (vs. some system level test suite) and test activity?

Measures for test activity:

  • number of tests
  • number of times test were run
  • Number of pass/fail, fail/pass transitions

Developer Profile

  • How did students use the testing tools.
  • Are ther clusters of similar use?
  • What is charactersitic for these clusters?
  • Meassures:
    • Aksing students before and after
    • Are there projects where tests initially always fail resp. pass
    • How often do they test?
    • How correct is their project?

I am not completely sure yet what to assess here.

How do extracted, synthesized and manually written test cases compare?

  • Which tests are the most useful to students?
  • How many tests are there in each category?
  • What's the test suite quality of each category?
  • Were some excluded from testing more often than others?
  • How many red/green and green/red transitions are there in each category?
  • Which had compile-time errors most often that did not get fixed?