CddMeeting01082008
Contents
CDD Meeting, Tuesday, 8.1.2008, 10:00
Next Meeting
- Friday, 11.1.2008, 10:00
 
Tasks
- Add filters and tags for extracted, manual tests and automated tests
 - Fix extraction for tuples -> DONE
 - Look at/fix test case execution for agents (Stefan)
 - CDD log window in IDE (Arno)
 - "New manual test case" Button (Arno)
 - Better Icons for GUI (Arno)
 - Status / Progress bar (Arno)
 - Port to 6.1 (?, probably only after Beta 1)
 - Manual re-run to find true prestate (Jocelyn, Stefan)
 -  Logging (Stefan)
- What data to log?
 - Implement storing
 - Define how students should submit logs
 
 -  Data Gathering (Stefan)
- Define what data to gather
 - Define how to process gather data
 
 - Forumulate Experiment Hypothesis (Andreas)
 -  Define Project for SoftEng (Manu)
- Find System level test suite for us to test students code
 - Find project with pure functional part
 
 - "Execute visible test cases only" Button (?)
 -  Restore open nodes and selection after grid update (Arno)
- Maybe better/easier solved via incremental updates from tree
 
 - Automate CDD System level tests (Stefan)
 - Install CDD in student labs (Manu)
 - Pause test execution and compilation during regular compilation and execution (Arno)
 - Add most important convenience routine to CDD_TEST_CASE (Stefan)
 -  Add failure context window (Arno)
- Maybe also additional information such as previous outcomes?
 
 - Check why Gobo slows down compilation of project not using gobo when melting (performance issue for compiling interpreter)
 -  Fix AutoTest for courses
- Integrate AUT_TEST_CASE into CDD_TEST_CASE hierarchy
 - Variable declaration for failing test cases
 - New release
 
 - Move logs below cdd_tests
 - Environment variable (or better user preference) for qualifying class names (to avoid svn conflicts)
 - Uniqe id to tag test cases with. To be used in logs. So test logs are resiliant to test class renamings
 - While extracting test cases, flag objects that are target to a currently executing routine
 - During setup check inv of all objects that are not flaged
 
Software Engineering Project
- One large project, but divided into testable subcomponents
 - Students required to write test cases
 - Fixed API to make things uniformly testable
 - Public/Secret test cases (similar to Zeller course)
 -  Competitions:
- Group A test cases applied to Group A project
 - Group A test cases applied to Groupt B project
 
 
Data to harvest
-  Test Case Source (just final version, or all versions?)
- Use Profiler to get coverage approximation
 
 -  TC Meta Data (with timestamps)
- TC Added/Removed
 - TC Outcome
 - TC execution time
 
 -  Development Session Data
- IDE Startup
 - File save
 
 -  Questionnairs
- Initial
 - Final
 
 
Experiment Hypotheses
Use of CDD increases development productivity
- Did the use of testing decrease development time?
 -  This can be meassured by either looking at
- Number of compilations
 - Number of saves
 - Number of revisions
 - IDE time
 - Asking the students
 
 
None of the above strikes me as particualry reliable though. Also, it is easy to develop quickly if you do a bad job. In order to compare apples to apples we must be careful to compare projects with a similar correcntess and completeness. We could use an external test suite to assess correctness, or the grade of the students.
Use of CDD increases code correctness
- Is there a relation between code correctness of project (vs. some system level test suite) and test activity?
 
Measures for test activity:
- number of tests
 - number of times test were run
 - Number of pass/fail, fail/pass transitions
 
Developer Profile
- How did students use the testing tools.
 - Are ther clusters of similar use?
 - What is charactersitic for these clusters?
 -  Meassures:
- Aksing students before and after
 - Are there projects where tests initially always fail resp. pass
 - How often do they test?
 - How correct is their project?
 
 
I am not completely sure yet what to assess here.
How do extracted, synthesized and manually written test cases compare?
- How many tests are there in each category?
 - Were some excluded from testing more often than others?
 - How many red/green and green/red transitions are there in each category?
 - Which had compile-time errors most often that did not get fixed?
 - Which did students find the most useful (questionnair)?
 

