Difference between revisions of "CA Adding New Rules"

(Standard Rules: section done)
(Accessing the Control Flow Graph)
 
(16 intermediate revisions by 2 users not shown)
Line 1: Line 1:
<center><small>[[User:Stefan/Code Analysis/Command Line Usage|&lt;&lt; 5. Command Line Usage]] | [[User:Stefan/Code Analysis/Architectural Overview|7. Architectural Overview &gt;&gt;]] </small></center>
+
[[Category:Code Analysis]]
----
+
 
<br />
+
The Inspector Eiffel framework was designed with regard to the fact that '''adding new rules''' should be as simple and as fast as possible. Looking at the initial set of rules that were implemented, nearly all of them have an implementation of less than 200 lines of code. Many of them use even less than 100 lines of code. Rules that search the code for certain patterns (this applies to the vast majority of rules) are particularly simple to implement.
The Code Analysis framework was designed with regard to the fact that '''adding new rules''' should be as simple and as fast as possible. Looking at the initial set of rules that were implemented, nearly all of them have an implementation of less than 200 lines of code. Many of them use even less than 100 lines of code. Rules that search the code for certain patterns (this applies to the vast majority of rules) are particularly simple to implement.
+
 
 +
This page shows you how to implement a rule in the form of a class. After you have written such a class you must add the rule to the list of rules. This list is populated in <e>{CA_CODE_ANALYZER}.make</e>. There, just below the lines where all the other rules are added add a line like
 +
 
 +
<e>rules.extend (create {YOUR_RULE}.make)</e>,
 +
 
 +
where <e>YOUR_RULE</e> must be replaced by the name of your rule class and the creation procedure (<e>make</e>) must be adapted if necessary.
  
 
== Standard Rules ==
 
== Standard Rules ==
  
All rules must conform to <e>CA_RULE</e>. The class you implement for a rule is on one hand responsible for checking the rule and contains metadata about the rule (i. e. title, description) on the other hand. As of now, rules must moreover conform to either <e>CA_STANDARD_RULE</e> or <e>CA_CFG_RULE</e>, both of which are subtypes of <e>CA_RULE</e>. A large number of possible rules are '''standard rules''', no matter whether they are trivial or more complicated.
+
All rules must conform to <e>CA_RULE</e>. The class you implement for a rule is on one hand responsible for checking the rule and contains metadata about the rule (i. e. title, description) on the other hand. As of now, rules must moreover conform to either <e>CA_STANDARD_RULE</e> or <e>CA_CFG_RULE</e>, both of which are subtypes of <e>CA_RULE</e>. A large number of possible rules are ''standard rules'', no matter whether they are trivial or more complicated.
  
All ''Standard rules'' are checked by iterating on the [http://en.wikipedia.org/wiki/Abstract_syntax_tree abstract syntax tree] (AST) of the class code. The developer who adds a new rule can very well ignore the details thereof. He needs to know however which AST nodes his rule needs to process. For each type of AST node you need to add an agent so your routine will be called during the iteration on the AST.
+
All ''Standard rules'' are checked by iterating over the [http://en.wikipedia.org/wiki/Abstract_syntax_tree abstract syntax tree] (AST) of the class code. The developer who adds a new rule can very well ignore the details thereof. He needs to know however which AST nodes his rule needs to process. For each type of AST node you need to add an agent so your routine will be called during the iteration on the AST.
  
 
To start implementing your rule you have basically two possibilities. (1) You start from scratch, implementing all deferred features of <e>CA_STANDARD_RULE</e> or (2) you use the following template.
 
To start implementing your rule you have basically two possibilities. (1) You start from scratch, implementing all deferred features of <e>CA_STANDARD_RULE</e> or (2) you use the following template.
Line 103: Line 108:
 
The creation procedure from the template takes an argument of type <e>PREFERENCE_MANAGER</e>. This is used for initializing preferences that are specific to your rule. Such preferences usually represent integral or boolean values. If you do ''not'' need any custom preferences then you can leave out the argument <e>a_pref_manager</e> of <e>make</e> and you can remove the whole <e>initialize_options</e> feature.
 
The creation procedure from the template takes an argument of type <e>PREFERENCE_MANAGER</e>. This is used for initializing preferences that are specific to your rule. Such preferences usually represent integral or boolean values. If you do ''not'' need any custom preferences then you can leave out the argument <e>a_pref_manager</e> of <e>make</e> and you can remove the whole <e>initialize_options</e> feature.
  
=== AST processing ===
+
=== AST Processing ===
  
 
The main part of your rule implementation consists of checking the source code for rule violations. Say, for example, that you want to check <e>if</e> instructions to have certain properties. Then you would add a feature like <e>process_if (a_if_ast: IF_AS)</e> to the section ''Rule checking''. Also, you would need to modify the <e>register_actions</e> feature by adding the line
 
The main part of your rule implementation consists of checking the source code for rule violations. Say, for example, that you want to check <e>if</e> instructions to have certain properties. Then you would add a feature like <e>process_if (a_if_ast: IF_AS)</e> to the section ''Rule checking''. Also, you would need to modify the <e>register_actions</e> feature by adding the line
Line 118: Line 123:
  
 
Your rule should be able to produce a formatted description of a concrete rule violation. This description is for example used in the Code Analysis tool panel of the GUI. There, class names and feature names are enabled for pick-and-drop. Variable names, numbers, and strings will be displayed in a nice way, too. In addition, this description is used in command line mode. In order to produce normal, unformatted text, use <e>{TEXT_FORMATTER}.add</e>. For adding formatted elements use features like <e>{TEXT_FORMATTER}.add_local</e>, <e>{TEXT_FORMATTER}.add_feature_name</e> and similar.
 
Your rule should be able to produce a formatted description of a concrete rule violation. This description is for example used in the Code Analysis tool panel of the GUI. There, class names and feature names are enabled for pick-and-drop. Variable names, numbers, and strings will be displayed in a nice way, too. In addition, this description is used in command line mode. In order to produce normal, unformatted text, use <e>{TEXT_FORMATTER}.add</e>. For adding formatted elements use features like <e>{TEXT_FORMATTER}.add_local</e>, <e>{TEXT_FORMATTER}.add_feature_name</e> and similar.
 +
 +
You should store all the data you need for this description (variables names, numbers, etc.) in <e>{CA_RULE_VIOLATION}.long_description_info</e>. <e>format_violation_description</e> can then retrieve this data for the formatted output. Here is a simple example of producing a formatted description:
 +
 +
<e>
 +
a_formatter.add ("Feature ")
 +
if attached {STRING_32} a_violation.long_description_info.first as l_feat_name then
 +
  a_formatter.add_feature_name (l_feat_name, a_violation.affected_class)
 +
end
 +
a_formatter.add (" is very long.")
 +
</e>
  
 
== More Customized Rules ==
 
== More Customized Rules ==
 +
 +
For rules that do not fit into a simple AST visitor scheme you best inherit your rule from <e>{CA_STANDARD_RULE}</e>, too. You can for example register agents that get called when a ''class'' or a ''feature'' is processed. Based on these agents you can perform your customized analysis on the classes and/or features. Using ''multiple inheritance'' or just aggregation it should hardly be a problem to include any functionality you need for your analysis.
  
 
== Accessing Type Information ==
 
== Accessing Type Information ==
 +
 +
The AST classes do not contain ''type information''. Suppose your rule processes function calls. Feature calls in the AST do not contain any information on the types, such as the type of the result.
 +
 +
The code analysis framework however provides functionality to retrieve the type of AST nodes. Before the analyzer lets a class be analyzed by all the rules it computes the types of the AST nodes of a class. Hence this data will be available to your rule afterwards.
 +
 +
While your rule is being checked you can retrieve the type of node <e>a_node</e> from feature <e>a_feature</e> by calling <e>current_context.node_type (a_node: AST_EIFFEL; a_feature: FEATURE_I)</e>. <e>{CA_RULE}.current_context</e> is of type <e>{CA_ANALYSIS_CONTEXT}</e> and contains other information about current rule checking, too, such as the currently processed class or the matchlist for this class.
  
 
== Accessing the Control Flow Graph ==
 
== Accessing the Control Flow Graph ==
 +
 +
Some kinds of static code analysis need and use the ''control flow graph'' of a program. The code analysis framework supports rules that use the control flow graph. If there is at least one such rule, the code analyzer computes the control flow graph of the procedures of the analyzed class before letting the ''rule'' check this class.
 +
 +
=== Worklist Algorithms ===
 +
 +
''Control flow graph rules'' iterate over the control flow graph. They do it using a ''worklist''—a list of CFG edges that still have to be processed. At the beginning, the worklist contains all edges of the control flow graph. The algorithm will pick edges from the worklist for processing in an arbitrary order. The iteration stops as soon as there are no more edges left in the worklist. How will the worklist get smaller? Each edge that is processed is removed from the worklist. After processing you will have to decide dynamically whether to add all the outgoing (or incoming, depending on the direction) edges to the worklist. Like this you can take the fact into account that some analyses need certain edges to be processed more than once (a fixed point iteration is such an example).
 +
 +
=== Implementation ===
 +
 +
A control flow analysis may iterate in either direction. For a forward-directed analysis inherit your rule from <e>{CA_CFG_FORWARD_RULE}</e>, for a backward analysis use <e>{CA_CFG_BACKWARD_RULE}</e> instead. In either case you will then have to implement the following deferred features:
 +
 +
; <e>initialize_processing (a_cfg: attached CA_CONTROL_FLOW_GRAPH)</e> : This is called before a routine is processed using the worklist. Essentially you may use it to initialize and prepare all the data structures you will need during analysis.
 +
; <e>visit_edge (a_from, a_to: attached CA_CFG_BASIC_BLOCK): BOOLEAN</e> : This will be called when an edge is being visited. Here, you can put the analysis. If you let <e>Result = False</e> then no further edges will be added to the worklist. If in contrary you let <e>Result = True</e> then edges will be added to the worklist: In a ''forward'' analysis all the ''outgoing'' edges of the current one will be added; in a ''backward'' analysis all the ''incoming'' edges will be added.
 +
 +
=== Non-Worklist algorithms ===
 +
 +
If your control flow graph does not fit into the structure of an algorithm as described above you may directly inherit from <e>{CA_CFG_RULE}</e> and implement the feature <e>process_cfg (a_cfg: attached CA_CONTROL_FLOW_GRAPH)</e> (in addition to the features explained above). In this case you do not have to use a worklist; basically you can process the control flow graph in any way you want.
 +
 +
== Exceptions During Analysis ==
 +
 +
In case of a bug in a rule, which leads to an exception being thrown during analysis, the exception is caught by the analyzer. It will show up as an ''error'' on the very top of the list in the panel, above all rule violations. You can double-click on the entry to see the exception details (the call stack, which rule caused it, and so forth).
 +
 +
When an exception occurs while a class is being analyzed the analyzer continues with the next class. Despite of exceptions code analysis tries to analyze as much as possible. However, some rule violations (of bug-free rules) may be missing.
 +
 +
You can report exceptions to the competent developers.

Latest revision as of 15:27, 3 June 2014


The Inspector Eiffel framework was designed with regard to the fact that adding new rules should be as simple and as fast as possible. Looking at the initial set of rules that were implemented, nearly all of them have an implementation of less than 200 lines of code. Many of them use even less than 100 lines of code. Rules that search the code for certain patterns (this applies to the vast majority of rules) are particularly simple to implement.

This page shows you how to implement a rule in the form of a class. After you have written such a class you must add the rule to the list of rules. This list is populated in {CA_CODE_ANALYZER}.make. There, just below the lines where all the other rules are added add a line like

rules.extend (create {YOUR_RULE}.make),

where YOUR_RULE must be replaced by the name of your rule class and the creation procedure (make) must be adapted if necessary.

Standard Rules

All rules must conform to CA_RULE. The class you implement for a rule is on one hand responsible for checking the rule and contains metadata about the rule (i. e. title, description) on the other hand. As of now, rules must moreover conform to either CA_STANDARD_RULE or CA_CFG_RULE, both of which are subtypes of CA_RULE. A large number of possible rules are standard rules, no matter whether they are trivial or more complicated.

All Standard rules are checked by iterating over the abstract syntax tree (AST) of the class code. The developer who adds a new rule can very well ignore the details thereof. He needs to know however which AST nodes his rule needs to process. For each type of AST node you need to add an agent so your routine will be called during the iteration on the AST.

To start implementing your rule you have basically two possibilities. (1) You start from scratch, implementing all deferred features of CA_STANDARD_RULE or (2) you use the following template.

Standard Rule Template

class
  CA_YOUR_RULE
 
inherit
  CA_STANDARD_RULE
 
create
  make
 
feature {NONE} -- Initialization
 
  make (a_pref_manager: attached PREFERENCE_MANAGER)
      -- Initialization for `Current'.
    do
      make_with_defaults
        -- This initializes the attributes to their default values:
        -- Severity = warning
        -- Default Severity Score = 50 (`severity score' can be changed by user)
        -- Rule enabled by default = True (`Rule enabled' can be changed by user)
        -- Only for system wide checks = False
        -- Checks library classes = True
        -- Checks nonlibrary classes = True
 
        initialize_options (a_pref_manager)
 
        -- TODO: Add your initialization here.
    end
 
  initialize_options (a_pref_manager: attached PREFERENCE_MANAGER)
      -- Initializes the rule preferences.
    local
      l_factory: BASIC_PREFERENCE_FACTORY
    do
      create l_factory
 
        -- TODO: Add the initialization of your custom preferences here.
        -- Example:
--    threshold := l_factory.new_integer_preference_value (a_pref_manager,
--      preference_namespace + "Threshold",
--      30)  -- default value
--    min_local_name_length.set_default_value ("30") -- default value, too
--    min_local_name_length.set_validation_agent (agent is_integer_string_within_bounds (?, 1, 1_000_000))
    end
 
feature {NONE} -- Activation
 
  register_actions (a_checker: attached CA_ALL_RULES_CHECKER)
    do
      -- TODO: Add agents for the features in section `Rule checking' here.
    end
 
feature {NONE} -- Rule checking
 
  -- TODO: Add the AST processing here.
 
feature -- Properties
 
  title: STRING_32
    do
        -- TODO: Add the title of your rule here.
      Result := "(Your title)"
    end
 
    -- TODO: Add the ID of your rule here. Should be unique!
  id: STRING_32 = "(YourID)"
 
  description: STRING_32
    do
        -- TODO: Add the rule description here.
      Result :=  "(Your description)"
    end
 
  format_violation_description (a_violation: attached CA_RULE_VIOLATION; a_formatter: attached TEXT_FORMATTER)
    do
      -- TODO: Add a formatted description of a concrete violation of this rule here.
    end
 
end

Let us have a closer look at the various parts of a rule class.

Initialization

Calling make_with_defaults initializes the attributes to their default values and makes sure that the class invariant is true. If you want to set an attribute to a custom value you can do so by setting it after the call to make_with_defaults.

The creation procedure from the template takes an argument of type PREFERENCE_MANAGER. This is used for initializing preferences that are specific to your rule. Such preferences usually represent integral or boolean values. If you do not need any custom preferences then you can leave out the argument a_pref_manager of make and you can remove the whole initialize_options feature.

AST Processing

The main part of your rule implementation consists of checking the source code for rule violations. Say, for example, that you want to check if instructions to have certain properties. Then you would add a feature like process_if (a_if_ast: IF_AS) to the section Rule checking. Also, you would need to modify the register_actions feature by adding the line

a_checker.add_if_pre_action (agent process_if).

Of course you may register as many such agents as you want.

Properties

The title and the description of the rule may be constant strings, they may also be localized strings. The rule ID must be unique among all rules. It should not contain spaces and should be reasonably short. The main rules that come with Code Analysis have IDs that are numbered from CA001 to CA999 (many of which are not used).

Formatted Violation Description

Your rule should be able to produce a formatted description of a concrete rule violation. This description is for example used in the Code Analysis tool panel of the GUI. There, class names and feature names are enabled for pick-and-drop. Variable names, numbers, and strings will be displayed in a nice way, too. In addition, this description is used in command line mode. In order to produce normal, unformatted text, use {TEXT_FORMATTER}.add. For adding formatted elements use features like {TEXT_FORMATTER}.add_local, {TEXT_FORMATTER}.add_feature_name and similar.

You should store all the data you need for this description (variables names, numbers, etc.) in {CA_RULE_VIOLATION}.long_description_info. format_violation_description can then retrieve this data for the formatted output. Here is a simple example of producing a formatted description:

a_formatter.add ("Feature ")
if attached {STRING_32} a_violation.long_description_info.first as l_feat_name then
  a_formatter.add_feature_name (l_feat_name, a_violation.affected_class)
end
a_formatter.add (" is very long.")

More Customized Rules

For rules that do not fit into a simple AST visitor scheme you best inherit your rule from {CA_STANDARD_RULE}, too. You can for example register agents that get called when a class or a feature is processed. Based on these agents you can perform your customized analysis on the classes and/or features. Using multiple inheritance or just aggregation it should hardly be a problem to include any functionality you need for your analysis.

Accessing Type Information

The AST classes do not contain type information. Suppose your rule processes function calls. Feature calls in the AST do not contain any information on the types, such as the type of the result.

The code analysis framework however provides functionality to retrieve the type of AST nodes. Before the analyzer lets a class be analyzed by all the rules it computes the types of the AST nodes of a class. Hence this data will be available to your rule afterwards.

While your rule is being checked you can retrieve the type of node a_node from feature a_feature by calling current_context.node_type (a_node: AST_EIFFEL; a_feature: FEATURE_I). {CA_RULE}.current_context is of type {CA_ANALYSIS_CONTEXT} and contains other information about current rule checking, too, such as the currently processed class or the matchlist for this class.

Accessing the Control Flow Graph

Some kinds of static code analysis need and use the control flow graph of a program. The code analysis framework supports rules that use the control flow graph. If there is at least one such rule, the code analyzer computes the control flow graph of the procedures of the analyzed class before letting the rule check this class.

Worklist Algorithms

Control flow graph rules iterate over the control flow graph. They do it using a worklist—a list of CFG edges that still have to be processed. At the beginning, the worklist contains all edges of the control flow graph. The algorithm will pick edges from the worklist for processing in an arbitrary order. The iteration stops as soon as there are no more edges left in the worklist. How will the worklist get smaller? Each edge that is processed is removed from the worklist. After processing you will have to decide dynamically whether to add all the outgoing (or incoming, depending on the direction) edges to the worklist. Like this you can take the fact into account that some analyses need certain edges to be processed more than once (a fixed point iteration is such an example).

Implementation

A control flow analysis may iterate in either direction. For a forward-directed analysis inherit your rule from {CA_CFG_FORWARD_RULE}, for a backward analysis use {CA_CFG_BACKWARD_RULE} instead. In either case you will then have to implement the following deferred features:

initialize_processing (a_cfg: attached CA_CONTROL_FLOW_GRAPH) 
This is called before a routine is processed using the worklist. Essentially you may use it to initialize and prepare all the data structures you will need during analysis.
visit_edge (a_from, a_to: attached CA_CFG_BASIC_BLOCK): BOOLEAN 
This will be called when an edge is being visited. Here, you can put the analysis. If you let Result = False then no further edges will be added to the worklist. If in contrary you let Result = True then edges will be added to the worklist: In a forward analysis all the outgoing edges of the current one will be added; in a backward analysis all the incoming edges will be added.

Non-Worklist algorithms

If your control flow graph does not fit into the structure of an algorithm as described above you may directly inherit from {CA_CFG_RULE} and implement the feature process_cfg (a_cfg: attached CA_CONTROL_FLOW_GRAPH) (in addition to the features explained above). In this case you do not have to use a worklist; basically you can process the control flow graph in any way you want.

Exceptions During Analysis

In case of a bug in a rule, which leads to an exception being thrown during analysis, the exception is caught by the analyzer. It will show up as an error on the very top of the list in the panel, above all rule violations. You can double-click on the entry to see the exception details (the call stack, which rule caused it, and so forth).

When an exception occurs while a class is being analyzed the analyzer continues with the next class. Despite of exceptions code analysis tries to analyze as much as possible. However, some rule violations (of bug-free rules) may be missing.

You can report exceptions to the competent developers.