Difference between revisions of "CA Adding New Rules"
(→Standard Rules: section done) |
(→Accessing the Control Flow Graph) |
||
(16 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
− | + | [[Category:Code Analysis]] | |
− | + | ||
− | + | The Inspector Eiffel framework was designed with regard to the fact that '''adding new rules''' should be as simple and as fast as possible. Looking at the initial set of rules that were implemented, nearly all of them have an implementation of less than 200 lines of code. Many of them use even less than 100 lines of code. Rules that search the code for certain patterns (this applies to the vast majority of rules) are particularly simple to implement. | |
− | The | + | |
+ | This page shows you how to implement a rule in the form of a class. After you have written such a class you must add the rule to the list of rules. This list is populated in <e>{CA_CODE_ANALYZER}.make</e>. There, just below the lines where all the other rules are added add a line like | ||
+ | |||
+ | <e>rules.extend (create {YOUR_RULE}.make)</e>, | ||
+ | |||
+ | where <e>YOUR_RULE</e> must be replaced by the name of your rule class and the creation procedure (<e>make</e>) must be adapted if necessary. | ||
== Standard Rules == | == Standard Rules == | ||
− | All rules must conform to <e>CA_RULE</e>. The class you implement for a rule is on one hand responsible for checking the rule and contains metadata about the rule (i. e. title, description) on the other hand. As of now, rules must moreover conform to either <e>CA_STANDARD_RULE</e> or <e>CA_CFG_RULE</e>, both of which are subtypes of <e>CA_RULE</e>. A large number of possible rules are | + | All rules must conform to <e>CA_RULE</e>. The class you implement for a rule is on one hand responsible for checking the rule and contains metadata about the rule (i. e. title, description) on the other hand. As of now, rules must moreover conform to either <e>CA_STANDARD_RULE</e> or <e>CA_CFG_RULE</e>, both of which are subtypes of <e>CA_RULE</e>. A large number of possible rules are ''standard rules'', no matter whether they are trivial or more complicated. |
− | All ''Standard rules'' are checked by iterating | + | All ''Standard rules'' are checked by iterating over the [http://en.wikipedia.org/wiki/Abstract_syntax_tree abstract syntax tree] (AST) of the class code. The developer who adds a new rule can very well ignore the details thereof. He needs to know however which AST nodes his rule needs to process. For each type of AST node you need to add an agent so your routine will be called during the iteration on the AST. |
To start implementing your rule you have basically two possibilities. (1) You start from scratch, implementing all deferred features of <e>CA_STANDARD_RULE</e> or (2) you use the following template. | To start implementing your rule you have basically two possibilities. (1) You start from scratch, implementing all deferred features of <e>CA_STANDARD_RULE</e> or (2) you use the following template. | ||
Line 103: | Line 108: | ||
The creation procedure from the template takes an argument of type <e>PREFERENCE_MANAGER</e>. This is used for initializing preferences that are specific to your rule. Such preferences usually represent integral or boolean values. If you do ''not'' need any custom preferences then you can leave out the argument <e>a_pref_manager</e> of <e>make</e> and you can remove the whole <e>initialize_options</e> feature. | The creation procedure from the template takes an argument of type <e>PREFERENCE_MANAGER</e>. This is used for initializing preferences that are specific to your rule. Such preferences usually represent integral or boolean values. If you do ''not'' need any custom preferences then you can leave out the argument <e>a_pref_manager</e> of <e>make</e> and you can remove the whole <e>initialize_options</e> feature. | ||
− | === AST | + | === AST Processing === |
The main part of your rule implementation consists of checking the source code for rule violations. Say, for example, that you want to check <e>if</e> instructions to have certain properties. Then you would add a feature like <e>process_if (a_if_ast: IF_AS)</e> to the section ''Rule checking''. Also, you would need to modify the <e>register_actions</e> feature by adding the line | The main part of your rule implementation consists of checking the source code for rule violations. Say, for example, that you want to check <e>if</e> instructions to have certain properties. Then you would add a feature like <e>process_if (a_if_ast: IF_AS)</e> to the section ''Rule checking''. Also, you would need to modify the <e>register_actions</e> feature by adding the line | ||
Line 118: | Line 123: | ||
Your rule should be able to produce a formatted description of a concrete rule violation. This description is for example used in the Code Analysis tool panel of the GUI. There, class names and feature names are enabled for pick-and-drop. Variable names, numbers, and strings will be displayed in a nice way, too. In addition, this description is used in command line mode. In order to produce normal, unformatted text, use <e>{TEXT_FORMATTER}.add</e>. For adding formatted elements use features like <e>{TEXT_FORMATTER}.add_local</e>, <e>{TEXT_FORMATTER}.add_feature_name</e> and similar. | Your rule should be able to produce a formatted description of a concrete rule violation. This description is for example used in the Code Analysis tool panel of the GUI. There, class names and feature names are enabled for pick-and-drop. Variable names, numbers, and strings will be displayed in a nice way, too. In addition, this description is used in command line mode. In order to produce normal, unformatted text, use <e>{TEXT_FORMATTER}.add</e>. For adding formatted elements use features like <e>{TEXT_FORMATTER}.add_local</e>, <e>{TEXT_FORMATTER}.add_feature_name</e> and similar. | ||
+ | |||
+ | You should store all the data you need for this description (variables names, numbers, etc.) in <e>{CA_RULE_VIOLATION}.long_description_info</e>. <e>format_violation_description</e> can then retrieve this data for the formatted output. Here is a simple example of producing a formatted description: | ||
+ | |||
+ | <e> | ||
+ | a_formatter.add ("Feature ") | ||
+ | if attached {STRING_32} a_violation.long_description_info.first as l_feat_name then | ||
+ | a_formatter.add_feature_name (l_feat_name, a_violation.affected_class) | ||
+ | end | ||
+ | a_formatter.add (" is very long.") | ||
+ | </e> | ||
== More Customized Rules == | == More Customized Rules == | ||
+ | |||
+ | For rules that do not fit into a simple AST visitor scheme you best inherit your rule from <e>{CA_STANDARD_RULE}</e>, too. You can for example register agents that get called when a ''class'' or a ''feature'' is processed. Based on these agents you can perform your customized analysis on the classes and/or features. Using ''multiple inheritance'' or just aggregation it should hardly be a problem to include any functionality you need for your analysis. | ||
== Accessing Type Information == | == Accessing Type Information == | ||
+ | |||
+ | The AST classes do not contain ''type information''. Suppose your rule processes function calls. Feature calls in the AST do not contain any information on the types, such as the type of the result. | ||
+ | |||
+ | The code analysis framework however provides functionality to retrieve the type of AST nodes. Before the analyzer lets a class be analyzed by all the rules it computes the types of the AST nodes of a class. Hence this data will be available to your rule afterwards. | ||
+ | |||
+ | While your rule is being checked you can retrieve the type of node <e>a_node</e> from feature <e>a_feature</e> by calling <e>current_context.node_type (a_node: AST_EIFFEL; a_feature: FEATURE_I)</e>. <e>{CA_RULE}.current_context</e> is of type <e>{CA_ANALYSIS_CONTEXT}</e> and contains other information about current rule checking, too, such as the currently processed class or the matchlist for this class. | ||
== Accessing the Control Flow Graph == | == Accessing the Control Flow Graph == | ||
+ | |||
+ | Some kinds of static code analysis need and use the ''control flow graph'' of a program. The code analysis framework supports rules that use the control flow graph. If there is at least one such rule, the code analyzer computes the control flow graph of the procedures of the analyzed class before letting the ''rule'' check this class. | ||
+ | |||
+ | === Worklist Algorithms === | ||
+ | |||
+ | ''Control flow graph rules'' iterate over the control flow graph. They do it using a ''worklist''—a list of CFG edges that still have to be processed. At the beginning, the worklist contains all edges of the control flow graph. The algorithm will pick edges from the worklist for processing in an arbitrary order. The iteration stops as soon as there are no more edges left in the worklist. How will the worklist get smaller? Each edge that is processed is removed from the worklist. After processing you will have to decide dynamically whether to add all the outgoing (or incoming, depending on the direction) edges to the worklist. Like this you can take the fact into account that some analyses need certain edges to be processed more than once (a fixed point iteration is such an example). | ||
+ | |||
+ | === Implementation === | ||
+ | |||
+ | A control flow analysis may iterate in either direction. For a forward-directed analysis inherit your rule from <e>{CA_CFG_FORWARD_RULE}</e>, for a backward analysis use <e>{CA_CFG_BACKWARD_RULE}</e> instead. In either case you will then have to implement the following deferred features: | ||
+ | |||
+ | ; <e>initialize_processing (a_cfg: attached CA_CONTROL_FLOW_GRAPH)</e> : This is called before a routine is processed using the worklist. Essentially you may use it to initialize and prepare all the data structures you will need during analysis. | ||
+ | ; <e>visit_edge (a_from, a_to: attached CA_CFG_BASIC_BLOCK): BOOLEAN</e> : This will be called when an edge is being visited. Here, you can put the analysis. If you let <e>Result = False</e> then no further edges will be added to the worklist. If in contrary you let <e>Result = True</e> then edges will be added to the worklist: In a ''forward'' analysis all the ''outgoing'' edges of the current one will be added; in a ''backward'' analysis all the ''incoming'' edges will be added. | ||
+ | |||
+ | === Non-Worklist algorithms === | ||
+ | |||
+ | If your control flow graph does not fit into the structure of an algorithm as described above you may directly inherit from <e>{CA_CFG_RULE}</e> and implement the feature <e>process_cfg (a_cfg: attached CA_CONTROL_FLOW_GRAPH)</e> (in addition to the features explained above). In this case you do not have to use a worklist; basically you can process the control flow graph in any way you want. | ||
+ | |||
+ | == Exceptions During Analysis == | ||
+ | |||
+ | In case of a bug in a rule, which leads to an exception being thrown during analysis, the exception is caught by the analyzer. It will show up as an ''error'' on the very top of the list in the panel, above all rule violations. You can double-click on the entry to see the exception details (the call stack, which rule caused it, and so forth). | ||
+ | |||
+ | When an exception occurs while a class is being analyzed the analyzer continues with the next class. Despite of exceptions code analysis tries to analyze as much as possible. However, some rule violations (of bug-free rules) may be missing. | ||
+ | |||
+ | You can report exceptions to the competent developers. |
Latest revision as of 15:27, 3 June 2014
The Inspector Eiffel framework was designed with regard to the fact that adding new rules should be as simple and as fast as possible. Looking at the initial set of rules that were implemented, nearly all of them have an implementation of less than 200 lines of code. Many of them use even less than 100 lines of code. Rules that search the code for certain patterns (this applies to the vast majority of rules) are particularly simple to implement.
This page shows you how to implement a rule in the form of a class. After you have written such a class you must add the rule to the list of rules. This list is populated in {CA_CODE_ANALYZER}.make
. There, just below the lines where all the other rules are added add a line like
rules.extend (create {YOUR_RULE}.make)
,
where YOUR_RULE
must be replaced by the name of your rule class and the creation procedure (make
) must be adapted if necessary.
Contents
Standard Rules
All rules must conform to CA_RULE
. The class you implement for a rule is on one hand responsible for checking the rule and contains metadata about the rule (i. e. title, description) on the other hand. As of now, rules must moreover conform to either CA_STANDARD_RULE
or CA_CFG_RULE
, both of which are subtypes of CA_RULE
. A large number of possible rules are standard rules, no matter whether they are trivial or more complicated.
All Standard rules are checked by iterating over the abstract syntax tree (AST) of the class code. The developer who adds a new rule can very well ignore the details thereof. He needs to know however which AST nodes his rule needs to process. For each type of AST node you need to add an agent so your routine will be called during the iteration on the AST.
To start implementing your rule you have basically two possibilities. (1) You start from scratch, implementing all deferred features of CA_STANDARD_RULE
or (2) you use the following template.
Standard Rule Template
class CA_YOUR_RULE inherit CA_STANDARD_RULE create make feature {NONE} -- Initialization make (a_pref_manager: attached PREFERENCE_MANAGER) -- Initialization for `Current'. do make_with_defaults -- This initializes the attributes to their default values: -- Severity = warning -- Default Severity Score = 50 (`severity score' can be changed by user) -- Rule enabled by default = True (`Rule enabled' can be changed by user) -- Only for system wide checks = False -- Checks library classes = True -- Checks nonlibrary classes = True initialize_options (a_pref_manager) -- TODO: Add your initialization here. end initialize_options (a_pref_manager: attached PREFERENCE_MANAGER) -- Initializes the rule preferences. local l_factory: BASIC_PREFERENCE_FACTORY do create l_factory -- TODO: Add the initialization of your custom preferences here. -- Example: -- threshold := l_factory.new_integer_preference_value (a_pref_manager, -- preference_namespace + "Threshold", -- 30) -- default value -- min_local_name_length.set_default_value ("30") -- default value, too -- min_local_name_length.set_validation_agent (agent is_integer_string_within_bounds (?, 1, 1_000_000)) end feature {NONE} -- Activation register_actions (a_checker: attached CA_ALL_RULES_CHECKER) do -- TODO: Add agents for the features in section `Rule checking' here. end feature {NONE} -- Rule checking -- TODO: Add the AST processing here. feature -- Properties title: STRING_32 do -- TODO: Add the title of your rule here. Result := "(Your title)" end -- TODO: Add the ID of your rule here. Should be unique! id: STRING_32 = "(YourID)" description: STRING_32 do -- TODO: Add the rule description here. Result := "(Your description)" end format_violation_description (a_violation: attached CA_RULE_VIOLATION; a_formatter: attached TEXT_FORMATTER) do -- TODO: Add a formatted description of a concrete violation of this rule here. end end
Let us have a closer look at the various parts of a rule class.
Initialization
Calling make_with_defaults
initializes the attributes to their default values and makes sure that the class invariant is true. If you want to set an attribute to a custom value you can do so by setting it after the call to make_with_defaults
.
The creation procedure from the template takes an argument of type PREFERENCE_MANAGER
. This is used for initializing preferences that are specific to your rule. Such preferences usually represent integral or boolean values. If you do not need any custom preferences then you can leave out the argument a_pref_manager
of make
and you can remove the whole initialize_options
feature.
AST Processing
The main part of your rule implementation consists of checking the source code for rule violations. Say, for example, that you want to check if
instructions to have certain properties. Then you would add a feature like process_if (a_if_ast: IF_AS)
to the section Rule checking. Also, you would need to modify the register_actions
feature by adding the line
a_checker.add_if_pre_action (agent process_if)
.
Of course you may register as many such agents as you want.
Properties
The title and the description of the rule may be constant strings, they may also be localized strings. The rule ID must be unique among all rules. It should not contain spaces and should be reasonably short. The main rules that come with Code Analysis have IDs that are numbered from CA001 to CA999 (many of which are not used).
Formatted Violation Description
Your rule should be able to produce a formatted description of a concrete rule violation. This description is for example used in the Code Analysis tool panel of the GUI. There, class names and feature names are enabled for pick-and-drop. Variable names, numbers, and strings will be displayed in a nice way, too. In addition, this description is used in command line mode. In order to produce normal, unformatted text, use {TEXT_FORMATTER}.add
. For adding formatted elements use features like {TEXT_FORMATTER}.add_local
, {TEXT_FORMATTER}.add_feature_name
and similar.
You should store all the data you need for this description (variables names, numbers, etc.) in {CA_RULE_VIOLATION}.long_description_info
. format_violation_description
can then retrieve this data for the formatted output. Here is a simple example of producing a formatted description:
a_formatter.add ("Feature ") if attached {STRING_32} a_violation.long_description_info.first as l_feat_name then a_formatter.add_feature_name (l_feat_name, a_violation.affected_class) end a_formatter.add (" is very long.")
More Customized Rules
For rules that do not fit into a simple AST visitor scheme you best inherit your rule from {CA_STANDARD_RULE}
, too. You can for example register agents that get called when a class or a feature is processed. Based on these agents you can perform your customized analysis on the classes and/or features. Using multiple inheritance or just aggregation it should hardly be a problem to include any functionality you need for your analysis.
Accessing Type Information
The AST classes do not contain type information. Suppose your rule processes function calls. Feature calls in the AST do not contain any information on the types, such as the type of the result.
The code analysis framework however provides functionality to retrieve the type of AST nodes. Before the analyzer lets a class be analyzed by all the rules it computes the types of the AST nodes of a class. Hence this data will be available to your rule afterwards.
While your rule is being checked you can retrieve the type of node a_node
from feature a_feature
by calling current_context.node_type (a_node: AST_EIFFEL; a_feature: FEATURE_I)
. {CA_RULE}.current_context
is of type {CA_ANALYSIS_CONTEXT}
and contains other information about current rule checking, too, such as the currently processed class or the matchlist for this class.
Accessing the Control Flow Graph
Some kinds of static code analysis need and use the control flow graph of a program. The code analysis framework supports rules that use the control flow graph. If there is at least one such rule, the code analyzer computes the control flow graph of the procedures of the analyzed class before letting the rule check this class.
Worklist Algorithms
Control flow graph rules iterate over the control flow graph. They do it using a worklist—a list of CFG edges that still have to be processed. At the beginning, the worklist contains all edges of the control flow graph. The algorithm will pick edges from the worklist for processing in an arbitrary order. The iteration stops as soon as there are no more edges left in the worklist. How will the worklist get smaller? Each edge that is processed is removed from the worklist. After processing you will have to decide dynamically whether to add all the outgoing (or incoming, depending on the direction) edges to the worklist. Like this you can take the fact into account that some analyses need certain edges to be processed more than once (a fixed point iteration is such an example).
Implementation
A control flow analysis may iterate in either direction. For a forward-directed analysis inherit your rule from {CA_CFG_FORWARD_RULE}
, for a backward analysis use {CA_CFG_BACKWARD_RULE}
instead. In either case you will then have to implement the following deferred features:
-
initialize_processing (a_cfg: attached CA_CONTROL_FLOW_GRAPH)
- This is called before a routine is processed using the worklist. Essentially you may use it to initialize and prepare all the data structures you will need during analysis.
-
visit_edge (a_from, a_to: attached CA_CFG_BASIC_BLOCK): BOOLEAN
- This will be called when an edge is being visited. Here, you can put the analysis. If you let
Result = False
then no further edges will be added to the worklist. If in contrary you letResult = True
then edges will be added to the worklist: In a forward analysis all the outgoing edges of the current one will be added; in a backward analysis all the incoming edges will be added.
Non-Worklist algorithms
If your control flow graph does not fit into the structure of an algorithm as described above you may directly inherit from {CA_CFG_RULE}
and implement the feature process_cfg (a_cfg: attached CA_CONTROL_FLOW_GRAPH)
(in addition to the features explained above). In this case you do not have to use a worklist; basically you can process the control flow graph in any way you want.
Exceptions During Analysis
In case of a bug in a rule, which leads to an exception being thrown during analysis, the exception is caught by the analyzer. It will show up as an error on the very top of the list in the panel, above all rule violations. You can double-click on the entry to see the exception details (the call stack, which rule caused it, and so forth).
When an exception occurs while a class is being analyzed the analyzer continues with the next class. Despite of exceptions code analysis tries to analyze as much as possible. However, some rule violations (of bug-free rules) may be missing.
You can report exceptions to the competent developers.