Dots of Tech Perception

October 28, 2008

On Role Mining

Filed under: Role Mining — Raluca Teodora Stoian @ 4:04 am

The role mining problem is a wrong way to go when trying to solve the role definition problem. There is a need to distinguish between actual and potential/candidate roles.

  • The actual roles are a complete and definite set that have semantics in the considered organizational context.
  • The role mining problem will define a set of potential roles. We can only come with an incomplete solution to the role mining problem. By means of approximation and heuristic, we could find an almost optimal solution or one that works reasonably well, so a subset of the all the potential roles.
  • Whereas all the actual roles set are assumed to exist and be useful in an enterprise context from the set of candidate roles a overwhelming majority will be eliminated.

So instead of trying to find all roles, why not just redefine the problem and take into consideration the correct premises to derive the set of correct roles. It all comes in the end to the old saying, why buy the cow (i.e. define the set of ALL candidate roles) when you can have the milk for free (i.e. the actual roles)?

October 10, 2008

Definition of Role Mining

Filed under: Role Mining, role definition — Tags: , — Raluca Teodora Stoian @ 4:17 am

A definition by extrapolation:

Data mining is the use of automated data analysis techniques to uncover previously undetected relationships among data items. Data mining often involves the analysis of data stored in a data warehouse. So, role mining = the use of automated data analysis techniques to uncover previously undetected relationships among users and entitlements (resources, permissions/privileges/rights). Role mining involves the analysis of user information (attributes and) stored in a various systems.

It is obvious how one can use data mining to perform privilege-to-role assignment. But user-to-role can only be done from a business point of view. And it implies careful analysis of the business processes to define job functions and then specify appropriate roles from them. While this approach can be quite accurate, it is tedious and time consuming since it requires understanding the business semantics. Unless using text mining I don’t think this can be defined as role mining.

Distinctions to be made:

  • Role Mining = bottom up
  • Role Engineering [?] = top down
  • Role definition [?] = both

Provisioning – A Definition

Filed under: provisioning, role definition — Raluca Teodora Stoian @ 3:25 am

In an enterprise environment, provisioning mechanisms are used to ensure that users have access only to the entitlements that they need in order to perform the responsibilities assigned to them throughout their full life-cycle (i.e. employment to separation). Provisioning technologies should automate the previously manual responsibilities of the human resources and information technology departments. More formally, provisioning is the automation of all the life-cycle steps required to setup, maintain and terminate user access to directory and/or data target systems.

The life-cycles to be defined in order to assign users to the required level of access depend on the chosen provisioning model (i.e. rule-based provisioning, role-based provisioning).

September 14, 2008

Paper Published

Filed under: bugzilla, defect prediction, mozilla, mozilla firefox — Raluca Teodora Stoian @ 6:19 pm

So finally my work in defect prediction was published. Here is a presentation.

March 5, 2008

Mining for defects – Mozilla Firefox

Filed under: bugzilla, defect prediction, mozilla, mozilla firefox — Raluca Teodora Stoian @ 1:58 am

The main problem with using software repositories in defect prediction is the lack of integration of the CVS history files and defect tracking systems. You can link the PRs with MRs using the PR identification number available both in the MRs in CVS and in the PRs in Bugzilla.

A real challenge is to associate the bug reports in Bugzilla with the specific Firefox releases. The data collection process takes place at moment t3 and the goal is to collect bugs that are in the source code the moment of the release, t1. This is not trivial as the following example illustrates. Suppose at the time of the release t1 a defect was in the source code. If the defect was solved after the release, say at t2 or t’, the bug at t3, when data is collected, is labeled as being resolved.

But we are dealing with an open source environment. It may happen that a bug was solved, the commit message exists in the CVS history file at t2, but the bug status was not modified in Bugzilla environment. It may also be the case that the commit message in CVS is not reflecting the change performed, it does not have a PR identification number associated with it, even if the change resolves a problem and it is reported in Bugzilla at t’.

There is a lot of debate with respect to whether size and complexity can predict defects. We argue that there is value in size and complexity metrics with respect to defect prediction and that research should rather focus on to what extent can size and complexity predict defects or in what particular cases we can predict defects based on size and complexity metrics. In this context, we present our interpretation of the results.

Data Collection

Filed under: bugzilla, defect prediction, mozilla, mozilla firefox — Raluca Teodora Stoian @ 1:50 am

As a non critical software system, it is widely recognized that Firefox contains post release defects. OSS facilitates the collection of data to be used in defect prediction models. An important requirement for OSS code is that it should be rigorously modular, self-contained and self explanatory, to allow development at remote sites. Therefore, the data that can be used for prediction models in OSS could be retrieved from the source code version (CVS) repositories and bug tracking systems. On the other hand, OSS development is characterized by lack of a formal process, poor design and architecture, and development tools that are not comparable to those used in commercial development. Few of the defect prediction approaches in commercial software can be directly applied to OSS development, however results obtained from OSS prediction models can be used in an industry environment.

1. Versions

Firefox is based on independent Mozilla Core components layered together. Due to this architecture some of Mozilla’s applications share many components, but they are fundamentally different in functionality.

The Mozilla source code is organized in several branches. The trunk is the main branch, the central source code that is used for continuous and ongoing development. Trunk builds contain the very latest changes and updates. However, the trunk can also be very unstable at times. When development is started for a specific Mozilla version a new branch is created. At conception, a derived branch contains everything that the principal branch contains. Firefox 1.0 branch was derived from Mozilla Branch 1.7 while Firefox 1.5 from Mozilla Branch 1.8. Firefox branches that are forked from the existing Mozilla branch will be used for all future releases of Firefox. The term release is used in OSS development to refer to different types of releases: major and minor, alpha and beta.

Firefox Branch 1.5.0.3 resynchronized the code base with the trunk which contained additional features not available in Firefox 1.0. On the other hand, in release 1.5.0.3 the focus was not on adding features but on improving security related aspects, which were bypassed in version 1.5.0. This peculiarity of the three selected releases allowed us to test if the performance of a defect prediction models increases when trained on data collected from major releases instead of minor ones.

2. Module Selection

The reason behind branching is that components that need to be prepared for a future release are at the same time continuously developed on the trunk. A distinction needs to be made between Firefox-specific source code, i.e. code that does not support any other Mozilla application, and the Mozilla components that support Firefox.

3. Metrics

To derive the product metrics for each source file Understand C++ can be used. The tool computes the source code metrics for C and C++ programs and generates metrics reports. The reports contain three categories of metrics: project level, file level, and function level. It also contains object oriented metrics for the .cpp files.

March 4, 2008

Mozilla Bugzilla Reporting Process – aka a bug’s lifecycle

Filed under: bugzilla, defect prediction, mozilla firefox — Raluca Teodora Stoian @ 6:51 am

The Mozilla project relies on Bugzilla, a defect tracking system, to monitor problem reports (PR), i.e. bugs. A PR in Bugzilla has several pre-defined attributes. Some fields, such as the PR identification number and creation timestamp, are created when the report is first filed. Other fields, such as the product, component, and severity, are selected by the testers when the report is filed and may be changed over the lifetime of the report. Other fields routinely change over time, such as the current status of the report, and if resolved, its resolution state.

Studying the lifecycle of a bug facilitates linking the Bugzilla PRs and CVS Modification Reports (MRs). The status and resolution fields define bugs as evolving entities that change over time. When a tester enters a new bug in Bugzilla the status of the bug is set to UNCONFIRMED. The Mozilla quality assurance team will look at it and confirm the bug exists and changing its status to NEW. After a developer looks at the bug and either accepts it or assigns it to someone else, the bug’s status becomes ASSIGNED. Once the bug is fixed, its status changes to RESOLVED. Finally, the quality assurance team verifies that the bug was indeed fixed and the status is set to VERIFIED and then CLOSED. If the quality assurance team is not satisfied with the solution, than the bug is REOPENED and the process starts again. A report can be RESOLVED in various ways. Bugzilla PRs indicate this in the resolution field. If the bug was solved and this resulted in a change to the code base, the bug is resolved as FIXED. When a developer determines that the bug is a duplicate of an existing report then it is marked as DUPLICATE. If the developer is unable to reproduce the defect, then the resolution is set to WORKSFORME. If the report describes a problem that will not be fixed, i.e. it is not an actual bug, the report is marked as WONTFIX or INVALID.

In Bugzilla terminology, a bug can be anything that needs to be tracked. Some entries are not real bugs, i.e. defects, but rather enhancements. When analyzing a report in Bugzilla, the quality assurance team rates severity of the bug using one of the following labels: blocker, critical, major, normal, minor, trivial, or enhancement.

While Bugzilla contains information about defects, it does not contain information about the location of the defects in the source code. Instead, this information is captured in the CVS log files. CVS Modification Reports (MRs) keep the complete history of any file in the project, including when and what was modified. Bonsai, Mozilla’s web interface to its CVS repository, can be used to retrieve MRs related to source files, comments associated with the files, and the timestamp of the commit message. Each comment acknowledges the people who submitted the change and contains relevant PR identifications numbers (if any). Every number that appeared in a MR’s comment field was a potential link to a bug, indicating that that commit message solved a PR. We selected the number as a candidate for a bug id if the following two conditions were met: the number had the length less than 6 digits and the comment message contained the keywords bug, bug id, id or # before the number.

Firefox Development Process

Filed under: mozilla, mozilla firefox — Raluca Teodora Stoian @ 6:47 am

Firefox is based on independent Mozilla Core components layered together. Due to this architecture some of Mozilla’s applications share many components, but they are fundamentally different in functionality.

The Mozilla source code is organized in several branches. The trunk is the main branch, the central source code that is used for continuous and ongoing development. Trunk builds contain the very latest changes and updates. However, the trunk can also be very unstable at times. When development is started for a specific Mozilla version a new branch is created. At conception, a derived branch contains everything that the principal branch contains (Figure 1). Firefox 1.0 branch was derived from Mozilla Branch 1.7 while Firefox 1.5 from Mozilla Branch 1.8. Firefox branches that are forked from the existing branch will be used for all future releases of Firefox. The term release is used in OSS development to refer to different types of releases: major and minor, alpha and beta. Due to data availability constraints, we have only considered two major releases, 1.0 and 1.5, and a minor release, 1.5.0.3, in our work presented here.

Firefox Branch 1.5.0.3 resynchronized the code base with the trunk which contained additional features not available in Firefox 1.0. On the other hand, in release 1.5.0.3 the focus was not on adding features but on improving security related aspects, which were bypassed in version 1.5.0. This characteristic of the three selected releases allowed us to test if the performance of a defect prediction models increases when trained on data collected from major releases instead of minor ones.

The reason behind branching is that components that need to be prepared for a future release are at the same time continuously developed on the trunk. A distinction needs to be made between Firefox-specific source code, i.e. code that does not support any other Mozilla application, and the Mozilla components that support Firefox.

Blog at WordPress.com.