MSR Mining Challenge 2007

May 19-20, 2007

Special track within MSR 2007,
International Workshop on Mining Software Repositories  

Co-located with ICSE 2007,
IEEE International Conference on Software Engineering


Thomas Zimmermann (chair)
Dept. of Informatics
Saarland University, Germany
Harald Gall
Dept. of Informatics
University of Zurich, Switzerland
Michele Lanza
Dept. of Informatics
University of Lugano, Switzerland

Research Jury

Marco D'Ambros
(U. of Lugano, Switzerland)
Sung Kim
Martin Pinzger
(U. of Zürich, Switzerland)
Peter Weißgerber
(U. of Trier, Germany)

Eclipse Jury

Erich Gamma
(IBM, Switzerland)
Darin Swanson
Andre Weinand
(IBM, Switzerland)

Firefox Jury

Mike Shaver
(Mozilla Corporation, Canada)


Co-located with ICSE 2007,
Minneapolis, USA


This year's International Workshop on Mining Software Repositories (MSR 2007) will host a mining challenge. The MSR Mining Challenge brings together researchers and practitioners who are interested in applying, comparing, and challenging their mining tools and approaches on software repositories for two common open source projects: Eclipse and Firefox.

The challenge wil be an excellent opportunity to get feedback on your research by open source stars such as Erich Gamma, Darin Swanson, and Andre Weinand (for Eclipse) and Mike Shaver (for Firefox).

The input data sources for the challenge include but are not limited to the following: source code releases, source control data, bug data, mailing lists, execution traces, design and project documentation. The basic data sources of both open source projects are available online and can be downloaded from the projects' web sites: for Eclipse go to; for Firefox go to

There will be two challenges: #1: scale and #2: predict.

Frequently Asked Questions

Challenge #1: Scale

In this category you can demonstrate the usefulness of your mining tools. The main task will be to find interesting insights by analyzing the software repositories of Eclipse and Firefox. Both systems are large in size, several years mature, and provide lots of input for mining tools. Since the submissions will be evaluated together with experts and developers of Eclipse and Firefox, this will be a unique opportunity to get candid feedback on your tools.

Participation is straightforward:

  1. Select your mining area (one of bug analysis, change analysis, architecture and design, process analysis, team structure).
  2. Get project data of Eclipse and/or Firefox.
  3. Formulate your mining questions.
  4. Use your mining tool(s) to answer them.
  5. Write up and submit your challenge report.

The challenge report should describe the results of your work and cover the following aspects: questions addressed, input data, approach and tools used, derived results and interpretation of them, and conclusions. Keep in mind that the report will be evaluated by both developers and researchers. Reports must be at most 4 pages long and in ICSE format.

The submission will be via Easychair ( Each report will undergo a thorough review and accepted challenge reports will be published as part of the MSR 2007 proceedings. Authors of selected papers will be invited to give a presentation at the MSR workshop in the MSR Challenge track.

Challenge #2: Predict

This year, the MSR Mining Challenge will have a special task:

Predict for Eclipse the number of bug/changes that will happen between February 1 and April 30, 2007 (both days included).

Participation is as follows:

  1. Pick a team name, e.g., SCHNITZL.
  2. Come up with predictions for changes and/or bugs based on some criteria or prediction model. A very simple model is for instance the number of past changes/bugs.
  3. Annotate the corresponding files with your predictions Here is a sample annotation: plugins-annotated-schnitzl.txt
  4. Write a paragraph (max 200 words) that describes how you computed your predictions.
  5. Submit everything before January 31 February 7 (Apia time) by email to

Obviously, the team with the best predictions will win. However, to increase the competition, we will organize a set of "benchmark" predictions.

Bug Prediction

The predictions for bugs should be on the component level. A component is specified directly bug reports. For instance bug report 42233 was reported for the component "UI" of the product "JDT". For the challenge, we will consider the core products of Eclipse: Equinox, JDT, PDE, Platform. A complete list of relevant products and components is in the file components.txt. Note, that we will not remove duplicates from the final counts.

Change Prediction

The predictions for changes should be on the plug-in level. We will count the number of 'Exp' changes (deleting a file is not a change) in "org.eclipse.*" modules. We will consider all branches. To keep things simple, we map modules to plug-ins by taking the first three parts of their namespace. For example "org.eclipse.jdt.ui" maps to "org.eclipse.jdt". You can find more examples in the file mapping.txt

More precisely, to determine the actual number of changes we will use the following script:

grep -r -E "date\W+2007\.0[234].+\W+author.+state Exp;" * | grep "^org.eclipse" | sed 's/org.eclipse.//' | sed 's/\..*//' | sed 's/\/.*//' | sed 's/-feature//' | uniq -c


Feel free to use any data source for the Mining Challenge. For your convenience, we provide mirrors for some of the repositories of Eclipse and Mozilla. We are sorry for splitting up the files, but our Apache server only supports files up to 2GB.

We also have links to pre-processed data:

If you want to provide additional data, please send an email to the discussion list (below) and I will link the data.

Discussion List

Since many mining approaches require advanced extraction techniques to process the basic data sources, participants are encouraged to collaborate by sharing extracted data and co-authoring their reports. To facilitate and encourage the sharing of extracted repository data and early results, a dedicated discussion list ( has been established.

Google Groups Subscribe to MSR Mining Challenge 2007
Browse Archives at

Important Dates

  • Submission of reports and predictions: 31st January 2007 extended: 7th February (Apia time)
  • Acceptance notification: 20th February 2007 now: 24th February 2007
  • Camera-ready: 2nd March 2007
  • Workshop date: 19-20 May 2007

Previous Challenges

Nedstat Basic - Free web site statistics