The section will focus on publishing papers that present benchmarks that consist of input data that methods are meant to operate upon, expected output data against which tool output can be compared, a specification of metrics used to assess performance, and performance values of sets of tools that have been run through the benchmark. The editors provided their unique criteria check list for the Benchmarking research articles:
- Methods being benchmarked must be in an active area of research.
- Tools evaluated should be comprehensive, or at least judiciously selected, among those publicly available in the field.
- The input- and expected output datasets used in the benchmark must be made freely available in a form that makes them easy to apply and reuse, to serve as validation datasets for new method development.
- Benchmark results must be trustworthy. This criterion is best achieved by being completely transparent about how the benchmark was conducted and making the results reproducible, ideally by making code to perform the assessment publicly available.
- Metrics used to measure tool performance should reflect different goals with practical relevance for potential users. There should be a limited number of metrics evaluating orthogonal aspects of tool performance (e.g., classification accuracy measured by an area under the curve (AUC) value and classification speed measured in seconds). If there are multiple distinct applications for the methods, inclusion of a suite of goal-specific datasets and metrics can be more appropriate.
- Methods benchmarked should be publicly available. Tools that are available free of charge as executables or—even better—at the source code and platform level should be appropriately credited and given preference for inclusion in evaluations. The training conditions of each tool should be clearly indicated.
- Novel tools created by the authors in parallel with the benchmark should not be included in the article, as they are essentially guaranteed to perform well. If they are included, this caveat should be prominently stated.
We encourage community members working on manuscripts that fit these considerations to submit to this new section dedicated to benchmarking. Overall, we hope that creation of this section in PLOS Computational Biology will help elevate this research activity to where it belongs: the heart of computational biology.
Interested in ready our current benchmarking research and editorial? This can all be found at the PLOS Computational Biology Benchmarking collection page: https://collections.staging.plos.org/benchmarking