The authors used the article review template to review articles in a common format. Each author individually scored all 27 articles first. After each author finished reviewing all the articles, we compared the ratings. We conducted repeated interviews until we reached a consensus on the scoring of each article. In cases where we could not reach a consensus, we sought the opinion of an expert in the field of measurement and evaluation and science education.