Addressing Challenges When Applying GRADE to Public Health Guidelines: A Scoping Review Protocol and Pilot Analysis

This is a protocol for a scoping review that aims to determine how guideline authors using the Grading of Recommendations, Assessment, Development, and Evaluations (GRADE) approach have addressed previously identified challenges related to public health. The Joanna Briggs Institute (JBI) methodology for scoping reviews will be followed. We will search and screen titles of guidelines for all languages published in 2013–2021 in: the GIN library, BIGG database, Epistemonikos GRADE guidelines repository, GRADEpro Database, MAGICapp, NICE and WHO websites. Two reviewers will independently screen full texts of the documents identified. The following information will be extracted: methods used for identifying different stakeholders and incorporating their perspectives; methods for identification and prioritization of non-health outcomes; methods for determining thresholds for decision-making; methods for incorporating and grading evidence from non-randomized studies; methods for addressing concerns with conditional recommendations in public health; methods for reaching consensus; additional methodological concerns; and any modifications made to GRADE. A combination of directed content analysis and descriptive statistics will be used for data analysis, and the findings presented narratively in a tabular and graphical form. In this protocol, we present the pilot results from 13 identified eligible guidelines issued between January and August 2021. We will publish the full review results when they become available.


Introduction
Guidelines have been defined as 'systematically developed evidence-based statements which assist providers, recipients and other stakeholders to make informed decisions about appropriate health interventions' [1]. As such, guidelines require a rigorous and transparent approach for development, and developers should aim to meet standards in their development by groups such as the Guidelines International Network (GIN) [2]. Guideline developers across the globe are increasingly endorsing the Grading of Recommendations, Assessment, Development, and Evaluations (GRADE) approach for guideline development, a trustworthy and sensible approach for moving from the evidence to making a recommendation [3]. The GRADE's output are evidence summaries (with assessment of the certainty of the evidence) and graded recommendations (with assessment of the strength of a recommendation and an overall certainty of the evidence). It is being used for all types of evidence synthesis and for the development of guidelines [3].
Guidelines may be developed across many different fields, including clinical practice, environmental health, and public health. Public health guidelines may be challenging to develop, partly due to the complex nature of the interventions assessed in such guidelines [4]. In public health, policy makers go beyond the usual efficacy and safety aspects of interventions, as is more common in clinical practice guidelines, and guidance on how to deliver interventions is just as important [4]. What constitutes a "public health guideline" has not been exactly defined, and although some organizations use the term to define their guidelines, most do not label the type of developed guidance as clearly. The most important aspect in defining a public health guideline seems to be the population perspective (rather than the perspective of the individual), the complexity of the interventions, and the scope encompassing broader policies, health reforms, population-wide interventions, with the respective target users (such as policy makers, governments, community leaders, relevant organizations).
The GRADE Public Health Group (the Group) was approved by the GRADE Guidance Group in October 2017 to improve the methodology of applying GRADE in the development of public health guidelines [5]. The Group conducted a scoping review investigating the experiences of applying GRADE in public health and existing research activity in this area [5]. The result was an overview of current scientific knowledge in the field and the challenges identified in the literature and by experts, which include difficulties incorporating diverse perspectives in guideline panels, selecting outcomes (especially non-health outcomes), interpreting outcomes and identifying a threshold for decision-making, assessing the certainty of evidence from diverse sources, and addressing implications for decision-makers (e.g., concerns about conditional recommendations, or strong ones based on very low certainty of evidence) [5].
The Group proposed the following solutions to answer the identified challenges: to identify the training needs of public health guideline developers (and their stakeholders) in understanding and using the GRADE concept; to develop and disseminate detailed examples of the application of GRADE to public health topics; to adapt GRADE training materials for public health and public policy audiences [5]. This paper mainly addresses the second proposed solution and aims to provide both concrete examples of how GRADE is currently being used in public health guidelines and a discussion on which parts of the current GRADE guidance need to be adapted to public health topics.
In the proposed scoping review, we intend to build on the work of the Group by searching for GRADE public health guidelines and determining how the guideline authors may (or may not) have approached the challenges identified by the recent paper, if encountered within the guideline development process, and document any additional challenges or modifications to GRADE. We aim to analyze the guideline authors experience, methods, and examples related to these specific challenges, as described in the guidelines themselves.
A preliminary search of MEDLINE, Epistemonikos, PROSPERO, and Open Science Framework (OSF) was conducted on 22 August 2021, and no current or ongoing systematic reviews or scoping reviews on the topic were identified.
The aim of this review will be to identify the key methodological characteristics of public health guidelines that used the GRADE approach, particularly in terms of how developers addressed or overcame challenges when applying GRADE. We chose a scoping review as the appropriate method as it aligns with the purpose of the review to provide an overview of the methods used by guideline authors in relation to the observed challenges when applying GRADE to public health guidelines [6]. The review does not aim to assess the guideline methodological quality or develop methodological guidance on addressing possible challenges, nor will it be an overview of recommendations.
The results of the review will be used to further address the challenges of using GRADE in the development of public health guidelines, and, together with other research, will form a basis for GRADE concept articles or guidance on the topic. The work will benefit all relevant stakeholders (public health policy makers, governmental and nongovernmental organizations, professional organizations, and all end-users of public health interventions) involved in the development, dissemination, and implementation of public health guidelines by allowing for a production of more trustworthy guidelines in public health. Too often, public health interventions are being implemented that do not benefit the public fully or are not based on robust scientific evidence. This work aims to improve the methods and approaches to developing public health guidance so that activities for health protection are based on trusted and robust evidence, cost-effective (avoiding waste of resources), useful, acceptable to all relevant stakeholders including the public, feasible, and ethical. We aim to enhance the health and financial literacy of the wider lay public. The results of the full review will help overcome some of the commonly observed challenges when formulating public health recommendations, such as the complexity of interventions and interpretations or situations when recommendations need to be formulated in the absence of robust evidence.

Methods
The proposed scoping review will be conducted in accordance with the Joanna Briggs Institute (JBI) methodology for scoping reviews [7,8] and will be informed mostly by the previous work of the GRADE Public Health group [5]. The Preferred Reporting Items for Systematic Reviews and Meta-analyses extension for scoping reviews (PRISMA-ScR [9]) will be used, bearing in mind the recently updated PRISMA 2020 [10] with changes relevant to scoping reviews described in the chapter on scoping reviews in the JBI manual on evidence synthesis [7].
According to the JBI methodology for scoping reviews, as with all good quality systematic reviews, an a priori protocol needs to be developed and published before undertaking the scoping review. The aim of such a protocol is to predefine the objectives, the review question and eligibility criteria, the detailed methods, and the reporting of the review, allowing for transparency of process. The protocol serves as a plan for the scoping review and should limit reporting bias. Any deviations of the scoping review from the protocol will be clearly highlighted and explained in the full scoping review [7].
An international team of experts in the field of guideline methodology and public health was assembled to cooperate on this scoping review. A piloting phase was conducted to improve the previously drafted methods and to further develop the necessary tools (e.g., for data extraction). After the piloting phase, the objectives have been rephrased more clearly, the eligibility criteria refined, the simple screening tool tested (see Appendix), and the data extraction tool modified and placed in an online form. In this section, we describe the basic methods of the proposed scoping review, followed by a presentation of the results of the piloting phase.

Review Question
How did guideline developers handle the specific challenges of applying the GRADE approach to developing recommendations in public health?
Challenges include: incorporating diverse perspectives in guideline panels, selecting and prioritizing outcomes (especially non-health outcomes), interpreting outcomes and identifying a threshold for decision-making, assessing the certainty of evidence from diverse sources, and addressing implications for decision-makers (e.g., concerns about conditional recommendations, or strong ones based on very low certainty of the evidence) [5].
We will aim to answer the following specific review questions: 1.
How were previously identified challenges addressed within public health guidelines? 2.
What additional challenges have been identified within public health guidelines? 3.
Have any modifications been made to the GRADE approach within public health guidelines, and how were they justified?

Concept
The focus of this review is how guideline development groups addressed some of the pre-identified challenges when issuing public health guidance. This includes the methods used for identifying different stakeholders and incorporating their perspectives; methods for identification and prioritization of any non-health outcomes; methods for determining thresholds for decision-making; methods for incorporating and grading evidence from non-randomized studies; methods for addressing concerns with conditional recommendations in public health; and methods for reaching consensus and the formal approval of recommendations.

Context
We will include only public health guidelines that used the GRADE approach made from a population perspective, i.e., not intended to cover questions about preventive or treatment measures in any specific group of individuals. Guidelines can be targeting policy makers and cover questions of health policies, management, and broader public policies. We will include only the most recent version of the guideline (the latest update). We will include guidelines published in any language. Geographically, guidelines of all scopes can be included: local, national, regional, or global.

Types of Sources
This review will focus on guidelines for public health interventions related to health protection, health services, and health improvement, dealing with mostly primordial and primary prevention. It will mainly include interventions implemented on a population level, e.g., population-level prevention programs, health system reform, regulation of unhealthy commodities, infrastructure development, social security policies, and the reduction of health inequalities. To state a few examples: screening/prophylaxis, infection prevention and control, eradication efforts, vaccination and related topics, healthcare management, healthcare services, environmental health, promotion of health in populations outside of the healthcare system (e.g., schools). Secondary and tertiary prevention guidelines covering topics such as the prevention of specific conditions in a healthcare setting, prevention of fractures in long-term facilities, prevention of surgical site infections, and the prescription of preventive medications (e.g., statins) will be excluded as these often do not use the population perspective and are linked to the clinical setting. We assume we will not include rapid guidelines as they often do not use robust and transparent methods.

Search
The search will aim to locate public health guidelines that used GRADE. Based on a preliminary search, the pilot search, and consultations with information specialists, we decided to focus on databases and repositories of GRADE (or mostly GRADE) guidelines. The following databases, repositories, and websites will be searched: the GIN international guideline library and registry of guidelines in development ( The search will be mostly manual and will not use any text, keywords, or index terms. The reference list of included guidelines will not be searched as it is unlikely that this would result in additional relevant guidelines; however, the citations recommended by guideline authors (either on the websites or in the introductory parts of the documents) will be screened for relevant documents. If the identified document is found to be part of a bigger set of guidelines (e.g., Module 1, Module 2) or a Supplementary Document, all of the related documents and any Supplementary Files or tools will be collated in one folder and regarded as one guideline. Any related documents found on the websites/source of the citation or in the reference lists will be collated and used during extraction (e.g., stakeholder workshops, minutes, recordings, methodological manuals, annexes). Only the latest version of the guideline will be used unless previous versions are referenced as providing necessary information for the interpretation of the updated version. Guidelines published in any language will be included. We will limit the search to 2013-2021 because we are looking for recent developments in the field. The year 2013 has been agreed on by an expert group due to major developments in using GRADE in public health occurring around that time [11][12][13]. The guideline authors will not be contacted; only information provided in the guidelines will be used.

Evidence Selection
During the manual search and title screening, the senior researchers and information specialists will note the potentially relevant records that have been identified into a Microsoft Excel sheet (title, organization/author, publication date, source, URL) and manually remove duplicates. Two reviewers will independently screen full texts of the identified documents against the eligibility criteria. Any disagreements will be resolved via discussion or with a third reviewer. A simple screening algorithm will be used (see Table S1). The relevant full texts of guidelines will be collated and uploaded into an online shared folder, including any related documents and Supplementary Material. The results of the search and the study inclusion process will be reported in full in the final scoping review and presented in a Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) flow diagram [10].

Data Extraction
Data will be extracted from guidelines included in the scoping review by a team of experienced reviewers who will hold regular meetings for consultations on any issues that Int. J. Environ. Res. Public Health 2022, 19, 992 6 of 18 may arise. Furthermore, the meetings will aim to ensure that all reviewers are trained and consistent. The project leader will revise the data extracted for each guideline. The following data will be extracted: A draft extraction form is provided in Table S2 in the Supplementary Materials. Data on the methods used in the respective guidelines will be extracted in the form of: (a) reproduction of the relevant text; (b) yes/no responses; (c) if none of the above is possible, interpretation may sometimes be needed (e.g., interpretation of the evidence profiles). The last option will always be checked by another extractor. The questions and items of the data extraction form were revised based on the piloting phase and consultations with GRADE methodologists.

Data Analysis and Presentation
A combination of content analysis (deductive and inductive) and descriptive statistics will be used for data analysis. Some of the collected data (the yes/no items) will be quantified, and simple descriptive statistics (distribution) will be used for analysis.
To analyze the textual data, we will use directed content analysis [14], imposing predefined evidence/theory from the recent GRADE Public Health Group paper [5] on the data while also allowing emergent codes to be identified. The paper describes five challenges when using the GRADE approach to develop public health guidelines: (1) incorporating diverse perspectives, (2) selecting and prioritizing outcomes, (3) interpreting outcomes and identifying a threshold for decision-making, (4) assessing the certainty of evidence from diverse sources, including non-randomized controlled studies (NRS), and (5) addressing implications for decision-makers, including concerns about conditional recommendations. These challenges will each constitute one theme. Based on other recent work of the GRADE Working Group, we will elaborate on some of the challenges (e.g., for theme 5, we will also explore the use of good practice statements and their equivalents) and add one more theme: (6) formulating and agreeing on recommendations. Furthermore, we will address any new challenges and any modifications made to GRADE. If we identify additional challenges besides those mentioned above, we will add new themes. We will use a combination of deductive and inductive content analysis.
Two coders will independently read the data extracted from the guidelines twice. During the coding process, the coders will highlight the important and relevant portions of the text and choose a word, phrase, or description to represent the meaning of the text segment. The codes that were developed in the pilot phase will be used as predefined codes in the full analysis, and new ones will be added. Codes with similar concepts will be grouped together to form categories. Where appropriate, codes or categories will be accompanied by explanations and examples from the text. Coders will not aim to quantify the codes' occurrence. We will use the codes and categories to populate the six predefined themes, and add new themes as needed.
In the final review, we will present the results of the search in a modified PRISMA flowchart [10], the summary of the basic characteristics of the identified guidelines, and the results of the screening process. Analysis results will be presented narratively, codes and categories for each theme, where applicable, will be presented in a tabular form, and the quantified data will be presented in graphs. Any deviations and changes from the protocol will be reported in the full review.
The whole process was piloted on public health guidelines published from January to August 2021; the results are presented below. The purpose of the pilot phase was to: (a) test and improve the methods proposed in the protocol; (b) develop and improve the necessary tools; (c) refine the process and content of data extraction; (d) assemble and train a team of reviewers and set up the necessary group processes; and (e) determine the scope of the work. We used a modified approach for the pilot, and after its completion, refined the methods of the proposed scoping review.

Preliminary Results (Pilot Phase) and Discussion
In August 2021, we manually searched the databases and repositories according to the protocol outlined above for any relevant public health guidelines that used the GRADE approach and were issued between January and August 2021, all languages included. Two independent reviewers screened the full texts of documents against the screening criteria and identified 13 relevant guidelines. One was issued by the Registered Nurses' Association of Ontario (RNAO, Boucherville, QC, Canada), one by the National Institute for Health and Care Excellence (NICE, London, UK), and the remaining 11 by the World Health Organization (WHO) ( Table 1).  1 We included this guideline to incorporate more organizations in the pilot extraction.
One reviewer extracted relevant text and data from the guidelines and performed the analysis as specified in the protocol above, and two other co-authors reviewed the work. The results of this pilot analysis of the 13 public health guidelines are organized around the six predefined themes. No conclusions should be drawn based on the pilot analysis, as its purpose was to determine the best approach to conducting the full review. Therefore, we do not provide a discussion of the pilot results here.
For theme 1 (incorporating diverse perspectives and identifying stakeholders), we extracted relevant text related to the methods for identifying stakeholders and including them in the guideline development process and performed a content analysis. The results are shown in Table 2. For theme 2 (identifying and prioritizing outcomes), we extracted the number of guidelines that used non-health outcomes and provided examples ( Figure 1) and the number of guidelines that used the GRADE approach for prioritization of outcomes ( Figure 2). For content analysis in theme 2, we extracted and coded text related to how outcomes were identified and selected, and what methods, if any, were used for their prioritization (Table 3). In the full review, we will also present the non-GRADE approaches to outcome prioritization and any modifications to the GRADE approach.
Due to a lack of data, we did not analyze the data for theme 3 (interpreting outcomes and identifying a threshold for decision-making) in the pilot phase. No data has been found in the guidelines to populate theme 3.
For theme 4 (assessing certainty of evidence from diverse sources, including NRS), we extracted information on which study designs were used to inform the guidelines (Figure 3), whether evidence from NRS led to a ranking of moderate or high certainty of evidence (CoE) (Figure 4), the number of guidelines in which NRS started with high CoE (Figure 5), and whether RCTs and NRS evidence was pooled when assessing CoE ( Figure 6). In the full review, these data will be accompanied by an analysis of the specific reasons and/or examples of assessing moderate or high CoE based on NRS evidence and of the rationale for NRS starting at high CoE and for pooling of RCTs and NRS. Furthermore, we will be extracting and analyzing data on the methods for assessing the overall CoE-whether this was done in the guideline or whether the GRADE approach was used-and providing a description of any modifications to GRADE in assessing the overall CoE.
For theme 2 (identifying and prioritizing outcomes), we extracted the number of guidelines that used non-health outcomes and provided examples (Figure 1) and the number of guidelines that used the GRADE approach for prioritization of outcomes ( Figure 2). For content analysis in theme 2, we extracted and coded text related to how outcomes were identified and selected, and what methods, if any, were used for their prioritization (Table 3). In the full review, we will also present the non-GRADE approaches to outcome prioritization and any modifications to the GRADE approach.    Using the GRADE prioritization of outcomes approach Figure 2. The use of the GRADE approach to prioritization of outcomes. (The GRADE approach defined as: ranking of outcomes 1-9, and dividing into critical, important, non-critical). Table 3. Categories and codes for theme 2.

THEME 2 (Predefined): Identifying and Prioritizing Outcomes CATEGORIES CODES
Identifying and selecting outcomes CODE: due to the variability of outcome reporting, decision rules for selecting outcomes were used CODE: outcomes identified via review of the literature CODE: outcomes identified via key informant interviews and discussion groups CODE: outcomes identified via expert panel survey prior to the in-person meeting CODE: outcomes identified via expert panel discussion at an in-person meeting CODE: primary outcomes were agreed upon by the GDG [seems after the relevant studies were identified, not a priori] CODE: scoping exercise of guidelines and systematic reviews of the Guideline topics informed the outcomes CODE: scoping review of target population s values and preferences informed the outcomes Prioritizing outcomes CODE: outcomes were aligned with the Sustainable Development Goals CODE: priority outcomes were aligned with a previous (related) guideline CODE: Critical and important outcomes were agreed between the review team and the Steering Group and were endorsed by the GDG CODE: outcomes were prioritized via an online survey-members ranked the importance of each outcome on the GRADE rating scale of 1-9 (0-3: not important; 4-6: important; 7-9: critical). CODE: up to five priority outcomes were determined based on confidential voting by each member, and a subsequent facilitated discussion of the voting results CODE: online vote determined critical outcomes if 70% of the votes were ranked 7-9 on a 9-point Likert scale Due to a lack of data, we did not analyze the data for theme 3 (interpreting outcomes and identifying a threshold for decision-making) in the pilot phase. No data has been found in the guidelines to populate theme 3.
For theme 4 (assessing certainty of evidence from diverse sources, including NRS), we extracted information on which study designs were used to inform the guidelines (Figure 3), whether evidence from NRS led to a ranking of moderate or high certainty of evidence (CoE) (Figure 4), the number of guidelines in which NRS started with high CoE (Figure 5), and whether RCTs and NRS evidence was pooled when assessing CoE ( Figure  6). In the full review, these data will be accompanied by an analysis of the specific reasons and/or examples of assessing moderate or high CoE based on NRS evidence and of the rationale for NRS starting at high CoE and for pooling of RCTs and NRS. Furthermore, we will be extracting and analyzing data on the methods for assessing the overall CoEwhether this was done in the guideline or whether the GRADE approach was used-and providing a description of any modifications to GRADE in assessing the overall CoE.

Only RCTs 23%
Any type of study 54% Any comparative study 7%

Not reported 8%
Any type of study was included but only RCTs were used to determine the overall cerainty of evidence 8% Eligible study designs searched in the guidelines (n = 13)

Yes, 22%
No, 78% Number of guidelines where high or moderate overall certainty of evidence rating was based on non-randomized controlled studies (n = 9) No 46% Not applicable (no NRS) 31% Yes 23% Number of guidelines in which non-randomized controlled studies start at high certainty of evidence in the GRADE assessment (n = 13)

Yes, 22%
No, 78% Number of guidelines where high or moderate overall certainty of evidence rating was based on non-randomized controlled studies (n = 9) No 46% Not applicable (no NRS) 31% Yes 23% Number of guidelines in which non-randomized controlled studies start at high certainty of evidence in the GRADE assessment (n = 13) Figure 5. Certainty of evidence assessment for NRS-initial rating at high. For theme 5 (addressing implications for decision-makers, including concerns about conditional recommendation), we first identified whether the guideline included strong recommendations based on low or very low CoE (Figure 7). The units of analysis were the extracted texts on the rationale for such recommendations. We identified the codes and developed categories describing the panels' different reasons for developing strong recommendations based on low or very low CoE. We then added further explanations and examples of such recommendations, one for each category (Table 4). Four reasons have been previously identified by Hilton Boon et al. [5]: life-threatening situations; uncertain benefit but certain harm; potential equivalence of effectiveness in which one option is clearly more or less risky or costly; and potential for catastrophic harm. We will include these as predefined categories in the full review. Here, we only present the newly identified categories (Table 4). Number of guidelines with strong recommendations based on low or very low certainty of evidence (n = 13) For theme 5 (addressing implications for decision-makers, including concerns about conditional recommendation), we first identified whether the guideline included strong recommendations based on low or very low CoE (Figure 7). The units of analysis were the extracted texts on the rationale for such recommendations. We identified the codes and developed categories describing the panels' different reasons for developing strong recommendations based on low or very low CoE. We then added further explanations and examples of such recommendations, one for each category (Table 4). Four reasons have been previously identified by Hilton Boon et al. [5]: life-threatening situations; uncertain benefit but certain harm; potential equivalence of effectiveness in which one option is clearly more or less risky or costly; and potential for catastrophic harm. We will include these as predefined categories in the full review. Here, we only present the newly identified categories (Table 4). For theme 5 (addressing implications for decision-makers, including concerns about conditional recommendation), we first identified whether the guideline included strong recommendations based on low or very low CoE (Figure 7). The units of analysis were the extracted texts on the rationale for such recommendations. We identified the codes and developed categories describing the panels' different reasons for developing strong recommendations based on low or very low CoE. We then added further explanations and examples of such recommendations, one for each category (Table 4). Four reasons have been previously identified by Hilton Boon et al. [5]: life-threatening situations; uncertain benefit but certain harm; potential equivalence of effectiveness in which one option is clearly more or less risky or costly; and potential for catastrophic harm. We will include these as predefined categories in the full review. Here, we only present the newly identified categories ( Number of guidelines with strong recommendations based on low or very low certainty of evidence (n = 13)

THEME 5: Addressing Implications for Decision-Makers, Including Concerns about Conditional Recommendations Reasons for Developing Strong Recommendations Based on Low or Very Low Certainty of Evidence Category Explanation (by the Review Team) Example Recommendation
There is a substantial experience of using the intervention (already widely implemented) and no harm.
The "intervention" is already implemented, seems effective, and this is causing the lack of research leading to low or very low certainty of evidence.
"WHO recommends making the self-management of folic acid supplements available as an additional option to health worker-led provision of folic acid supplements for individuals who are planning pregnancy within the next three months." [19] Greatly valued and/or needed by the target population and no known harm.
The target population is suffering greatly from a problem (or an intervention is needed), and any non-harmful intervention will be greatly valued and likely effective. Assessing effectiveness/costs and other aspects seems secondary.
"WHO recommends investing in rural infrastructure and services to ensure decent living conditions for health workers and their families." [22] Using other types of evidence with high confidence (indirect, pharmacokinetic modelling, programmatic data).
Various other-than scientific data (not experience or expert evidence) are available, and the panel has high confidence in them, or is confident that the identified indirect evidence can completely substitute the missing direct evidence (e.g., when one disease has much more evidence than another, but they are essentially the same, common for infectious diseases).
"Children weighing < 20 kg should receive a higher dose of artesunate (3 mg/kg bw per dose) than larger children and adults (2.4 mg/kg bw per dose) to ensure equivalent exposure to the drug." (artesunate is recommended for adult populations with high certainty of the evidence) [27] Potentially equivalent in benefits and harms, and doing intervention (vs. not doing) seems better in all other EtD domains (no reasons against).
When considering whether to perform an intervention or not, in the context of no obvious effects or harms, one option seems better in all other aspects, and there seems to be no reason not to perform the intervention.
"WHO recommends designing and enabling access to continuing education and professional development programmes that meet the needs of rural health workers to support their retention in rural areas." [22] The intervention is ethically necessary ("sound", "basic human right").
"WHO recommends ensuring a safe and secure working environment for health workers in rural and remote areas." [22] A perfect balance of effects, the only problem is low or very low certainty of evidence (lack of higher-certainty research) Recommendation formulated in the context of lack of higher-certainty evidence (usually due to limited evidence, e.g., only observational studies downgraded by 0 or 1 level, or RCTs downgraded by 2-3 levels) when all other aspects are in favour of the intervention.
"People established on anti-retroviral therapy should be offered refills lasting 3-6 months, preferably six months if feasible." [23] Rationale (from the guideline): "Some of the evidence supporting these recommendations came from observational studies with methodological limitations, and there was important variability (heterogeneity) in outcomes across studies." For theme 6 (formulating and agreeing on recommendations), the unit of analysis were texts related to the methods for reaching consensus and agreeing on recommendations. We identified text relevant to the course of the guideline development group (or panel) meetings for formulating and agreeing on recommendations, how the panel prepared for the meeting (what happened prior to the meeting), how those meetings were facilitated and by whom, how the panel agreed on recommendations, and the specific voting thresholds if voting was used (Table 5). Table 5. Categories and codes for theme 6.

THEME 6 (Predefined): Formulating and Agreeing on Recommendations CATEGORIES CODES
Prior to the meeting for agreeing on recommendations CODE: PanelVoice was used to allow panelists to pre-vote on the EtD framework questions CODE: members could draw attention to any new evidence prior to the meeting CODE: recommendations were drafted a priori to the meeting CODE: members received detailed background documents prior to meetings CODE: reviewed the preliminary judgements and comments posted by all members in the online EtD form CODE: reviewed evidence summaries CODE: all members posted comments in an online form (EtD) CODE: members were provided with the EtD frameworks, evidence profiles, and full-text articles The course of the meetings to agree on recommendations CODE: discussion facilitated by the chair and/or methodologist CODE: the meeting was guided by a clear protocol CODE: process guided by the organization s manual/handbook CODE: detailed background documents had been summarized in presentations during each GDG meeting CODE: methodologist facilitated discussions CODE: discussion was facilitated by co-chairs and methodologists CODE: formulate recommendations through a process of group discussion, engagement, and revision CODE: members were presented with a 'neutral' recommendation and decided on its direction and strength Facilitation methods CODE: voting was used as a starting point to build consensus (not as a formal vote) CODE: members were asked to raise their hands in support of each separate option as a decision-making aid (not as a formal vote) CODE: in the online environment, close attention was paid to eliciting responses from all members CODE: regular straw-polling and chat function were used to gain an initial indication of members' views

Limitations and Strengths
The work presented here serves as a protocol for a scoping review. This work does not present the results of primary research. The aim of protocols is, generally, to provide a detailed outline of the proposed review project [7]. The review itself should not start before the protocol is finalized, and ideally, published. We have rigorously followed the JBI methodology for scoping reviews and protocols and included all necessary information in detail [7]. Furthermore, we have piloted the review methods to make sure all of the steps are feasible to carry out. We expect the work to be very extensive, and the review, therefore, is the result of many international experts in the field working in cooperation. This work should be viewed, first and foremost, as a protocol for a scoping review. The reporting of the preliminary pilot results serves solely to explain the proposed methods better, especially how the analysis will be done and in what way the full results will be reported. The main limitation of this paper is, therefore, that it does not include the full results, discussion, and conclusion.
From the pilot results, it can already be seen that most of the identified guidelines were issued by the World Health Organization (WHO). It brings forward the question of over-saturation of the same kind of data if the WHO has used more or less the same methods. So far, we have observed that, although the WHO methods are well described, there are modifications to the usual processes in many of the public health guidelines that have been issued. We are confident it is worthwhile to include large numbers of guidelines even when they are issued by the same organization. In the preliminary full search in years 2013-2021, however, there were many guidelines issued by organizations other than WHO and it seems that the full review will comprise a much wider variety of issuing organizations then was the case in the pilot phase. The limited number of issuing organizations may have been caused by the COVID-19 pandemic.
Lastly, we have not included any COVID-19 guidelines in the pilot phase as none fulfilled the eligibility criteria of this review (e.g., excluded for one or more of the following reasons: rapid guideline methodology; mostly focusing on treatment or individual perspective rather than public health perspective; not using the GRADE approach; not clearly defined as a guideline; methods not described appropriately).

Conclusions
This protocol provides a description of the objectives, inclusion criteria, methods, and analysis of a scoping review to be undertaken by an international group of experts, building on the work of the GRADE Public Health Group in addressing challenges in public health guideline development. It concludes with the pilot phase results, during which 13 public health guidelines issued between January and August 2021 were analyzed. We draw no conclusions from this limited pilot evidence. The piloting phase was conducted to refine the proposed methods and to further develop the necessary tools (e.g., data extraction tool). After the piloting phase, the objectives have been rephrased more clearly, the eligibility criteria refined, the simple screening tool tested, and the data extraction tool modified and placed in an online form. The basic outline of the workflow and communication methods between team members had been established. The pilot analysis helped to outline the predefined themes and identify new categories to be used in the full review. We will publish the full results of the scoping review in a peer-reviewed journal when available. The full review will aim to provide concrete examples of how GRADE is currently being used in public health guidelines, as well as a discussion on which parts of the current GRADE guidance need to be adapted to be better suited for public health topics. The results of the review will be used to further address the challenges of using GRADE in the development of public health guidelines and, together with other research, will form a basis for GRADE concept articles or guidance on the topic. The work will benefit all relevant stakeholders (public health policy makers, governmental and non-governmental organizations, professional organizations, and all end-users of public health interventions) involved in the development, dissemination and implementation of public health guidelines.