In October 2009, the Lupus Research Institute (LRI) convened a meeting of experts in New York to discuss critical issues in the design of clinical trials for new agents to treat systemic lupus erythematosus (SLE) and maximize the likelihood of successful outcomes. Attendees included representatives from academia and clinical practice as well as pharmaceutical and biotech companies developing products for immune-mediated diseases.
Founded by families and guided by leading scientists, the Lupus Research Institute (LRI) is the world’s leading private supporter of lupus research. With its National Coalition of state and local lupus organizations, the LRI is dedicated to finding new and safer options for treating the disease by improving the design of clinical studies and promoting broad participation in clinical trials.
Through presentations and discussions, the meeting addressed a number of fundamental issues affecting trial design, including a drive to establish standard methodology for lupus clinical trials, the need for flexibility in trial design for such a heterogeneous disease, regulatory requirements for approval of new agents, and an interest in addressing the effects of any new product in the various clinical settings that may be encountered in practice. The need to adopt a standard set of outcome measures was a related concern, in the view of the different characteristics of the current instruments used to assess responses. While these measures have been assessed in clinical practice and research studies, their performance in clinical trials remains largely untested.
While falling short of reaching consensus on these matters, participants generated valuable discussion and insights on current practices and suggested potential directions in research into the design of clinical trials in lupus, which should prove useful for future clinical studies. Here are some highlights from the discussion.
Key Challenges
The heterogeneity of lupus poses a significant challenge for clinical trial design. Success in achieving statistical significance for new therapies in lupus trials needs to account for the marked heterogeneity of disease manifestations and severity of patient populations that, in some cases, may involve stratification with respect to:
- Ethnicity. In seeking trial designs likely to yield statistically significant results, researchers may need to consider ethnicity more carefully as a factor that can affect outcomes.
- Disease activity levels. Lupus can manifest with different levels and patterns of disease activity levels, such as relapsing and remitting versus persistently active disease, flare, or more active versus less active disease. Attendees addressed the question of whether there is any scientific, medical, or business rationale to focus trial design on flare prevention versus decreasing persistent disease activity of a patient on standard therapy.
- Trial size. The larger the trial, the more likely that a therapeutic effect in a particular ethnic group will be lost in analyses of the whole population. The recent successful BLISS-52 was a large trial, with 90% power to detect differences between different treatments. Could a smaller trial have achieved similar success? With the current outcome measures and disease heterogeneity, are large trials necessary to achieve statistically significant results? Will the need for large trials hamper clinical development?
Other issues that pose challenges for achieving statistical significance in clinical trial design include duration of disease prior to treatment, specific organ systems involved, and current and past treatments for a disease with nonapproved standard of care treatments and comorbidities.
Design Flexibility
Clinical trials need an acceptable outcome measure—a reasonable surrogate for an important clinical outcome—for a drug to obtain U.S. Food and Drug Administration (FDA) approval. Design flexibility, including the use of adaptive design, is considered a crucial factor in achieving successful outcomes.
It was noted that, for business reasons, pharmaceutical companies often seek the broadest possible indication(s) for a new product. Companies may tend to avoid smaller, focused trials and favor trials with larger numbers of relatively heterogeneous patients. Of note, although some recent trials entered subjects with relatively uniform patient profiles, these trials had little success.
The most successful trials were those of belimumab, which included large numbers of patients, permitted flexibility in background medication for a substantial portion of the trial, and employed a novel composite outcome that involved both measures of therapeutic success and the absence of deterioration. Whether this trial design or similar designs should become the paradigm for future lupus trials was discussed, but no consensus was reached.
Lessons Learned from Recent Trials
Rituximab: Genentech’s Study to Evaluate the Efficacy and Safety of Rituximab in Subjects With ISN/RPS Class III or IV Lupus Nephritis (LUNAR) and Rituximab in Patients with Severe SLE (EXPLORER) trials tested the anti-CD20 monoclonal antibody in two different patient populations: individuals with lupus nephritis and those with active nonrenal lupus. Both trials were unsuccessful in achieving their primary endpoint. Their failure may have stemmed at least partly from clinical trial design issues (including designated co-therapy for each of the treatment arms), inclusion criteria (i.e., the patient population studied), or ineffectiveness of the drug.
Because it has been claimed—either anecdotally or from data for early trials that were small or not randomized—that rituximab is effective in some lupus patients, some conference attendees speculated that rituximab may be effective for a subset of patients, possibly those with a CD20-positive B-cell driven pathogenesis, though this subset has not been identified. It was also noted that SLE patients with periodic flares may differ from those with chronically active lupus, SLE patients from different racial groups may exhibit different pathologies, and conditions within a single patient can vary over time.
Attendees discussed the validity or practicality of subclassifying SLE patients for highly targeted therapies, such as monoclonal antibodies, in future trials, and agreed that more targeted trials for these subgroups might be indicated if researchers can draw clear distinctions among these groups and if companies would be willing to narrow their potential drug markets, at least initially.
Belimumab: In Human Genome Sciences’ (HGS) Study of Belimumab in Subjects with SLE (BLISS-52) phase 3 trial, Benlysta (belimumab), a human monoclonal antibody that inhibits tumor necrosis factor SF13B or B-lymphocyte stimulator (BLyS), proved more effective than placebo in treating people with serologically active SLE.
The success of BLISS-52 may have resulted from a new design based on a post-hoc analysis of data from a disappointing phase 2 trial. In this analysis, HGS researchers identified a subpopulation of patients who had improved with treatment. The patients in this subpopulation had detectable anti-DNA antibodies and shared other characteristics as well: they were, on average, younger, were more likely to be African American, and had higher disease activity, more detectable serum BLyS levels, and higher serum immunoglobin G. The post-hoc analysis and identification of this subpopulation became the basis for a promising strategy that allowed the drug to demonstrate its efficacy.
In BLISS-52, HGS focused specifically on this seropositive subpopulation, and the conference group agreed that this focus likely contributed to the success of the trial. It was also noted that, in order to retain patients in the trial, HGS allowed participants wide latitude in using other medications and did not mandate a particular steroid dosing schedule. Several attendees noted that this approach may have handicapped belimumab, because some of belimumab’s efficacy might have been masked by the presence of these other medications in both the belimumab and “placebo” arms of the study. Despite this handicap, the trial met its primary outcome measure.
Importantly, the flexible criteria helped in the recruitment and retention of 865 patients in the study, providing a power of more than 90% for the study to achieve a statistically significant difference in its primary outcome measure between the belimumab and control groups. Because BLISS-52 recruited so many patients, the trial was able to show statistical significance despite relatively modest findings: 57.6% of the belimumab-treated patients met the primary composite outcome compared with 43% of the controls.
The successful HGS trial demonstrated that application of these particular trial design strategies can result in a statistically significant difference between a new agent and placebo on background standard-of-care treatment. The adoption of similar strategies as a general model for other lupus clinical trials was discussed, but no consensus was reached.
TABLE 1: The BILAG-2004 Index
This comprehensive computerized index measures changes in clinical disease activity over time and is based on the principle of the physicians’ intention to treat. It updates the Classic BILAG assessment and consists of questions on patient history, examination findings, and laboratory results. Separate alphabetic scores are assigned to each of nine organ-based systems:
- Constitutional
- Mucocutaneous
- Neuropsychiatric
- Musculoskeletal
- Cardiorespiratory
- Gastrointestinal
- Ophthalmic
- Renal
- Hematologic
Source: Isenberg DA, Rahman A, Allen E, et al. BILAG 2004. Development and initial validation of an updated version of the British Isles Lupus Assessment Group’s disease activity index for patients with systemic lupus erythematosus. Rheumatology (Oxford). 2005;44(7):902-906.
Definition of Improvement
Among outcome instruments used in clinical trials, the British Isles Lupus Assessment Group index (BILAG) is based on an intention-to-treat approach according to an extensive series of criteria to classify a patient’s SLE manifestations arising from different organ systems (see Table 1, p. 36). The BILAG score was used as the primary outcome measure in the unsuccessful trials of Genentech’s rituximab and Bristol-Myers Squibb’s abatacept in lupus, and is currently being used as the primary outcome measure in an ongoing trial of EMD Serono’s atacicept (TACI-Ig).
During the meeting, representatives of all three companies described their efforts to compensate for the BILAG’s limitations, including rigorous physician training programs, the use of adjudication panels to review scores to ensure the authenticity of any changes, and the use of simple questionnaires to allow more subjective measures. In the abatacept trial, the physicians’ subjective reports suggested that the drug was working, even though the BILAG scores and related analyses found no evidence of benefit. Other drug trials presented at the meeting also showed significant differences between physician opinions and BILAG outcomes, which reinforced the prevailing view that the BILAG index is not sufficiently sensitive to detect benefit from the new treatments. Particularly problematic may be the use of BILAG B events as outcome measures of lupus flares.
In contrast, HGS’s phase 3 trials of belimumab incorporated a combination of assays, characterized as a composite or anchored index, called the SLE Responder Index, or SRI. SRI included the Safety of Estrogens in Lupus Erythematosus: National Assessment version of the SLE Disease Activity Index (SELENA-SLEDAI) as a measure of drug efficacy and the BILAG Index and a physician’s global assessment as measures of patient deterioration.
Although the SLI approach worked in BLISS-52, some critics believe that the amalgamation of several outcome measures into one composite may create an uncertain foundation on which to base results. Some attendees believed that using any one of the component measures would have worked just as well.
Design Considerations
Clinical trial design considerations were extensively debated, with no clear conclusions; however, there was agreement on the key considerations for future trial design:
Heterogeneity versus homogeneity: Heterogeneity is inherent in lupus. One question discussed was whether it is advantageous in lupus clinical trials to study a more homogeneous population. In trials with heterogeneous populations, researchers must be careful not to overextend application of results. On the other hand, while study of a homogeneous population may produce “cleaner” results, these results are restricted to the particular population studied, and extrapolation to other populations may not be warranted. It was noted that large heterogeneous populations can be subdivided into smaller more homogeneous populations that could be studied in separate trials to detect clinical differences between the effects of study drug and placebo.
One participant noted that rheumatoid arthritis (RA) was considered a heterogeneous systemic disease 30 years ago. Today, trial entry is based on the number of active joints, suggesting that heterogeneity may not be important.
Another consideration was activity of disease among patients studied (e.g., more active versus less active patients). A targeted biologic, for example, which treats some features of lupus, might not show benefit in an active patient who is flaring but might show benefit in patients with chronic disease of lower activity, or vice versa. In the BLISS-52 phase 2 trial, the higher the baseline disease activity, the better the response was over time.
Drug mechanism: The success of the BLISS-52 trial suggests that drug mechanism may be critical to trial design. However, there is also the risk that some drugs might make patients worse. Hypothetically, an agent that is effective for active disease might make a patient with quiescent disease worse. Because lupus may have different pathogenic mechanisms in different people, it may be advantageous to select the trial population based on the drug’s mechanism. Participants at the meeting acknowledged that such a selection might be difficult to accomplish but that the approach should be considered.
The value of “withdrawal” trial designs: This is when all enrolled patients initially receive the drug for a period of time before one group switches to placebo. Though common in pediatric trials, withdrawal designs are rare elsewhere, partially because of the difficulty of structuring the trials to meet ethical, business, and FDA requirements in a chronic, slow-developing disease like lupus. Withdrawal trials are also difficult to conduct with a new drug because it is best to have some initial evidence that the drug is of some benefit before initiation of a withdrawal trial. In the absence of an approved “gold standard” therapy, it might be difficult to establish such benefit. Moreover, in the absence of gold standard therapy, superiority and not equivalency or noninferiority trials are required.
New and different clinical design elements: Participants also discussed several other elements that could be used to inform the design of trials, including biomarkers, randomized delayed treatment design (placebo, then drug versus drug, then placebo in different groups), durability studies, and human observational studies.
Return on investment: Attendees agreed that companies sponsoring clinical trials should consider testing SLE drugs for narrower indications, noting that studies of rigorously defined patient subpopulations may offer lower initial financial returns but also present lower risks of outright failure.
Despite these variables and the remaining hurdles in refining outcome measures and design issues, researchers remain optimistic about the future of clinical trials for lupus. It was noted that the first biologic drugs for RA entered clinical trials in the late 1980s, but the first approval did not come until 1999, after a series of false starts and failures. Moreover, even after an effective clinical trial strategy was identified in RA, many products failed to meet the designated outcome measures. And lupus biologic drugs only entered clinical trials in the mid-1990s.
Andrea Peirce is editorial and communications director for the S.L.E. Lupus Foundation and Lupus Research Institute in New York. Dr. Lipsky is editor-in-chief of Nature Reviews Rheumatology. Dr. Schwartz is professor of clinical medicine at Washington University School of Medicine in St. Louis.
Background on Lupus Clinical Trials Discussed
EXPLORER
Genentech’s EXPLORER trial enrolled 257 patients with moderately to severely active extrarenal lupus who displayed moderate to high activity breaking through background treatments. Because the FDA had suggested BILAG as the favored measure, the trial used BILAG v3. The trial used a graded responder endpoint (with a focus on major, partial, and non-response), rather than a binary “response/no response” measure. Patients were randomized to receive rituximab plus prednisone or placebo plus prednisone and were followed for 78 weeks. The EXPLORER trial failed to achieve primary or secondary endpoints but revealed no new major safety issues. The requirement for high activity made it difficult to recruit patients. EXPLORER also had a high (>25%) dropout rate (dropouts were counted as nonresponders, thereby increasing the nonresponse rate). Another potential confounder was the lack of a clear definition of flares—30% of patients met a stringent definition of response, but 70% did not.
LUNAR
This trial enrolled patients with biopsy-proven active lupus nephritis who were being treated with one background immunosuppressant (mycophenolate mofetil) and prednisone, which enabled researchers to leverage data from ongoing studies.
Success was defined by renal response at one year. The overall design was very similar to EXPLORER, other than differences in the patient population and the background therapy. The drug failed primary and secondary endpoints. No new safety issues emerged. The primary endpoint—a quantitative endpoint—was significantly easier to interpret than the EXPLORER BILAG endpoint.
HGS–Phase 2 Study
The trial enrolled 449 patients with active SLE in double-blind, extension, and long-term phases for a total of four years of follow-up. Patients, who were on the standard of care throughout the trial, had to have an SLE diagnosis with a history of measurable auto-antibodies. The study did not meet primary or secondary endpoints of percent reduction of SELENA-SLEDAI score at week 24 or time to first SLE flare over 52 weeks. The assumption of a 65–70% annual flare rate was too low in moderate to severe SLE. Permitting changes in prednisone and immunosuppressive medications also confounded SLE disease activity assessments. Researchers found the SELENA-SLEDAI was a better measure of sustained overall improvement than BILAG, and that BILAG B flares could be triggered easily. But researchers also found that BILAG was a better measure of organ-specific changes/worsening. Specifically, SELENA-SLEDAI tracks complete elimination, but not partial changes, of signs and symptoms, and BILAG tracks changes in disease activity for a defined time period.
BLISS-52
The BLISS-52 study involved 90 study centers in 13 countries and enrolled 865 patients. The study was a 52-week, double-blind, placebo-controlled trial of belimumab (1 or 10 mg/kg) plus standard-of-care therapy or placebo plus standard of care. Efficacy analyses included the SELENA-SLEDAI, BILAG, and SELENA-SLEDAI Flare Index (SFI). The primary endpoint was the Week 52 SRI: improvement in SELENA-SLEDAI (≥4 point decrease), no new BILAG A or no new 2 B flares, and no >0.3 point worsening in Physician’s Global Assessment (PGA) versus baseline. At Week 52, there were 57.6% responders in the 10 mg/kg belimumab arm, 51.4% responders in the 1 mg/kg belimumab arm, and 43% responders in the control arm, with both doses of belimumab showing a statistically significant difference from placebo. Statistically significant differences from placebo were also seen in at least one belimumab dose group in the secondary endpoints. Belimumab was generally well tolerated and generally allowed reduction of steroid use.
Abatacept
This Bristol-Myers Squibb abatacept trial was a phase 2B, multicenter, randomized, double-blind, placebo-controlled study to evaluate the efficacy and safety of abatacept on a background of oral glucocorticosteroids to prevent lupus flares. Eligible subjects must have been diagnosed with SLE and must have been experiencing an active lupus flare that satisfied BILAG A or B event criteria within 14 days, and have been on a stable dose of prednisone (<30 mg) for at least one month. Patients were randomized to receive abatacept IV or placebo and were followed for one year. The study also included a fixed steroid taper for two months. The primary endpoint was new adjudicated BILAG A or B flares after the start of steroid taper. One hundred eighty patients received treatment: 121 received abatacept, and 59 received placebo. The incidence of a new flare (defined by adjudication of all BILAG A or B events) was 79.7% for the abatacept group compared with 82.5% for the placebo group, a nonstatistically significant difference.
The study, while exploratory and not sufficiently powered, uncovered some difference in flare rates of BILAG A. The placebo group showed more adjudicated BILAG A flares than the study drug group did, but this difference was also not statistically significant. The study found that the majority of serious adverse events, mostly exacerbated lupus symptoms, occurred during or shortly after the steroid taper. The study also found that the high dose steroids required by the protocol at entry because of the flare design, did not permit sufficient flexibility to individualize treatment according to patient response.