A Yardstick for Lupus

Today the British Isles Lupus Assessment Group (BILAG) Index is commonly used to chart multi-system progression of lupus from year to year, but just three decades ago there was no system in place to track this complex chronic condition.

By the early 1980s the lupus research community recognized that the formal assessment of patients with lupus was hopelessly in disarray. As Matthew Liang, MD, MPH, professor of medicine at the Harvard Medical School in Boston, pointed out later, around 60 different activity indices for lupus were published between the mid 1950s and mid 1980s, and they were all inadequate.¹ There were attempts by many individual clinicians to capture lupus activity in ways that were never validated or shown to be reliable; all were global score indices.

The 1980s saw a considerable improvement. At the University of Toronto, Dafna Gladman, MD, and Murray Urowitz, MD, devised the Systemic Lupus Erythematosus Disease Activity Index (SLEDAI); although it is a global score, it was shown to be reliable, valid, and sensitive to change.² Dr. Liang also developed the Systemic Lupus Activity Measure (SLAM).³

Different Concept of Measurement

In the United Kingdom, the relatively small number of rheumatologists interested in lupus began to think along different lines. It seemed in particular to Paul Bacon, MD, University of Birmingham, and Michael Snaith, MD, University of Sheffield Medical School, that the notion of capturing disease activity with a single score, however thoughtfully derived and validated, was a suboptimal way to capture the reality of a multisystem disease like lupus. In particular, they balked at the idea that severe disease in a single system that caused a patient fighting for her life in hospital could, at least theoretically, have the same score as a patient who had little activity in five or six systems and who might still be working. In 1984, during a discussion in Dr. Bacon’s Birmingham garden, they decided a better approach would be an index that:

Showed the level of activity in different organs and/or systems;
Captured change in activity over time;
Was based upon the principle of the physician’s intention to treat; and
Recognized that disease activity was one part of a larger equation. To determine the totality of the effect of lupus, you needed a patient self-assessment index and a damage index.

Birth of BILAG

Realizing that developing this index was going to require a great effort, Drs. Snaith and Bacon decided to summon help. At that time I was working as Dr. Snaith’s senior registrar at University College Hospital. Others who were drafted in included Deborah Symmons, MD, University of Manchester (United Kingdom) (then senior registrar to Dr. Bacon); Peter Maddison, MD, University of Wales in Bangor; and Asad Zoma, MD, Hairmyres Hospital in East Kilbride, Scotland (then in training in Glasgow). With additional support from Barry Bresnihan, MD, St. Vincent’s University Hospital in Dublin; Nick Viner, MD, Peninsula Medical School in Exter, England; John Coppock, MD, University Hospitals Coventry and Warwickshire NHS Trust in Coventry, England; and Elaine Hay, MD, Keele University in Staffordshire, England. The group began meeting every three to four months (and still does).

We agreed that activity should be assessed in terms of constitutional features, mucocutaneous, central nervous system, musculoskeletal, cardiovascular respiratory, vasculitis, renal, and hematological organs or systems. In some instances, we asked outside experts for help. Original renal criteria were drawn up with the help of J. Stewart Cameron, MD, now emeritus professor of nephrology at Guy’s Hospital of King’s College in London.⁴ In abiding by the intention to treat principle, we had to agree which particular signs and symptoms in each organ or system, if present, would lead us to treat patients with a significant dose of corticosteroids (more than 25 mg prednisolone per day) with or without additional immunosuppressive drugs. This would then constitute the most “Active” form of disease (in that organ or system) and would be given grade A.

The B grade in each organ or system – in effect, a “Be aware” grade – envisaged a patient with active disease, also carefully defined, who required continuing steroid or immunosuppressive therapy but at a lower level; a C grade in each organ or system would imply “Contentment” meaning low-grade disease activity only, which might require just symptomatic therapy. The D grade implied inactivity in the respective organ or system. This was later divided into two grades: D for “Discount,” meaning the disease had once been active in this organ or system but was no longer active, and the E grade implying that the disease had never “Ever” been active in that organ or system.

Test of the System

The system we established provided a testable hypothesis. With a grant from the Arthritis Research Campaign, Dr. Hay visited five different rheumatology units around the United Kingdom to review patient notes to determine whether patients with grade A symptoms or signs were actually treated with the large steroid doses or immunosuppression. We also determined whether Dr. Hay’s assessment of the patients agreed with that of the local physician.

We were greatly encouraged by the results of Dr. Hay’s study, which showed strong correlations in seven out of eight of the systems.⁵ Only for the central nervous system was it difficult to obtain satisfactory agreement. By 1986, BILAG felt able to “go public” by presenting a poster at the 1st International Lupus Meeting in Calgary, Alberta. The organizers of this meeting placed this poster (which was defended by myself and Dr. Maddison) next to a poster describing the origins of the SLEDAI (defended by Drs. Urowitz and Gladman). The four of us felt that the issue of disease activity assessment was something that ought to be agreed globally. With the help of a grant from NATO three meetings were held between 1988 and 1991 to explore these indices and Dr. Liang’s SLAM index. Both real and paper patient exercises were undertaken with the support of other interested parties including Gunnar Sturfelt, MD, PhD, and Ola Nived, MD, PhD, both at Lund University in Sweden, and Keneth Kalunian, MD, at the University of California, San Diego.

In practice, the BILAG Index requires few blood or urine tests and the bulk of the form can be completed on paper or a computer in three or four minutes. The form asks the clinician to determine if a clinical feature, which must be due to lupus, is absent or present. If present, is it better, worse, or the same as a month ago or a new or recurrent problem; these data are converted into A–E scores for each organ/system.

Although somewhat antithetical the concept, the BILAG letter scores were assigned numbers so that they could be converted into a global score and thus compared more easily with SLEDAI and SLAM. In spite of their varying origins, there is a strong correlation among these three indices and also, as shown in later studies, with the European Community Lupus Activity Measure (ECLAM), which was devised by Stephano Bombardieri, MD, and colleagues.^6,7The choice of the A=9, B=3, C=1, D/E=0 point notation, while widely used now, was a rather spur-of-the-moment decision. BILAG is currently performing more objective modeling studies to determine how valid this numbering system is. It is likely that the C score will be downgraded.

By 1991, most necessary comparisons had been undertaken; the NATO group developed a damage index and selected the patient self-assessment index.^8,9 The group was named the Systemic Lupus International Collaborating Clinics (SLICC) and grew for the first time in 1991, when Dr. Gladman was appointed its first chair. During the 1990s, a number of distinguished rheumatologists joined, including Michelle Petri, MD, MPH, Johns Hopkins University, Baltimore; Ellen Ginzler, MD, MPH, SUNY Downstate, Brooklyn; John Hanly, MD, Dalhousie University in Halifax, Nova Scotia; Jorge Sanchez-Guerro, MD, Instituto Nacional de Ciencias Medicas y Nutricion, Mexico City; Jill Buyon, MD, New York University School of Medicine; and Rosalind Ramsey-Goldman, MD, DrPH, Northwestern University, Chicago, (who succeeded me as chair in 2004).

Drs. Bacon, Snaith, and Isenberg (shown left to right) sit in Dr. Bacon’s garden, where – in 1984 – he and Dr. Snaith developed the initial concept for the BILAG Index.

FDA and Tech Boosts

The turning point for the BILAG Index came in 2003 when the Food and Drug Administration (FDA), faced with the prospect of many drug companies wanting to do lupus trials, held a meeting to review the indices available for capturing disease activity in patients with lupus. The FDA concluded that the BILAG Index would be of particular value for clinical trials in SLE, although no member of BILAG was present. The index in its classic form is now widely used in clinical trials around the world.

There are many good reasons to adopt the BILAG Index for clinic trials, but perhaps the most important is its ability to capture degrees of change that cannot be captured by global score systems. The BILAG Index reflects real life more accurately by asking the physician to record whether the patient has improved, gotten worse, or stayed the same, making it better suited to clinical trial use.

Another boost, Gordon Hamilton of ADS-Limathon in Sheffield, U.K., built upon Dr. Viner’s original BILAG Index software to develop a highly sophisticated computer program. The new software can record a large amount of demographic information, a BILAG activity index, a SLICC damage index, the SF36 (which has been widely used to capture the patients’ assessment of their own disease), a wide range of laboratory results, and the various drug treatments the patients are receiving. It also has graphing capacity. Caroline Gordon, MBBS, senior lecturer and consultant in rheumatology at the University of Birmingham, has worked to smooth the original BILAG Index’s rough edges and ensure that the computer program reflects these improvements.

The BILAG Index has proven very useful in long-term observational studies.^10,11 It has also demonstrated its utility in a double-blind, controlled trial of 90 patients that compared prednisolone and azathioprine versus prednisolone and ciclosporin in lupus flares.¹² The key to an accurate BILAG score is only scoring a clinical feature if you are certain it is due to lupus. One ‘glitch’ that has emerged is the tendency in some organs/systems in the classic BILAG Index to allow an improvement in a grade A feature to become a C on the next assessment a month later, which then remains the same and can score a B at the third assessment – giving the false impression of an extra flare. The new BILAG 2004 makes this jump unlikely to occur, firmly establishing the more natural progression from grade A to B to C.

An updated version – BILAG 2004 – has been published and is being tested in large studies.¹⁰ The revised index removed the vasculitis section, placing individual clinical features more appropriately within the other organs or systems. It now incorporates sections on gastrointestinal disease and has an ophthalmology section, both missing from the original. Furthermore, some items, which were damage items, have been removed. A software version of the new index will be available soon.

Dr. Isenberg, ARC Diamond Jubilee professor of rheumatology at the University College London, would like to thank the other current members of BILAG, particularly Drs. Bacon and Snaith, who provided constructive criticism.

Different Concept of Measurement

Birth of BILAG

Test of the System

FDA and Tech Boosts

References

Mitigate Risk and Increase Success of Lupus Clinical Trials

Target Remission

Top 12: Research in Systemic Lupus Erythematosus at a Glance

Phase 2 Trial Results for Sjögren’s Syndrome & SLE Presented in 2nd Plenary Session at ACR Convergence 2022

Different Concept of Measurement

Birth of BILAG

Test of the System

FDA and Tech Boosts

References

Related Articles