Journal of Informatics in Primary Care 1998 (May):3-7


Papers


Reliability of central coding of patient reasons for encounter in general practice, using the International Classification of Primary Care

Helena Britt BA PhD

Director, Family Medicine Research Unit, Department of General Practice, University of Sydney, Acacia House, Westmead Hospital, WESTMEAD NSW 2145, AUSTRALIA

 

 ABSTRACT

Objective – to assess the accuracy, intra- and inter-coder reliability of secondary, centralised coding of patient reasons for encounter (RFEs) with the International Classification of Primary Care (ICPC). Design – Almost 150,000 RFEs were secondarily coded with ICPC in a centralised central coding situation by trained coders. Efforts were made to gain high coder reliability. A random sample of 2,369 RFEs was independently assessed for coding accuracy. A further sample of 162 RFEs were tested for inter-coder reliability by comparison with a gold standard and for intra-coder reliability by matched comparison of double coded RFEs. Setting – primary health care. Subjects – doctor–patient contact records. Main outcome measuresAccuracy was qualitatively assessed as: absent; incorrect; acceptable but could be improved; correct. Inter- and intra-coder reliability: mean percentage correct scores calculated at ICPC chapter level, at individual rubric level and within each ICPC chapter. Results – Only 1.8% of RFEs were missing, incorrect or needed improvement. Inter-coder reliability at ICPC chapter level was 91.7% and at rubric level, 81.8%. Intra-coder reliability was 96.2% at chapter level and 90.0% at rubric level. Reliability varied with ICPC chapter. Conclusion – High coder reliability can be gained with ICPC in a central, secondary coding environment but training, an improved index and coding rules are required. Key words – ICPC, reliability, classification, family medicine

 

INTRODUCTION

Of the 18 million people resident in Australia, over 85% visit a general practitioner (family physician) at least once in any year. There is no patient registration, and consultation rates are constantly increasing. In 1996 more than 95 million general practice services were provided by about 20,000 practitioners. Investigation of patient reasons for attendance was thought useful to try and identify areas of demand that could be addressed through public education. Patient RFEs were studied as part of the Australian Morbidity and Treatment Survey 1990–1991. This was a cross-sectional study which relied on active, paper-based recording about more than 100,000 consultations by a random sample of 495 general practitioners (GPs). For each consultation at least one and up to three RFEs were recorded together with patient demographics, the problems managed and treatments provided1.

Participating GPs were instructed to ask the patient why they came or requested a house call and to record the reasons as they were expressed by the patient, prior to gathering clinical information and beginning the diagnostic process. The extent to which GP perceptions of patient RFEs accurately reflect patient-recalled RFEs has been reported elsewhere2. A team of thirteen nurses and medical records clerks coded the RFEs and problems managed according to the International Classification of Primary Care (ICPC)3. This represented the first large-scale application of this classification system to primary care data in Australia.

Applying ICPC in a secondary, central coding situation is unusual. ICPC has less than 1,400 codes and was designed as an epidemiological tool for use by the primary care provider during the consultation. Study of the validity of the results therefore required investigation of the reliability of the central coding method.

Studies of reliability have traditionally used vignettes, and measured inter-practitioner agreement about the problem(s) managed in the described consultation. Results suggest that the majority of error occurs at the point of practitioner selection of the label and that coding and data entry only account for about 5% of data error4,5,6,7,8. In these studies two actions are being measured: selection of the problem label and the choice of code that "best fits" the label. Coding reliability is therefore affected by the concurrent selection of the label.

Ideally clinical data should be coded by the practitioner at the time of the consultation so they can utilise their knowledge of the patient presentation while being aware of the limitations set down by the selected classification system. However, while data collection and coding in a structured medical record is encouraged by the Royal Australian College of General Practitioners, fewer than 10% of practices have computerised clinical systems and most GPs find coding paper-based records too time-consuming. General practice-based research in Australia will therefore continue to rely on active paper-based surveys and the reliability of secondary coding of GP data will continue to be an issue.

The accuracy of coding is rarely tested and reported in morbidity studies. Only one study has investigated inter-coder reliability within a group of non-practitioner secondary coders. Inter-coder reliability of 92–97% was found among eight secretarial staff, using the International Classification of Health Problems in Primary Care (ICHPPC), the precursor to ICPC. Unfortunately the measurement methods were not reported9. This paper reports the steps taken to ensure maximum reliability in the coding of RFEs and the resulting inter- and intra-coder reliability gained using ICPC in a secondary, centralised coding situation.

 

METHODS

Maximising reliability

The thirteen coding staff were given the published copy of ICPC as background reading prior to a four-hour group training session in which the underlying philosophy and structure of the classification were discussed. Some broad coding rules were also designed.

Example 1 – The diagnoses or problems managed at the consultation were recorded on the same form as, and just below, the patient RFEs. When faced with an RFE with multiple possible codes, the coder could be tempted to use the more specific diagnostic information to assist in clarifying the patient's RFE: for example, RFE: "my mother's trying to kill me"; diagnosis: "schizophrenia"". Without the knowledge of the diagnosis, this RFE should be classified as ICPC code Z20 – relationship problem with parent. However, with the diagnosis available the coder may be tempted to classify the RFE as paranoia (P72).

Rule 1 – Ignore the diagnostic section when coding RFEs so as to reflect as closely as possible the patient's viewpoint rather than the practitioner's final definition of the problem.

Example 2 – While ICPC provides individual codes in each chapter for "check-up, partial", "check-up, full", "follow-up", and "doctor-initiated review", GPs rarely record such specific descriptions.

Rule 2 – code all "check-ups" as partial unless the practitioner specifies "full", "complete", "insurance", "diving", "driver’s licence", or gave any other indication that a full medical examination was undertaken.

During the training session each coder attempted to code twenty sample recording forms, raising issues for discussion when coding difficulties arose. Each then took home another fifty test recording forms for practice which highlighted further problematic coding decisions and this led to more specific individual instruction.

A coding supervisor was available to the coders for telephone advice and RFEs remaining uncoded after discussion were flagged for the supervisor’s personal attention. When the supervisor was in doubt, the term was noted and returned to the Research Unit where the appropriate code was designated and the term added to the ICPC index. Where necessary, queries were referred to members of the WONCA Classification Committee. The supervisor also checked a minimum random one in ten encoded contact forms and provided weekly lists of coding errors and intra-coder variance to the research team. These measures served to highlight many areas of doubt requiring further definition and led to a complex iterative process between the coding and research teams. As classification decisions were made and encoding rules formulated they were regularly communicated to all coding staff in writing.

Some coding errors occurred with sufficient regularity to suggest the fault lay in the layout of the ICPC index. There was general consensus that it was too compressed, with inadequate multiple indents and no font variation to designate group headings. This led to the development of a far more comprehensive index with clearer layout. On an increasingly regular basis over the year of the study, coders were given a fully updated index printed from the growing term database.

Measuring coder reliability

Reliability was tested at two stages. A member of the research team checked the accuracy of coding by reviewing all contact forms coded during the twelfth week of data collection. The code selected for each RFE was subjectively rated as: absent, incorrect, acceptable but could be more accurate, correct.

Inter- and intra-coder reliability was measured during the last month of the year-long survey. One pad of GP recording forms, including 162 RFEs, was randomly selected. A photocopy was given to all coders as part of their normal weekly work. To measure inter-coder reliability the codes selected by each coder were compared with the gold standard set by the most experienced coder. Four weeks later each coder was given a second copy of these forms. Intra-coder reliability was measured by comparing the codes applied to the first and second copies of the forms. Coders were not informed of the process.

ICPC has a bi-axial structure with seventeen chapters on one axis and seven components on the other. For inter-coder reliability the mean percentage correct codes was calculated at ICPC chapter level (that is, agreement as to the chapter in which the RFE belonged, irrespective of agreement about the specific ICPC code) and at specific rubric level, for each RFE.

To assess the extent to which reliability varied according to the type of reasons being coded, mean agreement at rubric level within each ICPC chapter was calculated. A parallel analysis was conducted between the matched forms from the two batches for measurement of intra-coder reliability.

  

RESULTS

Coder accuracy:

Of the 2,369 patient RFEs checked and rated for accuracy, only 42 (1.8%) codes were missing, incorrect, or had room for improvement.

Inter-coder reliability:

Mean percentage correct codes for patient RFEs at chapter level was 91.7% (SD = 2.4%, range = 73.5–90.1%) and at individual rubric level 81.8% (SD = 4.5%, range = 87.1–98.2%) (see Table 1).

Inter-coder reliability was perfect (100%) in less frequently-used ICPC chapters pertaining to the eye, the endocrine/nutritional/metabolic system, and pregnancy/family planning. All coders gained over 90% agreement with the gold standard in the respiratory (mean agreement 99.1%, SD = 1.8), and digestive (98.6%, SD = 3.5) chapters; with two exceptions, mean correct scores remained greater than 80% in the remaining chapters; though standard deviations increased, the lowest individual coder score was better than 70%. In the ICPC chapters related to urinary, male genital and social problems, mean agreement remained above 70% but the standard deviations widened. This may be partly due to the small sample size in these chapters.

Table 1. Inter-coder reliability at specific rubric level, by ICPC chapter. Average percent agreement of coders with the ‘gold standard’

ICPC chapter

N 1

Mean

%

Standard

Deviation (%)

Range

%

General

26

83.4

7.6

71.4 – 96.4

Blood

0

Digestive

10

98.6

3.5

90.0 – 100.0

Eye

1

100.0

0.0

100.0 – 100.0

Ear

7

98.0

7.4

71.4 – 100.0

Cardiovascular

6

94.1

12.0

66.7 – 100.0

Musculoskeletal

15

90.3

5.8

78.6 – 100.0

Neurological

11

98.1

3.7

90.9 – 100.0

Psychological

7

89.8

11.4

71.4 – 100.0

Respiratory

34

99.1

1.8

93.8 – 100.0

Skin

24

91.0

6.8

75.0 – 95.8

Endocrine,nutrit,met

1

100.0

0.0

100.0 – 100.0

Urological

3

92.9

17.5

50.0 – 100.0

Pregnancy/family plan

2

100.0

0.0

100.0 – 100.0

Female genital

10

93.6

6.3

81.8 – 100.0

Male genital

2

92.9

17.5

50.0 – 100.0

Social

3

71.4

30.5

33.3 – 100.0

All RFEs:

161

81.8

4.5

87.1– 98.2

1. The average number of cases coded in each ICPC chapter column does not add to total due to rounding.

 Intra-coder reliability

As shown in Table II, mean intra-coder agreement at chapter level was 96.2% (SD=1.3%. range=86.8%–93.9%), and at specific rubric level 90.0% (SD=2.5%, range=94.3%–98.7%). Mean intra-coder agreement within ICPC chapter was in general comparable top or slightly better than inter-coder agreement, being over 87% for all chapters. The improvement in mean agreement in the social chapter (from 71.4% to 97.7%) and in the general and unspecified chapter (from 83.4% to 93.9%) was marked.

Table 2: Intra-coder reliability at specific rubric level for RFEs at matched contacts, by ICPC chapter. Average percent coder agreement of batch 1 with batch 2

ICPC chapter

N 1

Mean

%

Standard

Deviation (%)

Range

%

General

26

93.9

3.4

90.6 – 100.0

Blood

0

Digestive

10

99.2

2.6

90.9 – 100.0

Eye

1

100.0

0.0

100.0 – 100.0

Ear

7

87.5

3.6

87.5 – 100.0

Cardiovascular

6

98.5

4.8

83.3 – 100.0

Musculoskeletal

15

94.7

5.8

83.3 – 100.0

Neurological

11

99.2

2.4

91.7 – 100.0

Psychological

7

89.5

10.3

75.0 – 100.0

Respiratory

34

99.5

1.1

97.1 – 100.0

Skin

24

96.8

3.3

90.5 – 100.0

Endocrine,nutrit,met

1

90.9

28.8

0.0 – 100.0

Urological

3

93.9

12.9

66.7 – 100.0

Pregnancy/family plan

2

95.5

14.4

50.0 – 100.0

Female genital

10

95.0

6.9

81.8 – 100.0

Male genital

2

90.9

19.3

50.0 – 100.0

Social

3

97.7

7.2

75.0 – 100.0

All RFEs:

161

90.0

2.5

94.3 – 98.7

1. The average number of cases coded in each ICPC chapter column does not add to total due to rounding.

 

DISCUSSION

The inter- and intra-coder reliability gained in the ICPC classification of patient RFEs was exceptionally good. The better results found at ICPC chapter level compared to those at individual rubric level were expected, due to the fewer choices available (17 chapters, compared with 1,380 codes).

At specific rubric level, reliability was surprisingly good considering that many concepts have multiple possible codes, only one of which is "preferred" (gold standard). Further, the results of the accuracy check suggest that inter- and intra-coder variance may not reflect "incorrect" coding. Many RFEs are of a rather subtle nature and multiple rubrics may closely approximate the recorded concept. For example, a patient RFE of "injured leg" could be coded as other injury musculoskeletal (ICPC code L18) or accident/injury NOS (A80). While the former option is more specific, the latter could hardly be said to be incorrect.

At times such multiple code options cross ICPC chapters and affect reliability at chapter level. For example some of the "error" in the coding of social RFEs may be due to the interaction of the psychological and social chapters. Should "very stressed due to work" be coded as "acute stress" in the psychological chapter or as a "work related problem" in the social chapter?

In general intra-coder reliability was slightly better than inter-coder reliability, particularly in the skin and social chapters, suggesting the habits of individual coders incline toward consistent selection of the same rubric for the same concept. This parallels Crombie's findings that while there may be considerable variation between practitioners in the selection of a label to represent a concept, individual practitioners are remarkably consistent in their selection of the same label for the same concept over time10.

The results suggest that differences in code selection between coders, whether they are practitioners coding at the time of the consultation, or a team of central coders, may well account for only 5% of data error but that variance in code selection could lead to unreliable enumeration of the occurrence of an individual rubric (or disease group). In future studies researchers should consider grouping similar rubrics in analysis, but first the concepts to be included in each group must be identified.

Considering the multiple possible rubrics that could be applied to many undifferentiated patient RFEs, it is questionable whether the results could have been achieved if only relying on the published copy of ICPC. While the results were good it must be remembered that considerable effort was made to define coding rules and develop an extensive index.

The new Version of ICPC, ICPC-2, is to be released in late 1997, with inclusion and exclusion criteria for many rubrics and a vastly improved index system, both in terms of size and layout. Such additional information should have a positive effect on coder reliability and this study should be repeated after its release.

 

CONCLUSION

This study has demonstrated that high inter- and intra-coder reliability can be gained in the application of the International Classification of Primary Care to the secondary coding of patient RFEs by a group of trained allied health professionals in a central coding situation. However, considering the effort required to ensure these results, the extent to which data coded by clinicians in multiple practices would have sufficient inter-coder reliability is called into question. The changes incorporated in the ICPC-2 should aid more reliable code selection in the future.

 

ACKNOWLEDGMENTS

The study on which this work is based was generously supported by a grant from the Australian National Health and Medical Research Council and the (then) Australian Commonwealth Department of Human Services and Health, through the General Practice Evaluation Program. The paper could not have been prepared without the statistical assistance of Geoffrey Sayer and the administrative support of Donna McIntyre of the Family Medicine Research Unit.

 

REFERENCES

  1. Bridges-Webb C, Britt H, Miles DA, Neary S, Charles J, Traynor V. Morbidity and treatment in general practice in Australia 1990–1991. Med J Aust 1992; 157(Supplement):S1–S56
  2. Britt H, Harris M, Driver B, Bridges-Webb C, O'Toole B, Neary S. Reasons for encounter and diagnosed health problems: convergence between doctors and patients. Fam Pract 1992; 9:191–194
  3. Lamberts H, Woods M (eds). ICPC. The International Classification of Primary Care. Oxford University Press, Oxford, 1987
  4. Bridges-Webb C. Classifying and coding morbidity in general practice: validity and reliability in an international trial. Fam Pract 1986; 23:147–150
  5. Morrell D, Gage HG, Robinson NA. Symptoms in general practice. J R Coll Gen Pract 1971; 21:32–43
  6. Gray D, Ward A, Underwood P, Fatovich B, Winkler R. Morbidity coding in general practice. Fam Pract 1989; 6:92–97
  7. Schneeweiss R, Rosenblatt R. Diagnostic clusters: a new tool for analysing the content of ambulatory medical care. Med Care 1983; 21:105–123
  8. Boyle RM, Schneeweiss R. Accuracy and reliability of ICHPPC-2 recording. Fam Pract 1983; 17:922–928
  9. Anderson JE. Reliability of morbidity data in family practice. J Fam Pract 1980; 10:677–683
  10. Crombie DL. The problem of variability in general practitioner activities. In: Yearbook of research and development, 1989. Her Majesty's Stationery Office, London, 1990; 733–742

On to next Paper

Back to May 1998 informatics Contents Page

Back to informatics Index