Navigation – Plan du site

AccueilNuméros79EntretienDeveloping a standardised languag...


Developing a standardised language examination for medical purposes: Lessons learnt from the sTANDEM project

Mise en œuvre d’une certification standardisée en anglais médical : les enseignements du projet sTANDEM
Jean-Pierre Charpy, Didier Carnet et Michael Friedbichler
p. 9-27


Haut de page

Texte intégral


1The French government’s decree issued on the 3rd of April 2020, which stipulates that certification in English will be mandatory at Bachelor’s level from 2022 onward, clearly demonstrates that there is not only a real national, but also international need for English language certification. A seemingly effective way for French universities to implement the decree might be to resort to internationally-recognised general English proficiency tests; the TOEFL (Test of English as a Foreign Language) or TOEIC (Test of English for International Communication), for example, might be administered to all students, including those in such domains as law or medicine, which are well-known for their highly specific language needs. This very situation and the current lack of tests in some specialised domains highlight the need for English for Specific Purposes (ESP) certification tests to be promoted and harmonised in France and beyond. This is why the lessons learnt from the sTANDardised language Examination for Medical purposes (sTANDEM), an EU-funded project in which D. Carnet, J.-P. Charpy and M. Friedbichler were involved, could prove useful in the near future.

Anthony Saber, editor: How was the sTANDEM project initiated and what were its objectives?

  • 1 Professor Barron († 2019), who was head of the Medical Communications Centre at Tokyo Medical Unive (...)

2The sTANDEM project was officially launched in 2011 after preliminary exchanges between English for Medical Purposes (EMP) experts from six European countries and Japan. The original idea was the brainchild of the late Patrick J. Barron1. In January 2009, the group first met in Budapest to explore the options of setting up a special interest group and working toward a standardised international examination and certification system tailored to the needs of non-native speakers of English in the medical, pharmaceutical and nursing fields in Europe and beyond.

3At the Budapest meeting, it was decided to submit the project for EU funding under the Erasmus Modernisation of Higher Education Programme. In addition, the group agreed to use existing bilingual tests in Japan and Hungary as a stepping stone. Unfortunately, the 2010 application failed to be considered for EU funding for several reasons, the focus on English as the only target language being one of the main arguments why the project did not meet the award criteria. When the project was submitted again in 2011, including nine countries at the time, the new application took into account the remarks made by the EU’s Executive Agency on the 2010 project. Notably, five examination sets were also to be developed in French, German, Hungarian, Polish and Romanian.

4The European Commission granted financial support for the project under the Lifelong Learning Programme (Key activity 2 - languages) for the years 2011-2014. The project was coordinated by the Jagiellonian University (Krakow, Poland) and the consortium involved EMP researchers and teachers from Austria, France, Hungary, Poland, Romania, and the United Kingdom, as well as an international panel of domain-specific experts from Japan, Malta, and the Netherlands.

5In view of the widespread need to standardise the specialised language proficiency required for clinical practice, education and training, particularly in English, the original aim of sTANDEM was to assess language skills for medical purposes in order to facilitate mobility and improve communication and cooperation between healthcare professionals, patients and families.

Editor: Could you describe the methodology used to develop sTANDEM?

6The methodology adopted by the consortium partners was based on medically-oriented tasks, in keeping with an action-oriented approach. It highlighted the authenticity of the tasks to be performed by test takers (Douglas 2000), the role of language users as socio-professional agents and favoured real-life situations within the context of actual communication (Bachman 1990, 2007). As a result, test developers had to make sure that the tests were in line with a communicative view of language and with the Common European Framework of Reference for Languages (CEFR). Genre-based specifications and detailed descriptors of language competence were defined to suit the needs of healthcare professionals.

7To ensure high-level authenticity and specificity, the consortium developed a test construct that was based on current research into the distinct features of specific purpose language competence, such as knowledge of EMP discourse, genres, moves, terminology and nomenclature – to mention but the most important ones. In the development and validation of the tasks and items produced for the sTANDEM exam, the real-time relevance of these features was constantly monitored against corpus-informed materials to provide solutions that truly reflect the language used by professionals. A corpus-based study, on the basis of which a compendium of medical use in English was published, provided sound guidelines for both global and specific issues (Friedbichler/Friedbichler 2003).

Editor: Could you describe a brief timeline of the project?

8The sTANDEM project comprised three distinct phases. The first one was a research-oriented phase. A detailed language needs analysis for medical purposes was carried out by the consortium partners in order to define the profiles of future test takers according to linguistic needs in the field of health care. More than 300 healthcare professionals, medical students and language experts all over Europe answered three different questionnaires. The statistical analysis of the various language needs led to the production of a Manual for sTANDEM Test Developers and Item Writers (Rebek-Nagy 2012). This handbook was meant to serve as a common framework of reference in the process of test development and administration of sTANDEM tests. Its purpose was to ensure the validity (i.e. the skills it aimed to measure) and reliability of the sTANDEM certification system.

9The second stage was a development phase in which 24 sets of examination papers in English (plus one set each in five other European languages) were developed by partners working in tandems. These comprised reading papers (Romania + UK team), writing papers (Poland + Austria), listening papers (Austria + Poland) and speaking papers, including oral interaction (Romania + UK). Four examination sets were developed at B1 level, ten at B2, and 10 at C1.

  • 2 At Skype SVB meetings, usually convened on a monthly basis, two teams of experts comprising each a (...)
  • 3 Internal and external validation of the receptive skills papers was carried out by pre-testing and (...)

10All exam papers systematically went through a face validity investigation process comprising two steps, first by the tandem team, then by the two Hungarian chief examiners before being validated by a Social Validation Board (SVB) under the responsibility of the French partners2. Before establishing examination centres in Europe, pre-testing sessions were organised in the consortium centres so as to obtain statistical data to confirm the so-called construct validity of the tests (used to determine how well a testing method actually measures the features it is supposed to measure). After internal and external validation3, master copies were formatted, edited, and finalised.

11The final phase was a testing phase during which two more handbooks were developed; one for the training of sTANDEM test assessors (Warta 2013) and one to inform future sTANDEM test takers (2014). Following the protocols agreed upon in five executive conferences of the consortium members, examiners were trained either in face-to-face or online tutorials. In 2015, after the European Commission had granted a one-year extension to the project, exam sessions were administered in a total of 20 centres all over Europe (in Austria, France, Germany, Greece, Hungary, Malta, Poland, Romania, Spain, the Czech Republic and the United Kingdom). Interestingly enough, two non-EU countries (India and Turkey) also hosted the tests. Overall, some 300 test takers who had successfully passed the exam received an official sTANDEM diploma. The certificate contained information on the CEFR level (B1, B2 or C1), the professional module (general medicine, pharmacy, nursing) and the examination type (written and/or oral papers) each candidate had taken. It also included a Statement of Results to help test takers better understand their scores and also give potential employers, recruitment or admissions officers and other stakeholders more insight into the proficiency in professional medical communication they could expect from the candidates. In the Candidate Proficiency Profile, the scores achieved in each paper were documented separately and the grades for the overall achievement were specified as follows: PASS WITH DISTINCTION (overall score: 90-100), PASS WITH MERIT (75-89), PASS (60-74), FAIL (0-59).

Editor: Unfortunately, the sTANDEM project did not prosper, and is not available to test takers today, although the need for medical English certification is widely acknowledged. Ultimately, why was the project not sustainable? What were the project’s merits and its possible shortcomings?

  • 4 As opposed to first-generation tests (characterised by a lack of objectivity and reliability), seco (...)

12One of the major achievements of the sTANDEM project undoubtedly lies in the fact that it explored the then largely unchartered territory of EMP certification. It resulted in the conception and implementation of a prototype suite of language proficiency exams designed to set an international standard for the medical professions that could prove useful for future test developers as a fourth-generation test4 (Tardieu 2013), which is action-oriented and close to authentic professional life situations and useful language skills.

13At the time, EMP certification for healthcare professionals was limited to a few national tests such as the OET (Occupational English Test) developed for Australian needs by the University of Melbourne, the CLES 3 (Certificat de Compétences en Langues de l’Enseignement Supérieur) developed in France and the PROFEX tests in Hungary. By capitalising on these test constructs and extending their national scope, sTANDEM’s main goal was to cater for healthcare professionals worldwide, for whom the command of specialised English has become a key pre-requisite for carrying out clinical, experimental or epidemiological research and taking part in international communication. In today’s global world, the quality of interactions in English among professionals from different parts of the world is an obvious factor of successful and effective communication.

14The international scope of sTANDEM manifested itself in the selection of reading and listening texts which comprised samples from all major types of English from around the globe, most notably British, American, and Australian sources. This was made possible by the generous support of several prestigious medical schools and institutions, e.g. the Johns Hopkins School of Medicine/Baltimore (USA), the World Health Organisation, Podmedics/London (GB), Emergency Medicine Cases/Canada, Mayo Clinic/Minnesota (USA), the New England Journal of Medicine (NEJM), or the Virtual Medical Centre/Perth (Australia), which offered assistance and provided sTANDEM test developers with authentic testing materials.

15In the early stages of the project, the language needs analysis conducted among medical students, healthcare professionals and EMP teachers was a key asset for the development of a common frame of reference for certification in medical English, which prompted a rewriting of the CEFR descriptors to tailor them to EMP needs.

16Another notable success of the project was the intense cooperation of EMP experts from a dozen different countries and various European universities over a period of three to five years, which was made possible by the financial support of the EU. In this respect, the Social Validation Board, which promoted close collaborative work between EMP teachers and researchers, healthcare professionals and experts from other domains (biology, applied linguistics...) proved to be a major step forward in cross-cultural communication. In the same way, during the development phase, the fact that partners worked in international tandems was mutually enriching.

17On top of that, sTANDEM guidelines and activities were disseminated worldwide by a number of associated partners, among which were GERAS (Groupe d’Étude et de Recherche en Anglais de Spécialité) in France, International Medical Publications in the Czech Republic, and the Iaonnina Medical School in Greece. Although the sTANDEM project did not outlive the 2015 exam sessions, the tests themselves proved a resounding success with stakeholders across borders, as evidenced by the highly favourable informal feedback from both test takers and test examiners.

18Unfortunately, the other side of the coin with this fledgling certification system was the fact that consortium members were not able to overcome several pitfalls. First among them was the original selection of consortium members, which was partly based on co-option, goodwill and personal ties, due to time pressure and lack of other options. This resulted in the formation of an inhomogeneous band of partners. Consequently, stakeholders in the project had diverse interests, various degrees of involvement and competencies.

19In the course of test development, coordination problems, inconsistent quality control and the half-hearted involvement of some partners became recurrent matters of debate. Besides, red tape issues with the European Commission and partner institutions in various countries slowed down the development of tests, resulting in a one-year extension of the project.

20On the one hand, collaborative work such as the international cooperation of EMP experts based on multinational standards proved to be a unique strength that was widely perceived as holding great promise for the healthcare community across the globe. On the other hand, it also proved a weakness, since harmonisation efforts and the lofty goals of the project were eventually undermined by more nationalistic agendas. Therefore, when it came to making the project sustainable in the long run, consortium members failed to set up a business plan by finding private investors and reliable sponsor institutions once the EU’s financial support was discontinued. By way of illustration, when some partners asked the academic authorities that had endorsed the project till then for post-funding financial help, the answer they got was: “Why don’t you create your own startup?”.

Editor: In EMP testing, should one assess medical knowledge, medical terminology or communicative skills, or all of them (or something more specific, such as the ability to show empathy, for example)?

21While in general proficiency tests content knowledge is reputed to distort test results, the sTANDEM examination construct was based on the widely held understanding that content knowledge is to be seen as a pre-requisite for successfully eliciting performance in ESP testing. This is confirmed by Douglas, who is in line with practically all authorities in the field when he describes domain-specific professional competence as "inextricably linked to language performance in those fields" (2013: 369) and postulates:

It should always be a part of the construct of specific purpose tests that learners’ specific purpose language needs include not only linguistic knowledge but also background knowledge relevant to the communicative context in which learners need to operate. (2013 : 371)

22In EMP testing, as is the case in all ESP domains, it is the test takers’ language skills and their ability to communicate effectively in their professional environment that are to be assessed, not domain-specific knowledge per se. In EMP certification, medical knowledge is to be regarded as a basic pre-requisite, as stated by Warta in the introduction to the Manual for sTANDEM Test Assessors (2013):

In addition to authenticity, LSP tests, as opposed to the so-called general language tests, are also required to provide for interaction between language knowledge and specific purpose content knowledge. While in general language tests content knowledge is widely regarded as a factor distorting test results, in LSP testing it is a pre-requisite for successfully eliciting LSP performance. Although the two types of knowledge are difficult to separate, it must be stated clearly that the focus of assessment can only be language knowledge, not specific purpose content knowledge.

23This is why it is crucial to define language profiles in the field of health care through a careful needs analysis campaign involving professionals, students, experts and linguists before even thinking of developing EMP tests. Basically, the same five skills should be tested in ESP/EMP certification as in general language proficiency tests, albeit in the specialised domain of health care, this certainly includes the selective use of medical terminology, which implies a good command of terms related to diseases, symptoms, treatments, surgical procedures, etc.

24From a linguistic viewpoint, it is also of great interest to take into account the distinction between purely medical terms, as used by physicians and medical students when they meet or work in a professional setting, and the lay terms doctors resort to when they want to make themselves understood within the context of a doctor-patient conversation. Thus, when a patient consults a doctor, it is advisable for the clinician to use lay terms such as kneecap and whooping cough rather than patella and pertussis, the corresponding medical terms.

25In the same way, specific medical and non-medical communicative skills such as the ability to take a patient’s history, to elicit tell-tale answers from the patient, to give factual information, to break bad news or to show empathy should be systematically included in EMP test tasks.

26Within the sTANDEM project, the interaction between content and language knowledge was further specified. On the basis of a study by Clapham (1996), the role of content knowledge relative to language knowledge was defined depending on the CEFR level of the exam. In lower level exams (B1), the primary focus was on language knowledge, while requirements on content knowledge were kept down by using materials of relatively low specificity. Conversely, content knowledge was given a more prevalent role in higher level exams (C1) by selecting highly domain-specific texts which were more complex and demanding not only language-wise, but also content-wise.

27For example, sTANDEM test takers were required to perform three distinct tasks to test their speaking skills in an action-oriented context. Task 1 consisted of an introductory conversation about the job and/or research field of the test taker (at all three levels). Task 2 was a simulation of a doctor-patient conversation focusing on history taking (level B2), and a simulated interview between a doctor and a patient discussing and/or explaining diagnostic, therapeutic or prognostic details (level C1). Task 3 consisted in commenting on a diagram, table or graph (level B2), or giving a short presentation on a professional issue (level C1).

Editor: In your view, what is the value of off-the-shelf general English certification tests used to assess the language proficiency of students receiving specialised instruction at university level?

28It is important to draw the line between standards for generalist certification and minimal standards for professional certification. In the literature, both Douglas (2000) and O’Sullivan (2012) describe certification tests as a continuum of specificity, from very general to very specific tests. Thus, general English exams such as the TOEFL or TOEIC tests certify that students have reached a certain level in linguistic and communicative skills needed in everyday life and common workplace situations, but rarely in specific professional settings (Fries-Verdeil 2009, Charpy & Carnet 2014). Similarly, when looking at professional certification tests, the OET, which has been developed for altogether twelve healthcare professions ranging from dentistry to veterinary science, can be regarded as a suite of broader tests situated closer to the general end of the ESP continuum (O’Sullivan 2012), whereas narrow tests such as the ones developed in the course of the sTANDEM project were closer to the more specialised end of the continuum.

29There is ample evidence in the literature on language testing and ESP assessment that general English tests are not suitable to assess the linguistic and professional needs of specialised communities. For example, Eggly et al. (1999) report that many medical graduates who scored highly on the TOEIC were found by both colleagues and patients in clinical settings to display weaknesses in their professional communicative abilities. Similarly, Wette (2011) discusses the fact that, although they have met the English requirements, many overseas qualified healthcare professional report experiencing difficulties as they endeavour to acquire professional communicative competence. Douglas draws on O’Neill et al.’s study (2007) and concludes his review of ESP research by laying emphasis on the fact that "the use of a text designed for one purpose for another purpose without accompanying evidence supporting such a use, is questionable at best and at worst unethical." (2013: 377)

30Over and above these fundamental concerns, the washback effect on ESP teaching and learning must not be overlooked. Using off-the-shelf general English tests for assessing a candidate’s ability to use language precisely in authentic domain-specific contexts would definitely have a highly problematic impact on specific purpose language learning and teaching. As will be explained in more detail below, there is reason to assume that such tests would do more harm than good when used to assess the language proficiency of ESP students.

31To sum up, it cannot be denied that, although general English certification tests are efficacious when it comes to evaluating the language skills of students in general or academic settings, their lack of accuracy and validity in the assessment of the language proficiency of students receiving specialised instruction is fraught with all sorts of ethical issues and genuine docimological concerns. This is why there is an urgent need for the development of specialised tests based on authentic professional situations, all the more so as ESP researchers and teachers are increasingly aware of the various kinds of language assessments used to determine a candidate’s abilities to function linguistically in the workplace (Knoch & Macqueen 2020).

Editor: Should ESP certification tests be based on the standards and levels of the CEFR or should they be adapted to fit specialised communicative needs and settings encountered in domain-specific contexts?

32The CEFR and its accompanying guide (Milanovic 2002) have long been regarded as a gold standard in language certification in Europe (Bruderman et al. 2012). Today, they still seem to be the best tool to provide a solid basis, even for specialised language proficiency testing.

33CEFR levels, which have achieved international recognition, may be useful – even within the context of ESP certification – to take into account varying degrees of language and professional skills. Although CEFR descriptors, particularly at level C1, sometimes do refer to professional contexts, they are mostly too vague (as displayed by the frequent reference to "the candidate’s particular fields of interest" at levels B1 and B2) and irrelevant, as demonstrated by Charpy & Carnet (2014). In the same way, Fries-Verdeil suggests that:

In practice, however, making the mosaic of the specific aims that characterise English for special purposes coherent with the plethora of CECRL descriptors is not an easy task, especially as these descriptors are often either inexistent, or inappropriate. (2009 : 118)

  • 5 Some of the sTANDEM descriptors are accessible in Charpy & Carnet (2014).

34This is why specific criteria and descriptors have to be defined in ESP certification. In the course of the sTANDEM project, CEFR descriptors were revisited and more professionally-based specifications5 were provided to make sure that the tests were adapted to the needs and settings typically encountered by healthcare professionals in the fields of medicine, pharmacy and nursing.

35It is our view that future test developers working in the field of ESP certification should use CEFR descriptors and specifications as a basis for further progress by amending and adapting them to meet the needs of medical students and healthcare professionals. Inventory lists of clearly-defined items pertaining to particular CEFR proficiency levels such as specific lexis, structural features and genres would definitely help them achieve that goal.

Editor: Considering that the development of ESP certification tests is a time-consuming and therefore costly task, how can the costs be contained?

36There cannot be any doubt that developing high-stakes certification testing suites takes a lot of time, know-how and perseverance, particularly if they are to be internationally-oriented. It actually took consortium members four years to complete the sTANDEM testing system. And it also took a lot of precious EU money (398,101 euros, to be precise) to fund the project.

37On paper, developing tests in international tandems was an excellent approach, intended to avoid national bias and promote cross-cultural harmonisation both in terms of medical and linguistic standards. However, with hindsight, some tandems proved to be dysfunctional, which led to unnecessary discussions and the rather unfortunate one-year extension of the project. This largely contributed to one of the more costly issues encountered in the course of the project lifetime, namely the constant need to revise and fix minor problems, which kept other partners from working on their own assignments in the project. Due to the given institutional framework, questioning the fitness to serve or even replacing a partner who failed to deliver the goods in the necessary quality or within the prearranged timeline was not an option.

38Concerning financial issues, EU funding stopped the moment the exam sessions were over. Since neither post-development funding nor a business plan had been on the project agenda, partners found themselves in dire straits and were subsequently unable to obtain further financial support from either the public or the private sector. This goes to show that solid financial support for the international dissemination of certification tests should be secured from either private or institutional investors from the very start of an ambitious certification project.

  • 6 Apart from subject-matter and context specialists, future certification projects should include ass (...)

39Therefore, one of the key lessons learnt from the sTANDEM experience is to make sure that future international testing systems are implemented with reliable experts6 in the most pragmatic and cost-effective way. International expertise in ESP testing, including some from well-established institutions such as the Association of Language Teachers in Europe (ALTE), should be involved in the whole process. Furthermore, the cooperation of international professional bodies in the domain must be enlisted. To promote cost-effectiveness, project managers should endeavour to select and recruit a good mix of competent, enthusiastic ESP instructors guided by experienced testing experts in charge of running the exam.

  • 7 According to Cambridge English Assessment, "Linguaskill Business tests English used in a business a (...)

40Other cost-effective measures might consist in building on existing ESP exams in the domain, such as the CLES in the case of EMP, and upgrading them into the desired international format. In the case of domains where existing ESP exams are on offer, e.g. Cambridge English: C1 Business Higher or Linguaskill Business7, adopting them would definitely be preferable to using one of the general English certification tests on the market.

  • 8 Methods used to determine the passing percentage, or cutscore, for a test. The passing grade of a t (...)

41By tailoring and streamlining the test construct to the bare necessities that guarantee the required quality criteria, test developers could also save both time and money. For instance, by replacing the development of an extra set of exam papers for each CEFR level by either a multilevel or a single CEFR level construct, test development timelines could be reduced dramatically. Another option for containing costs might consist in selecting a cost-effective validation protocol: when it comes to standard setting, the Angoff or the Bookmark method8 could be used instead of statistical analysis of pretesting results, which in the sTANDEM experience proved quite time-consuming and lacking in efficiency. Indirectly, the costs of developing an ESP exam suite might be counterbalanced in the long run by reusing past examination papers for examination preparation and course books.

42Finally, it is critical to enhance the testing system’s effectiveness, longevity and profitability by making sure that the project receives long-term financial support either from universities via their departments of continuing education or from private companies, including startups in which different subject matter experts, ranging from ESP linguists to assessment experts and representatives of professional bodies from the domain, should be involved.

Editor: Should ESP certification tests be tailored to curriculum-related learning outcomes or rather focus on the professional needs and the specialised discourse that learners are likely to encounter in their future careers?

43This is a tricky question because there are two seemingly incompatible aspects which have to be kept in mind. On the one hand, the test materials, especially the texts and genres as well as their medical specificity and linguistic complexity, definitely have to be in keeping with the test takers’ learning objectives at Bachelor’s level: in French medical schools, for example, specialisation begins in the third year. On the other hand, the language proficiency required for education and training, clinical practice and international communication, which is obviously more specific and complex than the study materials pre-med students are confronted with on a regular basis, must definitely be a key target too. In other words, particular emphasis should also be laid on the professional needs and the specialised discourse that students are likely to encounter in their future careers. However, the English language proficiency of the large majority of medical students at Bachelor’s level is more likely to be at CEFR level B2 at best, whereas professional discourse standards are mostly at a higher level.

44Looking back at the sTANDEM project may also help shed some light on this issue. When working on the assessment of reading skills, consortium partners decided to focus on textbook extracts and information leaflets at level B1. At levels B2 and C1, which generally correspond to the language proficiency of students taking their Bachelor’s degree, the focus was on specialised case reports and different types of work-related documents (B2) and on informed consent statements and research papers (C1).

45When designing ESP test constructs, this then seems to create a dilemma both language- and content-wise, which is encountered not only in the medical domain. According to the ALTE guidelines mentioned above, the criteria for use in ESP testing need to be clearly stated:

Where the test is designed to integrate the language and the content more robustly, as would be the case in a highly specific test, then clear lines should be drawn between criteria that focus on language and those that focus on content. (2018 : 18)

  • 9 Both domain-specific certificates were discontinued on economic grounds in December 2016.
  • 10 Telc Deutsch B2·C1 Medizin Fachsprachprüfung is a German language test at competence level B2 and C (...)

46Several best practice models demonstrate how the impasse can be overcome, namely by implementing a mixed two-level approach which integrates B2 and C1 level tasks in the exam. This solution was first applied several years ago in the International Legal English Certificate (ILEC) and the Cambridge International Certificate in Financial English (ICFE) examination suites.9 In the medical domain, the same approach is used in the more recent dual-level German language examination for foreign-born doctors from a different language background seeking employment in Germany’s hospitals ; this test (Deutsch B2·C1 Medizin Fachsprachprüfung) is offered by Telc GmbH, a subsidiary of the German Adult Education Association. It makes allowances for specialised texts and genres associated with distinct CEFR levels, as shown by the following extract10 from their site :

In the [...] written exam general and workplace-related German language situations at level B2 are tested. In the oral exam and the composition of a doctor’s note specific medical situations at level C1 is (sic) the focus of attention.

47Another approach that might be considered to help overcome this issue lies in the rating and assessment construct: abandoning the pass-fail system in favour of grading the language proficiency of test takers along a continuous band as demonstrated in the Cambridge Scale recently introduced by Cambridge English Language Assessment, which is able to represent performance across a wider range of language ability than a single-level exam. This has the additional benefit of better meeting the needs of candidates and stakeholders.

48To sum up, as there is a considerable overlap between the two objectives implied in this question, we suggest a mixed approach. Ideally, both the students’ curricular learning objectives and their future professional needs should be targeted.

Editor: How can ESP certification tests promote a positive washback effect and contribute to the quality of language instruction in specialised domains?

49The phrase “washback effect” refers to the impact of testing on curriculum design, teaching practices, and learning behaviours (Alderson & Wall 1993). Ideally, best teaching and learning practices result from positive washback, also known as washforward (Wall & Horak 2008).

50As already stated, situations in which the language proficiency of medical students is certified by off-the-shelf general English tests ought to be avoided at all costs because they would have a very negative, even counterproductive washback effect.

51The findings of Eggly et al. (1999) and Wette (2011) already mentioned above are confirmed by a study by Ajideh (2011), who compared the scores of medical students obtained in general English tests to those they obtained in EMP tests. The results demonstrated that students who obtained higher scores in one test did not do so in the other, which suggests that the two tests assess quite disparate aspects of language proficiency.

52On a pragmatic level, the discrepancy described by Ajideh is quite tangible when one looks at lexical chunks typically found in medical contexts and compares them to their occurrences in general English texts. Let us consider, for instance, the phrase ‘rebound tenderness’. In EMP contexts, the phrase describes a tell-tale sign that plays a key role in the diagnosis of acute appendicitis. In general English exams, by contrast, ‘rebound’ is a lexical item that might be encountered in texts related to basketball and ‘tenderness’ is more likely to appear in a romantic novel or in the description of the services of a steak house. Needless to say, there is hardly any helpful semantic overlap that might give test takers a clue when trying to transfer meaning from medical to general English contexts and vice-versa.

53As a result, it must be assumed that the exclusive use of general English tests to certify the language proficiency of medical students would practically ruin any effort leading to adequate EMP teaching.

54Given the dynamic interaction between language testing, learning and teaching, state-of-the-art EMP certification tests that are authentic in that they actually reflect domain-specific language profiles and needs definitely provide the best substrate for enhancing language instruction intended to promote the professional, discursive and cross-cultural skills required for effective communication in professional settings. As pointed out by Knoch & Macqueen:

  • 11 LAPP stands for Language Assessment for Professional Purposes.

One of the key considerations in the development of LAPPs11 is that of authenticity. Authenticity is important to consider because it provides test-takers the opportunity to use language in a very similar task environment to that in the real world, with the hope that performances are truer representations of what test-takers would achieve in the real world. Authenticity is also considered to help promote positive washback as test candidates are provided with awareness of their future work environment. (2020 : 101)

55The washforward to be expected from such testing practices are enhanced course design and better educational practices on the teaching side as well as higher motivation and enhanced language skills among students. Although it was nipped in the bud, this effect was readily observable in the sTANDEM experience.

56In this context, it must not be forgotten that, in some domains such as aviation or medicine, the professional exchange of real-time information, advice and opinions between experts and their colleagues or clients is associated with risks, as even the slightest misunderstanding can lead to the most tragic consequences. There is copious evidence in the literature (Flores et al. 2003; Benfield & Feak 2006; Moreno et al. 2007; Friedbichler et al. 2008) that it is in these domains that inadequate professional language skills are particularly detrimental and, in some cases, can even cause devastating harm.

57This is illustrated by Nogue-Bonet (2019), who reports the case of a comatose teenager who was rushed to hospital by his Spanish-speaking family. They told healthcare staff that the patient was "intoxicado", which means in Spanish that he had ingested something that made him ill. The healthcare staff took the Spanish word to mean he was intoxicated and initiated treatment for substance abuse. In reality, he had sustained a brain haemorrhage that subsequently went untreated for more than two days. As a result of the linguistic misunderstanding and the untreated haemorrhage, the patient became quadriplegic. Had the correct diagnosis been made right away, the patient could have left the hospital without any major lasting sequelae.

58In our view, it is possible to promote a positive washforward effect for specialised certification tests only by setting up a quality ESP certification construct that reflects the needs of key stakeholders in the professional field, by capitalising on best practice models and paying special attention to the adequacy and authenticity of the test tasks.

59At the end of the day, tests specially designed for ESP certification could also provide clear objectives and guidelines for ESP teachers who sometimes lack specific training to teach domain-specific courses. Another collateral effect of washforward lies in the fact that domain experts might be incited to publish textbooks with relevant teaching material based on past examination papers. And last but not least, positive washback could progressively lead to the international harmonisation of ESP teaching practices by highlighting the discursive and communicative needs of students and professionals all over the world.

Editor: How can ESP teachers be formally trained and certified to implement and assess certification tests?

  • 12 A general English certification aimed at improving the linguistic and communicative skills of migra (...)

60Existing general English certification models can be used as a starting point for discussion. For instance, IELTS (International English Language Testing System)12 requirements for test examiners are extremely demanding, as shown by the following instructions taken from their site:

  • future examiners should have a recognised language teaching qualification and proof of substantial, relevant, recent teaching experience ideally equivalent to at least 1800 hours.

  • future examiners are invited to complete Induction, Training, and Certification of Procedure and Assessment modules.

  • quality assurance modules may be delivered by online self-access, by telephone and/or via face-to-face meetings.

61To address the issue of EMP certification, let us now have a look at the STANDEM tests. Basically, consortium members followed the best practice models set by state-of-the-art general English tests. Pre-testing sessions were organised by consortium members in their own examination centres so as to obtain statistical data to confirm the construct validity of the testing procedures. As mentioned previously, a manual for the training of sTANDEM test assessors was compiled. Future examiners in prospective European examination centres were formally trained and certified in 2014 under the responsibility of senior consortium members; the training took place either on location, as in Dijon or Innsbruck, via videoconferencing as in Graz (Austria) or in face-to-face-meetings organised before the examination sessions as was the case in Créteil and Angers (France).

62Assessors were required to hold a valid sTANDEM Test Assessor Certificate for the given examination period. To become a qualified sTANDEM assessor, future examiners were required to hold at least a Bachelor’s degree in linguistics or a branch of health sciences and have at least one year’s experience in EMP teaching or testing. They were also asked to sign a declaration of confidentiality and a declaration of secrecy.

63Considering that the training and validation of ESP teachers as future test examiners is vital for the success of certification tests, it is important to inform and train them well ahead of the examination sessions. This is why any future consortium would do well to make a handbook available for the training of examiners and the implementation of future tests. Such a handbook should rest on the observation of best practice models (e.g. ALTE guidelines, Cambridge English Language Assessment, TOEFL, etc.) and standardised testing based on statistical data to confirm the construct validity of the tests. Pragmatically speaking, most of the training should be done online.

Editor: What docimological criteria should the tests be based on to ensure quality ESP certification?

64In an early attempt to define docimology, De Landsheere (1971) stated that it is a discipline which has as its object the systematic study of examinations, in particular their grading systems and the proper conduct of examiners and examinees.

65In the sTANDEM project, the docimological cornerstones as well as detailed instructions were laid down in three manuals, one for test developers, one for examiners and one for candidates. One of the conventional features of the rating procedure in any segment of the sTANDEM testing system was that the test takers’ performances were assessed by two trained assessors, who had specific assessment scales at their disposal. However, contrary to various general English tests, consortium members adopted a holistic approach to the five targeted skills since sTANDEM certification did not depend on the score achieved in each of the five skills separately; they introduced a form of compensation to make sure that test takers were not penalised if they were weaker in any one skill. By way of illustration, a minimum requirement of 40 percent was introduced for each paper (speaking and listening comprehension, which constituted the oral test and writing and reading comprehension, which constituted the written test), while the overall requirement was 60 percent for the written and the oral tests taken together.

66Although we were not able to conduct a scientific analysis of the sTANDEM exam results due to the fact that the project was discontinued after the first sessions, we can confirm that it is critical to provide a manual for test assessors and another one for test takers. This should allow to define docimological criteria that lead to best practices and flawless protocols. Domain-specific certification must not be purely business-oriented; it should be based on sound, internationally-recognised models from the process of test design and development all the way to the procedures for grading, rating and monitoring. Thus, the following passage taken from ALTE’s guidelines is of particular interest:

Rating productive performance in LSP tests might pose greater challenges than general language tests. In the case of highly specialised tests it might be useful to involve subject and context experts as well as assessment experts. The necessary degree of cooperation might depend on the degree of specialisation of the test and on the degree of jargon typical of the subject area. (2018 : 17)

67To ensure quality certification, we believe that subject and context experts in specialised domains, in other words professionals and ESP teachers, should play a major role in the definition of solid docimological criteria. Similarly, pre- and post-testing auditing by international experts in the field should be considered to monitor the quality of tests and guarantee that best practice models are being implemented. Once again, this should be done in keeping with recent recommendations for the development of ESP tests published by ALTE in 2018.

Haut de page


Adjideh, Parviz. 2011. “EGP or ESP Test for Medical Fields of Study”. Journal of English Language Teaching and Learning 3/7, 19–37.

Alderson, Charles & Diane Wall. 1997. “Does Washback Exist?”. Applied Linguistics 14, 115–129.

ALTE guidelines. Guidelines for the Development of Language for Specific Purposes Tests, retrieved from <> on 11/08/2020.

Bachman, Lyle. 1990. Fundamental Considerations in Language Testing. Oxford: Oxford University Press.

Bachman, Lyle. 2007. Statistical Analyses for Language Assessment. Cambridge: Cambridge University Press.

Benfield, John & Christine Feak. 2006. “How Authors Can Cope With the Burden of English as an International Language”. Chest 129/6, 1728-30.

Bruderman, Cédric & Christine Demaison. 2012. « Le CERCL : un outil pour construire une politique des langues ? Retour d’expérience sur l’évaluation et la certification à l’université UPMC (2009-2011) ». Cahiers de l’APLIUT 31/3, 31–41. <>

Charpy Jean-Pierre & Didier Carnet. 2014. "The European sTANDEM Project for Certification in Medical English: Standards, Acceptability and Transgression(s)". ILCEA 19 <>

Clapham, Caroline. 1996. The Development of IELTS: A Study of the Effect of Background Knowledge on Reading Comprehension. Cambridge: University of Cambridge Local Examinations Syndicate.

Council of Europe. 2001. "Common European Framework of Reference for Languages: Learning, Teaching, Assessment", retrieved from <> on 14/06/2020.

De Landsheere, Gilbert. 1971. Évaluation continue et examens : précis de docimologie. Bruxelles et Paris: Labor/Fernand Nathan.

Douglas, Dan. 2000. Assessing Languages for Specific Purposes. Cambridge: Cambridge University Press.

Douglas, Dan. 2013 "ESP and Assessment". In Paltridge, Brian & Sue Starfield (eds.). The Handbook of English for Specific Purposes. Boston: Wiley-Blackwell, 367–383.

Eggly, Susan, Joseph Musial & Jack Smulowitz. 1999. "Research and Discussion Note the Relationship between English Language Proficiency and Success as a Medical Resident". English for Specific Purposes 18, 201–208.

Flores Glen, Michael Laws & Sandra Mayo. 2003. "Errors in Medical Interpretation and their Potential Clinical Consequences in Pediatric Encounters". Pediatrics, 111/1, 6–14.

Friedbichler, Ingrid & Michael Friedbichler. 2003 [2016]. KWiC–Web Fachwortschatz Medizin Englisch Sprachtrainer & Fachwörterbuch in einem KWiC – Key Words in Context. Stuttgart : Thieme.

Friedbichler, Michael & Ingrid Friedbichler & Jens Christoph Türp. 2008. « La communication scientifique à l’âge de la globalisation ». Schweizerische Monatsschrift für Zahnmedizin / Revue mensuelle suisse d’odonto-stomatologie 118/12,1193–1212.

Fries-Verdeil, Marie-Hélène. 2009. « Mise en cohérence de l’anglais de spécialité et du CERCL en France : difficultés et enjeux ». ASp 56 105–125.<>

Knoch, Ute & Suzie Macqueen. 2020. Assessing English for Professional Purposes. London & New York: Routledge.

Milanovic, Michael. 2002. Common European Framework of Reference for Languages: Learning, Teaching, Assessment: Language examining and test development. Strasbourg: Council of Europe Language Policy Division.

Moreno, Maria, Regina Otero-Sabogal & Jeffrey Newman. 2007. “Assessing Dual Role Staff-Interpreter Linguistic Competency in an Integrated Healthcare System”. Journal of General Internal Medicine 22/2, 331–335.

Nogue-Bonet, Betlem. 2019. “Ad Hoc Interpreters – A Risk in the Clinical Setting” <>. Retrieved on 11/08/2020.

O’Neill, Thomas, Chad Buckendahl, Barbara Plake & Lynda Taylor. 2007. “Recommending a Nursing-Specific Passing Standard for the IELTS Examination”, Language Assessment Quarterly 4/4, 295–317.

O’Sullivan, Barry. 2012. “Assessment Issues in Languages for Specific Purposes”. The Modern Language Journal 96, 71–88.

Rebek-Nagy, Gabor. 2012. Manual for sTANDEM Test Developers and Item Writers (unpublished). Pécs: Faculty of General Medicine, University of Pécs.

Tardieu, Claire. 2013. « Testing et certification ». ACEDLE 10/2, 237–251. <>

Wall, Diane & Tania Horak. 2008. “The Impact of Changes in the TOEFL Examination on Teaching and Learning in Central and Eastern Europe: Phase 2, Coping with Chang”. ETS Report Research Series 2, i–105.

Warta, Wilmos. 2013. Manual for sTANDEM Test Assessors (unpublished). Pécs: Faculty of General Medicine, University of Pécs.

Wette, Rosemary. 2011. “English Proficiency Tests and Communication Skills Training for Overseas Qualified Health Professionals in Australia and New Zealand”. Language Assessment Quarterly 8/2, 200–210.

Haut de page


1 Professor Barron († 2019), who was head of the Medical Communications Centre at Tokyo Medical University at the time, was harbouring plans to launch an international medical communications association intended to be a cross-fertilisation hotbed harmonising the expertise of associations such as JASMEE (Japan Society for Medical English Education), EMWA (European Medical Writers Association) and EASE (European Association of Science Editors).

2 At Skype SVB meetings, usually convened on a monthly basis, two teams of experts comprising each a test developer, an EMP teacher and a medical/scientific expert assessed at least two different types of exam papers drafted by test developers working in tandem. Their conclusions were passed on to the chief examiners.

3 Internal and external validation of the receptive skills papers was carried out by pre-testing and traditional statistical analysis in one procedure. For external validation purposes, same-level tasks of the PROFiciency EXamination (PROFEX) administered by the University of Pécs (Hungary) were incorporated in the pretesting papers.

4 As opposed to first-generation tests (characterised by a lack of objectivity and reliability), second-generation tests (emblematic of the structuralist-psychometric approach, and disconnected from real life contexts) and third-generation tests (assessing linguistic skills in context and offering a global view of language proficiency).

5 Some of the sTANDEM descriptors are accessible in Charpy & Carnet (2014).

6 Apart from subject-matter and context specialists, future certification projects should include assessment experts as well as project management and executive positions.

7 According to Cambridge English Assessment, "Linguaskill Business tests English used in a business and corporate setting, and is most suitable for recruitment in organisations where employees are expected to be familiar with the language of business". The tests are currently being used in several French business schools.

8 Methods used to determine the passing percentage, or cutscore, for a test. The passing grade of a test cannot be arbitrary; it must be justified with empirical data.

9 Both domain-specific certificates were discontinued on economic grounds in December 2016.

10 Telc Deutsch B2·C1 Medizin Fachsprachprüfung is a German language test at competence level B2 and C1 of the CEFR. <>. Retrieved on 14/07/2020.

11 LAPP stands for Language Assessment for Professional Purposes.

12 A general English certification aimed at improving the linguistic and communicative skills of migrants and higher education students; it is one of Cambridge English’s better-known productions. Retrieved on 10/07/2020.

Haut de page

Pour citer cet article

Référence papier

Jean-Pierre Charpy, Didier Carnet et Michael Friedbichler, « Developing a standardised language examination for medical purposes: Lessons learnt from the sTANDEM project »ASp, 79 | 2021, 9-27.

Référence électronique

Jean-Pierre Charpy, Didier Carnet et Michael Friedbichler, « Developing a standardised language examination for medical purposes: Lessons learnt from the sTANDEM project »ASp [En ligne], 79 | 2021, mis en ligne le 01 mars 2022, consulté le 20 juin 2024. URL : ; DOI :

Haut de page


Jean-Pierre Charpy

Jean-Pierre Charpy is a retired senior lecturer in English for Medical Purposes. He used to teach EMP courses at the Dijon School of Medicine. He is affiliated with research unit EA 4182 from Université de Bourgogne Franche-Comté. His research interests cover the Fasp (Fiction à Substrat Professionnel) genre, the diachronic study of medical discourse and the exploration of EMP certification. He is the co-author of L’Anglais des Spécialités Médicales (Ellipses 2015) and L’Anglais pour les Sciences de Santé (Ellipses 2019). Within the context of the sTANDEM project, he was convenor and co-head of the French-based Social Validation Board. He also supervised the organisation of sTANDEM tests in Angers and

Articles du même auteur

Didier Carnet

Didier Carnet is a senior lecturer in the English Department of the Dijon Med School, Université de Bourgogne Franche Comté, France. He holds a PhD in English applied linguistics and is affiliated with the research unit EA 4182. His main research covers didactics, discourse analysis and applied linguistics in English for Medical Purposes. He is the co-author of L’Anglais des Spécialités Médicales (Ellipses 2015), L’Anglais pour les Sciences de Santé (Ellipses 2019), and L’Anglais de la LCA sans Galère (Ellipses, 2019). Within the context of the sTANDEM project, he was co-head of the French-based Social Validation Board. He also supervised the organisation of sTANDEM tests in Créteil and Dijon.

Articles du même auteur

Michael Friedbichler

Michael Friedbichler is a retired senior lecturer who used to teach English for Medical Purposes at Innsbruck Medical University, Austria. Since the 1980s, he has been involved in medical translation and has served as an English language publishing consultant for biomedical researchers. His research interests focus on corpus-based translation and lexicography as well as EMP teaching and certification. He is the co-author of several learner’s dictionaries in different medical domains and language pairs: Fachwortschatz Medizin Englisch (Thieme, Germany, 3rd ed. 2016), Fachwortschatz Zahnmedizin Englisch (Thieme, Germany 4th ed. 2019), Pinkhof Medisch Engels: KWiC-Web taaltrainer en vakwoordenboek voor onderwijs en onderzoek (BSL/Springer, the Netherlands 2009), and Dictionary of English Usage in Medicine (Medical View, Japan 2012). Within the context of the sTANDEM project, he headed the team developing the listening tasks, oversaw tandem validation of the writing tasks and piloted the prototype exam for medical German in cooperation with the University of Freiburg/

Haut de page

Droits d’auteur


Le texte seul est utilisable sous licence CC BY-NC-ND 4.0. Les autres éléments (illustrations, fichiers annexes importés) sont « Tous droits réservés », sauf mention contraire.

Haut de page
Rechercher dans OpenEdition Search

Vous allez être redirigé vers OpenEdition Search