- 1 This article is based on a paper read at the GERAS workshop (Anglais de spécialité) at the SAES Con (...)
1The aim of this paper1 is to examine the notions of phraseology and collocation in the field of English for Specific Purposes (ESP) and to recast these terms from the point of view of Systemic Functional Linguistics (SFL). Broadly speaking, phraseology involves the study of formulaic sequences of words, including idiomatic phrases and proverbial expressions, which stand in contrast to other more prosaic constructions in the language in that they have a highly conventionalised form and frame of reference. For example, the rhetorical impact of the phrase (to) cut (one’s) losses (cited in sample text T1 in the Appendix) cannot quite be captured by paraphrases such as: accept what one has lost and move on, stop doing something in order not to make a bad situation worse, etc. Whereas phraseology is phrase-oriented and rhetorical (involving a contrastive choice between marked phrases and their unmarked paraphrases), the notion of collocation is essentially word-oriented and cohesive: it refers to the extent to which the presence and meaning of a word ‘coheres’ or depends on the presence of another word (or words) in the same stretch of text. For example, the noun loss refers to ‘debit, decrease in revenue’ in contexts such as to cut one’s losses and to make a loss, whereas loss refers to ‘bereavement, death’ when used in the context of verbs such as mourn, regret, suffer, etc.
2In the first half of this paper, I contrast the traditional, lexicological approach to phraseology and collocation with the ‘lexicogrammar’ approach adopted by the proponents of Systemic Functional grammar (Halliday 1961, Halliday & Matthiessen 2004). The notion of lexicogrammar encompasses a much broader set of phenomena than are usually considered in mainstream lexicology. In the final sections of this paper, I demonstrate this by showing that it is possible to explore the lexicogrammatical properties of high-frequency, closed-class grammatical items (also called ‘small words’). My point is that individual grammatical signs not only enter into collocational relations, but also form relatively predictable and productive chains of expression, with one construction cascading into another. It can be shown that these extended lexical patterns are often unique to a particular register or genre. I would argue that the identification of such patterns should be a fundamental step in the systematic analysis of ESP texts. For demonstration purposes, throughout this paper I refer to examples taken from two related pieces of science writing on Genomic Imprinting (set out in the Appendix: T1 an extract from a popular science book by Dan Dennet, T2 an abstract from a research paper by David Haig).
3The term lexicogrammar refers to two distinct but related notions: (1) the typical lexical and grammatical environment of a sign as it is habitually used in naturally occurring texts or ‘discourse’, and (2) the core stratum of ‘wording’ in Michael Halliday’s model of language, which serves to mediate between the lower stratum of ‘sounding’ (graphology/phonology) and higher ‘meaning’ (semantics/discourse). As this notion was first developed in the framework of Systemic Functional Linguistics (SFL) (Halliday 1961, Fries et al. 2002, Halliday & Matthiessen 2004), it is important to set out here some of the core features of the SFL approach.
4One of the central tenets of SFL is that lexis (a structured system of signs which serves to organise the vocabulary of a language) and grammar (a structured system of choices which serves to organise sequences of signs into texts) are not different in nature, but rather form a unified stratum in the language: the lexicogrammar. A further central assumption of SFL, following Firth (1957), is that no aspect of lexis or grammar can be properly defined without reference to its typical context of use (or ‘co-text’) that is to say in actual stretches of texts or discourse. It follows from this that SFL rejects the structuralist view that the abstract system of language (langue) is independent from language in use or discourse (parole). Rather, the language system is constantly interacting with and being shaped by different types of speech event (the ‘context of situation’) within a community of speakers (the ‘context of culture’). Another way of putting this, following Martin (2001), is to say that everything in language, from lexical items and grammatical constructions to whole texts, has evolved to express very specific discourse functions, in the form of situational ‘registers’ (the lexicogrammatical resources associated with a specific speech activity, such as impersonal expressions, nominal style, taxonomies of terms, etc.), as well as ‘genres’ (goal-oriented, culturally specific speech activities, such as conversation on a scientific topic, exposition in popular science, narration in a research article, etc.). It is this focus on the underlying communicative functions of language and the systemic choices that are made available by the language system that make SFL distinct from other models of language.
5It follows from what has just been written that the SFL viewpoint on phraseology and collocation is very different from that of mainstream lexicology. Lexicographers and other analysts typically conceive of phraseological phenomena in terms of a continuum that ranges from ‘free combinations’ at one end to ‘fixed phrases’ at the other. Here is how Howarth (1996) puts it:
[…] a ‘scale of idiomaticity’, ranging from the most freely co-occurring lexical items and transparent combinations to […] the most cast-iron and opaque idiomatic expressions. [… It] is desirable for purposes of efficiency to eliminate from the description those combinations whose co-occurrence can be accounted for by normal grammatical and syntactic processes. (Howarth 1996: 32-47)
6The SFL approach is diametrically opposed to this view of language. Firstly, SFL assumes that any normal construction in the language can potentially be promoted to the rhetorical status of idiom, and there is thus no need to establish a separate category of phraseological unit outside the lexicogrammar (this point is discussed in terms of lexicalisation, below). Secondly, Howarth’s notion of free combination supposes that grammatical rules or structures operate independently from lexical signs or lexical relations. The lexicogrammar approach assumes instead that even the most mechanical or abstract grammatical process depends on lexical relationships and has a lexical realisation (e.g., the grammatical mechanism of ‘raising’ depends on cognitive, reporting verbs as in the pattern N has been {found, shown, thought} to V.) In this respect, it is useful to return to Firth’s (1957) original conception of collocation, which states that all signs in the language are mutually dependent on and mutually defined by the other signs with which they are habitually used within actual stretches of text:
Words must not be treated as if they had isolate meaning and occurred and could be used in free distribution. (Firth 1968b: 18)
The collocation of a word or a ‘piece’ is not to be regarded as mere juxtaposition, it is an order of mutual expectancy. The words are mutually expectant and mutually prehended. (Firth 1957: 181)
7The main objects of study from an SFL perspective are thus not phraseological units or grammatical constructions, but rather lexicogrammatical (LG) patterns (Stubbs 1995, Hunston & Francis 1998, Tucker 1998, Legallois & François 2006). Lexicogrammatical patterns have the following properties:
8- a LG pattern is a predictable but also productive sequence of signs, which as a whole shares a stable, coherent frame of reference;
9- a LG pattern can be composed of lexical signs, or more abstract signs, including grammatical morphemes and constructions;
10- a LG pattern is composed of permanent ‘pivotal’ signs and a more productive ‘paradigm’, a feature which allows the pattern to be reformulated and integrated into other patterns and thus into on-going discourse;
11- a LG pattern may extend over a long stretch of text, it may be discontinuous and it may or may not be a syntactic constituent or phrase.
12It is possible to explore some specific examples of LG patterns that occur in the research article abstract (T2, see Appendix), such as the sequences mount (a) response and gene + express. Using a Web browser, it is possible to find over 16,000 examples of mount a response in texts relating to molecular biology, including:
(1) Patients with muscular dystrophy mount immune response to dystophin protein prior to gene therapy.
(2) Target cells however mount a response to such membrane damage...
(3) [...] the host might mount a response against the cancer cells...
(4) Pure-bed S. salar were susceptible but frequently mounted a response to G. salaris without eliminating the infection.
(5) We describe an HA-A1 melanoma patient who has mounted a spontaneous cytolytic T cell (CTL) response against an antigenic peptide encoded by gene MAGE-A3 and presented by HLA-A1.
- 2 Here I follow the usual SFL practice of using capitalised initials for semantic roles and grammatic (...)
13From the point of view of the lexis, mount (a) response is ‘lexicalised’ phrase, an extended lexical sign in which there is only a small degree of variation. For example, an on-line search for a passive sequence such as response (is, was) mounted reveals only four occurrences, suggesting that the pattern is relatively invariable. From a grammatical point of view, the pattern involves a Predicator2 mount, which expresses a ‘light’ or generic Material Process (change, create), plus a Complement response which specifies the type of Process expressed by the verb (for Halliday & Matthiessen 2004, its semantic role is ‘Process Range’). Finally, it is important to point out that the pattern is not restricted to the pivotal elements mount (a) response: in its wider context, it also includes a relatively stable set of Subjects (cells, hosts, patients) and an ‘indirect’ Complement (introduced by against or to), which is in effect the main (Affected) Participant of the clause.
14The sequence gene + express involves a much more productive set of LG patterns, as the following examples suggest (these are taken from the 500,000 word Pharmaceutical Sciences Corpus (PSC), reported in Gledhill 1995, 1997):
(6) Under these conditions, we did not detect PAF-R gene expression (Ma and Bazan, 2000).
(7) However, expression of the gene was not confined to the hair follicle, as the transgene phenotype included not only hair abnormalities, but also vertebral defects and bladder, liver and intestinal tumors.
(8) In the present study, we report our attempt to identify differentially expressed genes with respect to the confluence/proliferative status of MGH-U3 cells in culture.
(9) [...] level was determined semiquantitatively by calculating the ratio of density metric value from specific genes expressed in relation to the internal standard
(10) Results: the Muc2 mucin gene was expressed in middle ear mucosa of the control rats.
15The signs gene + express occur in two basic LG patterns. The first involves a nominalisation, in which gene is a (pre-modifying) Classifier or (post-modifying) Qualifier of a nominalised Process (gene expression, expression of the gene). In these contexts, the emphasis is on the investigation or observation of a ‘metaphorical’ (nominal, static) process (we did not detect, was not confined to...). In the second pattern, gene is typically post-modified by an embedded passive clause, or is the Subject of a passive (examples 8-10). In these contexts, the emphasis is on explaining the physical or genetic location of a ‘congruent’ (verbal, dynamic) process. In both patterns, the implicit semantic role played by gene is not Agent but rather Medium (Halliday & Matthiessen 2004), the location or vehicle in which the self-regulating Process of expression takes place.
16It is interesting to note that these examples represent two fairly typical perspectives that can be adopted in science writing. In the LG patterns typically associated with gene + express(-ed, -ion), there is no explicit Agent. In contrast, the LG pattern mount (a) response always involves an Agent: it is either the host’s cells, the host or more generally the patient. In the contexts above (1-5) mount (a) response appears to be a deliberately dramatic choice of expression, and in text T2 this fits in coherently with the other conflictual metaphors used throughout the rest of the text.
17Having set out the main principles of the lexicogrammar approach, it is now worth revisiting the well-known terms ‘phraseology’ and ‘collocation’. One of the principal assumptions of the traditional lexicological approach is that phraseological phenomena generally correspond to lexical units. This is reflected in the terminology of phraseology studies, especially the phraseological unit (PU) – in contrast to ‘phraseologism’ and ‘phraseme’, which are used differently. The prototypical examples of PUs studied in the literature tend to be idioms it’s raining cats and dogs, catchphrases the rain in Spain stays mainly on the plain, proverbs it never rains but it pours, and the like. These kinds of phrases are clearly essential to the cultural life of a language. However, examples such as these give the impression that PUs generally have an idiosyncratic structure or meaning. They also suggest that PUs correspond to fully-formed constituent phrases or clauses.
18There have been few studies on phraseological units in ESP and science writing, at least in the traditional ‘idiom-oriented’ sense of the term. The exception perhaps lies in the areas of LSP, terminology and translation studies (Pavel 1993, Fiedler 2007). However, not all phraseological studies adopt this perspective, or indeed refer to phraseological units. An alternative approach has emerged in discourse analysis (Gréciano 1997, Tollis 2001, Gonzalez-Rey 2002, Gledhill & Frath 2007) and corpus-based lexicography (Moon 1994, Fernando 1996, Hunston & Francis 1998, Pecman 2005). On the basis of empirical evidence, these analysts emphasise the fact that idiomatic expressions change over time, have variable interpretations in on-going discourse, and are often reformulated or serve as the basis for new constructions. Similarly, analysts working in psycholinguistics and language acquisition (Wray 2002, Jones & Haywood 2004, Granger & Meunier 2008) refer to ‘formulaic sequences’, a term which can be applied to the invariable sequences encountered in children’s speech (allgone) or in conversation (d’you know what I mean?).
19Rather than concentrate on the notion of ‘idiomaticity’ or on specific types of phraseological phenomena, it may be more relevant to those working in the SFL perspective and areas such as ESP to refer to more general, underlying processes. An important notion to emerge recently in cognitive and comparative linguistics involves lexicalisation, the historical process of language change in which a sequence of signs gradually coalesces in structure and in sense to become a single sign. Brinton & Traugott (2005) claim that this process involves a continuum ranging from L1: partially productive lexicalised compounds and phrases (airbrush, to bear witness, cutting-edge), through L2: non-productive lexicalised composites (auburn hair, with bated breath, to curry favour) and finally L3: fully lexicalised items (altogether, breakfast, causeway). It is important to note that although lexicalisation is defined in the same terms as idioms and other phraseological phenomena, the process potentially involves a much broader set of patterns:
Lexicalization is the change whereby in certain linguistic contexts speakers use a syntactic construction or word formation as a new contentful form with formal and semantic properties that are not completely derivable or predictable from the constituents of the construction or the word formation pattern. Over time there may be further loss of internal constituency and the item may become more lexical. (Brinton & Traugott 2005: 96)
20Various examples of lexicalisation can be seen in the sample texts T1 and T2. As might be expected, there are few phraseological units in the traditional sense of the term in these texts, except perhaps some stereotypes or clichés the conflict plays out, cut her losses in T1 (these appear to be quite appropriate to a popular science account). However, in the same text there are also a variety of lexical frameworks (also known as ‘sentence stems’) it is the embryo’s best interests...that..., locutions or lexicalised verb phrases given the choice, taking whatever steps are available, lexicalised adverbial/prepositional phrases and so on, of course, on the one hand...on the other and lexicalised noun groups by-product, tug-of-war, trying circumstances. Similarly, text T2 (the more ‘serious’ research article abstract) does not contain any clear examples of phraseological units. Instead, there are many examples of partially lexicalised noun groups blood glucose levels, natural selection and lexicalised verb groups such as the examples examined above, genes expressed as, mount a response.
21Unlike phraseological units, there has been a long tradition of studies on collocation in applied linguistics, especially in the fields of English for Academic Purposes (EAP) and Language for Special Purposes (LSP) (Sager, Dungworth & McDonald 1980, Howarth 1996, Nesselhauf 2003, Williams 2003, Cavalla 2008) as well as in related areas such as terminology (Gläser 1988, Béjoint & Thoiron 1992, Thomas 1993, Pearson 1998, Grossmann & Tutin 2002, Tutin 2007). Many of these studies adopt a semantic definition of collocation proposed by lexicologists such as Hausmann (1985) and Mel’čuk et al. (1995). The key concept in this approach is the lexical function, a privileged semantic relation between two lexical items in which one element retains its core meaning as the ‘base’ while the other is a relatively restricted or metaphorised ‘collocator.’ For example, constructions such as express a gene and mount a response (to take the examples from T2 examined above) are considered to be collocations because they exploit a metaphorised, or in this case specialised sense of express (‘to process information in order to synthesise proteins or other gene products’) or response (‘a hormonal defence mechanism’). In contrast, Predicator + Complement sequences such as produce hormones, provide nutrition, release hormones (these examples are from text T1) are considered to be simple ‘combinations’ (to use Howarth’s 1996 term), because they refer to one of the usual senses of a polysemous verb. This approach has been particularly influential in LSP and terminology, as can be seen in the distinction often made between ‘LGP’ and ‘LSP’ collocations (Sager et al. 1980, Benson et al. 1986, Howarth 1996).
22The advent of computer-based corpus analysis has meant that many linguists use statistical methods for identifying collocations as well as or instead of semantic criteria. The statistical approach emphasises factors such as the frequency of co-occurrence of lexical items (Smadja 1993, Stubbs 1995, Evert 2004), the distribution and co-occurrence of collocations across text-types (Muller 1968, Williams 1998, Biber et al. 2004) and more recently the co-occurrence of lexical items and grammatical constructions (Stefanowitsch & Gries 2003). Since corpus analysis is necessarily based upon the observation of texts, a probabilistic approach has often been central to the lexicogrammar approach, as can be seen in this early definition from Halliday:
Collocation is the syntagmatic association of lexical items, quantifiable, textually, as the probability that there will occur at n removes (a distance of n lexical items) from an item x, the items a, b, c... Any given item thus enters into a range of collocation, the items with which it is collocated being ranged from more to less probable. (Halliday 1961: 276)
23In this light it is interesting to re-examine the collocations express + gene and mount + response mentioned in the previous section. The pair of signs express(-ed, -es, -ing, -ion) + gene(s) co-occur (i.e., occur together within a window of five words, right and left) 156 times in the PSC, and 263 times in the BNC. In contrast, the pair mount (-ed, -ing, -s) + response(s) does not occur in the PSC, and only occurs four times in the BNC. Since the BNC is twenty times the size of the PSC, there is proportionally a stronger rate of co-occurrence between express + gene than mount + response. But there are also many other ways of looking at these data. For example, using the AntConc program (Anthony 2007), it is possible to find all of the exactly repeated sequences (‘clusters’) that are formed within a given span of a single lexical item (related terms include ‘N-Grams’ and ‘bundles’, as reported in Biber et al. 2004). Thus within a span of five words, we can find 3,214 clusters for gene(s) in the PSC. It is interesting to note that common collocates of gene such as express(-ed, -ion) occur quite low down in the frequency list of clusters. This is because clusters, such as expression of fibronectin gene was, expression of genes encoding biotransformation, expressed genes with respect to, etc., mostly only occur once (although segments of the same cluster are also counted again, as parts of other clusters). This type of analysis shows that frequently co-occurring pairs of signs are not necessarily involved in strictly fixed sequences.
24Finally, since collocation is now usually associated with the large-scale analysis of corpora, there has been less research on the role of collocation as a textual resource in individual texts. In this respect, it is appropriate to return briefly to Halliday and Hasan’s (1976) view of collocation as a form of cohesion, that is to say a linking device that contributes to the overall coherence of a text. Halliday and Hasan (1976) originally made a distinction between grammatical forms of cohesion (reference, ellipsis, substitution, conjunction) and lexical cohesion, which involves explicit and implicit links between lexical signs, including such relations as reiteration, synonymy, complementarity, membership of ordered series or any other systematic lexical relationship, including ‘collocation’:
laugh…joke, blade…sharp, garden…dig […]. In general, any two lexical items having similar patterns of collocation – that is, tending to appear in similar contexts – will generate a cohesive force if they occur in adjacent sentences. (Halliday & Hasan 1976: 285-6)
25Unlike the other approaches to collocation mentioned here, Halliday and Hasan’s definition is very informal and has not been generally taken up outside the SFL approach. Nevertheless, some analysts (Hoey 2005, Siepman 2005 and Gledhill 2009) have recently argued that a ‘textual’ approach to collocation would be a useful corrective to semantic and statistical approaches, which are essentially de-contextualised, and do not account for the role of collocation in on-going discourse. These analysts also point out that the lexical items that are usually involved in cohesive chains are necessarily embedded in lexicogrammatical patterns, whose distribution throughout a text must therefore also contribute to the development of coherence throughout the text (Firth 1957 coined the term ‘colligation’ for this kind of relation). It is possible to observe this kind of development in the sample texts T1 and T2, in particular by examining the typical context of use of the key term embryo in T1 and the equivalent in T2 fetus (later reformulated as placenta). In the Popular Science account (T1) embryo is embedded in three types of cohesive chains which emphasise:
261. the status of the embryo as an Agent or (potentially conscious) Participant (the embryo produces a hormone, [if the embryo were] given a choice, in the embryo’s interests, the embryo...can be entirely oblivious of this conflict, this is just what the embryo does);
272. the relationship between the embryo and the mother as co-dependent Participants (brains of the mother and embryo, nutrition she provides her embryo, the mother bearing it [=the embryo], the genes of her embryo...);
283. the embryo as an (Affected) Participant (the embryo... being stillborn, its own survival, threat to the embryo’s survival).
29These patterns contrast with the Research Article abstract (T2), in which three cohesive chains are formed around the terms fetus and (later in the text) placenta:
301. the fetus (or placenta) as Agent (the attempt by the fetus... to increase its supply of nutrients, the fetus gains direct access to, the placenta is able to release hormones, Placental hormones... manipulate maternal physiology for fetal benefit.);
312. the relationship between the fetus and mother as conflictual Participants (conflict can be said to exist between maternal and fetal genes, fetal actions are opposed by maternal countermeasures, This (fetal) action … is countered by increased maternal production of insulin, the mother is unable to …mount an adequate response to fetal manipulation, poorly nourished fetus);
323. the fetus as Classifier or Circumstance, a location for the activities of cells and genes (genes expressed in fetuses, fetal genes will be selected, fetally derived cells, a similar conflict exists within fetal cells).
33The overall effect in Dan Dennet’s text (T1) is to emphasise the embryo’s viewpoint or predicament and to underline that there is an equilibrium between two opposing but complementary Participants (encapsulated by metaphors such as tug-of-war). This kind of ‘human story’ is perhaps to be expected in a popular science book. In contrast, and in keeping with the conventions of science writing, the research article abstract (T2) de-humanises the fetus by embedding it in complex noun groups, or by transferring Agency to other Participants (such as cells, genes, the placenta). The text is not without drama however: the author David Haig consistently underlines the conflictual nature of the relationship between mother and fetus, and couches this in surprisingly warlike terms (conflict, countermeasures, escalation, invasion, resistance).
34In the previous sections, I have shown that it is possible to analyse various aspects of the sample texts T1 and T2 in terms of phraseological units, lexicalised phrases, collocational pairs and cohesive chains. However, as can be seen in the above analyses, all of these phenomena can be discussed in terms of a more general unit of analysis, namely the ‘lexicogrammatical pattern’. LG patterns have been studied before in the SFL and applied linguistics literature, but as Hunston and Francis (2000) point out, the starting point has usually been that of the lexical item. In this section (following Gledhill 1995, 1997, 2000a and 2000b), I set out an alternative method, which involves the identification of LG patterns on the basis of grammatical items. More recently, the term ‘small word’ has been used to analyse grammatical items in LG patterns (Groom 2005). However I find it preferable to refer to the ‘grammatical sign’, a term which includes not only grammatical (‘small’) items, but also lexicalised function words (such as complex prepositions because of, by dint of, in so far as), grammatical morphemes (including inflections such as the plural, -ing, -ly and more abstract forms, such as tense) and grammatical categories (such as the active sequence N+V in mount a response or the passive N+V in gene expressed, etc.).
35Although there exists a growing body of research on collocational frameworks and other discontinuous patterns involving grammatical items (Renouf & Sinclair 1991, Luzon Marco 2000), the idea that grammatical signs enter into collocational relations is still not generally accepted, as can be seen in the following definition:
collocation, n. A term used in lexicology by some (especially Firthian linguists) to refer to the habitual co-occurrence of individual lexical items […]. Some words have no specific collocational restrictions – grammatical words such as the, of, after, in […]. (Crystal 2008: 86-87)
36Yet there is ample evidence that grammatical signs are involved in collocational patterns. For example in Gledhill (2000a and 2000b), I explored the hypothesis that every text-type, and in particular every sub-section of a research article, has a particular configuration of grammatical items. The first step in this analysis is to establish the statistical distribution of grammatical words (not including morphemes and other grammatical signs) in the sub-sections of 500 research articles (the Pharmaceutical Sciences Corpus, PSC). For example, the first five statistically most ‘salient’ items in each research article sub-section are set out below in Table 1.
Table 1. The distribution of verb forms in the Pharmaceutical Sciences Articles (Gledhill 2000b: 112)
Relative Rank
|
Titles
|
Abstracts
|
Introductions
|
Methods
|
Results
|
Discussions
|
1
|
of
|
but
|
been
|
were
|
no
|
that
|
2
|
for
|
these
|
has
|
was
|
in
|
be
|
3
|
on
|
of
|
have
|
at
|
did
|
may
|
4
|
and
|
there
|
is
|
then
|
not
|
is
|
5
|
in
|
in
|
such
|
for
|
had
|
our
|
37To some extent this kind of analysis simply confirms the findings of previous research on the research article genre (Swales 1990); for example the prevalence of have been in Introductions signals the perfect, the prevalence of was, were in the Methods signals the use of past passive forms, and so on. However, the important point is that these items are not used in isolation, but co-occur with others to form longer lexicogrammatical patterns. The following examples give some idea of how the ‘salient’ grammatical items in Abstracts (lines 11-15) and Discussions (16-20) co-occur in sequences which ultimately represent some of the most typical LG patterns for each of these sub-sections of the research article (here each statistically salient item is indicated in bold):
Abstracts
(11) the mechanism of action of {compound Y} was shown to {empirical process} (nominal expression of findings)
(12) there was a significant increase in toxicity (quantitative report)
(13) It is concluded that propagation did not increase (impersonal expression of quantitative report)
(14) but subjects who receive active management (contrastive results expressed in embedded clause)
(15) both normal and tumor cells (contrastive framework)
Discussions
(16) data suggests that reactive oxygen would be important (projected report of biochemical process)
(17) It is interesting to note that (evaluation of research process)
(18) increasing data does not result in any further enhancement (metaphorical empirical report)
(19) This evidence suggests that (reformulation of previous data and projection of research process)
(20) we have found that (projected report of research process)
38It is important to point out that examples such as these represent prototypical but also productive sequences. This can be shown by using a concordancer (Anthony 2007) to search for discontinuous sequences, as in * of * was *(-ed) to (where * represents a ‘wild-card’, either a whole word or part of a word). This pattern, based on example (11) above, is often found in Abstracts in the PSC, usually in phrases which summarise experimental data. A search of the PSC reveals that two reporting verbs are typically involved in this pattern (find and show), and the subjects of these verbs typically have the structure: Empirical process of Biochemical entity X:
(21) Another neuroprotective activity of cannabimimetics was shown to be associated with the CB1-mediated inhibition of nitric oxide release from rat microglia cells.
(22) In our case the optimum content of acetonitrile was found to vary between 25 and 30%, depending on the column efficiency.
(23) The efficacy of zidovudine was shown to reduce risk of transmission by 66% in the treated group.
(24) The prevalence of restraint was found to be 68% (n=69).
(25) As compared with the non-pregnant women, the sensitivity to the glucose-lowering effect of insulin was found to be reduced 45¯70% in the 3rd trimester
39I have so far examined patterns on the basis of large-scale corpus analysis. But it is also worthwhile examining the specific patterns that emerge in individual texts, and asking to what extent they are related to neighbouring segments of the same text, or to examples of the same register or the general language as a whole. For example, in sample text T2 the first few lines of the text contain a number of LG patterns. In the first sentence Pregnancy has commonly been viewed as a cooperative interaction, the sequence * has * ly been * (-ed, -en) as * occurs 16 times in the BNC and 56 times in the PSC. This clearly suggests that this is a significant LG pattern in science writing. Even in the BNC, the kinds of (cognitive, communicative, reporting) verbs used in this pattern correspond closely to the wording adopted in T2, and their context of use is typically that of ‘academic exposition’:
(26) May Sinclair has frequently been described as shy and scholarly.
(27) Providing support has previously been identified as a key aspect of the district nurses role in palliative care
(28) This expansion of the role of the state has variously been interpreted as a functional response to old age incapacity,
(29) Indeed, iconoclasm has frequently been portrayed as little more than mindless vandalism perpetrated by
(30) Coeliac disease has traditionally been regarded as a disorder of childhood and early adult life.
40A similar pattern emerges in the second sentence of text T2, The effects of natural selection on genes expressed in fetuses. The framework * the * s of * on * s is highly productive in the BNC and the PSC, and in both cases the noun effects emerges as a pivotal lexical item (20 out of 74 sequences in the BNC; in the PSC all 25 examples). However, whereas text T2 presents effects as a Theme/Subject, in both the BNC and PSC, the noun effects is usually embedded in a complex noun group as a qualifier, or as a complement of a Research-oriented process (concentrate on, examine, investigate, questions about, work on, here marked in italics):
(31) Thus the scarce research work on the effects of participation on effectiveness is further limited by its inability clearly to
(32) so too, on the other hand, do questions about the effects of adrenalin on sitters in an examination room, or family genetics on the
(33) Based on the expectation that cellular functions would be adversely affected by such increased steroid levels, most research has concentrated on the effects of glucocorticoids on lens metabolism and ion levels.
(34) Therefore, we undertook a series of studies that examined the effects of cannabinoids on noxious stimulation-evoked activity in nociceptive spinal and thalamic neurons
(35) Our study is novel with respect to investigate the effects of erythromycin on LPS-induced preterm labor model in rats
41A final, rather curious, example can be found in the third sentence in T2 In this sense, a genetic conflict can be said to exist. This is in fact similar to the pattern that we have already seen in research article abstracts (examples 11 and 21-25). The difference here is that the phrase used in T2 involves a modal verb rather than the past tense (can be *(ed) to *) and the clause is introduced by a textual marker In this sense. From a functional point of view, the phrase In this sense signals that a previous discourse referent is to be reformulated by an explicit evaluation (above, marked in italics), a similar discourse function observed in the PSC examples (11, 21-25). Interestingly, if we look for this framework in the BNC, a very similar pattern emerges:
(36) In this sense, Keynes’s General Theory may well be regarded as self-defeating in terms of its impact on political economy.
(37) In this sense, the definition of standards and routines can be seen as a defensive process: the housewife is defending herself against the allegation that she does nothing at all.
(38) In this sense, the placement in industry will not be viewed as an end in itself but as an essential ingredient in the process of change...
(39) [...] secondary-school pupils in China have only a limited opportunity to go into higher education and in this sense, those who do make it can be viewed as a privileged elite.
(40) In this sense, this strategy may be viewed as a tactic employed in the air quality management strategy to meet air quality standards.
42The examples (36-40) all involve the same phrase In this sense as in T2 as well the same verb (or a with a similar meaning: see, regard, view). The main difference with T2 however is that the verb introduces a Complement (after as). Generally speaking, therefore, it can be seen here that the wording used in text T2 does not quite correspond to the ‘prevailing phraseology’, that is to say the typical LG patterns to be found either in the BNC or the PSC. Rather, in text T2 the author appears to have created a hybrid construction, which exploits in the first instance a pattern from the BNC (albeit in an academic register), and then reverts to a pattern that is more generally found in the PSC. This example shows that what looks like a novel creation may in fact involve the seamless joining of two regular patterns from the general discourse of academic and/or scientific writing.
43In this paper I have argued that the lexicogrammatical pattern should be seen as a fundamental unit of analysis in text analysis, and is perhaps a more useful unit of analysis for the purposes of ESP than such notions as ‘phraseological unit’, ‘collocational pair’ and the like. I have based my observations here on Halliday’s theory of ‘lexicogrammar’, which in this paper I conceive as a system of choices for the creation of meanings, with each choice corresponding to a cascade of lexical and grammatical features associated with a particular register or discourse function. By ‘cascade’ I am highlighting the fact that any choice of expression inevitably leads to a further set of choices and associated expressions, with the result that stretches of speech appear to be at the same time pre-constructed and coherent, but also highly varied and productive.
44If the notion of lexicogrammar is such a useful concept, what are we to make of phraseology and collocation? In fact, these terms present different perspectives on the same object of enquiry. I have argued in this paper that, from an SFL perspective, it would be useful to view phraseology in terms of the diachronic process of lexicalisation. This view, as mentioned above, has the advantage of avoiding any artificial distinction between ‘idiomatic’ and ‘normal’ forms of expression. In addition, if a distinction has to be made between idiomatic expressions and other types of phrase, it is perhaps better to conceive of this in terms of rhetorical effect, a perspective that I have discussed elsewhere (Gledhill 2008). Finally, the SFL perspective on collocation is that it is a semantic concept, which refers to the dependent relationship between a sign and its habitual context of use. Although many linguists prefer to analyse collocations on the basis of large-scale corpus analysis, I have argued here (and elsewhere, Gledhill 2009) that it would also be useful, from an ESP perspective at least, to look at how collocation operates in individual texts. In this case, it is worth seeing collocation, as originally proposed by Halliday and Hasan (1976), as a form of textual cohesion.