Navigazione – Mappa del sito

HomeNumeri66Auditory objects as higher-order ...

Auditory objects as higher-order objects

Vincenzo Santarcangelo
p. 8-21


In questo articolo identifico in primo luogo le aree di confusione concettuale sul concetto di auditory object, molto diffuso nella letteratura scientifica, ma ancora poco esplorato per quanto riguarda le sue implicazioni filosofico. Si procede in seguito con una disamina della letteratura neuroscientifica, psicologica e filosofica per proporre una nuova definizione di auditory object basata i) sulla relazione dei suoni – intesi come entità fisiche – con il sistema uditivo che li processa; ii) sulla relazione tra entità che si danno prevalentemente nel tempo (i suoni) con oggetti spaziotemporali (le fonti); iii) sulla relazione tra oggetti di ordine superiori – nei termini di Meinong – e inferiora.

Torna su

Testo integrale

1The purpose of this paper is to propose a tentative definition of auditory objects based on a clear distinction between sounds, i.e. entities generally referred to as acoustic waves generated by physical bodies (or sound sources), and auditory objects, i.e. entities conceived as perceptual constructs or mental representations of a sound or a series of sounds effectively perceived by a hearer.

2The concept of auditory object has recently attracted much attention among neuroscientists and neuropsychologists. According to Griffiths and Warren (2004), the main questions raised by this notion can be summarized as follows: how do our senses treat environmental stimuli and form object-like representations that are specific for different objects and different senses, yet remain stable for the same objects in time and space? Secondly, can perceptual objects in general be considered as mental representations based on generic mechanisms that analyze and process sensory information conveyed by our perceptual systems?

3The concept of a perceptual object makes intuitive sense in vision, but rapidly becomes counter-intuitive when it comes to audition, touch and other senses, to the point that the «most pressing issue concerns the validity of visual analogies for auditory object processing mechanisms» (Griffiths and Warren 2004: 892). Furthermore, the very fact that these authors use the term object instead of, for instance, stream (Bregman 1990), or simply sound, raises immediately some philosophical issues.

4Nevertheless, terminological issues aside, it is a given that neuroscientists increasingly refer to auditory objects. It still remains to be clarified: 1) what properties these objects should possess; 2) if and how they might be represented in the brain; and 3) how they might relate to the more familiar objects of vision.

1. Preliminary remarks

5Let’s begin with the last point. Whereas sounds, just like bodies, need space and are causally connected with the physical entities (i.e. sound sources) generating them, auditory objects, to rephrase Strawson (1959), could – at least from a metaphysical point of view – be conceived as being a-spatial, while still somewhat depending on spatial entities. In line with most definitions borrowed from neuroscientific literature – especially those provided by Bizley and Cohen (2013) and Denham and Winkler (2015) – my definition of auditory object therefore proposes to distinguish between the following:

  1. Sound source: a material object existing in the space-time, i.e. a sound source (e.g. a coffee-maker);

  2. Sound: a physical entity (a vibration) existing in the space-time, produced by the interaction (i.e. the collision) of material objects and propagating through a transmission medium such as a gas, liquid or solid (e.g. the hiss of the coffee-maker);

  3. Auditory object: a perceptual object, i.e. an object as perceived, a mental representation of the sound, conceived as an object segregated from the overall auditory scene and existing in time but not in space (e.g. the output of my auditory system’s analysis of the hiss). Rephrasing some of the neuroscientific descriptions of auditory objects in the so-called Graz and Berlin Schools of Psychology’s jargon (with a particular reference to Meinong and Benussi’s works), I will define auditory objects as «higher-order objects».

6The sound source, of course, is not identical with the auditory object. The sound source allows, as a rule, a multimodal access to its features, insofar it is an object bound to several perceptual modalities. For example, you can see a coffee-machine, hear it, smell it, or touch it. The real object, however, can generate a sound, say a hiss, that can itself be represented by a hearer’s auditory system as an auditory object. This auditory object is obviously connected to the sound source – the coffee-machine – but it is not identical to it. Auditory objects are not identical to sounds, either. For example, unlike the sounds they (may) refer to, which are transient and ephemeral, auditory objects (as the representations of perceived objects) are characterised by stable perceptual boundaries on the axis – as we will see – of time and pitch dimensions.

2. Objecthood

7In neuroscientific literature (see Santarcangelo 2015 for a review), the word «object» is often used with no regard to its philosophical implications. Furthermore, the debate about intentionality, presentation, and objects (Gegenstände) held among Brentano’s disciples – temporally and geographically circumscribable (for an overview, see Smith 1994) – has been largely disregarded.

8Discussion of auditory objects and accounts of their nature and perception is new among philosophers (for significant exceptions see O’Callaghan 2008; Bullot and Egré, 2010; Nudds 2010), but seems «ripe for philosophical contributions» (O’Callaghan 2009). Paradoxically, the very concept of object needs further clarification. Is the auditory object an intentional object or a material object? Is it an objective entity – could it exist even if no subject perceived it? Is it an individual or a type; a concrete, re-identificable entity or an abstract object, a state or a property?

9As we will see, although the definition of auditory objects (see, for example, O’Callaghan 2008; Nudds 2010) and sounds (for a review see O’Callaghan 2009) is matter of discussion in the philosophical area, most thinkers tend to agree on the metaphysical nature of the admissible content of auditory perception: auditory objects are largely taken to be distinct particular perceptual units (in metaphysical terms, re-identifiable individuals or objective particulars) extended in time rather than in space, and recognized primarily in virtue of their temporal features.

10The intrinsic temporal nature of sounds has been metaphysically stressed out by Strawson (1959), who imagined a hypothetical No-Space world in which sounds are the only objects of sense-experience. The questions raised by Strawson are the following: could we conceive of a being whose experience was purely auditory? Could such a being whose experience is purely auditory have a conceptual scheme that provided for objective particulars? Could this being be able to identify and re-identify particular entities in a world with no spatial references and relations?

11Strawson’s thought experiment and its consequences for philosophy of perception (for an in-depth analysis of the Strawson’s thought experiment, see Santarcangelo and Terrone 2015), still echo in the contemporary debate about sound perception – at least implicitly. Take, for instance, the abstract versus concrete object issue. When discussing the matter, the term abstract seems to be used sometimes as an equivalent of a-spatial, and some other times as an equivalent of universal. A vague distinction is offered, for example, in Wightman and Jenison (1995), where concrete auditory objects, constituted by sounds emitted by real objects in the environment (i.e. the instruments of an orchestra) are distinguished from abstract auditory objects, which do not often correspond to real environmental objects (i.e. a melody, a symphony, or the mixture of sounds produced by the orchestra).

12As we said, most psychologists and neuroscientists tend to speak of auditory objects in order to draw analogies with visual perception, whereas in the philosophical debate the notion of sounds as event-like individuals seems to be the prevailing one (see, for instance, Casati and Dokic 1994, 2005; Scruton 1997; O’Callaghan 2007; Matthen 2010). The only two philosophers who speak of auditory objects are Nudds (2010) and O’Callaghan (2008). Let us see whether their arguments can help clarify the notion.

13According to Nudds (2010), auditory objects are sequences of discrete (countable) or non-discrete (uncountable) sounds: «an auditory object is a sequence of sounds which are such that they are normally experienced as having been produced by a single source» (Nudds 2010: 22). Not all the sequences of sounds are auditory objects: the peculiarity of auditory objects lies in the very fact of experiencing the sequences as grouped. Why do we hear some sequences as grouped (on this topic, see Martina, Voltolini, this volume, pp. 22-46)? Why, in other terms, do some sequences of sounds constitute auditory objects? Rather than answering these questions in a pure auditorily way, using as explananda the principles of Gestalt psychology (as formulated, for example, in Bregman 1990), Nudds addresses the question in terms of the function of the auditory system in ecological environments.

14The goal of the auditory system is to recognize the sources of sounds and to perceive their properties. Via a three-stage process (which includes the sensory detection or transduction, the grouping process, and the extraction of the information from the frequency component groupings), the auditory system represents sounds and auditory objects that correspond to their sources: «As a result our experience represents both sounds and the sources of sounds, and we normally experience sounds that correspond to their sources» (Nudds 2010: 24). Experiencing auditory objects, thus, means experiencing a sequence of sounds as produced by the same source. Nudds’ position appears to be opposite to Scruton’s theory of sounds as pure events, according to which streaming processes attribute to sounds «an identity distinct from any process in their source and involve the creation of a world of coherent sounds, rather than a world of coherent spatio-temporal objects» (Scruton 2009: 58). On the other hand, Nudds’ definition shares some similarities with Alain and Arnott’s one (2000). However, in the latter, a clear distinction is made between the perceptual construct (the auditory object) and the sound source and the sounds it emits. An auditory object is rather a «mental description of a sound source in the environment» (Alain and Arnott 2000: 202), whereas the sound source is the physical entity that generates an acoustic wave, and a sound (although not clearly stated) is simpliciter the acoustic wave. In this regard, I wish to recall here a further distinction that is made by the authors: that between auditory object and auditory event. While the former refers to a perception of a sound source and its behaviour over time, the latter is used when referring to the perceptual dimension of hearing a sound that is occurring at a particular time, in a particular space and having particular attributes (e.g., intensity, duration, timbre). The predominance of the temporal features on the spatial ones in Alain and Arnott’s definition of auditory objects, as opposed to auditory events, highlights the «current dilemma for researchers whether an auditory object is understood in terms of temporal dimensions (making it more like an auditory stream) or whether an auditory object is understood independently of time (making it more like an auditory event)» (Plack 2010: 189).

15O’Callaghan (2008) has been the first who proposed a notion of auditory object stronger than just that of an intentional object of audition. Objects are central components to our perceptual schemes; again, this conviction comes primarily from the comparison with vision:

Given the prominence of objects in visual perception, it is tempting to think that all perceiving concerns objects, their features, and their arrangement. Audition, touch, olfaction, and gustation thus may follow the model of vision’s organization, character, and function. According to this line of thought, the various sense modalities involve phenomenologically distinctive ways of becoming acquainted with objects. (O’Callaghan 2008: 803-804).

16However, according to the philosopher, we should not overstress this comparison. We cannot assume that the structure and function of other sense modalities mirror that of vision: «Doing so risks neglecting the diversity that is most striking about experience across the modalities» (O’Callaghan 2008: 805). For O’Callaghan, auditory experience presents sounds as independent from ordinary material things in a different way from visual and tactual features. Nevertheless, he notes, researchers have recently extended discussions of object perception beyond vision, to other modalities. Auditory perception of objects, in particular, has come into focus. However, as said above, the objects in question are not ordinary material objects, but auditory objects. O’Callaghan defines auditory objects as mereologically complex individuals that persist through time. Because of their unity and continuity, they should be called objects just like the visual ones, but what makes them different from material objects is that the mereology according to which they are perceptually individuated and identified is primarily temporal rather than spatial. Vision deals with objects based primarily on space: we identify and individuate objects by means of spatial criteria, like continuity, rather than qualities like shape and colour. We can individuate an object by separating it from the background even without the latter information, insofar as its spatial characteristics are non-ambiguous. When it comes to audition, on the other hand, space seems to be less important. Auditory processes, in fact, determine the organization of streams in time according to principles that parallel how vision determines the constitution and arrangement of objects in space. Notably, the way in which we are able to distinguish an auditory object from another is by means of its temporal succession: the object cde is different from edc. A sequence of notes, say cccdee, could be segmented differently (say, cccd + eee or ccc + deee) depending on the pitch distance between the single notes. The very use of the term distance, here, suggests that the pitch might be taken to play in audition the role that space plays in vision. In audition, therefore, it is time that defines the auditory object as such, as an individual with its own structure and composition. In fact, just as visual objects appear to fill space and to have spatial parts and boundaries, audible individuals appear to occupy time and to possess temporal parts and boundaries. Pitch, on the other hand, makes it possible to distinguish between different auditory objects and has a relational role – in vision, this function is fulfilled by spatial location.

Just as different visible individuals have different spatial locations, and just as (all else equal) difference in location suffices for different visual individuals, different auditory individuals have different locations in pitch space, and (all else equal) difference in pitch suffices for difference of auditory individuals (O’Callaghan 2008: 824).

17Basically, space is all that is needed for vision to individuate its objects. When it comes to audition, however, we have time and a surrogate of space: that is, pitch. This is in line with Strawson’s experiment for which auditory objects, insofar as they claim objective reality, cannot do entirely without space but need an analogue of space – that is, in Strawson’s jargon, the master sound, something very similar to the concept of an absolute pitch.

18According to O’Callaghan, comparison between audition and vision is illuminating. Whether through time or space, perception seems to identify individual objects. Audition, like vision, assigns discernible individual elements or parts to unified but complex perceptible individuals. In the case of auditory objects, as we have seen, the allocation is temporal (sequential) and not spatial. Auditory scene analysis leads one to think that one primary task of audition is to carve up the acoustic scene into discrete sounds, each of which may possess its own pitch, timbre, and loudness.

19There are very good commonsensical reasons to think that what we hear are individual objects. Take, again, a melody. We hear one note after the other just like, say, in a room we see one table, then one chair, then another chair and so forth. We can say what note came first, in a temporal sense, just like we can say how many chairs there are, or where they are located in the room, in a spatial sense. Also, O’Callaghan tells us, auditory objects have properties: a given melody (but also, say, the sound of a knock on the door) has features such as timbre, volume, pitch and so forth. So if, as some have thought, the things that bear properties are all individuals, then this shows that sounds must be individuals too. Auditory objects should therefore be taken to be individuals bearing properties (and not, as some have suggested, sensible qualities, see Cohen 2010). This is also proven by the common experience that the very same auditory object might have different features across time – it may be louder at some point and softer at another – without changing its identity. We still recognize it as an individual, despite its changes. Basically, it seems that both visual and auditory perception work by isolating a unity, an object of perception. Both perceptual forms are organized so that perceiving objects involves individuating and tracking mereologically complex individuals. However, whereas vision tracks, in space, individuals that tend to coincide with material, three-dimensional objects; audition tracks, in time, individuals which are continuous or coherent collections of temporally bounded tones and sounds.

20Auditory objects, as we have characterized them so far, are individual perceptual objects temporally – rather than spatially – bounded. Auditory objects are certainly different from objects of vision, which not only seem to have rich internal spatial structure, but also are individuated in terms of inherent spatial features. As we have seen, while vision’s objects possess a spatial mereology and are individuated and tracked in terms of spatial features, audition’s objects have a temporal mereology and are individuated and tracked in terms of both pitch and temporal characteristics. So, it is clear enough that space has in vision greater importance than in audition. But could auditory objects be conceived of as totally independent of space, or does space should be considered, albeit playing a different, and probably less significant role than in vision?

21In the scientific worldview, space and time are taken to be the inextricable dimensions of a unitary structure called space-time. From a scientific point of view, everything that exists in time also exists in space. There are no instants or intervals in time detached from spatial locations. In psychology, it was natural for the first scientists who studied waves to adopt the spatiotemporal mapping. Although differences between light and sound became clear in the twentieth century, «spatiotemporal mapping has been retained» (Kubovy and Van Valkenburg 2001: 115), and audition phenomena such as auditory segregation and dichotic listening have been considered as spatial in nature.

22Space and time are closely intertwined also in the ordinary worldview. The bias against the idea of non-visual objects is embedded in folk ontology. Language itself may lead us to believe that objects are visible for nature. We ascribe reality, and therefore objecthood, primarily to things such as inorganic substances, living organisms and concrete artefacts – that is, things that individually exist in space-time. However, the ordinary worldview seems to have room also for individual entities whose existence takes place in time but not in space. Think for example at entities created within social and cultural practices: things like promises, for instance, do not seem to exist in space. In this respect, Rohrbaugh (2003: 200) defines historical individuals as being «in time but not in space» such as, for example, linguistic texts or musical compositions. Thomasson (1999: 36) describes fictional characters as «abstract artefacts» that «lack a spatiotemporal location», but specifies that their existence can have a beginning and an end in time. Smith (2003: 23) calls «freestanding Y terms» objects like symphonies, debts or corporations, which can exist independently of a particular spatial embodiment, and observes that «a symphony (as contrasted with the performance of a symphony) is not a token physical entity at all; rather – like a debt, or a corporation – it is a special type of abstract formation (an abstract formation with a beginning, and perhaps an ending, in time)».

23As per what is of interest here, it is important to note that sounds too can be taken by common-sense as existing in time only. If science didn’t tell us otherwise, we wouldn’t naturally picture sounds as space-located waves, at least from a phenomenological point of view. To put it with Nudds (2010: 122) «we don’t experience sounds as having a spatial location independently of their sources having a spatial location, nor do we experience sounds as having spatial parts». In conclusion, the preliminary analysis carried out in the first sections of this paper seems to indicate that auditory objects are primarily defined by their temporal boundaries: auditory objects are individuated in terms of pitch and temporal features.

3. Higher-order objects and the notion of production

24As we said, Nudds (2010) makes explicit reference to the processes of auditory grouping described by Bregman (1990) – the heuristics that our auditory system uses in order to represents sounds and auditory objects that correspond to their sources. However, in Bregman’s implicit ontology there is no room for objects. Sound is created when things of various types happen. Acoustic information, therefore, tells us about physical happenings; some of them go on at the same time in the world, each one a distinct event. Bregman refers to the perceptual unit that represents a single happening as auditory stream.

Why not just call it a sound? […] a physical happening (and correspondingly its mental representation can incorporate more than one sound, just as visual object can have more than one region […] By coining a new word, “stream” we are free to load it with whatever theoretical properties seem appropriate (Bregman 1990: 10)

25As for what interests us most here, the notion of auditory object can be considered something very similar to Bregman’s stream, a word that neuroscientists are free to load often with vague theoretical properties borrowed from analogies with visual objects. So far, an object has been taken to be an object-like representation or the output of an acoustic experience (Griffiths and Warren 2004), a perceptual representation (Denham and Winkler 2015), a computational result of the auditory system performances (Bizley and Cohen 2013), or that entity which is susceptible to figure-ground segregation (Kubovy and Van Valkenburg 2001).

26Kubovy’s definition, in particular, is grounded in the so-called Theory of Indispensable Attribute (Kubovy 1981), which provides a heuristic argument that indicate the conditions under which a perceptual aggregate preserves the individuality of its elements. The theory specifies what features should a stimulus have to ensure that our perception of the surrounding environment results in the perception of individuals x, y or z segregated as single entities or objects – that is, a certain number of discrete objects rather than one indistinct scene without discontinuities. An attribute is defined essential if and only if it is a necessary requirement in order to perceive an environment as being composed of several elements (entities). These attributes are of course different for vision and for hearing: some kind of spatial discontinuity is a necessary prerequisite for us to see something as consisting of two (or more) elements rather than as a whole; some kind of discrimination related to the pitch of a sound is an indispensable attribute to identify – and distinguish – one sound from another. Even if the obvious reference, here, is the law of figure/ground segregation (Wertheimer 1923), no reference to Brentano’s disciples legacy is made, in order to clarify what kind of perceptual object is an auditory (or an audio-visual) object.

27All members of the Austrian Gestalt tradition of Meinong, Witasek and Benussi (the so-called Graz School, as opposed to the Berlin School co-founded by Wertheimer) share a two-storey conception of experience according to which experienced objects are partitioned into objects of lower and higher order. Lower order objects are, for example, colours or tones, which are immediately given in sensation. Higher order objects are, for example, shapes and melodies, that is the result of complex organized wholes which are founded on the former and require different psychic acts to be comprehended by consciousness.

28The notion of higher order objects (Gegenstände höher Ordnung) was borrowed by Meinong from Fechner (1876), and is a consequence of his distinction between psychical act, content (Inhalt) and object (Gegenstand). Only objects can possess an ideal nature, whereas the content is always real. Higher-order objects are objects founded on the objects of passive perceptions. Founded objects are ideal objects: typical examples are melodies or the similarity between two colours.

Four nuts thrown on the table one after the other form a quadrilateral, the shape of which depends on the positions of the nuts. Slightly moving just one of them changes more than one property of the quadrilateral. The quadrilateral is therefore non-indipendent of the position of the nuts […] The quadrilateral is a higher-order object; the four nuts are its ‘inferiora’; the ‘superiora’ depend on the ‘inferiora’: a ‘superius’ without ‘inferiora’ is not possible, but the reverse is not the case (Bozzi 1996: 286).

29The question is what takes place, at a correspondent psychological level of content, when a higher-order object is considered as founded? What are the melodic perception characteristics, at the true level of psychological content, where a melody is seen as a founded, i.e. an ideal, object? This is solved by the Vorstellungsproduktion theory, developed by Ameseder and Benussi under Meinong’s supervision. What production does in the ambit of content, so foundation does in that of objects. Geometrical shapes, velocities and distances or, in other words, higher-order objects, cannot be perceived but need to be appreciated thanks to higher-order intellectual acts, reflecting the divisions between those elements that exist and those that subsist which permeate Meinong’s thought. Thus, the experience of higher-order objects that include facts or states of affairs, is an amalgamation of existents and subsistents at one and the same time. An example of this is when experiencing a sequence of tones. A degree of intellectual application permits us to hear that the sequence in question is divided into phrasal clusters that change from one way to another. Unchanging founding elements result in different Gestalt qualities, at different times and, occasionally, to such a degree that the resultant qualitative experiences alternate in an uncontrollable way. The Gestalt tradition is one where the changing of Gestalten is a generally recognized phenomenon. However, that Benussi (1904) was the first to subject it to a detailed treatment is less widely accepted. In Benussi’s view, the main concern focused on the ambiguity of Gestalt objects rather than their ideal nature and he considered that, in purely sensory phenomena, this ambiguity did not exist. He arrived at the conclusion that there must be a particular form of non-sensory mental process or additional brain level, that will explain the occasionally both complex and subtle resolutions of Gestalt ambiguity on a case by case basis. This conclusion was reached by the consideration that differing Gestalt qualities can be achieved despite using the same conditions of stimulus. Gestalt perception, in Benussi’s view, requires both a specific intellectual act as well as sense activity. The same inferiora can result in the founding of different Gestalten – these inferiora being both stimulus and conscious content.

The advancement through time of a melodic line which can be only abstractly decomposed into notes and time relations among notes consists in the progressive appearance of an object which is already in itself complete, like the progressive development of a landscape seen from the window of a train; which is landscape even before we have seen it, as various as it could reveal itself in time (Bozzi 1996: 302).

4. Completeness in time

30It follows from Strawson’s experiment that, although we cannot do without space or something that serves its functions, we can still conceive of existence outside of space, in time only. Now, even though, as pointed out earlier, Strawson himself did not engage in this kind of definition, I think his thought experiment plus the results stemming from the psychological debate on the nature of intentional objects can be used as a successful demonstration of the fact that auditory objects are mind-dependent temporal individuals or Meinongian higher order objects that may link more individual sounds or in a coherent and meaningful sequence or perceptual unity.

31For sake of simplicity, by sounds I here mean sounds as they are understood by classical physics (i.e. acoustic waves). My definition could hold for any possible metaphysical characterization of sound. In fact, what really matters to the definition I am proposing is that auditory objects are taken to be higher order entities compared to sounds, no matter what we agree that a sound is.

32In ordinary experience, auditory objects depend on the existence of sounds (as in Strawson’s account, purely temporal individuals need a notion of spatial bodies, or at least something like a spatial bodies) but do not amount to sounds. Auditory objects are sounds as we perceive them. To give an example, if I hear someone walking, there will be many distinct sounds corresponding to each step. However, my brain will perceive them as a sequence of sounds: a series of steps, as a perceptual unit. The latter, as opposed to the former, is the auditory object of my perception. In cases of single sounds (e.g. a single step), the auditory object corresponds to the sound of that step. However, whereas the sound of the step coexists in the auditory scene with many other (perceptible and imperceptible) sounds, the auditory object is the outcome of my brain segregating it from the background.

33In this respect, my proposal shares significant similarities with the definitions offered, for instance, by Denham and Winkler, according to which an auditory object is a perceptual representation of a possible sound source, derived from regularities in the sensory input that has temporal persistence and can link events separated in time. This representation forms a separable unit that generalises across natural variations in the sounds and generates expectations of parts of the object not yet available. The fact that this definition speaks of a possible sound source fits with the conceptual separation I’m proposing between sound sources, sounds and auditory objects. For instance, if I hear a buzz in my ears, or even if I have an auditory hallucination, what I perceive is in fact an auditory object even though there is no corresponding sound, nor any corresponding sound source. Also, the representation forms a separable unit that generalises across natural variations in the sounds and generates expectations of parts of the object not yet available. Auditory objects are thus characterized by being such as to generate expectations in the subject, which regard sounds that are not yet available. The mind-dependency of auditory objects is therefore strongly stressed.

34Another definition that is very much in line with mine is that offered by Bizley and Cohen (2013), from whom I borrowed my coffee-machine example. An auditory object is the computational result of the auditory system’s capacity to detect, extract, segregate, and group spectrotemporal regularities in the acoustic environment. Thus, the authors define an auditory object as a perceptual construct, corresponding to the sound (e.g. the hiss) that can be assigned to a particular source (e.g. the coffee machine). The auditory object is considered to be the outcome of the subject’s auditory system and, in particular, its capacity to detect, extract, segregate, and group spectrotemporal regularities in the acoustic environment. In this sense, once again, the auditory object is a mind-dependent temporal individual that can (or cannot) be assigned to a particular source. Auditory objects are mind-dependent as per the organization and segregation from the auditory scene, but dependent on a mind-independent object (the sound) as per its properties (pitch, loudness etc.).


35So far, sounds, sound streams and auditory objects have been often used as synonyms, especially in neuroscientific literature. There is no a clear, unitary definition of auditory object available yet, as far as science is concerned. On the other hand, when it comes to the philosophical landscape, most thinkers tend to regard sounds as event-like individuals. For this reason, at present there are only two philosophical definitions of auditory object, which I briefly sketched in the first sections of this paper. However, they limit themselves to:

  1. distinguishing between single sounds and sequences of (discrete or non-discrete) sounds (Nudds 2010);

  2. characterizing auditory objects as a generic perceptual object, in order to highlight the analogies between objects of vision and objects of audition (O’Callaghan 2008);

  3. underlining the importance of the sound source-auditory object relationship.

36In order to trace a clearer definition of auditory object, I therefore proceeded by exclusions: I first chose the definition of object over that of event, then the definition (in most cases) of perceptual object over those of material object and intentional object. Finally, based on the available literature, I defined auditory objects as temporally bounded individuals.

37I mentioned Strawson’s No-Space experiment to show that auditory objects can exist outside of space and still have, indeed, objective reality as long as they depend on spatial entities (in my view, sounds). If this condition fails, however, then auditory objects are fully mind-dependent. It is worth noting that in ordinary cases (that is, in cases of veridical perception as opposed to illusions and hallucinations), the objects of perception appear to be hybrid cases of intentional objects whose properties depend to varying degrees on the properties of mind-independent objects and sources: in normal situations, we take the experience of perceptual objects whether visual, auditory, or multimodal, to depend in a reliable way on a mind-independent object (such as an external and observer-independent light or sound source). This definition is consistent with the psychological and neuroscientific literature according to which auditory objects are considered primarily as constructions of the brain, and in particular as entities processed in primary or higher-order cortex or even at earlier stages of the auditory system. As seen by means of Strawson’s No-Space experiment, however, these mind-dependent constructions can (and in most cases do) have objective reality as long as they depend on spatial entities (i.e. sounds, which in turn depend on sound-sources).

Torna su


Alain, C., Arnott, S.R.

– 2000, Selectively attending to auditory objects, “Frontiers in Bioscience”, 5: D202-D212.

Benussi, V.

– 1904, Zur Psychologie des Gestalterfassens (Die Müller-Lyersche Figur), in Meinong (ed.) 1904: 303-448.

Bizley, J.K., Cohen, Y.E.

– 2013, The what, where and how of auditory-object perception, “Nature Reviews Neuroscience”, 14, 10, 693-707.

Bozzi, P.

– 1996, Higher-order objects, in L. Albertazzi et al. (eds), The School of Franz Brentano, Dordrecht, Kluver Academic Publishers: 285-304.

Bregman, A.S.

– 1990, Auditory Scene Analysis, Cambridge (Mass.), The Mit Press.

Bullot, N.J., Egré, P.

– 2010, Editorial: Objects and sound perception, “The Review of Philosophy and Psychology”, 1, 1: 5-17.

Casati, R., Dokic, J.

– 1994, La philosophie du son, Nîmes, Chambon.

– 2005, Sounds, “The Stanford Encyclopedia of Philosophy” (Fall 2014 edition), Zalta, E.N. (ed.),

Cohen, J.

– 2010, Sounds and temporality, “Oxford Studies in Metaphysics”, 5: 303-320.

Denham, S., Winkler, I.

– 2015, Auditory perceptual organization, in Wagemans, J. (ed.), The Oxford Handbook of perceptual organization, Oxford, Oxford University Press: 601-620.

Fechner, G.T.

– 1876, Vorschule der Aesthetik, Leipzig, Breitköpf & Haärtel.

Griffiths, T.D., Warren. J.D.

– 2004, What is an auditory object, “Nature Review Neuroscience”, 5: 887-892.

Kubovy, M.

– 1981, Concurrent-pitch segregation and the theory of indispensable attributes, in M. Kubovy, J. Pomerantz (eds), Perceptual Organization, Hillsdale (NJ), Lawrence Erlbaum: 55-99.

Kubovy, M., Van Valkenburg, D.

– 2001, Auditory and Visual Objects, “Cognition”, 80: 97-126.

Matthen, M.

– 2010, On the diversity of auditory objects, “Review of Philosophy and Psychology”, 1: 63-89.

Nudds, M.

– 2010, What are auditory objects?, “Review of Philosophy and Psychology”, 1: 105-122.

O’Callaghan, C.

– 2007, Sounds. A Philosophical Theory, Oxford, Oxford University Press.

– 2008, Object perception: vision and audition, “Philosophy Compass”, 3.

– 2009, Auditory perception, “The Stanford Encyclopedia of Philosophy” (Winter 2016 Edition), Edward N. Zalta (ed.), <>.

Plack, C.J. (ed.)

– 2010, The Sense of Hearing, London - New York, Psychology Press.

Rohrbaugh, G.

– 2003, Artworks as historical individuals, “European Journal of Philosophy”, 11, 2: 177-205.

Santarcangelo, V.

– 2015, Auditory Objects: A New Way to Define Old Things, Doctoral Dissertation, University of Turin.

Santarcangelo, V., Terrone, E.

– 2015, Sounds and other denizens of time, “The Monist”, 98, 2: 168-180.

Scruton, R.

– 1997, The Aesthetics of Music, Oxford, Clarendon Press.

– 2009, Sounds as secondary objects and pure events, in Nudds, M., O’Callaghan, C. (eds), Sound and Perception: New Philosophical Essays, Oxford, Oxford University Press: 50-68.

Smith, B.

– 1994, Austrian Philosophy. The Legacy of Franz Brentano, Chicago and LaSalle, Open Court.

– 2003, John Searle: From speech acts to social reality, in B. Smith (ed.), John Searle, Cambridge University Press: 1-33.

Strawson, P.F.

– 1959, Individuals. An Essay in Descriptive Metaphysics, London, Methuen.

Thomasson, A.

– 1999, Fiction and Metaphysics, Cambridge, Cambridge University Press.

Wertheimer, M.

– 1923, Untersuchungen zur Lehre von der Gestalt II, “Psycologische Forschung”, 4: 301‑350.

Wightman, F., Jenison, R.

– 1995, Auditory spatial layout, in W. Epstein, S.J. Rogers (eds), Handbook of Perception and Cognition, New York, Accademic Press: 365-399.

Torna su

Per citare questo articolo

Notizia bibliografica

Vincenzo Santarcangelo, «Auditory objects as higher-order objects»Rivista di estetica, 66 | 2017, 8-21.

Notizia bibliografica digitale

Vincenzo Santarcangelo, «Auditory objects as higher-order objects»Rivista di estetica [Online], 66 | 2017, online dal 01 décembre 2017, consultato il 14 juin 2024. URL:; DOI:

Torna su


Vincenzo Santarcangelo

Articoli dello stesso autore

Torna su

Diritti d’autore


Solamente il testo è utilizzabile con licenza CC BY-NC-ND 4.0. Salvo diversa indicazione, per tutti agli altri elementi (illustrazioni, allegati importati) la copia non è autorizzata ("Tutti i diritti riservati").

Torna su
Cerca su OpenEdition Search

Sarai reindirizzato su OpenEdition Search