- 1 This research was supported by the project “Average-Transaction Costs and Risk Management during t (...)
1This article introduces the Italian side of the AveTransRisk database and reflects generally on the process of creating a large online database of historical data in order to draw lessons for future scholars. The AveTransRisk database was recently published by the ERC project “Average-Transaction Costs and Risk Management during the First Globalization” (AveTransRisk), based at the University of Exeter (UK), and captures a wealth of data on early modern maritime trade, particularly in the Mediterranean basin.1 The article will discuss the particularities of the Italian data and the solutions put in place to capture it, outlining the choices that confronted the database creators during the four-year period of its construction. It will demonstrate the capabilities of the resulting database as a tool for historical research. It will provide a clear description of the documentation, the process of creation, the resulting product, and its potential uses, and conclude with some thoughts for the future. The article will first outline the context in which the Italian sources were produced, with particular attention on how the values contained within were arrived at. It will then outline the technical solutions adopted and show the database’s potential for historical analysis. The final section will reflect on the end result and the process of creation.
- 2 G. Felloni, 1998 [1978].
2The Italian data come from the port cities of Genoa and Tuscany, where the local state archives contain some of the best serial data available on historical general average (GA). AveTransRisk – based on the pioneering research of Giuseppe Felloni at the University of Genoa almost fifty years ago – thus adopts the Italian data as “standard” and the original database architecture was created in response to the way this data is structured.2 The article thus includes technical discussion of design decisions that were made with this context in mind. Nevertheless, although the two cities were only 150 km away as the ship sails, different political economies and administrative traditions impacted greatly upon the structure of the respective sources, something that the database endeavours to respect. AveTransRisk now contains information on 1,149 voyages whose documents are preserved in Genoa and 213 from Tuscany. More sources are still being uploaded as we write.
- 3 M. H. Sánchez et al., 2011.
- 4 G. Doria, 1995; C. Bitossi, 1990.
3Genoa is located at the Northern edge of the Tyrrhenian Gulf. During the early modern period, this port city was the capital of a small republic squeezed onto the Ligurian coast between Savoy to the West and Tuscany to the South. From 1528, it was a close ally of the Spanish Empire.3 The Genoese granted loans (the so-called asientos) to the Spanish monarchy and, in return, they obtained formal and informal privileges that allowed them to engage profitably in trade and finance operations on a global scale. The Republic’s independence and “free trade” also rested on this alliance. Genoa was an oligarchic Republic: the main patrician families, who often also owned fiefs inland and had multiple commercial and financial interests, shared amongst themselves all political power and controlled GA procedures. The ruling class coincided with the state’s economic elite.4
- 5 On Livorno see A. Prosperi, 2009; J.-P. Filippini, 1998; L. Frattarelli Fischer, 2018.
- 6 L. Frattarelli Fischer, 1993.
- 7 C. Tazzara, 2014.
4The Tuscan case, meanwhile, reflects a quite different political-economic set up. Most early modern maritime traffic in Tuscany arrived in the port of Livorno, a so-called “free port” which was established by the Medici Grand Dukes in the late-sixteenth century (though GA cases continued to be the prerogative of the court of the Consoli del Mare in nearby Pisa even after Livorno’s rise ended Pisa’s role as an international port).5 The conditions established in the free port granted persons of all religions and nationalities the right to trade freely, at least in theory. The free port also offered tax exemptions, eventually resulting in the abolition of all import and export duties in 1676.6 More generally, the institutional set up in Livorno gave significant leeway to merchants and adopted a general spirit of non-interference in their dealings.7 Livorno relied above all on settled communities of foreign merchants, especially those originating from North-Western Europe, and did not possess a strong native merchant corps of its own, in sharp contrast to Genoa where oligarchic nobles had historically controlled both the state and a significant segment of the maritime trade.
5These different traditions of political economy in the two states may help explain the different approaches taken to GA documentation: the Genoese documents are generally more punctilious and detailed, outlining individual rights and interests in very precise terms, perhaps in order to better protect the interest of the merchants who ruled the state as an oligarchy. The Tuscan documentation, on the other hand, whilst still very rich and detailed, is less so in relation to its Genoese counterpart, which may reflect the free port’s emphasis on the expeditious resolution of disputes and a more relaxed attitude towards formalities. More importantly, these different political economies affected the way that GA were handled in the two centres, an important consideration for users of the data. This will be outlined in further detail below.
6Almost all early modern GA records, regardless of jurisdiction, rest upon two documentary pillars (see the companion piece in this issue for further details). The first is the narrative element: the “sea protest” or consolato in Italian. This was the shipmaster’s account of the voyage and the misfortunes that had occasioned the sacrifice of property or extraordinary expense. Sea protests could vary in their form and content but usually provide a rich array of quantifiable information. While some of this data directly concerns early modern maritime trade (ship types and tonnage, itineraries, voyage times), we also find abundant weather information (wind direction, storms, the presence of ice) which could be used for reconstructing historical climate patterns. For example, the sea protest drawn up for the voyage of the Speranza Incoronata in 1668/1669 from Arkhangelsk to Livorno tells us that the seasonal winter ice, which had blocked the exit of the harbour of Arkhangelsk, had finally melted to an extent sufficient for navigation on 21 May 1669.8 Finally, sea protests also contain social information, in part because they were witnessed by several members of the crew (giving us master’s and witnesses’ names, places of origin, and roles on board). This sea protest content does not tend to differ greatly between the two centres and could, at any rate, have been produced somewhere other than Genoa or Livorno as sea protests were normally written (“declared”) in the first port encountered after the accident. The table below lists the declaration places found so far in the Genoese Average procedures uploaded in AveTransRisk, with the exception of Genoa (Table 1).9
Table 1. Places in which sea protests were declared in GA cases found in the Genoese State Archive
Declaration place |
n. |
Declaration place |
n. |
Declaration place |
n. |
Livorno, Italy |
80 |
Toulon, France |
5 |
Almería, Spain |
2 |
Trapani, Italy |
20 |
Alghero, Italy |
4 |
Anzio, Italy |
2 |
Gaeta, Italy |
17 |
Bonifacio, France |
4 |
Cannes, France |
2 |
Civitavecchia, Italy |
16 |
Capoliberi, Italy |
4 |
Castellamare del Golfo, Italy |
2 |
Portovenere, Italy |
16 |
La Ciotat, France |
4 |
Corsica, France |
2 |
Messina, Italy |
15 |
Milazzo, Italy |
4 |
Cowes, England |
2 |
Portofino, Italy |
14 |
Pisa, Italy |
4 |
Majorca, Spain |
2 |
Savona, Italy |
14 |
Pozzuoli, Italy |
4 |
Malaga, Spain |
2 |
Alicante, Spain |
12 |
Tabarka, Spain |
4 |
Monterosso al mare, Italy |
2 |
Cadiz, Spain |
11 |
Valencia, Spain |
4 |
Oristano, Italy |
2 |
Portoferraio, Italy |
11 |
Viareggio, Italy |
4 |
Porto Azzurro, Italy |
2 |
Cagliari, Italy |
10 |
Agrigento, Italy |
3 |
Reggio Calabria, Italy |
2 |
Naples, Italy |
9 |
Alassio, Italy |
3 |
Saint-Florent, France |
2 |
Piombino, Italy |
8 |
Crotone, Italy |
3 |
Saint-Tropez, France |
2 |
Porto Ercole, Italy |
8 |
Ibiza, Spain |
3 |
Santa Margherita, Italy |
2 |
Ajaccio, France |
7 |
La Spezia, Italy |
3 |
Sestri Levante, Italy |
2 |
Bastia, France |
6 |
Marseille, France |
3 |
Talamone, Italy |
2 |
Calvi, France |
6 |
Nettuno, Italy |
3 |
Tarragona, Spain |
2 |
Palermo, Italy |
6 |
Orbetello, Italy |
3 |
Villefranche-sur-Mer, France |
2 |
Antibes, France |
5 |
Porto-Vecchio, France |
3 |
Ports mentioned only once |
59 |
Barcelona, Spain |
5 |
Sanremo, Italy |
3 |
Unknown |
11 |
Malta, Malta |
5 |
Rossignano marittimo, Italy |
3 |
Total 494 |
Syracuse, Italy |
5 |
Alcudia, Spain |
2 |
Source. Procedures declared in Genoa between 1590 and 1700, present in the online database AveTransRisk.
7The main point of difference here is the level of detail consistently present in the Genoese sea protests. Elements that are reliably present in Genoa – a record of ship tonnage for example – are present only intermittently in the Tuscan documentation.
8The other main documentary element in a GA case was the calculation that determined how much each interested party owed. Here the data produced between the two centres diverges to a somewhat greater degree. In both cases the calculation first lists all the contributing elements and their value; it then lists all the items that were lost or damaged and/or any expense incurred. It then divides the total of the first by the total of the second to arrive at a contribution rate for every quotum of investment. A calculation in Genoa would follow these steps (with examples and simplified for heuristic purposes):
- List of the contributing values (ship and cargo) before the GA event: Ship Owners (50)10; Merchant A (100); Merchant B (40); Merchant C (60) = 250
- List of the damages after the GA event: Ship Owners (–15); Merchant A (–25); Merchant B (0); Merchant C (–10) = –50
- Calculation of the contributing rate: (50/250)*100 = 20%
- Contribution to be paid/net loss: Ship Owners (–10); Merchant A (–20); Merchant B (–8); Merchant C (–12) = –50
- List of the net contributing values after calculation: Ship Owners (40) Merchant A (80) Merchant B (32) Merchant C (48) = 200
- Money to pay (–) or to receive (+) by each asset following the calculation: Ship Owners (+5) Merchant A (+5) Merchant B (–8) Merchant C (–2)
9In Tuscany, only the first three steps outlined above would be carried out, with the latter three steps considered surplus to requirements: once the contributing rate had been ascertained the calculation was apparently deemed to have served its practical and juridical purposes. This fact neatly illustrates the streamlined approach of the Tuscan documentation and the more detailed approach of the Genoese. Nevertheless, in both cases, the calculation thus presents the historian with a wealth of data: the cargo that was on board the ship, its weight, quantity, and value; the value of the ship and its equipment; and, often, the names of merchants who were interested in each shipment, whether as sender, receiver, buyer, or seller. This sort of information is not readily available in other early modern sources. The value of goods, for example, cannot be reliably obtained from other data sources, such as insurance contracts or fiscal records, where the parties had both a strong incentive and the means to over- or understate the value of their assets. The information on the monetary and measurement units used in the ports of loading and their equivalences in local – Genoese or Tuscan – units are elements that offer a concrete contribution, drawn from daily practice, for the solution of many problems of comparative metrology, and can offer valuable comparisons or complement the information available in mercantile handbooks and trade dictionaries. Commodity prices, being established by experts on the basis of attached documents or sworn statements of merchants, can be used to establish the contemporary prices given to the same commodity in different places and the degree of integration of markets over time. The place of origin and the composition of the cargoes, specified in the GA calculations, allows for a first approximation of the geographical and commodity distribution of maritime trade over time.
10The problem of fraudulent valuation besets GA records to a far lesser extent than other sources mentioned above, as individual interests were usually a lot smaller and single parties rarely had the means to distort values to their advantage. Calculations were mostly carried out by neutral experts working on behalf of the collective. Nevertheless, several local factors affecting the valuation process had to be taken into account when designing the database and should also be borne in mind when using the data despite the best efforts of its authors to make design choices that promote transparency in this regard.
- 11 Before the civil statutes, the tribunal of the Rota Civile chose the experts for each Average case (...)
11The Genoese civil statutes drafted in 1589 ruled, among other things, on GA and the persons responsible for drawing up calculations.11 They were officially called calculators (calcolatori). They were selected by the Genoese Senate, remained in office for 18 months, signed all the calculations, and had their own specialised notary/chancellor with a renewable five-year mandate. The calculators listened to all parties involved (shipmaster, merchants, and any insurers) and their witnesses and then validated or nullified the sea protests presented. Following the report’s approval, calculators could order the unloading of the goods and mandate the presence of guards on the ship to prevent any fraud. During the seventeenth century, GA procedures in Genoa underwent only slight changes, which consisted in an increase in the responsibilities, competences, and control exercised by the maritime tribunal of the Conservatori del Mare, composed of members of the nobility. With the passing of time, the Conservatori gradually absorbed the calculators’ functions. By the 1660s they had complete control over the procedure and the calculators seem to disappear.
- 12 A. Addobbati & J. Dyble, 2021, p. 837.
12In Tuscany, the figures responsible for drawing up calculations changed over the period under consideration. The calculations up until the mid-seventeenth century were carried out by two merchants who were selected by lot: one had to be a “recognised” Florentine merchant, and the other a “recognised” Pisan merchant. Around the 1650s, this procedure appears to have fallen into disuse, and from this point on, two calculators were selected at the discretion of the Pisan Consoli del Mare themselves. Little is known about their identity or profession, though the Consoli drew on a wide range of candidates: in one case we can be sure that the calculator was a merchant based in Livorno but hailing from a family of Pisan lawyers.12 The layout of the calculations suggests that the calculators appear to have used the ship’s bill of lading to draw up the document.
- 13 A. Watson, 2011 [1998], vol. 2, p. 420.
- 14 See E. Maccioni, 2019; V. Piergiovanni, 2012; L. Tanzini, 2015.
- 15 G. M. Casaregi, 1802, p. 28.
- 16 J. Dyble, 2021, p. 156.
- 17 J. Dyble, 2021, p. 158.
13The Tuscan calculators were explicitly tasked with the job of valuation, though is not always clear on what basis this was carried out. The relevant normative material apprehends this problem in part, though this provides contradictory assertions on what practice should be adopted. The Lex Rhodia of the Corpus Iuris Civilis stated that contributing goods should be valued at the price they can fetch (i.e. the market price at the destination), while sacrificed goods should be valued at their purchase price “since what is made good is loss suffered not gain foregone”.13 The Llibre del Consolat de Mar, on the other hand, a highly influential collection of maritime customs issued in Barcelona during the fifteenth century,14 stated that cargo should be valued at the price of the port of origin (i.e. the purchase price) if lost in the first half of the voyage and at the destination (selling price) if lost in the second half.15 On an operational level, the only certainty is that a variety of valuation approaches were adopted and that there was no one fixed way to value the cargo. Sacrificed and saved cargo seems to have received the same value in most cases. Two cases from the Tuscan data clearly used the “rule of halves” approach outlined by the Llibre, but there is no indication of its use in several instances where we would expect to see it applied: in a few cases (with multiple jettisons of the same cargo during a voyage, for example) we can be sure that it was not.16 Very often, we simply cannot tell: the most likely occurrence is that all cargo was simply being valued at the Livornese market rate. A government report of a somewhat later date (1785) states that merchants submitted invoices of their goods in order to establish values, though it is not clear that this practice was being applied in our period, and we have never found any trace of these invoices among our surviving sources.17
- 18 R. Davis, 1957, p. 410. For further details see A. Addobbati & J. Dyble, 2021, p. 838.
14The Tuscan calculators were likewise responsible for judging the value of the ship. In one case, historical records preserved elsewhere give the age of the ship in the GA in question, and thus allow us to compare the value given by the Tuscans with Ralph Davis’ estimates about the value of seventeenth-century vessels over their working life. In this case, we know that the ship was at least 13 years old and probably older. According to Davis’ estimates it should have been worth no more than 4,700 pieces of eight, and no less than 2,300. The Tuscan estimate of 4,000 pieces of eight is perhaps a touch generous – we know that the ship was in a poor state of repair even before the vicissitudes of the voyage – but certainly not unreasonable.18
- 19 See S. Corrieri, 2005.
- 20 As an example of this shift in the contribution criteria see the two voyages IDs 51105 (total vess (...)
- 21 On the differences in the contributing criteria between Genoa and Barcelona see A. Iodice, 2023.
- 22 Archivio di Stato di Genova, Genoa, Conservatori del Mare 109, 1691. On Carlo Targa see M. G. Mere (...)
15Once the ship had been valued, the two centres proceeded differently regarding the ship’s contribution to the GA. The Llibre del Consolat suggested that the vessel should count for only half its value.19 The effect of this provision was to lessen the financial burden upon ship masters and/or owners, whose interest in the venture – and hence final contribution – was thus reduced: this was probably introduced to promote the health of the maritime transport sector. Yet the documents we found in Genoa and Pisa tell a different tale, a tale about the decision of Genoese and Livornese institutions to adapt the Consolat by tailoring it on their own needs. In fact it is only from the 1660s that we find this rule being applied in Genoa: the Genoese, reeling from the economic and social effects of the 1656 plague, tried to make the port more attractive to shipmasters by copying the practices current in nearby Livorno, where the Llibre del Consolat’s rule on the partial contribution of the ship was followed.20 Previously, the whole value of the vessel was taken into account, a move that favoured merchants. In order to capture this information accurately, the database thus had to include a field for the vessel’s contributing value, one for the vessel’s non-contributing value, and one for the vessel’s total value. In Tuscany, meanwhile, the ship always counted for only half its value. Freight likewise contributed for one-third of its value according to the Llibre del Consolat but contributed in its entirety in Genoese practice.21 The Genoese lawyer and jurist Carlo Targa explained, in a document hidden inside a folder of GA procedures, how the Genoese authorities chose not to follow the Llibre del Consolat in this respect.22 In Targa’s opinion, the value of the freights is part of the value of the cargo. Decreasing the contribution rate of the former without decreasing the contribution rate of the latter – as goods always contributed for their full value – would be unfair and would benefit the shipowners. In Tuscany, meanwhile, freights contributed for one third. This is the reason why even the freights’ tab is divided into three subsections.
- 23 J. Dyble, 2021, p. 163; see, for example, Archivio di Stato di Pisa, Consoli del Mare, Atti Civili (...)
- 24 On these dynamics, see J. Dyble, 2023, pp. 378-379.
- 25 A. Addobbati & J. Dyble, 2021; for more general diplomatic conditions between England and Tuscany s (...)
- 26 A. Addobbati & J. Dyble, 2021.
16Tuscany’s political economy likewise rendered the question of valuation complicated at times, specifically with regard to the valuation of lost or damaged ship’s equipment. Here Livorno’s reliance on Northern European traffic led to distortions by contemporary actors. The financial import of damages suffered by a ship was usually assessed by neutral experts nominated by the court, often naval carpenters attached to the galleys of the Order of St Stephen.23 From the second half of the seventeenth century, however, English and Dutch shipmasters were apparently allowed to submit their own unnotarised damage reports in which they told the court how much each lost or damaged part of the ship was worth. The total sum requested was then “reduced” by the Pisan court by an arbitrary amount, probably to give a semblance of probity.24 This practice reflected the commercial power held by these actors and the desire of the Tuscan authorities to keep them onside and to maintain a good working relationship with them at a time of diplomatic tension with their home states.25 Our research even turned up one verified case of fraud, in which the English shipmaster and the (mostly English) receiving merchants colluded to push through a spuriously large GA claim in favour of the master, whilst the master, for his part, promised to help the merchants recoup their larger contributions by defrauding the Tuscan customs house.26 Since this would have been presented to the Pisan Consoli del Mare as an amicable settlement, and since the English were very important to the port economy, the Consoli had every reason to wave it through, and there is some reason to think that this may not have been an isolated example.
17These compromises may have suited historical actors, but they presented an issue when collecting data. The sum mandated by the court was not itemised and could thus tell the historian very little of value; the submitted sums, on the other hand, came solely from one interested party – the shipmaster and his crew – and were thus not of the same reliability as more “neutral” figures contained in other GA cases: mixing this “self-assessment” data with the figures provided by the Tuscan carpenters would risk contaminating all of the data on ships’ equipment. It was therefore necessary to build a new section of the database to accommodate this reality. A new drop-down menu was added to the entry form for damages and expenses in which the inputter could specify “awarded” (the default option) or “claimed”. Users are thus able to distinguish a value that had been awarded consensually or through the decision of a third party (i.e., a court or a court-nominated expert) and a valuation given by an interested party that expected to benefit directly from an inflated figure.
- 27 A. Addobbati & J. Dyble, 2021, p. 837.
18In any case, and whatever the precise mode of valuation, it is worth bearing in mind that in the majority of cases, wildly inaccurate valuations could not be made to stick. The incentive of each interested party was to have their cargo valued at the lowest possible price: the lower the price, relative to the other property involved, the lower its owner’s contribution. Yet the fact that all interested parties faced the same incentive was itself a guard against falsely low valuation. Many receiving merchants would be receiving the same cargoes, making it difficult to favour particular parties. Moreover, an ever-present dynamic of the GA case is the tension between “hull” and “cargo”. The shipmaster would have hoped for a low valuation for his ship and freight, and a high valuation for the merchandise; the merchants hoped for the opposite. Save for cases of fraud like those outlined above, in which one party found a way to secretly “reimburse” the other for an unfavourable valuation, the presence of the shipmaster and at least some of the receiving merchants during a GA procedure meant that an acceptable balance had to be found between these opposing interests, and hence a mutually acceptable appraisal of goods, ship, and freight. GA valuations were thus, to a certain extent, self-correcting. It is possible, of course, that absent merchants based in other ports of call were in a weaker position, but the analysis published by members of the “AveTransRisk” team suggested that valuations were not distorted in this regard.27 If anything, it is likely that all valuations of cargo contained in the database err on the side of generosity, as in many cases this would have produced a lower contribution rate when expressed as a percentage, fraction, or ratio – thus mollifying all participants in the venture.
19A number of challenges involved in the construction of the database were known at the outset. One of these was the use of different calendars during the early modern period: whilst many places had adopted the Gregorian calendar, others, notably England, continued to use the Julian calendar. In order to overcome the potential distortions this would introduce, a “date: place” field was included: when recording a date, the inputter must include the place from which the document originates. These were linked to a central record of where different calendars were being used at what times. This was considered a more straightforward solution than forcing the inputter to determine which calendar was being used in each individual case.
20Currencies, weights, and measures, which displayed a manifold heterogeneity during the early modern period, presented a similar problem. It was decided that currencies would be linked to a silver equivalent to facilitate comparison. The question of weights and measures, on the other hand, was too complicated for automatic equivalence to be contemplated. In some cases of generic units, e.g. “bales”, this would be simply impossible. Instead, the database settles for maximum atomisation i.e. using separate fields for number and units (3 rubbi is entered as “3” in one field and then “rubbi” in another).
21Language, terminology, and translation were foreseen issues. Most of the sources were written in different regional Italian languages, whilst the database itself was to be published in English. In line with the guiding principle of data cleanliness, it was decided that terms would be captured exactly as they appear in the sources. This puts the onus for analysis onto the user, who must be versed in the languages in question in order to take advantage of the more language-sensitive data. This is preferable to a situation where input errors are carried deep into the historical record by unsuspecting and perhaps inexpert users. Roles onboard, for example, are kept in the original language rather than “translated”, as these often did not have a direct equivalent. A glossary of these and other terms allows non-speakers an entry point into this data.28
22A particularly tricky instance of this problem occurred in relation to the names given to cargo. The cargo values are some of the most valuable information in the database, and we hoped to give users as much access to this as possible. Cargo descriptions, on the other hand, are difficult to translate accurately and often require specialist knowledge. A comprehensive glossary would have been beyond the means and scope of the project. Without knowledge of Italian, however, users would have difficulty in finding the cargoes that interested them. In the end, it was decided that the original name would be preserved in the entry field, and that two further fields would be added: “type” and “category”. These capture individual judgements by the inputter on the nature of the cargo. Type is intended to be the narrowest useful descriptor, and category the widest useful descriptor: lana for example is given the type “wool” and the category “raw materials for textiles”; tonnina is given the type “fish” and the category “food”. Since these categories are imposed at the discretion of the inputter, this solution is admittedly somewhat imperfect and subjective but at least helps users to begin their search inquiry whilst simultaneously allowing for better transparency and data cleanliness. For instance, in Figures 1 and 2, an example related to the uploading of a batch of cargo in the offline and online database can be observed. The cargo was made of rocchiella wheat, a variety of durum wheat, associated to the typology of “cereal” and the category of “food”.
Figure 1. Cargo visualisation in the offline AveTransRisk database
Figure 2. Cargo visualisation in the online AveTransRisk database
23We followed the same approach in creating categories to group the events behind the sea protests. We have information, for example, on 192 jettisons that occurred mainly in the Mediterranean, with more or less specific geographical indications and the unforeseen costs they caused.
24Another final layer of complexity not apprehended initially was the coexistence of multiple Averages, whether multiple GAs or a mixture of different types of Average: Particular Average (PA) and GA, in particular, could occur during the same voyage. PA occurs when a ship or cargo has incurred direct damage as the result of a casus fortuitus, the cost of which is borne solely by the afflicted party. Given the overwhelming nature of a storm, it is no surprise that these two types of Average often occurred in the same voyage. For Genoa, we currently have 188 GA calculations and 146 PA calculations, and 54 voyages ended up with a GA and PA calculation. Here too, the exigencies of real life confounded the neat, classical architecture of a database. For example, a jettison in a storm could give rise to a GA, while the soaking of the cargo in the same storm could give rise to a PA. A single calculation could be used to manage both Averages, however. The values of the ship, freight, and cargo’s values, together with the jettison’s damage, would be included in the GA. Only the cargo (or a portion of it), with its damage – the soaking, in our example – would be accounted for in the PA section. The more punctilious Genoese documentation sometimes provides different values for the same cargo depending on where the accident took place: the value in the port of loading, if the accident happened in the first half of the voyage, or in the port of unloading, if it happened in the second half of the voyage. This forced us to add another sub-tab in the Average tab, called simply “Risk”, to attach each valued element to the relevant Average. This allows users to see which items on the calculation were pertinent to which Average. Importantly, it also avoids false totals, whereby the same cargo is effectively counted twice because the ship suffered two different Averages.
25The “AveTransRisk” project used a series of offline databases to capture data before periodically uploading each dataset into a central database from which the data is displayed online via the web interface. At the beginning of the project, it was envisaged that the interpretation of source material and subsequent capturing of data would be performed in remote archives, possibly in areas without Wi-Fi or any connection to the internet at all. Photographs of the source material might be allowed in some cases, but it was considered that at least some archives would forbid this or charge per photo. Copyright issues also made this approach less attractive. With this in mind, it was decided that a digital tool would be required to capture the data in situ. Ideally, this solution would allow for all required data to be captured and made instantly available to other researchers (after moderation) via the project website. However, due to the possible lack of internet connectivity in the archives, the data capture tool would need to be able to work offline at first, before the data could be uploaded to a web-enabled database at a later date.
26A three-dimensional, relational structure clearly suited the complex source material better than a two-dimensional spreadsheet. Microsoft Access was chosen as a relatively simple database engine included in the same Office suite as Excel and thus similarly intuitive. It also has the benefit of familiarity for many users. Using Access, tables for each data entity (e.g., Vessel, Cargo, Master’s Report, Events, Average and Damages) can be easily constructed and data input made simple through the creation of input forms. Unlike many larger database systems, the data is contained in a single file allowing for easy copying and transporting. It also has several export options making the transfer of data into the web-enabled database very straightforward. With Access, a powerful yet easy-to-use data entry tool could be created. A separate version could be given to each researcher to be used offline and then periodically emailed to the data technician for upload into the main online database. This also gave researchers and the wider project team an opportunity to inspect and check the data before uploading it to the central data store. This approach presents no downtime while the offline database is being uploaded, and allows researchers to continue collecting and amending beyond the end of the project as they have their own version of the dataset.
27An automatic upload, avoiding the need to manually upload the material as soon as the researcher connected to the internet, was initially considered. However, the risk of errors being uploaded to the web-enabled version was thought to be too high. Delaying this upload allows researchers to check the data and make amendments as they move through the data input process.
28Although Microsoft Access can be used to present data online, a more powerful relational database management system (RDBMS) was required for the needs of AveTransRisk.
29The web interface was written in Python using the Django Web Framework. Through the importation of code packages, Python is a powerful programming language allowing easy implementation of features including search engines, data export, and the Math functions. Using pre-processing scripts written in Python, data can be collated and analysed automatically to present a statistical display to the user. The Django framework uses the popular Model-View-Controller (MVC) approach to separate the data structure, code, and web template, effectively encapsulating each section. This forces the creation of cleaner code and a more scalable program. While both Postgres and MySQL database systems compliment Python/Django solutions, the University of Exeter Digital Humanities department has a dedicated MySQL database server, so this system was chosen.
30The Web User Interface (WUI) presents the data in a format which is attractive for users. The library jQuery was used to add interactivity to the pages, particularly through the use of Google Maps. The Google Maps API is a simple yet powerful mapping function which, when combined with Python, can be used to analyse the data by location, time and entity. The AveTransRisk maps show the location of ports visited by year and colour-coded flags indicate whether each port was the origin of a voyage, a stop along the way or a destination. Events are also displayed using maps and can be isolated to show, for example, all weather events by year.
31To provide the most effective legacy data structure, each table in the database will be exported as a simple comma separated value (csv) file. CSV files are easily imported into many different data software systems, including spreadsheet applications such as Excel and database management systems such as Postgres, MySQL and MSSQL. Saving the data as simple CSV files removes the dependency on any particular software that may not be available in the future. Extensive notes will accompany the files to explain the structure and how to interpret the data and recreate the relationships between entities.
32Once in an accessible format, the data will be deposited into a research repository such as the UK Data Service29 or the University of Exeter’s own ORE repository30. Both systems will make the data available for download and create a unique DOI for referencing. Offering the data in its constituent parts allows easy access to each part of the research and allows other systems to incorporate the data into their own datasets.
33We spent more than four years designing and amending the database. All the above-mentioned “exceptions” and irregularities led us to design computational tools as wide and, at the same time, as specific as possible. AveTransRisk offers today three different tools: the map function, the simple search, and the advanced search. Since we specified current country and the geographical coordinates for all ports registered in the database, they have been easily converted into geopoints. We created two different map functions, one related to ports, and another related to events.
34Events in Average sources are usually described with references to nearby islands, towers, and so on. Sometimes there are vague distances in nautical miles (“we were 5 miles north-west of Elba island”), and so on. However, it is hard to determine exactly when a certain event took place, or for how long. For example, how should we locate a storm that started near Sicily, lasted 7 hours and ended near Sardinia? For this reason, the map of events tool is still a work in progress. Currently, only the voyages from 37 Tuscan sources have their events geographically located. An example of event visualisation is shown in Figure 3. The ship Tre Re, with the Flemish “Giovanni di Giovanni” as shipmaster, was freighted to transport wheat from Ancona and Senigallia to Livorno. A few days after setting out they encountered a storm near Cavo Festozzi with north-easterly winds. The crew were set to bailing out and they took in all sail to avoid being pushed onto the land. However, with the storm increasing and the ship threatening to capsize, the shipmaster ordered a jettison of a very large cargo portion after the necessary consultation with the crew. The location of the event A, therefore, was not defined with GPS coordinates, as would be the case today.31
Figure 3. A voyage map with events (A and B)
Note. Red circles indicate uncertainty in event position.
35For similar reasons, we chose not to “draw random lines” to hypothesise the vessels’ routes. For example, although we know that a ship sailed from point A to point B, we cannot know for sure the route it followed. The voyage of the ship called Il Lauro, with a tonnage of 70 lasti (161 tonnes), is a clear example of the unpredictability of maritime routes.32 Shipmaster Simon Sverze of Emden (in modern-day Germany), sailed from Amsterdam to Genoa with a cargo of wheat and rye. Due to a storm, the ship ran aground on a sandbank between England and Ireland. Sverze ordered part of the cargo to be jettisoned to lighten the ship and resume the voyage to Genoa, where he arrived in February 1592. If it were not for the Average event, we would not have known that the ship took a route around Scotland and through the Irish Sea, maybe because of unfavourable winds blowing in the English Channel which were difficult to manage for a vessel that only had square-rigged sails.
36The map of ports, therefore, only shows information that is actually reported in the sources. The number of ports discovered by archive is shown below (Table 2). This is broken down into ports where the location is known, unknown or unsure.
Table 2. Number of ports discovered by archive
Source |
Total ports |
Location unknown |
Location unsure |
Georeferenced location |
Tuscany |
173 |
15 |
15 |
143 |
Genoa |
414 |
36 |
40 |
338 |
Total (unique*) |
498 |
50 |
51 |
397 |
Note. Since the same port can occur in multiple sources, the numbers shown in the “Totals” column are not sums of the values above it but the number of unique ports.
37First, the user can choose the year range, from 1500 to 1900. The second option is related to the source material. A specific source location can be selected, such as Genoa or Tuscany, in addition to the “all source” option. At this point, the map function allows an optional function to filter the ports based on the reason they were visited by vessels in the selected year range. Currently, there are five reasons:
- voyage origin: the first place touched by the vessel. Usually this is the port of loading;
- scheduled stop: any port or place – an island, for example – where the vessel stopped intentionally. Usually, the purpose of this stop was to meet with a merchant who communicated the destination port, or to load more goods;
- forced stop due to an event: any port or place where the vessel wouldn’t have stopped if not for an unpredictable event. Classic examples are voyages in which vessels stopped in a port to take shelter from a storm;
- destination: the last place touched by the vessel. Usually this is the port of unloading;
- unknown reason: all ports or places where vessels stopped but the shipmasters did not explain why; this is a rare occurrence.
38The different reasons for a stop in a port are represented on a map using different colours, as shown in Figure 4. If a place was touched several times for different reasons, the ping on the map will be split between the relevant colours. The map can be visualised both with and without port names. Finally, a list of all the places not shown on the map can be found below the results.
Figure 4. Example of map of ports function
39The simple search is a full-text search that looks through all the text entries in the database. It is a basic computational tool that could be used for quick surveys. Since there was no standardised spelling in the early modern period, the search also returns results based on similar spellings implemented through the built-in features of the Apache Solr search platform. The notaries and officials who wrote the documents uploaded in AveTransRisk translated foreign names of both people and vessels into seventeenth-century local vernacular Italian. Juan Sanchez, for instance, becomes Giovanni Sances, while the vessel Merchant of Dover becomes Il Mercante di Dover. There is no correct spelling, and we cannot always be sure of the original name or pronunciation. This is why we adopted the names given by the sources and allowed the simple search to look for similar spelling.
40The last computational tool offered by AveTransRisk is the Advanced Search function based around a query builder. Users can simply choose the desired database fields that they wish to search and enter the criteria. Every piece of information recorded in the database (the vessel’s value, a cargo item, the freights, a damage or an Average, etc.) has its own ID. This made it easier to separate them and allow for specific searches. Each voyage has a macro-ID that includes all the minor ones, as can be seen in the following relationship diagram in Figure 5.
Figure 5. AveTransRisk entity relationship diagram
41Various methods of filtering the results are available such as “CONTAINS”, “GREATER THAN” (>), “LESS THAN” (<) and “IS EMPTY”/“IS NOT EMPTY” and brackets can also be used to split the query into smaller parts. There are two types of search fields: text and choice. Text fields allow users to type in the year, word, or phrase to search for. Choice fields allow users to select the enquiry from options presented by a scroll-down list, for example “vessel type” or “cargo type”. Users can build a query around any combination of text and choice fields and add more fields of the desired type. We provided a list of all fields and their field type in the user manual.33
- 34 See A. Iodice & L. Oddo, 2022; A. Iodice & L. Piccinno, 2021.
- 35 According to Douglass North’s definition, transaction costs are the costs of specifying and enforc (...)
42With the advanced search function, the database can be used to investigate, for example, trends in wheat prices, or administrative costs paid by vessels in Genoa for their Average procedures.34 Such costs have recently been studied as a proxy for the overall transaction costs incurred by vessels involved in maritime trade during the so-called “Northern Invasion” at the end of the sixteenth century.35 The following query, for example, also shown in Figure 6, could be used to investigate the evolution of sugar prices and the types of sugar arriving in Genoa between 1590 and 1700, as they appear in Genoese Average sources:
- [choice field] Archival Source = Genova (Republic of Genoa) AND
- [choice field] Average type = Grossa/Comune AND
- [choice field] Risk: Cargo type = Sugar AND
- [text field] Calculus written date >= Year: 1590 AND
- [text field] Calculus written date <= Year: 1700
Figure 6. Example of advanced search function
43Unfortunately, the advanced search requires proper user-training to be fully exploited. A simple mistake, for example selecting “=” in the last two fields, could significantly alter the results. At the same time, the database contains so much data that computational tools of this kind are essential. The user manual included on the site gives guidance on these issues.
44The search results can be downloaded in Excel. A successful advanced search is only the first step. The results must be downloaded, cleaned, and integrated, probably with the existing literature or other sources, etc. In some cases, a direct search through the original sources may be necessary. An example is the analysis of administrative costs visualised in Figure 7.
Figure 7. Pisan and Genoese averages administrative costs 1599-1670
Note. Pisan total number of calculations = 33; average expenses’ value = 298 Genoese lire; Genoese total number of calculations = 71; average expense’ value = 62,5 Genoese lire.
45We compared administrative costs linked to GA procedures between Genoa and Livorno from 1599 to 1670. This analysis provided us with an important parameter to assess the institutional efficiency in GA handling between the two rival seaports. What emerged is that Genoese institutions managed to keep their administrative costs constantly low, probably because of the institutional specialisation of the calculators that allowed some price stability. Things changed only in the second half of the century, but this convergence is still being evaluated. To reach the visualisation of Figure 7 we had to carry out two separate advanced searches: one on Genoese and one on Tuscan GA administrative costs. We then converted Tuscan results into Genoese lire. We chose Genoese lire rather than the silver prices conversion provided by AveTransRisk because usually the sources in Genoa and Livorno provide direct exchange rates that we recorded in the metrological and monetary equivalences field. Similar analysis could be carried out for the values of certain commodities between different ports (sugar, wheat, etc.), damages or ships’ values, and so on.
46The process of creating the database demonstrated the enormous challenge of developing a single data model for procedures unfolding in different geographical locations over a large time period, resulting in different rules, conventions, and workarounds. Inevitably, and despite the overall success of the database in providing a framework capable of housing GA records produced almost anywhere, its realisation involved trade-offs and choices that future developers and historians should be aware of when embarking upon similar projects.
47First of all, the guiding principles of completeness and data cleanliness increase the demands on inputters. Data cleanliness should, of course, be a sine qua non for all historical projects in order to avoid contaminating the analytic “food chain” at its source. Stefano Fenoaltea, in one of his last essays after a lifetime spent in cliometrics, lamented the tendency of historians to neglect the unglamourous data-collection stage in which vital analytical decisions are taken.36 The result is castles built on clouds, wide-ranging conclusions and impressive-looking econometric analyses based on faulty numbers. However, it should also be recognised that clean data capture usually goes hand-in-hand with more entry fields, as several of the examples related above demonstrate. This does more than simply increase the inputter’s workload. Every additional field increases the complexity of the database and hence the learning curve required for new prospective inputters to become familiar with the entry form. This could reduce the pool of potential collaborators. It also may, ultimately, prove self-defeating, as the increased complexity of the entry form leads inputters to make mistakes, adopt their own idiomatic conventions, or just ignore large sections of the entry form. Lengthy instruction manuals are not an effective safeguard against this kind of human error.
48Since data cleanliness is (or should be) non-negotiable, the trade-off is effectively between going deep (a source-oriented database which harvests all potentially useful information from a necessarily limited number of documents) and going wide (a method- or model-oriented database capable of harvesting limited information from many documents). In this respect, database designers would do well to think early about whether their project is going to be an open project to which future researchers may add more data, or a project-specific research tool, whose more restricted design can be guided by research questions (or even whether it is primarily a teaching and dissemination tool). Designers should take account of the resources they have available and pick their strategy accordingly. Slave Voyages is a successful example of a relatively well-funded project that appeals to a large scholarly and public user base and can thus realistically focus on a large number of variables.37 The new Risky Business database, on the other hand, which aims to become a repository for data on historical marine insurance, has wisely decided to focus on just a few variables, notably the insurance policy premium rate, an ubiquitous and analytically useful variable that does not rely on converting currencies.38 The AveTransRisk database could be opened to further collaboration in the future.
49To a certain extent, these inputting challenges could be offset with technological solutions. The AveTransRisk database already includes some safeguards that direct the inputter towards good practice. Certain fields in the database, for example, will not allow the inputter to insert free text into fields like “port name”: the user must instead choose from a list of ports which have been previously inserted into a master list of place names along with geographical coordinates and alternative names and spellings. This both allows the map functions (which relies on coordinates being entered) to be used for all entries and also helps ensure uniformity across the database (modern place names and standardised spelling are adopted throughout). Additional development could further help to reduce human error: one example might be checking to see if a port name has already been entered in order to avoid duplication; or refusing to allow an entry to be uploaded unless archival reference information is present. The user could even be guided step-by-step through the complete entry process in order to reduce the overwhelming impact of the database and to minimise cognitive load on the inputter. Whatever the potential of these techniques, however, these are no substitute for good strategic planning in the design phases, which assesses trade-offs with reference to the project’s overall aims. Funding bodies should also bear these considerations in mind when deciding to award money to projects with a digital database, assessing the extent to which questions about audience, purpose, and available resources have been seriously considered in light of the difficulties presented by researching the premodern period.
50Given that many research projects are now looking to create large online databases, it is important that we also ask ourselves about sustainability (whether the database can and will be used in the future rather than being a short-term gimmick or novelty) and interoperability (the potential to combine the datasets with other related datasets). Funding bodies are attracted to the “useful for future researchers” argument because it promises a compounding effect and a more productive investment of resources. Such promise is unlikely to be realised, however, if the database does not receive the technical and institutional support needed to maintain and run it. Even if such technical resources are forthcoming, the database needs to consistently appeal to a wide user base to have a long and productive lifespan beyond the horizons of a research project. It is here that we see how strategic decisions about completeness, cleanliness, sustainability, and interoperability are linked. First of all, cleanliness is a paramount consideration for sustainability. If historians realise that even a small part of the data is compromised and has contaminated the rest, it becomes unusable. Completeness is also a consideration in terms of attracting a user base. If temporal coverage is limited, or if there are not enough entries for serious statistical analysis, that will also reduce the number of potential users.
51Interoperability aids sustainability because larger datasets are more likely to attract users. Consolidation also allows datasets to be maintained more efficiently. Unfortunately, the AveTransRisk experience demonstrates quite how difficult the creation of such “master databases” might prove to be. The particularities of GA in different early modern ports necessitated the creation of many fields in order to keep data clean, and to deal with various regional and procedural idiosyncrasies. The prospect of simply merging databases – with, for example, the “Portic” project (Ports, and Information and Communication Sciences and Technology: Querying and Visualizing eighteenth-century shipping and trade dynamics in the digital era) – would clearly be a delicate and complex operation rather than a simple case of pouring two datasets into the same digital trough.39
52More promising in this respect could be a kind of “modular interoperability”, whereby projects draw upon commonly available building blocks that could reduce the time involved in constructing a database and provide a basis for some limited data comparison. The website GeoNames, for example, provides a common fund of coordinates and alternative names that could be drawn upon by future historical databases: rather than creating a port list from scratch, inputters could simply include a reference to GeoNames, allowing for the automatic importing of this data.40 This would also help to reduce human error. Similar online repositories dealing with currencies, premodern weights and measures, or calendars could likewise aid database construction by outsourcing many of difficulties which are currently being confronted “in-house”. A relational structure between online databases may thus be the key to larger datasets in future. It will not, however, provide a substitute for historians making interpretative choices about their data and the way it is presented, or seeking to fully understand the choices that have been made by others in producing large, publicly available datasets.