4. Backup and Dissemination of Data
p. 99-101
Texte intégral
1In this section we shall present the methodology developed as part of our Master’s research for the backup and dissemination of raw data and results. The principles of this backup are based upon the notion of a long-lasting and sustainable storage of the data and the reproduction of results. It is quite simple with MicMac to save essential data in sustainable formats: JPG, TIFF, TXT, TFW. With other software, if one wishes to save the processing, one must keep files within a proprietary format which may become unusable should the programme cease to exists. This is not the case here.
2First of all, there is no point in keeping all the photos taken during acquisition, only those used in processing must be saved. Each process is stored in a dedicated named sub-directory, and these are stored in a general directory specific to the acquisition campaign. Thus, each sub-directory only holds the photos in JPG format, the topography file in TXT, the process in TXT, the final orthophotomosaic in TIFF with its accompanying TFW, and a sketch of the scene in JPG which enables the recovery of the GCPs on the photos. The raw topography file is stored in the general directory, along with a planar view of the site on which each surveyed scene with its name (or code) is located. It is also useful to save the Residus.xml file generated by Tapas, located in the directory Ori-CAL, which allows a third party to check the quality of the processing. All of this makes it subsequently possible for anybody to easily review the data when needed.
3For the sake of interoperability, file names must be standardised. Several criteria should be considered when deciding on a naming convention. Firstly, the file name must only contain alphanumeric characters. Special characters such as accents, spaces, commas and brackets have different encoding depending on the system used, and this can cause false readings. Capitals are acceptable but it is best to limit their use since switching between upper and lower case can provoke mistakes and writing in capital letters is rather aggressive to read. It is also important to find a balance between short, standardised, easily machine-readable names, and those which are sufficiently explicit to be intelligible to users. In practice, a name should not exceed 30 characters. Firstly, this is because of a technical restriction, file paths are limited to 256 characters in Windows, which currently is the most common operating system, but also because a short name limits the risk of mistakes when writing file requests. It is also a helpful to keep the same number of characters for the same type of file, at least in the file prefix. Once again, the aim is to facilitate the reuse of data by yourself or a third party by enabling, for example, the quick modification of one or two characters during iterative queries. Lastly, your file organisation system will not necessarily be that of new users. Therefore, every file name created must be unique, independently of the directory in which it is held. To conclude, the naming convention is a deciding element of file structure: with an ensemble of logical codes and names, it enables humans as well as computers to get past the “storage” in order to focus on the actual use of the files.
4We propose here an example of a naming convention suited to photogrammetry files created during archaeological research: “Specific operation authorisation number_site acronym_scene number_file type.extension format”. Each element can be described as follows:
- Specific operation authorisation number: This the number of the command authorising the archaeological operation, whether prospection or excavation. This number can be read on the document of the same name provided by the official authorities to every operation manager, usually in the form “2022/385”. We advise using an eight-figure format, which, in this example, would appear as “20220385”. It is worth using a date as the initial key when relevant, since it is then easy to classify the documents in any sort of file management system.
- Site acronym: It is common in archaeology to use an acronym on, for example, photos as a quick way of identifying a site. We propose using such an identifier here, with four lower case letters, such as “vsld” for Verdun-sur-le-Doubs.
- Scene number: A site can be the subject of several photogrammetry acquisitions and procedures during an operation. We suggest grouping each acquisition and its processing per scene, numbered from 1 to n, in the form “sc001”, “sc002”, “sc003”, etc.
- File type: This code identifies the content of the file. In our case, this is “pho” for photos, “ort” for orthomosaic, “prc” for process, “gcp” for ground control point, “rsd” for residual error, “dia” for gcp location diagram, and “pla” for general site plan with the location of the different scenes. If the scene displays several files of the same type, such as photos, they can be numbered from 1 to n: “pho001”, “pho002”, etc.
5Using this convention, the final orthomosaic of Verdun-sur-le-Doubs would be named “20220385_vsld_sc028_ort.tiff”.
6In order to disseminate photogrammetry data, it is advisable to store them within a reliable digital storage facility like Nakala, and to allow access via a single, sustainable link, such as a digital object identifier (DOI), a Handle or an ARK. However, if the data is to be published in paper copy for a clear presentation, we suggest the following, where the plates are limited to the raw data of each processing. First of all, a page is dedicated to the complete raw topographic data, since these are thereafter truncated for processing. The general site plan, on which the different scenes are located, can then be added. Finally, for each elevation, as in the preparation for processing, one should present the contact print(s) of the corresponding photo set, the related truncated topography file, the complete processing procedure archived in text format, the average residual calculated by the tool Tapas and the sketch plotting the markers. The contact sheets can be generated automatically using the tool PanelIm, which is built into MicMac. The command mm3d PanelIm ./ ".*JPG" automatically generates a TIF named Panel.tif directly into the work directory. The command mm3d PanelIm -help displays the list of arguments that enable a change in the result settings, such as image size, spacing the number of lines, the background colour or the scale of the thumbnails. Otherwise, it is possible to use a third-party photo editing software. In GIMP the operation is done using an installed script: http://tounoki.org/documents/planche_contact.scm.
Le texte seul est utilisable sous licence Licence OpenEdition Books. Les autres éléments (illustrations, fichiers annexes importés) sont « Tous droits réservés », sauf mention contraire.
Michelet, à la recherche de l’identité de la France
De la fusion nationale au conflit des traditions
Aurélien Aramini
2013
Fantastique et événement
Étude comparée des œuvres de Jules Verne et Howard P. Lovercraft
Florent Montaclair
1997
L’inspiration scripturaire dans le théâtre et la poésie de Paul Claudel
Les œuvres de la maturité
Jacques Houriez
1998