File-naming convention for BADC instrumental and other datafiles
The BADC holds a wide range of instrumental and model datasets of interest to the scientific community. From the point of view of
data access, it is highly desirable to adhere to common file formats and file-naming conventions for all the data produced under
the various projects. A well thought out and organised file-naming convention allows quick data access and avoids the user having to
read the file in order to enquire as to its contents. Using this convention will save time and resources when setting up data
management for each individual project, it will also allow greater analysis and manipulation of the data by software at the BADC
The file-naming convention for instrumental (and other) datasets uses long file names since these indicate significant information
about the contents of the file without having to read the file or refer to the directory structure. Important attributes in a file
name include INSTRUMENT, LOCATION and TIME.
Please note that FAAM file names expand the convention below by allowing
three [_extra] fields, two of which are mandatory for data
collected on board the FAAM aircraft (for details, please refer to the
FAAM Filename Convention).
Participants to FAAM campaigns may feel free to generalise this rule to all
data collected during FAAM campaigns, and use up to 3 extra fields separated by
underscore signs, if they wish to do so.
The chosen convention is as follows:
instrument - is the instrument name (full or shortened) or model name. When the same
instrument is used by a number of groups, the instrument name should be prefixed with the institute name/code and a hyphen, for
example uea-ptrms and york-ptrms.
location - is the location name (full or shortened). This refers to the location of the
observation and not the institute or location of the participating scientist/group. This field could be used for a range of items
such as a site, a station, a platform, an institute or a university.
YYYYMMDD - is the date on which measurements were taken.
If a data file spans more than one day then this field should represent the first day during which data was recorded. The year is given
as four digits with month and day as two digits each.
[hh][mm][ss] - is the time of day specified (optional). Hours, minutes and seconds can be
represented as two digits each. Hours can be used alone, only hours and minutes used or all three fields can be included. However,
minutes or seconds cannot be used without the preceding time unit (i.e. no minute field allowed unit without the hour field).
[_extra] - this section allows additional code to define such things as different range
resolutions and so forth. It could also be used for Version numbers etc,.
.ext - will normally be .nc (NetCDF) or .na (NASA Ames) although occasionally other formats
will be used, in particular .png and .gif for Image files.
Filenames should contain only the characters [-_.a-z0-9].
Spaces are forbidden and upper case characters should be avoided.
The underscore "_" character should only be used as a separator between fields.
File-naming for non-standard data (e.g. model, trajectory data)
Some projects will also generate model data, flight data, data recorded at sea (stationery and in transit),
trajectories and other non-standard data types. It is suggested that the above format be adapted in the following ways:
- Data recorded by onboard moving craft
When data is recorded on a moving craft the varying spatial location should not be recorded in the filename. Instead, the location
field in the filename should include a name (or code) for the vessel and optionally the flight/voyage code/number.
- Trajectory data
Calculated trajectory data is similar to data recorded on a moving craft. The varying spatial location should not be recorded in the
filename. Instead, the location field in the filename should include a relevant code for the trajectory
- Model data
In the case of the model data, the instrument field in the filename should instead be used for a model code
(indicating the type, version etc., of the model). For box models running at one location only the location fie
ld should include this. However, models that output data over a grid can use appropriate codes to represent this.
- Use of the [_extra] additional information field
The [_extra] field is unlikely to be used in most cases but is provided as an option for exceptional cases where the
data producer wishes to include some additional information not otherwise catered for. Suitable warning should be used a
gainst overloading this field. Such a use might be in forecast files where the date and time provide the start time whilst the
[_extra] field provides the time of the actual forecast.
- Use of the [hh][mm][ss] time options
The [hh][mm][ss] options are included or occasions where data is produced at such a high frequency that storing it
in multiple files per day, hour or minute becomes appropriate. This is unlikely to be commonplace but is available for special cases.
- Image files
Text files (.txt) may be included to describe image data. Apart from the file name extension (last field), files
containing images and their associated metadata should have the same name.
When data exist both in the form of NASA Ames formatted fields and images,files also have the same name, except for the file name
Standardising common names in the naming convention
In order to standardise the names used within the file-naming convention the BADC will need to collate those currently used by the
community and publish them via our website. This can be regularly extended to include new locations, instruments, models, etc.
Interaction with instrument scientists and modellers will be essential to achieve this aim successfully.