Storage location and infrastructure
Excerpt from "Guidelines on Digital Research Data at TU Darmstadt"
"[...] This concerns, for example, [...] measures to maintain the integrity and authenticity of the data, as well as information on confidentiality [...]" (Guideline 1)
"[...] all project participants are obliged to ensure compliance with good scientific practice and long-term archiving, [...] In doing so, they take ethical, data protection and copy-right aspects into account." (Guideline 2)
Storage location and service hosting
Different storage options come with their own advantages and drawbacks. Moving your data from one location to the next might involve significant work and a temporary interruption of data access. Thus, it is best to evaluate storage options and select the fitting one before starting data generation. This evaluation can be done, for example, when preparing a data management plan.
Things to consider in this evaluation include the following list:
- How much data will I need to store?
- Are there special requirements, e.g. very fast storage?
- How much will the data storage cost?
- How is the data backed up?
- In case of a disaster:
- What is the maximum acceptable amount of lost data?
- What is the maximum acceptable time for recovery?
- In case of a disaster:
- Whom do I trust to store my data?
- Who will need to access the data?
- Is the level of protection against unauthorized access sufficient for the sensitivity of the data?
- Do I want to rely primarily on the file system for data organization or will I apply specialized software?
You can refer to the sections below for more insights into several of the issues raised here.
If you don't want to take care of data storage yourself, the University Computing Centre offers its file service that basically will provide you with a common network drive of up to 2 TB for your research group. The free storage space of the Hessenbox (100 GB) is also a good option for certain situations as outlined in the respective sections but limited by its assignment to individual TU-IDs. Please note that Hessenbox data is deleted one month after the end of the employment relationship (§10 Nutzungsbedingungen Hessenbox-DA, version of 03.09.2020).
If you decide to run your own instances of software helping with tracking of your research (e.g. electronic lab notebooks, see below) or data organization, we recommend the virtual servers that are offered by the University Computing Centre. You can choose between Windows and Linux as operating system. See the service website for details. The storage amount that can be allocated per virtual server is limited to approx. 500 GB.
At the moment there is no service at the University Computing Centre for storing large amounts of data.
If you are planning your own storage solution it should be in accordance with BSI Grundschutz (SYS.1.8 Speicherlösungen).
Data sharing and collaborative work
You will want your data to be stored securely and protected from unauthorized access. On the other hand, discussing and sharing data with your colleagues and collaborators are often essential for a successful project conclusion. For these activities, we recommend not to rely on (free) commercial providers where it is unclear or even problematic how the data is protected, where the servers are located, and how the services are actually financed. Instead, you should prefer services offered by TU Darmstadt or other public institutions from Germany or the European Union. If commercial providers are used, the legal framework must be carefully examined.
If you want secure storage whose access is restricted to selected individuals via their TU-ID, your research group should opt for the File Service provided by University Computing Centre. However, this service has no functionality for sharing data externally, except via guest TU-ID.
Hessenbox, in contrast, which TU Darmstadt provides free of charge to all individuals with a TU-ID, allows for data synchronization with multiple devices, access of external researchers upon invitation, and offers advanced functions such as collaboratively working on documents. Its restriction on overall storage space and on file ownership to a single individual might preclude its use in certain settings, however.
A great tool to transfer large files (up to 100 GB) to other researchers from storage that does not allow sharing itself is GigaMove offered in cooperation with RWTH Aachen. Basically, files can be uploaded and made available to others via a unique download link, protected with a password if necessary.
Backup and recovery
Losing important research data is one of the major catastrophes that might happen in a research project as recreating the data might constitute a significant effort or, in some cases, not be possible at all. Thus, taking necessary precautions that data is not lost is a major task in research data management. Backups of your data should therefore be available at all times. Your recovery strategy should be documented not only for others but also for yourself to have something to refer to in the future.
Your backup concept should be in accordance with BSI Grundschutz (CON.3 Datensicherungskonzept).
There should always be at least three copies of your data, one of which being the local copy you work with, and two copies for safe keeping. One copy should be at an off-site location that is nowadays typically a cloud copy, for example on the Hessenbox (see below). Redundant disks (RAID) or snapshots in your server cannot replace a backup.
Please keep in mind: "Data without a backup can be lost without any warning at every given moment. But if you can afford losing the data at any time: Why store it at all?"
For backup procedures to be effective, they have to be executed at a regular schedule. Thus, up-to-date copies of your data will not only be available, but creation of those will become routine if done manually. The frequency of manually created backups needs to increase with the frequency of changes to the data. Test your recovery regularly, to ensure is complete and working.
If possible, automate your backup procedures, as the easiest backups that are created are those that you don't have to create yourself. The University Computing Centre offers a professional backup service on a for-fee basis to organizational units. Please see the service description website for further information. Files stored using the Computing Centre Fileservice are also automatically backed up.
Each individual TU-ID comes with storage space on the Hessenbox. This service allows for use as an automated cloud backup of files on your local hard drive as it offers automated file synchronization as well as storing multiple versions of the same file. The storage size limit (100 GB for employees and 30 GB for students), together with the additional space requirements of keeping multiple file versions if intended, restricts its use in very data-intensive settings.