Home | | | Contact | | | Projects | | | Post Mortem | | | Publications | | | Pictures | | | Private | | | Blog |
Note: RAnsrID is still under hefty development! Some of the indicated
features may or may not work yet.
RAnsrID was first presented on LinuxTag 2010. Slides are available on my publications page.
Upstream code is available on gitorious.org, project ransrid.
Get the source with
git clone git://gitorious.org/ransrid/ransrid.git
Patches are welcome, as always :-) My TODO list is still rather long...
RAnsrID is a RAID-lookalike multi-disk block device for storing large amounts
of data in a highly redundant way. It differs from regular RAIDs in a number of
unique properties, that makes it the better suited solution for some
applications, most related to home use and small business cases. Additionally,
it adds a few features regular software RAID implementations do not
have.
Expected use cases include, but are not limited to:
RAnsrID is not a reasonable block device to use as a system or main
data disk! In fact, it is recommended to use a RAID 1 array for its
journal (e.g. the system disks(s)). Don't mistake this with a filesystem
journal, RAnsrID needs its own journal to get rid of the so-called "RAID write
hole" (see Wikipedia's article about RAID).
Also, even using a highly redundant disk system doesn't help with user errors
(e.g. file deletions), thus it is not an alternative to a backup.
However, if the size of the data makes backups prohibitively expensive, RAnsrID
used with a filesystem that provides cheap snapshots may be considered
safe enough for non-mission-critical data.
RAnsrID is implemented as a multi-threaded network block device (nbd) server in
user space, written in C. The main reason for this model was ease of
implementation, as debugging in user space is extremely easier, and
the used Reed-Solomon algorithm and journaling technique are difficult beasts.
The algorithms have been written from scratch, so they fit RAnsrID's use case
best (it's also good to write the algorithms personally in order to understand
the characteristics of Galois Field mathematics completely, and a mild form of
NIH syndrom is involved, too). Additionally, I still have little experience
hacking the Linux kernel yet (I had more experience hacking Solaris in the
past). This imposes one big disadvantage for RAnsrID at the moment: the block
device server and the block device user must be running on separate
hosts for read/write mounts, otherwise the machine can (and probably will)
deadlock upon writes. I will experiment with a virtual file server host in the
future and verify whether that is enough.
So is hardware RAID, software RAID, or RAnsrID the best suited solution for
you? The following table tries to summarize the available boundary conditions;
the solution you actually purchase/use might have additional restrictions,
though.
System | HW RAID | SW RAID | RAnsrID |
---|---|---|---|
Implementation | Dedicated hardware | Kernel driver | User space network block device |
Visibility to user | One large device | One large device | One device with one partition per data disk |
Max. No of disks | 257 (1) | 257 (1) | 256 (2) |
Max. No of redundancy disks | Typ. 2 (RAID6) (3) | 2 (RAID6) | 255 (typically 16) (2) |
Live adding/removing of data disks | No (4) | No (4) | Yes (up to limit on creation time) |
Live adding/removing of redundancy disks | No | No | Yes (up to limit on creation time) |
Live attaching/detaching disks w/o rebuilding | No | No | Yes (write depending on policy) |
Possibility to mount detached disks solo | No | No | Yes (strict r/o mounting policy!) |
Write on (partially)broken arrays | Yes | Yes (?) | Yes (depending on policy) |
Cost | High | Low (5) | Low - Very low (6) |
Read performance | > 100 MB/s | ≈ 100 MB/s (5) | ≈ 30 MB/s (6) (7) |
Write performance | > 100 MB/s | ≈ 50-100 MB/s (5) | ≈ 10-20 MB/s (6) (7) |
CPU usage | Very low | Medium | High during writes |
Disk access during read | All disks | All disks | Only selected data disk |
Disk access during write | All disks | All disks | Selected data disk and all redundancy disks |
Power usage while idle | High | Medium-Low | Very low (up to < 0.1 W / disk) (6) |
Data integrity | Extremely High | Medium | Very high (8) |
Data integrity: Erasure resilience | Yes | Yes | Yes |
Data integrity: Error resilience | RAID6: Yes (full speed, 1 error) | RAID6: (?) | Yes (no. red. disks / 2 errors) (9) |
Data integrity: Write hole | No (buffered cache) | Yes | No (write journal) |
(1) | RAID 5: Unlimited data disks, 1 redundancy disk. RAID 6: 255 data disks, 2 redundancy disks. |
(2) | Number of disks has to be split into maximum number of data and redundancy disks upon initial creation, e.g. 240/16. |
(3) | Some hardware RAIDs have additional (non-standard) levels with more than 2 redundancy disks. |
(4) | With the exception of RAID 4, which is considered dead nowadays. Also, it's typically possible to add/remove disks by triggering a live rebuild of the array. |
(5) | Expecting use with internal harddisks - performance numbers and general usage quality drop significantly with external disks. |
(6) | Expecting use with external USB2 hard disks. |
(7) | Numbers estimated - no performance test performed yet. Numbers also depend on CPU speed. Large block writes are supposed to be much faster than small block writes. |
(8) | Device state journaling, write journaling, rebuild journaling, add/remove journaling turned on. |
(9) | All disks have to run for error detection. This also potentially slows down reads (calculative overhead). |