Skip to content

Archives

The Google File System

Boing Boing links to a paper on the design of the Google Filesystem, Google’s in-house redundant-array-of-inexpensive-PCs cluster filesystem.

It’s very, very nice — and full of interesting tidbits about Google’s architecture.

  • ‘the system must efficiently implement well-defined semantics for
    • multiple clients that concurrently append to the same file. Our files are often used as producer- consumer queues or for many-way merging. Hundreds of producers, running one per machine, will concurrently append to a file. Atomicity with minimal synchronization overhead is essential. The file may be read later, or a consumer may be reading through the file simultaneously.’
  • ‘The workloads also have many large, sequential writes that append data to files. Typical operation sizes are similar to those for reads. Once written, files are seldom modified again. Small writes at arbitrary positions in a file are supported but do not have to be effcient.’

A perfect example of traditional UNIX system design!