1

Using the Gfarm File System as a POSIX Compatible Storage Platform for Hadoop MapReduce Applications (IEEE GRID 2011)

MapReduce is a promising parallel programming model for processing large data sets. Hadoop is an up-and-coming open-source implementation of MapReduce. It uses the Hadoop Distributed File System (HDFS) to store input and output data. Due to a lack …

Optimization Techniques at the I/O Forwarding Layer (IEEE Cluster 2010)

I/O is the critical bottleneck for data-intensive scientific applications on HPC systems and leadership-class machines. Applications running on these systems may encounter bottlenecks because the I/O systems cannot handle the overwhelming intensity …

Improving Parallel Write by Node-Level Request Scheduling (IEEE CCGRID 2009)

In a cluster of multiple processors or cpu-cores, many processes may run on each compute node. Each process tends to issue contiguous I/O requests for snapshot, checkpointing or so, however, if large number of processes enter the I/O phase at the …

On-demand file staging system for Linux clusters (IEEE Cluster 2009)

An on-demand file staging system, Catwalk, is proposed. Catwalk is designed so that it can run on any Linux clusters without any special or additional hardware. By having hook functions on the system calls of file operations, a file staging system …

Parallel File System Architecture for the Multi-Core Clusters (SWoPP 2008)

マルチコアCPUの普及 * コモディティ: Intel Core 2 Duo, AMD Athlon 64 X2 * クラスタにおいても一般的に使用される * クラスタ内で走る計算プロセスの数が増加 アプリケーションが扱うデータ量の増加 * 計算能力の増大により大規模なデータが生成 * CPU・メモリ速度に比べるとディスクは非常に低速 * ディスクI/Oがボトルネックになる

Gather-arrange-scatter: Node-level request reordering for parallel file systems on multi-core clusters (IEEE Cluster 2008)

Multiple processors or multi-core CPUs are now in common, and the number of processes running concurrently is increasing in a cluster. Each process issues contiguous I/O requests individually, but they can be interrupted by the requests of other …