MapReduce is an emerging programming model for data-intensive application proposed by Google, which has attracted a lot of attention recently. MapReduce borrows ideas from functional programming, where programmer deﬁnes Map and Reduce tasks to process large set of distributed data.
In this talk, we propose an implementation of the MapReduce programming model. We present the architecture of the prototype based on BitDew, a middleware for large scale data management on Desktop Grid. We describe the set of features which makes our approach suitable for large scale and loosely connected Internet Desktop Grid : massive fault tolerance, replica management, barriers-free execution, latency-hiding optimization as well as distributed result checking. We also present performance evaluation of the prototype both against micro-benchmarks and real MapReduce application. The scalability test shows that we achieve linear speedup on the classic WordCount benchmark. Several scenarios involving lagger hosts and host crashes demonstrate that the prototype is able to cope with an experimental context similar to real-world Internet.