This paper aims to reduce the storage space required for data storage in a distributed storage environment and it provides an optimal repair bandwidth when a system failure occurs. Previous scientific literature suggests various approaches such as replication, erasure code, local reconstruction, regenerating codes etc. to overcome from system failure. These approaches are applied on archival storage, cloud storage etc. to provide data availability and reliability. Although, these approaches have proved efficient, but they have their own strengths and weaknesses as some of them deals with storage improvement and others focus on providing an effective repair mechanism. In this paper, we present a new approach, Group Repair Codes, which provides optimal repair bandwidth by replicating the nodes and calculating parity nodes for smaller groups. In comparison to approaches (hybrid and double code) that provide optimal repair, it utilizes less storage space. Moreover, it improves fault tolerance, disk reads and data transferred by the system in case of failure of nodes. The current study is conducted considering various existing approaches like replication, erasure codes, LRC, hybrid and double coding that were implemented to manage the big data. The results reported in the paper prove the suitability of our approach. We have also discussed the significance of intelligent system for the present study. We are intended to propose an intelligent based system for Group Repair Codes in the near future. We believe that our research will be beneficial for several communities such as cloud storage, big data and distributed storage.
|