# Full text of "10 IJAERS NOV 2015 21 Parallel Graph Computation Using A Partition Aware Engine Review Paper"

## See other formats

International Journal of Advanced Engineering Research and Science (IJAERS) [Voi-2, Issue-11, Nov- 2015] ISSN: 2349-6495 Parallel Graph Computation using a Partition Aware Engine: Review Paper Mr. S. B. Shirsath 1 , Prof. S. R. Duragkar 2 'Department of Comp. Engg., SNDCOE & RC, Maharashtra, India 2 Head, Department Comp. Engg, SNDCOE & RC, Maharashtra, India Abstract — A Partition Aware Engine framework provides a major increase in compression with respect to all currently known techniques, both on web graphs and on social networks. These improvements make it possible to analyse in main memory significantly larger graphs. Graph partition quality affects the overall performance of parallel graph computation systems. The quality of a graph partition is measured by the balance factor and edge cut ratio. A balanced graph partition with small edge cut ratio is generally preferred since it reduces the expensive network communication cost. However, according to an empirical study on Giraph, the performance over well partitioned graph might be even two times worse than simple random partitions. This is because these systems only optimize for the simple partition strategies and cannot efficiently handle the increasing workload of local message processing when a high quality graph partition is used. In this paper, we propose a novel partition aware graph computation engine named PAGE, which equips a new message processor and a dynamic concurrency control model. In this paper, we propose a new paradigm, Partition Aware Engine framework that allows parametric control of asynchrony ranging from completely asynchronous execution to partially asynchronous execution to level- synchronous execution. Partial asynchrony is achieved by generalizing the BSP model to allow each super step to process up to k levels of the algorithm asynchronously. In the model, we studying on two heuristic rules to effectively extract the system characters and generate proper parameters. Keywords — Message Processing, Parallel Graph, Graph Partition, Graph Computation. I. INTRODUCTION Database Parallel graph algorithms are currently expressed in level synchronous or asynchronous paradigms. Level-synchronous paradigms iteratively process vertices of a graph level by level. This model guarantees the current level’s computation to have completed before starting the next one through the use of global synchronizations at the end of each level. Level- synchronous algorithms tend to perform well when the www.ijaers.com number of levels is small, but suffer from poor scalability when the number of levels is large. Bulk synchronous parallel (BSP) algorithms can naturally be expressed in this paradigm. The asynchronous paradigm replaces global synchronizations with point-to-point synchronizations, which can increase the degree of parallelism, but which may also require the completion of redundant work. For example, an asynchronous breadth- first search (BFS) may re-visit vertices multiple times as shorter paths are discovered. Choosing the right paradigm depends on the system, input graph, and algorithm. This implies different implementations and optimizations for algorithms, with no easy way to switch between them. II. SURVEY REVIEW 1. Harshvardhan show how common patterns in graph algorithms can be expressed in the KLA paradigm and provide techniques for determining k, the number of asynchronous steps allowed between global synchronizations. Results of an implementation of KLA in the staple Graph Library show excellent scalability on up to 96K cores and improvements of lOx or more over level synchronous and asynchronous versions for graph algorithms such as breadth-first search, PageRank, k-core decomposition and others on certain classes of real-world graphs. 2. Amr Ahmed proposes a framework for large-scale graph decomposition and inference. To resolve the scale, our framework is distributed so that the data are partitioned over a shared nothing set of machines. They propose a novel factorization technique that relies on partitioning a graph so as to minimize the number of neighbouring vertices rather than edges across partitions. System decomposition is based on a streaming algorithm. It is network-aware as it adapts to the network topology of the underlying computational hardware. 3. Nathan Backman present a framework that parallelizes and schedules workflows of stream operators, in real-time, to meet latency objectives. It supports data- and task-parallel processing of all workflow operators, by all computing nodes, while maintaining the ordering properties of sorted data Page | 53 International Journal of Advanced Engineering Research and Science (IJAERS) [Voi-2, Issue-11, Nov- 2015] ISSN: 2349-6495 streams. System show that a latency-oriented operator scheduling policy coupled with the diversification of computing node responsibilities encourages parallelism models that achieve end-to- end latency-minimization goals. 4. Lars Backstrom use decision-tree techniques to identify the most significant structural determinants of these properties. Also develop a novel methodology for measuring movement of individuals between communities, and show how such movements are closely aligned with changes in the topics of interest within the communities. 5. Paolo Boldi presents the compression techniques used in Web Graph, which are centred around referentiation and intervalisation (which in turn are dual to each other). Web Graph can compress the WebBase graph (118 Mnodes, 1 Glinks) in as little as 3.08 bits per link, and its transposed version in as little as 2.89 bits per link. 6. Marco Rosa proposed algorithm with the Web Graph compression framework provides a major increase in compression with respect to all currently known techniques, both on web graphs and on social networks. These improvements make it possible to analyse in main memory significantly larger graphs. III. PARALELL IMPLEMENTATION Layered label propagation lends itself naturally to the task-decomposition parallel-programming paradigm, which may dramatically improve performances on modern multicore architectures: since the update order is randomised, there is no obstacle in updating several nodes in parallel. Our implementation breaks the set of nodes into a very small number of tasks (in the order of thousands). A large number of threads picks up the first available task and solves it: as a result, we obtain a performance improvement that is linear in the number of cores. We are helped by Web Graph’s facility, which allows us to provide each thread with a lightweight copy of the graph that shares the bit stream and associated information with all other threads. IY. CHALLENGES Latent variable modelling is a promising technique for many analytics and predictive inference applications. However, parallelization of such models is difficult since many latent variable models require frequent synchronization of their state. The power law nature of such graphs makes it difficult to use chromatic scheduling. Furthermore, the bulk-synchronous processing paradigm of Map-Reduce does not afford low- enough latency for fast convergence: this has been reported, e.g. in comparisons between bulk-synchronous www.ijaers.com convergence and asynchronous convergence. Consequently there is a considerable need for algorithms which address the following issues when performing inference on large natural graphs: Graph Partitioning We need to find a communication efficient partitioning of the graph in such a manner as to ensure that the number of neighbouring vertices rather than the number of edges is minimized. This is relevant since latent variable models and their inference algorithms store and exchange parameters that are associated with vertices rather than edges. Network Topology In many graph-based applications the cost of communication (and to some extent also computation) the cost of storing data. Hence it is desirable to have an algorithm which is capable to layout data in a network-friendly fashion on the fly once we know the computational resources. Variable Replication While the problem of variable synchronization for statistical inference with regular structure is by now well understood, the problem for graphs is more complex: The state space is much larger (each vertex holds parts of a state), rendering synchronization much more costly - unlike in aspect models only few variables are global for all partitions. Asynchronous Communication Finally there is the problem of eliminating the synchronization step in traditional bulk-synchronous systems on graphs. More specifically, uneven load distribution can lead to considerable inefficiencies in the bulk synchronous setting. After all, it is the slowest machine that determines the runtime of each processing round (e.g. Map Reduce). Asynchronous schemes, on the other hand, are nontrivial to implement as they often require elaborate locking and scheduling strategies. V. DYNAMIC CONCURRENCY CONTROL MODEL The concurrency control problem can be modelled as a typical producer-consumer scheduling problem, where the computation phase generates messages as a producer, and message process units in the dual concurrent message processor are the consumers. Therefore, the producer- consumer constraints should be satisfied when solving the concurrency control problem. The concurrency of dual concurrent message processor heavily affects the performance. But it is expensive and also challenging to determine a reasonable concurrency ahead of real execution without any assumption. Therefore, PAGE needs a mechanism to adaptively tune the concurrency of the dual concurrent message processor. The mechanism is named Dynamic Concurrency Control Model, DCCM for short. Page | 54 International Journal of Advanced Engineering Research and Science (IJAERS) [Voi-2, Issue-11, Nov- 2015] ISSN: 2349-6495 For the PAGE situation, the concurrency control problem arises consumer constraints. Since the behaviour of producers is determined by the graph algorithms, PAGE only requires to adjust the consumers to satisfy the constraints (behaviour of graph algorithms), which are stated as follows. First, PAGE provides sufficient message process units to make sure that new incoming message blocks can be processed immediately and do not block the whole system. Meanwhile, no message process unit is idle. Second, the assignment strategy of these message process units ensures that each local/remote message process unit has balanced workload since the disparity can seriously destroy the overall performance of parallel processing. VI. CONCLUSION Finally we conclude that our study of the partition unaware problem in current graph computation systems and its severe drawbacks for efficient parallel large scale graphs processing. To address this problem, we proposed a partition aware graph computation engine named Parallel Graph Computation using a Partition Aware Engine that monitors three high-level key running metrics and dynamically adjusts the system configurations. REFERENCES [1] Yingxia Shao, Bin Cui, Senior, "PAGE: A Partition Aware Engine for Parallel Graph Computation", IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 27, NO. 2, FEBRUARY 2015. [2] A. Amr, S. Nino, N. Shravan, J. Vanja, and S. J. Alexander, “Distributed large scale natural graph factorization,” in Proc. 22nd Int. Conf. World Wide Web, 2013, pp. 37-18. [3] N. Backman, R. Fonseca, and U. C 4 et intemel, “Managing parallelism for stream processing in the cloud,” in Proc. 1st Int. Workshop Hot Topics Cloud Data Process., 2012, pp. 1 : 1—1:5. [4] L. Backstrom, D. Huttenlocher, J. Kleinberg, and X. Lan, “Group formation in large social networks: Membership, growth, and evolution,” in Proc. 12th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2006, pp. 44—54. [5] P. Boldi and S. Vigna, “The webgraph framework I: Compression techniques,” in Proc. 13th Int. Conf. World Wide Web, 2004, pp. 595- 602. [6] P. Boldi, M. Rosa, M. Santini, and S. Vigna, “Layered label propagation: A multire solution coordinate-free ordering for compressing social networks,” in Proc. 20th Int. Conf. World Wide Web, 2011, pp. 587-596. www.ijaers.com [7] S. Brin and L. Page, “The anatomy of a large-scale hyper textual web search engine,” in Proc. 7th Int. Conf. World Wide Web, 1998, pp. 107-117. [8] A. Chan, F. Dehne, and R. Taylor, “CGMGRAPH/CGMLIB: Implementing and testing CGM graph algorithms on PC clusters and shared memory machines,” J. High Perform. Comput. Appl., pp. 81-97, 2005. [9] G. Cong, G. Almasi, and V. Saraswat, “Fast PGAS implementation of distributed graph algorithms,” in Proc. Int. Conf. High Perform. Comput., Netw., Storage Anal., 2010, pp. 1-11. [10] J. Dean and S. Ghemawat, “MapReduce: Simplified data processing on large clusters,” in Proc. Operating Syst. Des. Implementation, 2004, pp. 107-113. [11] D. Gregor and A. Lumsdaine, “The parallel BGL: A generic library for distributed graph computations,” in Proc. Parallel Object-Oriented Sci. Comput., 2005, pp. 1-18. [12] C. A. R. Hoare, “Communicating sequential processes,” Commun. ACM, vol. 21, pp. 666-677, 1978. [13] U. Kang, C. E. Tsourakakis, and C. Faloutsos, “PEGASUS: A petascale graph mining system implementation and observations,” in Proc. IEEE 9th Int. Conf. Data Mining, 2009, pp. 229-238. [14] G. Karypis and V. Kumar, “Multilevel algorithms for multiconstraint graph partitioning,” in Proc. ACM/IEEE Conf. Supercomput., 1998, pp. 1-13. [15] A. Roy, I. Mihailovic, and W. Zwaenepoel, “X- stream: Edgecentric graph processing using streaming partitions,” in Proc. 24th ACM Symp. Operating Syst. Principles, 2013, pp. 472—188. [16] S. Salihoglu and J. Widom, “GPS: A graph processing system,” in Proc. 25th Int. Conf. Sci. Statist. Database Manage., 2013, pp. 22:1-22:12. [17] B. Shao, H. Wang, and Y. Li, “Trinity: A distributed graph engine on a memory cloud,” in Proc. ACM SIGMOD Int. Conf. Manage. Data, 2013, pp. 505- 516. [18] H. Simonis and T. Cornelissens, “Modelling producer/consumerconstraints,” in Proc. 1st Int. Conf. Principles Practice Constraint Program., 1995, pp. 449—462. [19] I. Stanton and G. Kliot, “Streaming graph partitioning for large distributed graphs,” in Proc. 18th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2012, pp. 1222-1230. [20] C. Tsourakakis, C. Gkantsidis, B. Radunovic, and M. Vojnovic, “Fennel: Streaming graph partitioning for massive scale graphs,” in Proc. Int. ACM Conf. Web Search Data Mining, 2014, pp. 333-342. Page | 55