Brice Nédelec, Pascal Molli & Achour Mostefaoui

Slides:



Advertisements
Similar presentations
Analysis of Computer Algorithms
Advertisements

Cognitive Radio Communications and Networks: Principles and Practice By A. M. Wyglinski, M. Nekovee, Y. T. Hou (Elsevier, December 2009) 1 Chapter 12 Cross-Layer.
Processes and Operating Systems
Performance in Decentralized Filesharing Networks Theodore Hong Freenet Project.
Energy-Efficient Distributed Algorithms for Ad hoc Wireless Networks Gopal Pandurangan Department of Computer Science Purdue University.
Introduction to Algorithms 6.046J/18.401J
An Inductive Database for Mining Temporal Patterns in Event Sequences Alexandre Vautier, Marie-Odile Cordier and René Quiniou
Objectives To introduce software project management and to describe its distinctive characteristics To discuss project planning and the planning process.
Scalable Routing In Delay Tolerant Networks
DCV: A Causality Detection Approach for Large- scale Dynamic Collaboration Environments Jiang-Ming Yang Microsoft Research Asia Ning Gu, Qi-Wei Zhang,
Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY
Universität Innsbruck Leopold Franzens Copyright 2006 DERI Innsbruck LarCK Workshop, ISWC/ASWC Busan, Korea 16-Feb-14 Towards Scalable.
1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3.
1/23 Learning from positive examples Main ideas and the particular case of CProgol4.2 Daniel Fredouille, CIG talk,11/2005.
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Correctness of Gossip-Based Membership under Message Loss Maxim GurevichIdit Keidar Technion.
A GRASP Heuristic to the Extended Car Sequencing Problem Lucas Rizzo Sebastián Urrutia Federal University of Minas Gerais.
Tintu David Joy. Agenda Motivation Better Verification Through Symmetry-basic idea Structural Symmetry and Multiprocessor Systems Mur ϕ verification system.
Computational Intelligence Winter Term 2011/12 Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS 11) Fakultät für Informatik TU Dortmund.
Computational Complexity, Choosing Data Structures Svetlin Nakov Telerik Corporation
Perspective on Overlay Networks Panel: Challenges of Computing on a Massive Scale Ben Y. Zhao FuDiCo 2002.
Routing and Congestion Problems in General Networks Presented by Jun Zou CAS 744.
Decoding of Convolutional Codes  Let C m be the set of allowable code sequences of length m.  Not all sequences in {0,1}m are allowable code sequences!
Scalable and Dynamic Quorum Systems Moni Naor & Udi Wieder The Weizmann Institute of Science.
Mani Srivastava UCLA - EE Department Room: 6731-H Boelter Hall Tel: WWW: Copyright 2003.
Adopt Algorithm for Distributed Constraint Optimization
A General Characterization of Indulgence R. Guerraoui EPFL joint work with N. Lynch (MIT)
Copyright 1998 Chengzheng Sun1 Operational Transformation in Real-Time Group Editors: Issues, Algorithms, and Achievements Chengzheng Sun Charence (Skip)
SPORC: Group Collaboration using Untrusted Cloud Resources Ariel J. Feldman, William P. Zeller, Michael J. Freedman, Edward W. Felten Published in OSDI’2010.
Linearizability Linearizability is a correctness criterion for concurrent object (Herlihy & Wing ACM TOPLAS 1990). It provides the illusion that each operation.
D u k e S y s t e m s Time, clocks, and consistency and the JMM Jeff Chase Duke University.
1 P2P Logging and Timestamping for Reconciliation M. Tlili, W. Dedzoe, E. Pacitti, R. Akbarinia, P. Valduriez, P. Molli, G. Canals, S. Laurière VLDB Auckland,
Scaling Distributed Machine Learning with the BASED ON THE PAPER AND PRESENTATION: SCALING DISTRIBUTED MACHINE LEARNING WITH THE PARAMETER SERVER – GOOGLE,
Informed Content Delivery Across Adaptive Overlay Networks J. Byers, J. Considine, M. Mitzenmacher and S. Rost Presented by Ananth Rajagopala-Rao.
Dept. of Computer Science & Engineering, CUHK1 Trust- and Clustering-Based Authentication Services in Mobile Ad Hoc Networks Edith Ngai and Michael R.
Peer-to-Peer Based Multimedia Distribution Service Zhe Xiang, Qian Zhang, Wenwu Zhu, Zhensheng Zhang IEEE Transactions on Multimedia, Vol. 6, No. 2, April.
Video summarization by video structure analysis and graph optimization M. Phil 2 nd Term Presentation Lu Shi Dec 5, 2003.
Synthesis of Fault-Tolerant Distributed Programs Ali Ebnenasir Department of Computer Science and Engineering Michigan State University East Lansing MI.
Byzantine Generals Problem in the Light of P2P Computing Natalya Fedotova Luca Veltri International Workshop on Ubiquitous Access Control July 17, 2006.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Impact of Problem Centralization on Distributed Constraint Optimization Algorithms John P. Davin and Pragnesh Jay Modi Carnegie Mellon University School.
BTREE Indices A little context information What’s the purpose of an index? Example of web search engines Queries do not directly search the WWW for data;
Distributed Constraint Optimization Michal Jakob Agent Technology Center, Dept. of Computer Science and Engineering, FEE, Czech Technical University A4M33MAS.
Distributed Computing 5. Synchronization Shmuel Zaks ©
Robust Network Supercomputing with Malicious Processes (Reliably Executing Tasks Upon Estimating the Number of Malicious Processes) Kishori M. Konwar*
Lecture #12 Distributed Algorithms (I) CS492 Special Topics in Computer Science: Distributed Algorithms and Systems.
1. Big Data A broad term for data sets so large or complex that traditional data processing applications ae inadequate. 2.
ECOO Environments for COOperation Inria Lorraine.
1 Reading Report 5 Yin Chen 2 Mar 2004 Reference: Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications, Ion Stoica, Robert Morris, david.
High Throughput Computing on P2P Networks Carlos Pérez Miguel
Young Suk Moon Chair: Dr. Hans-Peter Bischof Reader: Dr. Gregor von Laszewski Observer: Dr. Minseok Kwon 1.
COLLABORATIVE TEXT EDITOR Multiple users distributed geographically can access the same document simultaneously. CHARACTERISTICS – high concurrency –
On Reducing the Global State Graph for Verification of Distributed Computations Vijay K. Garg, Arindam Chakraborty Parallel and Distributed Systems Laboratory.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 February 10, 2005 Session 9.
Multi-synchronous Collaborative Semantic Wikis Charbel Rahhal, Hala Skaf-Molli, Pascal Molli, Stéphane Weiss Inria Nancy-Grand Est Nancy University, France.
A Passive Approach to Sensor Network Localization Rahul Biswas and Sebastian Thrun International Conference on Intelligent Robots and Systems 2004 Presented.
A Multiversion Update-Serializable Protocol for Genuine Partial Data Replication Sebastiano Peluso, Pedro Ruivo, Paolo Romano, Francesco Quaglia and Luís.
D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University.
Peer to Peer Network Design Discovery and Routing algorithms
Scalability for Search Scaling means how a system must grow if resources or work grows –Scalability is the ability of a system, network, or process, to.
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
School of Electrical Engineering &Telecommunications UNSW Cost-effective Broadcast for Fully Decentralized Peer-to-peer Networks Marius Portmann & Aruna.
IHP Im Technologiepark Frankfurt (Oder) Germany IHP Im Technologiepark Frankfurt (Oder) Germany ©
Incrementally Improving Lookup Latency in Distributed Hash Table Systems Hui Zhang 1, Ashish Goel 2, Ramesh Govindan 1 1 University of Southern California.
Peer-to-Peer Networks 05 Pastry Christian Schindelhauer Technical Faculty Computer-Networks and Telematics University of Freiburg.
The Biologically Inspired Distributed File System: An Emergent Thinker Instantiation Presented by Dr. Ying Lu.
Pastry Scalable, decentralized object locations and routing for large p2p systems.
Scalability for Search
Automated Pattern Based Mobile Testing
Published in: IEEE Transactions on Industrial Informatics
Presentation transcript:

Sequence CRDT: A Scalable Sequence Encoding for Massive Collaborative Editing Brice Nédelec, Pascal Molli & Achour Mostefaoui GDD – LINA – University of Nantes Workshop on Highly-Scalable Distributed Systems Wednesday 14 January 2015, Paris France.

Distributed Collaborative Editors Distributed Collaborative Editors allow people to work distributed in space, time and organizations. Google Doc, Etherpad, Google Wave… 190M users on GDrive. (include Gdoc)

Google Doc is great, but... Single point of failure: If provider is down -> no collaboration Privacy, economic intelligence: What if google search for ANR on 15 October ;) ? Mass editing: Google has limitations on simultaneous users… (50), up to 50 -> just readers

Is it possible to build a fully decentralized editor that support 1M of simultaneous users? Why? Because it is hard ;) “We choose to go to the moon in this decade and do the other things, not because they are easy, but because they are hard.” Kennedy 1962 Because it can also be useful, mass collaboration -> Mooc, Webinars, events, Google Wave has already been used like that…

Distributed Collaborative Editors Principles (OT or CRDT) Based on optimistic replication algorithms Operations are generated locally No lock, no communication with others sites Broadcasted to others sites Every operation eventually derlivered Re-executed when received System is correct if it ensures causality, convergence and “intention preservation” (OT definition) i.e. preserve partial orders in the sequence

Principles of Sequence CRDT Encode the order of the sequence in the Id of elements (remember ;) 10 LET B=A 15 For I=1 to 27 20 LET A=A*A 21 NEXT I Arghh, I forgot LET B=B^2 before NEXT I, no way to use 20,5 ??

Insert alpha between p and q Create an id for alpha Create a disambiguator for alpha so path+dis unique) Space and time complexity of Sequence CRDT mainly decided here !!

Scientific problem Write an allocation strategy ID for sequence element that is independent of insertion order Many ways to type “QWERTY”, how to compute the smallest IDs for each character whatever insertion order ?

PB: Order of Insertions Typed: Q;W;E;R;T;Y Typed: Y;T;R;E;W;Q

Combine Exponential tree & random allocation

LSEQ Complexities O((log n)2) -> avoid to rebalance IDs…

Experiments We built the CRATE Editor1 LSEQ for ID allocation Gossip for broadcast Anti-entropy for missed delivery interval version vectors for causal reception2 1https://github.com/Chat-Wane/CRATE.git 2M. Mukund, G. Shenoy R., S. Suresh, Optimized or-sets without ordering constraints, in: M. Chatterjee, J.-n. Cao, K. Kothapalli, S. Rajsbaum (Eds.), Distributed Computing and Networking, Vol. 8314 of Lecture Notes in Computer Science, Springer Berlin Heidelberg, 2014, pp. 227{241. doi:10.1007/978-3-642-45249-9_15.

1st Setup Objective: Validate the space complexity analysis of LSEQ. when the editing behaviour is monotonic, LSEQ has a polylogarithmic upper-bound on space complexity with respect to the number of insert operations. When the editing behaviour is random, LSEQ has a logarithmic space complexity. Setup: A single machine with 2 peers Peers globally produce 166 char/s to create a doc of 500000 chars Monotonic behavior

Evaluation

2nd Setup Objective: Show that CRATE scales in terms of the number of peers. In other words, the size of the network does not impact the space complexity upper bound of messages. Setup: On GRID500, number of peers grows from 2 to 450, 166 C/s uniformely distributed among peers

3rd Setup Objective: Show that concurrency does not negatively impact the size of identifiers. Hence, scenarios without concurrency show the upper-bound on the size of identifiers. Setup: A single machine emulates 10 peers using the application CRATE. 10000 char at 3 ins/s uniformly distributed among the peers 5 runs with the approximate following latencies: 0: 02ms , 100ms , 500ms , 1s , and 10s .

Conclusions LSEQ allows to compute IDs for sequence CRDT with an upper bound to log(n)2 The number of peers and concurrency do not impact negatively the performances of CRATE One million users is reachable… Nédelec, B., Molli, P., Mostefaoui, A., & Desmontils, E. (2013, September). LSEQ: an adaptive structure for sequences in distributed collaborative editing. In Proceedings of the 2013 ACM symposium on Document engineering (pp. 37-46). ACM. Nédelec, B., Molli, P., Mostefaoui, A., & Desmontils, E. (2013). Concurrency Effects Over Variable-size Identifiers in Distributed Collaborative Editing. In Proceedings of the International workshop on Document Changes: Modeling, Detection, Storage and Visualization, Florence, Italy, September 10, 2013 (Vol. 1008, pp. 0-7).

Perspectives Deploy a 1M editor on a network of browsers 1M users Editing 1M characters… And measures performances Under progress, nearly ready…