Presentation is loading. Please wait.

Presentation is loading. Please wait.

23 May 2001LSCCW A.Manabe1 System installation & updates A.Manabe (KEK)

Similar presentations


Presentation on theme: "23 May 2001LSCCW A.Manabe1 System installation & updates A.Manabe (KEK)"— Presentation transcript:

1 23 May 2001LSCCW A.Manabe1 System installation & updates A.Manabe (KEK)

2 23 May 2001LSCCW A.Manabe2 Installation & update zSystem(SW) installation & update is boring and hard work for me. zQuestion: How do you install or update system for Cluster of more than 100 nodes. zQuestion: Did you postpone a system upgrading, because the work is too much?

3 23 May 2001LSCCW A.Manabe3 Installation & Update methods 1.Pre-installed, Pre-configured System you can postpone your work, but soon or later... 2.Manual installation; one PC by one PC. many operators in parallel with many duplicated installation CDs. å it require many CRTs, days and cost (to hire operators) 3.Network Installation with NFS/FTP server and Automated ‘batch’ installation. å‘Server too busy’ in installation to many nodes. åA lot of works still remain (utility SW installation...).

4 23 May 2001LSCCW A.Manabe4 Installation & update methods 4.Duplicate disk image Attach many disks to one PC and dup. the installed disk, then distribute duplicated disks to nodes. åHardware work is hard (attach/detach easy disk unit). 5.Diskless PC Using local disks only for swap and /var directory, other dir. from NFS server. åPowerful server is necessary. åNode can do nothing alone (trouble shooting may become difficult).

5 23 May 2001LSCCW A.Manabe5 An Idea zMake one installed host, clone the disk image to nodes via network. z100PC installation in 10min. (objective value) zNecessary operator intervention as small as possible.

6 23 May 2001LSCCW A.Manabe6 Our planning method (1) u Network Disk Cloning Software dolly+ è For cloning disk image. U Network Booting PXE (Preboot Execution Environment) with Intel NIC èFor starting an Installer. U Batch Installer Modified RedHat kickstart èFor disk format, network setup and starting cloning sw. make private /etc/fstab, /etc/sysconfig/network..

7 23 May 2001LSCCW A.Manabe7 Our method (2) U Remote Power Controller Network control power tap (Hardware) èFor remote system reset. (replace ‘pushing reset button’ one by one) UConsole server with a serial console feature of Linux. èFor watching everything done well.

8 23 May 2001LSCCW A.Manabe8 Dolly+ 100PC installation in 10 min. mA software to copy/clone files or/and disk images among many PCs through a network. mRunning on Linux as a user program. mFree Software mDolly is developed by CoPs project in ETH. (Swiss)

9 23 May 2001LSCCW A.Manabe9 Dolly+ Sequential file & Block file transfer. RING network connection topology. Pipeline mechanism. Fail recovery mechanism.

10 23 May 2001LSCCW A.Manabe10 Config file zNeed only for Server host. Server = host having original images or files iofiles 3 /data/image_hda1 > /dev/hda1 /data/image_hda5 > /dev/hda5 /dev/hda6 > /dev/hda6 server dcpcf001 clients 10 n001 n002 (listing of all nodes) endconfig

11 Ring Topology Utilize max. performance ability of full duplex ports switches. Good for networks of complex of switches. (because connection is only needed between adjacent nodes) S Server = host having original image

12 Server bottle neck both in network and server itself. m Broadcast or Multicast UDP Difficulty in making reliable transfer on multicast. Sever bottle neck in One Server-many clients topology Server = host having original image S

13 PIPELINING & multi threading Next node BOF network Server Node 1 Node 2 EOF 1 2 3 4 5 6 7 8 9 ….. 9 8 7 87 6 6 7 5 6 5 File chunk =4MB 3 thread in parallel

14 23 May 2001LSCCW A.Manabe14 Performance (measured)  1Server - 1Nodes (Pent.III 500Mhz) IDE disk/100BaseT network ~ 4MB/s SCSI U2W/100BaseT network ~ 9MB/s 4GB image copy >> 17min.(IDE), 8min.(SCSI)  1Server - 7Nodes IDE/100BaseT 4GB image copy -> 17min.(IDE) (+8sec.) : +Time for booting process.

15 23 May 2001LSCCW A.Manabe15 Expected performance  1Server-100Nodes IDE/100 ~ 19min.(+2min.Ovh) SCSI/100 ~ 9min.(+1min.Ovh)

16 How many min. to install to 1000 nodes? +100% +50%

17 23 May 2001LSCCW A.Manabe17 S Fail recovery mechanism In my experience, ~2% initial HW problem. Dolly+ provides automatic ‘short cut’ mechanism in node problem. RING topology makes its implementation easy. Short cutting time out

18 z Server bottle neck could be overcome. z Week against a node failure. Failure will spread in cascade way as well and difficult to recover. Cascade Topology

19 23 May 2001LSCCW A.Manabe19 Beta version will be available from corvus.kek.jp/~manabe/pcf/dolly after this work shop.

20 23 May 2001LSCCW A.Manabe20

21

22 PIPELINING & multi threading Next node BOF network Server Node 1 Node 2 EOF 1 2 3 4 5 6 7 8 9 ….. 9 8 7 87 6 6 7 5 6 5 File chunk =4MB


Download ppt "23 May 2001LSCCW A.Manabe1 System installation & updates A.Manabe (KEK)"

Similar presentations


Ads by Google