Presentation is loading. Please wait.

Presentation is loading. Please wait.

Vincenzo Vagnoni LHCb Real Time Trigger Challenge Meeting CERN, 24 th February 2005.

Similar presentations


Presentation on theme: "Vincenzo Vagnoni LHCb Real Time Trigger Challenge Meeting CERN, 24 th February 2005."— Presentation transcript:

1 Vincenzo Vagnoni LHCb Real Time Trigger Challenge Meeting CERN, 24 th February 2005

2 Vincenzo Vagnoni LHCb RTTC Meeting, 24 th February 2005 2 Diskless booting  4 ways (to my knowledge) to operate a linux diskless machine Removable device booting (e.g. a la Knoppix on cdrom) Not flexible enough, option discarded! High-reliability Mini-Drive or Disk-On-Chip booting Interesting… “firmware”-oriented approach, like modern x-terminals Never tried, adopting such a solution would depend on the availability of low cost devices of this kind Not a real option at the moment, but to be kept in mind… “Classic” network boot with “root over NFS” Used for example for old x-terminals, for CETIA motherboards, etc… Used in production for 4 years in Bologna for the data analysis farm and also for two years for the Bologna MC production farm Works fine Network boot with root filesystem on ramdisk I’m not aware of other people using it apart us in the Bologna L1&HLT testbed Root filesystem downloaded at boot time together with kernel via network Application software directories mounted via NFS (or other network filesystem protocols) Works fine

3 Vincenzo Vagnoni LHCb RTTC Meeting, 24 th February 2005 3 Classic network boot  Requires few basic services PXE, DHCP, TFTP servers on a control PC Requires root directories to be exported by a control PC Each machine will have its root filesystem as a specific directory on the control PC Installation of a new node just requires the update of the DHCP configuration, the copy of a template directory and export of the copied directory via the network filesystem  One drawback Reliability depends on the reliability of the network filesystem In case the network filesystem hangs, the root filesystem is frozen, the machine is not reachable anymore in no way (the kernel frezees and waits for the root filesystem to come back online) A potential problem? For example, NFS at CNAF has shown some serious problems: in a complex network environment with large (unwanted!) variable latencies, deadlocks of the (linux implementation) protocol had shown up. However, we should’n t have such non-controlled latencies in our online network, otherwise the trigger is dead…

4 Vincenzo Vagnoni LHCb RTTC Meeting, 24 th February 2005 4 Ramdisk network boot  Requires the same services as the “root over NFS” PXE, DHCP, TFTP servers on a control PC Installation of a new node just requires the update of the DHCP configuration Kernel and ramdisk with root filesystem image is downloaded at boot time The root filesystem is memory resident  Advantage The machine is always operative and reachable (unless the memory doesn’t break or a bit flip is triggered by a cosmic ray… but ECC memories are protected against single bit flip ) Unwanted corruption (mistakes) of the filesystem files is automatically restored at reboot (changes to the filesystem are just temporary and lost)  Drawbacks The ramdisk eats memory, typically order of 200 MB for a “normal” root filesystem (not a real problem however to loose just 200 MB) Requires recompiled kernel with large ramdisk size (not a real problem anyway) Application software is too large in any case and should be mounted via network

5 Vincenzo Vagnoni LHCb RTTC Meeting, 24 th February 2005 5 Control PC  Scalability of the network boot and of the NFS exports shouldn’t be an issue A control PC serves just a few subfarms However, it is better the control PC doesn’t live too far (“networkly” speaking) the served subfarms  An issue will be to keep up-to-date and syncronized all the Control PCs operating systems, the application software served, the operating system(s) served, etc. The control PCs are “regular” “disked” machines, and have on their disks the core of the system The way these PCs are managed is a core business for an efficient- flawless-costless operation of the farm

6 Vincenzo Vagnoni LHCb RTTC Meeting, 24 th February 2005 6 Preparation for the RTTC  Well in time for the RTTC, we should sit together and configure a testbed farm at CERN Administrator(s) should make experience with this configuration  Several other issues should be addressed VLANs, e.g., might interfere with network boot for example in case of using unmanaged switches which learn dynamically the VLANs from the nodes… the node doesn’t know anything about the VLAN until it is booted with an “intelligent” operating system Just an example, as we won’t use unmanaged switches to my knowledge, but however all the details with occasional problems and solutions should be settled down in time Thus, we should sit together and define in more details the hardware/software infrastructure of the RTTC farm

7 Vincenzo Vagnoni LHCb RTTC Meeting, 24 th February 2005 7 Relevant tools  All the operations to prepare a new node can be done in principle “by hand” Of course, not feasible for a 1800 PC farm  But, feasible for the RTTC as few nodes will be involved However (see Gianluca’s talk) it would be nice to have already for the RTTC a GUI that automatically triggers the work to be done (update of the configuration files on the control PC, e.g. DHCP, etc…) Needs some strict interaction between administrator(s) and PVSS GUIs developers  Remote control of electrical power Not really necessary for the RTTC, but still nice to have it in time IPMI solution (see Gianluca’s) controlled via PVSS GUIs Alternatively, remotely ethernet-controlled power switches (e.g. those used at CNAF) controlled via PVSS GUIs Of course, essential issue for the final online farm


Download ppt "Vincenzo Vagnoni LHCb Real Time Trigger Challenge Meeting CERN, 24 th February 2005."

Similar presentations


Ads by Google