Presentation on theme: "Lecture One What’s a Data Centre and what are its components."— Presentation transcript:
Lecture One What’s a Data Centre and what are its components
Data Centre: what’s this? Wikipedia: “A Data Centre is a facility used to house computer systems and associated components” … … better for me: “A Data Centre is a facility used to house … data, stored in specific electronic devices and accessible by computer systems suitable for their update, elaboration, transmission”
Data Centre: the components Site Hardware Network Connections Logical Security Systems System Software Application Software Data People Organisation
Data Centre components – Site (1/3) “Site” = Building + “Building Plant” Building Plant (or Physical Plant) is the machinery used within the building. Generally: Power Supply from external providers Stand alone Power Generators and UPS Systems HVAC (Heating, Ventilation and Air Conditioning) Systems Hardware Cooling Systems (in case of unsuitable HVAC) Fire Protection Systems Anti-intrusion Systems
Data Centre components – Site (2/3) Some good practices about the Building: Location Out of the City Centre (traffic, security, …). Easy access for transportation (big hardware devices). Possibly not hot climatic conditions. Good availability for commodities (power, water, …) possibly from different providers Design (possibly ad hoc) Wide and flat structure (not more than 2-3 floors – possibly only one over the ground). Three concentric layers topology: external for transit spaces (products delivery, visitors, …), medium for personnel, internal for hardware. Wide rooms inside with few dividing walls, movable panels – according with security conditions – for the first two layers. Regular design (i.e. rectangular) possibly with no divisions at all for the inner layer.
Data Centre components – Site (3/3) Some good practices about the Building Plant: Not only one provider for Power Supply. Keep Margins for Future Development (KMFD). Evaluate Power Generators and UPS Systems sizing as a function of “stand-alone time” you need (regular shut-down of all systems). Consider your vital applications and the possibility of a “minimal sub-system continuity”. Consider the refuelling time for the power generators. KMFD. Possibly separate HVAC servicing personnel rooms from those used for hardware rooms (different requirements). For huge Data Centres it could be required a specialised Cooling Systems for hardware (i.e. water flow closed-circuit); the same could be required for special devices. KMFD. Consider different Fire Protection Systems, depending from the presence of personnel in a room and the evacuation time (gas systems are toxic). Consider different level of sophistication for the Anti-intrusion Systems, for the three layers (i.e. security guards for the external one, badge-protected doors for the medium, biometric systems for the inner rooms)
Data Centre components – Hardware (1/4) In this section we consider, basically, only three “families” of components: Devices for data recording (STORAGE) Devices for data elaboration (COMPUTERS) Devices for data flow to/from Storage, Computers and Network (SAN/LAN SWITCHES) The Data Centre houses many other hardware components (Network and Security Systems we’ll se in the next sections, as well as systems for Building Plant automation, working-stations for the personnel, …)
Data Centre components – Hardware (2/4) The age, the past history and the “maturity level” of a Data Centre produce the “homogeneity level” of its HW components Generally the big Data Centres are mostly more than 20-years aged, and experienced, in their past, numerous applications merges, substitutions, re-engineering. This history – not defined by a global initial design, but built through day by day needs – generally leads to very poor homogeneity levels of HW components. Lower is the homogeneity level, greater is the effort (human, technical and economical as well) to manage the Data Centre.
Data Centre components – Hardware (3/4) Generally, in a medium-high size Data Centre, it’s possible to find (in different ratios) three “families” of computers: Mainframes Middle-range systems Intel based system (i.e. rack mounted 386 servers) Similarly, as storage systems, you can find: Disk subsystems (with different, but similar architecture and functions) Tape subsystems (generally equipped with robotic libraries) Solid state systems From the size point of view, one of the main figures characterising a Data Centre is the couple “storage capacity” & “computing power”. However is generally impossible to represent this figure by a simple couple of numbers, just due to a low homogeneity level that requires many numbers and many descriptions related to devices of different hardware architecture.
Data Centre components – Hardware (4/4) In the last 10-15 years the hardware (and software) architectures developed more and more sophisticated “Virtualisation” solutions, that allow to “map” virtual computer and storage systems on physical computer and storage devices. With virtualisation a computer or a storage device must not be dedicated to one specific application, but can be shared by different applications From the quality (maturity) point of view, one of the main characteristic of a Data Centre is the level of virtualisation achieved. Try to uncouple as more as possible your applications from a specific computer and/or storage: it will give you more degrees of freedom in technical terms and will relieve you from providers lock-in
Data Centre components – Network (1/2) The Network – by its nature – is out of the Data Centre, as it’s the infrastructure aimed to connect the Data Centre with “the rest of the world”, both internal and external to the organization the Data Centre operates for (so we can distinguish from an internal and an external network) However Data Centers contain a set of Routers and Switches that transport data traffic between the servers and to/from the outside world. The physical connection with the outside world is often provided by using two or more (to maximize availability) upstream service providers.
Data Centre components – Network (2/2) The network, today, is most often based on the IP protocol suite. So the Data Centre often contains servers used for running the basic Internet (external network) and Intranet (internal network) services needed by external and internal users (i.e. DNS servers). In other cases these services are out of the Data Centre, committed to the mentioned service providers. Also common are controlling and monitoring systems for the network. This set of systems is generally known as NOC (Network Operations Centre). Sometimes the Data Centre contains the NOC. However, for Disaster Recovery purposes (see later), is a good practice to take it far from the Data Centre, together with other network servers not duplicated somewhere else.
Data Centre components – Security Systems Logical Security Systems are systems dedicated to control the access to data and applications, through a complex set of authentication and authorisation rules. As the great majority of unauthorised accesses comes from the network (both internal and external), the Security Systems are lined up the network input/output points and are mostly considered “network systems” The most common systems for security are: Firewall: a system used to control the incoming IP traffic at a double level. At IP packets level (protocol correctness) and at application level (content suitability) IDS (Intrusion Detection System): a system used to detect any unauthorised access IPS (Intrusion Prevention System): a system used to prevent any unauthorised access Similarly to the NOC, is generally present a SOC (Security Operations Centre), to control and monitor security systems
Data Centre components – Software & Data To this point we talked about physical components (building, machinery, hardware). Now we face with “less physical” items: Software and Data. Two considerations: Software is a special kind of Data. It’s a set of “instructions” that can be interpreted by the computers and used to “elaborate” other data. Software & Data, indeed, are recorded on storage and read in computers memory in very similar ways Software & Data are “less” physical than the previous components (as they have not a weight, a colour, a shape …), but they are not merely “logical” components, as they exist as a physical alteration of electronic circuits
Data Centre components – System Software (1/2) We can classify as “System Software” all the programs executing the “base” functions of the systems (computers, switches, storage systems, network and security systems), independently of the applications running on those systems Therefore it’s common to find identical System Software in Data Centres offering completely different application services (banks, airline companies, phone service providers, public administration, …) System Software is generally delivered by specialised software houses: IBM, Microsoft, Oracle, … In the last years “Open System Software” (Linux based) is gaining a more and more wide share of the market. Within this area, even if it’s “open”, anyway some companies based a big business (i.e. Red Hat).
Data Centre components – System Software (2/2) As an example, some of the most common System Software are: The Operating Systems (Windows, IBM OS, Linux, IBM AIX, iOS, …) The Virtualisation Systems (VMWare, Linux V-Server, …) The Data Base Management Systems (Oracle, DB2, SQL-Server, …) The so called “Middleware” and all the Software used to develop application servers, web servers, etc. (WebSphere, Apache, SAP, JBOSS, …) The “Oltp” and “Message Queuing” Systems (IMS, CICS, WebSphere MQ, …) … and more and more …
Data Centre components – Application Software (1/2) Unlike the System Software, the Application Software is developed to deliver a specific Service to a specific set of users. Therefore it’s possible to find externally similar Application Software in Data Centres offering similar services (two different banks, two different public administration, …), but it’s quite sure that they are completely different: In many particular functions In the internal technical architecture In their performance … etc. … and, generally, we’ll find completely different Application Software in Data Centres operating in different business areas (i.e. a phone service provider and an airlines company), with the exception of limited functions (i.e. e-mail, personnel management, …)
Data Centre components – Application Software (2/2) Unlike the System Software, the Application Software is not delivered in the same form by some Companies to different customers, because of the high personalisation they require Therefore the Application Software is: Developed from one Software House for the single customer, with the personalised characteristics the customer requires, otherwise … … “home-made” by the customer itself (in case with a Software House external help)
Data Centre components – Data (1/3) Data are the most important component of the Data Centre. They are absolutely unique for the organisation: everything else we saw above (building, HW, SW) can be replaced, rebought, rebuilt … but it’s not true for the data. Or, at least, for the great majority of the data. That’s the reason why (as we’ll see later) data play a lead role in the Disaster Recovery project. Furthermore, data must be protected not only from the risk of loss, but also from the risk of unauthorised access, that’s even more frequent and insidious.
Data Centre components – Data (2/3) The data in a Data Centre may be categorized by different points of view: By the technical form of their internal structure (sequential, relational data base, unstructured, …) By the type of device they are stored (disk, tape, …) By how the applications access to them (read, read-write, online, batch, …) By the performance they must guarantee (how many R/W per second, …) By the level of relevance they have for the organisation (vital, critical, important, less important, …) By their life-cycle characteristics (how long must they be kept, if must exist security copies, …) … and by many other criteria
Data Centre components – Data (3/3) Generally the amount of data in a Data Centre grows year by year at a very impressive speed. A very common figure is 30%. The reasons why we can generally measure such a dramatic grow are four: Applications begin more and more sophisticated and they require more and more data The innate type of data is becoming more and more space consuming (voice, imagines, videos, …) Generally the “cleaning” operations to remove the obsolete data are not at high priority in the Data Centre policy. The cost of storage is not high and is continuously decreasing
Data Centre components – People People, for the Data Centre, are more important than data. And that’s true not only for “ethic” reasons. It can be even rationally justified, because of the “Knowledge” the people have about the Data Centre itself. The knowledge of the Data Centre means to know: What to do (services to users) and how (service levels) The mission (why must the services be delivered) The strategy (why the services must & how they can be improved) The means to operate by (Data Centre components) The knowledge cannot be bought. It comes from years and years of experience. It comes from people and enriches people, in a never ending virtuous circle.
Data Centre components – Organisation (1/2) We told about People that have the Knowledge of the Data Centre: that’s a strong but, at the same time, a weak point. It’s a weak point as the Data Centre People change. They change as someone goes and some other one comes. And even standing People often change “inside” (their skill, their wills, their vigour change). Therefore the Knowledge must be preserved regardless of People changes. This can be achieved through the Organisation, in other terms by Procedures & Documentation.
Data Centre components – Organisation (2/2) Procedures must be defined to exactly identify “who does what, when and how”. Some “improvisation” is ever unavoidable, but the unexpected situations must be strongly limited All the defined Procedures must be well described in an appropriate Documentation. People must cooperate to design procedures and to write documentation Documentation must be accessible to people, who must be trained on the procedures Procedures and documentation must be continuously adapted to the Data Centre changes
Data Centres in the time of Web, Apps, Cloud … do they make sense yet? More and more “smart informatics” on consumer devices (PC’s, tablets, smartphones, …) accessible through simple Apps, using data stored “who-knows-where” (Cloud) lead us to consider Data Centres as “proto-industry products” … … but also “smart informatics” needs data … and more and more data indeed (Big Data)! So we’ll plausibly need bigger and more steady Data Centres. It'll be however unsurprising some decrease in the number of medium/small private Data Centres, whose activity will be merged into less bigger public Data Centres (Cloud)
Data Centres actually … … a few numbers … Possible classifications of Data Centres The Data Centres costs … and a few cases … 1 st case: the Italian Public Administration Data Centres (survey) 2 nd case: two merging national banks Data Centres (2007) 3 rd case: Google’s Container Data Centre
Possible classifications of Data Centres (1/4) Many different classifications of Data Centres may be defined: Dimensional, by: computing or storage capacity site area extension number of involved people number of served users number of executed transactions per second costs … etc. Qualitative, on the basis of : energy efficiency technological up to date reliability level … etc.
Possible classifications of Data Centres (2/4) The great part of these classifications are context-conditioned (for example a dimensional classification may significantly differ considering the set of main 50 US public administrations and the set of main 50 automotive industries). So their interest is restricted to single specialized surveys However some classifications have a generalized interest and may be used as a standard. Some examples are: The Telecommunications Industry Association's TIA-942 Standard The metrics established by The Green Grid Consortium
Possible classifications of Data Centres (3/4) TIA-942 Standard defines the minimum requirements for Data Centres availability, setting four “Tiers”: Tier 1 = Non-redundant capacity components (single uplink and servers). For the Data Centre must be guaranteed a general availability not less then 99.671% Tier 2 = Tier 1 + Redundant capacity components. The general availability guaranteed must be equal to 99.741% or more Tier 3 = Tier 2 + Dual-powered equipment and multiple uplinks. The general availability guaranteed must be equal to 99.982% or more Tier 4 = Tier 3 + all components are fully fault-tolerant including uplinks, storage, chillers, HVAC systems, servers etc. Everything is dual-powered. The general availability guaranteed must be equal to 99.995% or more
Possible classifications of Data Centres (4/4) The Green Grid Consortium is a non-profit, industry consortium of end-users, technology providers, utility companies, collaborating to improve the resource efficiency of Data Centres The main metrics defined by the GGC is the Power Usage Effectiveness (PUE). It’s a measure of how efficiently a computer Data Centre uses energy; specifically, how much energy is used by the computing equipment (in contrast to cooling and other overhead). PUE is the ratio of total amount of energy used by the Data Centre at all, to the energy delivered to computing equipment. An ideal PUE is 1.0 (while some surveys measured, all over the world, a medium PUE equal to 1.8). Anything that isn't considered a computing device in a Data Centre (i.e. lighting, cooling, etc.) falls into the category of facility energy consumption. PUE is the inverse of Data Centre Infrastructure Efficiency (DCIE).
The Data Centres costs A Data Centre is a heap of high technology and high-skilled people, so it’s a generator of significant costs The main costs come (in descending order) from: 1.People 2.Software 3.Hardware 4.Energy Inside the hardware the order is: servers, network, storage. Energy cost is usually higher then the servers cost.
1 st Case: the Italian Public Administration Data Centres (survey) The survey has been realized in 2013 and involved about 1.000 Data Centres A dimensional classification based on the site area, showed that: Only 1% of the Data Centres sites occupy more than 1.000 m 2 10% of them are from 100 and 1.000 m 2 The rest is smaller than 100 m 2 Only 7% of the Data Centres is less than 3 years old. The 57% was built before 2000. From an architectural point of view the servers are mainly “rack-type” with Windows OS. Linux systems follow. Other OS are a minority. From the TIA-942 point of view, the 65% of the Data Centres are TIER-1.
2 nd case: two merging national banks Data Centres (2007) – (1/x) First Bank – Two main Data Centres: Monte Bianco DC Basson DC
2 nd case: two merging national banks Data Centres (2007) – (2/x) FIRST BANK – MAIN DATA CENTRE: Features: net surface: 3.000 sqm 24 rooms just a Data-Center: no offices in the premises (only Control Room) campus design: two distinct ‘Half Campus’ completely independent with separated equipment (power, cooling, connections…)
2 nd case: two merging national banks Data Centres (2007) – (3/x) FIRST BANK – a few figures: Computers: 6 mainframes (4+2) 324 central processors more than 40.000 MIPS 1.9 Tbyte of central storage 560 Tbyte of disk storage 1.500 Tbyte of tape storage Avg online transactions per day: 23 mln Number of data-base access per day: about 960 mln Transactions completed in 0.6 seconds: 97%
2 nd case: two merging national banks Data Centres (2007) – (4/x) Second Bank – Three main Data Centres + minor others: A B C
2 nd case: two merging national banks Data Centres (2007) – (5/x) Second Bank – Three Data Centres (after consolidation and DR projects): A B C
2 nd case: two merging national banks Data Centres (2007) – (6/x) SECOND BANK – a few figures: Computers: 5 mainframes (3 active + 2 stand-by) more than 13.000 MIPS (active) 150 Tbyte of disk storage (active) + DR capacity in sites B & C 600 Tbyte of tape storage Avg online transactions per day: 16 mln Transactions completed in 1.0 seconds: 95%
3 rd Case – Google’s Container Data Centre An example of CMDF (Containerized and Modular Data Centre Facility) platform Repeatable, pre-engineered, prefabricated, and quality assured set of building blocks (containers) that together bring online the necessary amount of IT capacity (computing, storage, network capacity + power supply, cooling facility, fire control) ▶ Google container data center tour.mp4