![]()
|
CONTENTS of BigAdmin Wiki
|
Note:Please add your comments/corrections to this page. It is an excerpt of the EIS Boot Disk Standard. It is being placed here so that it can be discussed/amended/ratified by a wider community. If you have any comments about this page, you can either leave comments directly on this page, or you can discuss on sysadmin-discuss@opensolaris.org For more information about adding comments, pages, or sections, go back to the BigAdmin wiki home page. Excerpts of the EIS Bootdisk StandardIntroductionThis EIS standard is intended to help installers (and those who prepare planning documentation for installations) to layout their system disks in a consistent manner. ScopeThis standard is designed to cover installations on SPARC servers from Solaris 8 upwards. It also covers x86/x64 servers from Solaris 10 update 1 onwards.
This standard provides general guidelines for an installation by Sun Support Services (e.g. standard EIS installation delivery) and Sun Professional Services. It acts as a base for customer-specific layouts that may result from a PS engagement. The layouts and guidelines in this standard are preferred and will maintain consistency of support procedures. General NotesThe following are assumed to be common knowledge (ie accepted):
(yes I know the 3rd is controversial, but if you look at the historical reasons for multiple slices, the reasons no longer exist!) Hardware RAID ControllersA number of the current servers supplied by Sun have H/W-based raid controllers which enable the boot disk to be mirrored to a second disc. EIS recommends that H/W RAID 1 be used to mirror the boot disk wherever it is available. Live UpgradeLive Upgrade is facility that provides 1 or more alternative boot slices for the purposes of reducing the downtime during patching, upgrades and providing a quick rollback mechanism. The use of Live Upgrade requires that partitions be left aside to support this. We recommend the provision of a Live Upgrade partition at initial install time. To simplify the use of Live Upgrade, we have recommended the use of a SINGLE partition for the complete O/S. Should you choose to have multiple partitions you will also need to create additional partitions for Live Upgrade use. (This pertains only to filesystems that contain O/S files such as /usr, /var, /opt etc.). Be aware that currently there are significant complexities to using Live Upgrade if the boot disk is encapsulated with VxVM. Refer to Patching Mirrored Systems with the Solaris Live Upgrade Software for guidance on planning & performing Solaris upgrades and patch management using the Live Upgrade partition. Dedicated Dump DeviceSolaris allows a Dedicated Dump Device to be defined. This allows crash dumps to be collected without writing into the swap partition. The advantage for this is twofold. Firstly, it allows dumps to be taken of a running system. Secondly it means that the primary swap partition does not need to be sized to accommodate the size of a dump. The DDD has to be a raw device, and should NOT be mirrored. Disk SizingSizing for SolarisGiven that boot disks are currently much larger than required for the O/S, we have recommended a conservative upper limit for the size of the root partition. The choice of this size is a balance between the actual space requirements, and the desire to reduce resyncing time between the root disk and its mirror in the event of a failure.
The remaining space on the disk may be simply left unused, or allocated to an otherwise usable partition. e.g. /export, /data, /apps etc. Sizing SWAPIt is not strictly necessary to have a swap partition (although there should be one for crash dumps unless there is a separate dump device). Clearly if a server starts to swap then the performance as seen by the clients will be degraded.
The following may help to define the TOTAL swap space requirements especially for SunFire domains:
Sizing the Dedicated Dump DeviceThe maximum theoretical size of a crash dump is the total of physical and virtual memory. However, this assumes that ALL the memory space was used for the kernel, which is highly unlikely. In general, a DDD size equal to physical memory should be sufficient. Boot Disk LayoutBased on the above rules, here are a sample of root disk layouts: Solaris DesktopIt is assumed that the disk of a desktop is not mirrored. The following layout os for disks with a capacity >=36Gb
The slices showed above are the EIS-recommended standard. If required, and the disk has a higher capacity, you may increase the size of root and/or swap; however slices 0 and 3 must be identical in size. Boot Disk Layout for a Solaris Server (Not SW Mirrored)It is expected that most servers will have a redundant boot device. In many cases this will be using host based mirroring and these layouts are discussed in the next section. For the cases where there is no redundancy, or it is being provided by a hardware based RAID device, the following layout is an example. The following layout is for disks with capacity ≥36Gb.
The slices showed above are the EIS-recommended standard. If required, and the disk has a higher capacity you may increase the size of root and/or swap; however slices 0 and 3 must be identical in size. N.B. If the root disk is >36Gb, then we recommend a / filesystem size of 18Gb. Mirrored Disk Layouts on Solaris SystemsDepending on the class of server, a variety of different system disk hardware tends to be available. On the smaller servers, the system disks tend to be internal, often on the same controller and limited to two disks. On the larger servers, boot storage tends to be external and split between devices, such as multiple S1s, D130s, or split D240s, and the number of disks available as "system" storage range from four to six. (2-disk mirrored, 3-disk mirrored). However, some of these disks may be earmarked for customer system data such as application binaries, and user data, and therefore it cannot be assumed that all the disks can be used for the system layout. We will focus on the worst case (but most common) scenario of a pair of mirrored disks (1-disk mirrored) being dedicated to the boot environment, but will provide guidelines for making use of additional disks if they are available. The root disk and its mirror should be on separate controllers and housed in separate devices if possible. Normally you should only mirror the boot disk to a "similar" disk. For configurations with boot disks on a single controller and external storage (SE33x0, SE35x0, SE6900 etc.) on another controller, the latter should probably not be used for mirroring. Mirroring using SVMIn the following layouts we have left slice 7 free for the metaDBs on each disk. Just as with the boot disks themselves, these metaDBs should be spread across at least two controllers if at all possible. We recommend 32MB1 for the disk slice size; each slice to hold 3 metaDBs. When using SVM, if you have metaDBs on ONLY the root disk and its mirror, the system will fail to boot if one of the disks is failed due to it not having access to a "majority" of metaDB replicas. For this reason it is highly recommended to create a 3rd metaDB location on a 3rd disk, preferably on a different controller. If there are only 2 disks in the system, the following line can be added to the file /etc/system which allows the system to boot if 1 drive fails in a 2 drive (mirrored) scenario: set md:mirrored_root_flag=1 However, this method COULD cause data corruption if multiple reboots occur and the "failed" disk "unfails" itself resulting in inconsistent mirrors. In many cases it may be preferable to not boot, and force user intervention. See InfoDocs 18280 & 70946 for more details. Mirrored Boot Disk LayoutIs basically identical to the server layout above with the additional of a 32Mb metadb partition on slice 7.
(The above is mirrored to a second, preferably identical, disk) The free space can be allocated to any of slices 4,5 or 6 and used for customer data, or an (unmirrored) Dedicated Dump Device. (although it is preferred that this be on additional disks) Factory Boot Disk LayoutThere is an increasing trend towards pre-loading Solaris at the factory. The intention is to reduce the time between delivery and production start. We assume a minimum boot disk size of 36Gb for current systems (September 2007: most but not all systems have larger disks). Since one usually does not know the final configuration (even more so with volume products) the following disk layout has been chosen:
Slices 4, 5, 6 & 7 remain unallocated as does the remaining space on the disk. This allows the customer to easily add additional slices and allocate space as required (e.g. /globaldevices and metadb). (Or even remove the liveupgrade partition if he really does no want it. Whilst realising that this layout will not fit all installations, it is believed that it has a reasonable chance of being useful, especially with volume products. |
Most Popular CollectionsAll Collections
View Related Links |
Comments (6)
Feb 22, 2008
DavidTBullock says:
I'm relatively new to the disk administration scene (I am a humble x86 user), an...I'm relatively new to the disk administration scene (I am a humble x86 user), and don't know the historical reasons for having slices, so I was wondering if someone could summarise them somewhere on the Wiki?
Not that I want to dispute the conclusion, but that I want to understand the reasoning, so that I can evaluate these strong recommendations for my situation.
I recently had a lot of trouble with an overly small root filesystem (only 4GB, and not big enough to install all the patches, let alone Java Enterprise) and found that it was not possible to growfs it ... I wonder if that would be sufficient reason to put /usr and /opt on different filesystems/slices/disks? Or will 15-18GB be 'plenty enough for years to come, provided you don't install apps on it' (my server has disk-attached-storage, and is intended to be a general purpose server running zones).
Does ZFS have any place serving up any of the 'system paths' (/usr, etc)?
If we are not to put applications on the system disk, do we mount a new filesystem at /usr/local and take the time to wrestle with packages that choose a different installation default (blastwave, sunfreeware, etc), or do we go with the flow in those cases?
(Please delete this comment when/if the issues it raises are covered by the page contents).
Mar 04, 2008
mramchand says:
Hi David, Much of this is discussed on the discussion group thread that prompte...Hi David,
Much of this is discussed on the discussion group thread that prompted this posting.
http://mail.opensolaris.org/pipermail/sysadmin-discuss/2007-September/001640.html
In summary:
We used to have slices for 2 reasons: disks were small and couldn't fit the whole O/S, and the fear of corruption if the / filesystem filled up.
Things have changed, disks are huge, and Solaris copes with a full filesystem, so the traditional way we partition has to be rethought.
There are a few varied views in the e-mail thread, and they are all valid. I suggest you take the time to read through them.
Jul 23, 2008
TGalway says:
I have a few questions on some of the recommendations (I have many customers ask...I have a few questions on some of the recommendations (I have many customers asking these questions constantly) so I have a vested interest in seeing a generalized standard available
(1) Hardware versus Software mirroring
One of the practices often advocated to sys admin's is to break a root mirror before patching, upgrading, etc. With SVM this is a trivial task and provides a great way to avert disaster. Hardware raid however is completely destructive, I cannot break a raidctl mirror and reattach. Even with Live Upgrade, which continues to have some issues for zones as an example, I'd still break a mirror before I will perform a Live Upgrade - just for safety's sake. While the EIS standard calls for hardware raid, I don't consider this prudent from an operational standpoint.
Is there any operational reason that hardware raid will be better?
(2) Sizing Swap
It is not completely clear as to the reason for sizing SWAP the way you have. Assuming that you use a dedicated dump device, you suggest a size of 4GB for swap - but what is this number based on? That number is then superceeded by the recommendations for a SunFire enterprise class server (Let us assume we are not using DR, since the is understandable why you size it larger for those systems that do use DR), where you suggest that swap should be equal to the system board with the most memory. Why size it that way?
Most of the servers for my clients now will ship with a minimum of 32G of memory and many with 64G. Assuming this server is used specifically for some application that should never run out of memory and require swap - what should the swap size be? Should it be 0? Assuming application issues aside, how should this be sized? (Customers have a hard time accepting they don't need swap.)
For myself, I am more concerned with the dump device which I fully agree should be based on phys mem - since I cannot afford to lose any important data in case of system crash - I also need to then size my var/crash to be the same size as my dump device which is not mentioned in the document.
Also, as a matter of course, since I do always use SVM for mirroring instead of hardware, another question I have from customers is whether they should mirror swap - and I always advise them to do so. This should probably be explicitly indicated in your document as well (a recommendation one way or the other)
(3) Default Disk Layout
I'd suggest that an additional partition be considered, a separate '/var/crash' or similar onto which savecore can write its dump contents.This to be separate from the root
file system mostly because sys admin's often forget to clean it up and we don't want to fill the root file system.
Thanks for letting me ask some questions
-Tony
Aug 06, 2008
mramchand says:
Responding to each of your points in turn. I'm trying to summarise/recollect the...Responding to each of your points in turn. I'm trying to summarise/recollect the discussions that took place to define the EIS standard. I can't guarantee that my memory is perfect.
1: LU is the preferred method for performing upgrades. If you have a separate pair of mirrored slices for the ABE, there is no need to break the mirror as during the LU procedure you don't touch/change the active BE. Operationally H/W Raid is better in almost all cases for x64 based servers as it is harder to program the BIOS to boot off the mirrored disk. Arguably, in the SPARC world there isn't that much of a difference.
2: Sizing swap is always fun. To be honest, the 4GB size came about because we simply wanted to make sure that we had a "good enough" size on the root disk which fit the most general cases. The SunFire recommendation is based on the assumption that we don't have a dedicated dump device. I think we should make that clear. Finally, you should ALWAYS mirror swap. I think the document does make that clear. Swap CAN be 0. However, given that disks are cheap, and that the boot disk is normally larger than it needs to be, there's no harm in giving it some. Remember by default /tmp is swap, so something hammering /tmp may well end up chewing up RAM, nice to be able to have some real disk allocated to swap.
3: A separate /var/crash is often asked for, based on the same reasons you quoted. Our feeling is that someone who really cares about capturing crash dumps, won't let /var (or / fill up). I know that's a kind of flippant answer, but in short, the complication of having an additional /var/crash slice, and the extra steps considerations for creating LU ABEs with multiple boot slices outweighed the potential benefits.
Hope these answers helped. Not saying they were right, but I hope it explains the collective thinking of the group that put together the EIS bootdisk standard.
Nov 11, 2008
tlhumbert says:
With the new zfs root capability, has that made any changes to the recommended l...With the new zfs root capability, has that made any changes to the recommended layout?
Nov 12, 2008
mramchand says:
ZFS root COMPLETELY changes the above recommended layout. With ZFS, you get a r...ZFS root COMPLETELY changes the above recommended layout.
With ZFS, you get a root pool, which should be a simple mirrored pool. The installer creates a few ZFS filesystems within that root pool: (rpool is the default name)
rpool/ROOT/<os> mounted on /
rpool/export
rpool/export/home
(you can optionally have a rpool/var as well)
and it creates 2 Volumes:
rpool/swap
rpool/dump
whose sizes can either be specified at install time, or will be based on some function of physical memory size if left as default.
The cool thing about ZFS is that I can resize these volumes on the fly, anytime I want:
zfs set volsize=2G rpool/dump
This makes the initial sizing much less critical.
ZFS root/boot is the way forward.
I guess the recommendations are:
1: keep you root pool as simple as possible i.e. give it 2 whole disks.
2: If possible restrict your rpool to system data, create additional pools for your data. (Obviously not possible on 1U 2 disk servers, in this case, create filesystems in the rpool for your non-OS related data.)
See http://docs.sun.com/app/docs/doc/819-5461/zfsboot-1?a=view for the official Sun Documentation, including rpool considerations, and how swap and dump are sized.