|
Key
This line was removed.
This word was removed. This word was added.
This line was added.
|
Comment:
Changes (1)
View page history... {anchor:top} h1. {anchor:CHP-IO} {{io}} Provider The {{io}} provider makes available probes related to disk input and output. The {{io}} provider enables quick exploration of behavior observed through I/O monitoring tools such as [{{iostat}}(1M)|http://docs.sun.com/doc/819-2240/iostat-1m?a=view]. For example, using the {{io}} provider, you can understand I/O by device, by I/O type, by I/O size, by process, by application name, by file name, or by file offset. [Top|#top] h2. {anchor:GELRK} Probes The {{io}} probes are described in [Table 27–1|#TBL-IO]. h6. {anchor:TBL-IO} {{io}} Probes ||Probe||Description|| |{{start}}|Probe that fires when an I/O request is about to be made either to a peripheral device or to an NFS server. The {{bufinfo_t}} corresponding to the I/O request is pointed to by {{args\[0\]}}. The {{devinfo_t}} of the device to which the I/O is being issued is pointed to by {{args\[1\]}}. The {{fileinfo_t}} of the file that corresponds to the I/O request is pointed to by {{args\[2\]}}. Note that file information availability depends on the filesystem making the I/O request. See [fileinfo_t|#chp-io-fileinfo] for more information.| |{{done}}|Probe that fires after an I/O request has been fulfilled. The {{bufinfo_t}} corresponding to the I/O request is pointed to by {{args\[0\]}}. The {{done}} probe fires after the I/O completes, but before completion processing has been performed on the buffer. As a result {{B_DONE}} is _not_ set in {{b_flags}} at the time the {{done}} probe fires. The {{devinfo_t}} of the device to which the I/O was issued is pointed to by {{args\[1\]}}. The {{fileinfo_t}} of the file that corresponds to the I/O request is pointed to by {{args\[2\]}}.| |{{wait-start}}|Probe that fires immediately before a thread begins to wait pending completion of a given I/O request. The [{{buf}}(9S)|http://docs.sun.com/doc/819-2257/buf-9s?a=view] structure corresponding to the I/O request for which the thread will wait is pointed to by {{args\[0\]}}. The {{devinfo_t}} of the device to which the I/O was issued is pointed to by {{args\[1\]}}. The {{fileinfo_t}} of the file that corresponds to the I/O request is pointed to by {{args\[2\]}}. Some time after the {{wait-start}} probe fires, the {{wait-done}} probe will fire in the same thread.| |{{wait-done}}|Probe that fires when a thread is done waiting for the completion of a given I/O request. The {{bufinfo_t}} corresponding to the I/O request for which the thread will wait is pointed to by {{args\[0\]}}. The {{devinfo_t}} of the device to which the I/O was issued is pointed to by {{args\[1\]}}. The {{fileinfo_t}} of the file that corresponds to the I/O request is pointed to by {{args\[2\]}}. The {{wait-done}} probe fires only after the {{wait-start}} probe has fired in the same thread.| Note that the {{io}} probes fire for all I/O requests to peripheral devices, and for all file read and file write requests to an NFS server. Requests for metadata from an NFS server, for example, do _not_ trigger {{io}} probes due to a [{{readdir}}(3C)|http://docs.sun.com/doc/819-2243/readdir-3c?a=view] request. |
| [Top|#top] h2. {anchor:CHP-IO-ARGS} Arguments |
| The argument types for the {{io}} probes are listed in [Table 27–2|#tbl-io-args]. 27–2|#TBL-IO-ARGS]. The arguments are described in [Table 27–1|#TBL-IO]. |
| h6. {anchor:TBL-IO-ARGS} {{io}} Probe Arguments ||Probe||{{args\[0\]}}||{{args\[1\]}}||{{args\[2\]}}|| |
... |{{start}}|{{struct buf *}}|{{devinfo_t *}}|{{fileinfo_t *}}| |{{done}}|{{struct buf *}}|{{devinfo_t *}}|{{fileinfo_t *}}| |{{wait-start}}|{{struct buf *}}|{{devinfo_t *}}|{{fileinfo_t *}}| |{{wait-done}}|{{struct buf *}}|{{devinfo_t *}}|{{fileinfo_t *}}| Each {{io}} probe has arguments consisting of a pointer to a [{{buf}}(9S)|http://docs.sun.com/doc/819-2257/buf-9s?a=view] structure, a pointer to a {{devinfo_t}}, and a pointer to a {{fileinfo_t}}. These structures are described in greater detail in this section. [Top|#top] h3. {anchor:CHP-IO-BUF} {{bufinfo_t}} structure The {{bufinfo_t}} structure is the abstraction that describes an I/O request. The buffer corresponding to an I/O request is pointed to by {{args\[0\]}} in the {{start}}, {{done}}, {{wait-start}}, and {{wait-done}} probes. The {{bufinfo_t}} structure definition is as follows: {noformat} typedef struct bufinfo { int b_flags; /* flags */ size_t b_bcount; /* number of bytes */ caddr_t b_addr; /* buffer address */ uint64_t b_blkno; /* expanded block # on device */ uint64_t b_lblkno; /* block # on device */ size_t b_resid; /* # of bytes not transferred */ size_t b_bufsize; /* size of allocated buffer */ caddr_t b_iodone; /* I/O completion routine */ dev_t b_edev; /* extended device */ } bufinfo_t; {noformat} The {{b_flags}} member indicates the state of the I/O buffer, and consists of a bitwise-or of different state values. The valid state values are in [Table 27–3|#TBL-IO-FLAGS]. h6. {anchor:TBL-IO-FLAGS} {{b_flags}} Values |{{B_DONE}}|Indicates that the data transfer has completed.| |{{B_ERROR}}|Indicates an I/O transfer error. It is set in conjunction with the {{b_error}} field.| |{{B_PAGEIO}}|Indicates that the buffer is being used in a paged I/O request. See the description of the {{b_addr}} field for more information.| |{{B_PHYS}}|Indicates that the buffer is being used for physical (direct) I/O to a user data area.| |{{B_READ}}|Indicates that data is to be read from the peripheral device into main memory.| |{{B_WRITE}}|Indicates that the data is to be transferred from main memory to the peripheral device.| |{{B_ASYNC}}|The I/O request is asynchronous, and will not be waited upon. The {{wait-start}} and {{wait-done}} probes don't fire for asynchronous I/O requests. Note that some I/Os directed to be asynchronous might not have {{B_ASYNC}} set: the asynchronous I/O subsystem might implement the asynchronous request by having a separate worker thread perform a synchronous I/O operation.| The {{b_bcount}} field is the number of bytes to be transferred as part of the I/O request.\\ \\The {{b_addr}} field is the virtual address of the I/O request, unless {{B_PAGEIO}} is set. The address is a kernel virtual address unless {{B_PHYS}} is set, in which case it is a user virtual address. If {{B_PAGEIO}} is set, the {{b_addr}} field contains kernel private data. Exactly one of {{B_PHYS}} and {{B_PAGEIO}} can be set, or neither flag will be set.\\ \\The {{b_lblkno}} field identifies which logical block on the device is to be accessed. The mapping from a logical block to a physical block (such as the cylinder, track, and so on) is defined by the device.\\ \\The {{b_resid}} field is set to the number of bytes not transferred because of an error.\\ \\The {{b_bufsize}} field contains the size of the allocated buffer.\\ \\The {{b_iodone}} field identifies a specific routine in the kernel that is called when the I/O is complete.\\ \\The {{b_error}} field may hold an error code returned from the driver in the event of an I/O error. {{b_error}} is set in conjunction with the {{B_ERROR}} bit set in the {{b_flags}} member.\\ \\The {{b_edev}} field contains the major and minor device numbers of the device accessed. Consumers may use the D subroutines {{getmajor}} and {{getminor}} to extract the major and minor device numbers from the {{b_edev}} field. [Top|#top] h3. {anchor:CHP-IO-DEVINFO} {{devinfo_t}} The {{devinfo_t}} structure provides information about a device. The {{devinfo_t}} structure corresponding to the destination device of an I/O is pointed to by {{args\[1\]}} in the {{start}}, {{done}}, {{wait-start}}, and {{wait-done}} probes. The members of {{devinfo_t}} are as follows: {noformat} typedef struct devinfo { int dev_major; /* major number */ int dev_minor; /* minor number */ int dev_instance; /* instance number */ string dev_name; /* name of device */ string dev_statname; /* name of device + instance/minor */ string dev_pathname; /* pathname of device */ } devinfo_t; {noformat} The {{dev_major}} field is the major number of the device. See [{{getmajor}}(9F)|http://docs.sun.com/doc/819-2256/getmajor-9f?a=view] for more information.\\ \\The {{dev_minor}} field is the minor number of the device. See [{{getminor}}(9F)|http://docs.sun.com/doc/819-2256/getminor-9f?a=view] for more information.\\ \\The {{dev_instance}} field is the instance number of the device. The instance of a device is different from the minor number. The minor number is an abstraction managed by the device driver. The instance number is a property of the device node. You can display device node instance numbers with [{{prtconf}}(1M)|http://docs.sun.com/doc/819-2240/prtconf-1m?a=view].\\ \\The {{dev_name}} field is the name of the device driver that manages the device. You can display device driver names with the \-D option to [{{prtconf}}(1M)|http://docs.sun.com/doc/819-2240/prtconf-1m?a=view].\\ \\The {{dev_statname}} field is the name of the device as reported by [{{iostat}}(1M)|http://docs.sun.com/doc/819-2240/iostat-1m?a=view]. This name also corresponds to the name of a kernel statistic as reported by [{{kstat}}(1M)|http://docs.sun.com/doc/819-2240/kstat-1m?a=view]. This field is provided so that aberrant {{iostat}} or {{kstat}} output can be quickly correlated to actual I/O activity.\\ \\The {{dev_pathname}} field is the full path of the device. This path may be specified as an argument to [{{prtconf}}(1M)|http://docs.sun.com/doc/819-2240/prtconf-1m?a=view] to obtain detailed device information. The path specified by {{dev_pathname}} includes components expressing the device node, the instance number, and the minor node. However, all three of these elements aren't necessarily expressed in the statistics name. For some devices, the statistics name consists of the device name and the instance number. For other devices, the name consists of the device name and the number of the minor node. As a result, two devices that have the same {{dev_statname}} may differ in {{dev_pathname}}. [Top|#top] h3. {anchor:CHP-IO-FILEINFO} {{fileinfo_t}} The {{fileinfo_t}} structure provides information about a file. The file to which an I/O corresponds is pointed to by {{args\[2\]}} in the {{start}}, {{done}}, {{wait-start}}, and {{wait-done}} probes. The presence of file information is contingent upon the filesystem providing this information when dispatching I/O requests. Some filesystems, especially third-party filesystems, might not provide this information. Also, I/O requests might emanate from a filesystem for which no file information exists. For example, any I/O to filesystem metadata will not be associated with any one file. Finally, some highly optimized filesystems might aggregate I/O from disjoint files into a single I/O request. In this case, the filesystem might provide the file information either for the file that represents the majority of the I/O or for the file that represents _some_ of the I/O. Alternately, the filesystem might provide no file information at all in this case.\\ \\The definition of the {{fileinfo_t}} structure is as follows: {noformat} typedef struct fileinfo { string fi_name; /* name (basename of fi_pathname) */ string fi_dirname; /* directory (dirname of fi_pathname) */ string fi_pathname; /* full pathname */ offset_t fi_offset; /* offset within file */ string fi_fs; /* filesystem */ string fi_mount; /* mount point of file system */ } fileinfo_t; {noformat} The {{fi_name}} field contains the name of the file but does not include any directory components. If no file information is associated with an I/O, the {{fi_name}} field will be set to the string {{<none>}}. In some rare cases, the pathname associated with a file might be unknown. In this case, the {{fi_name}} field will be set to the string {{<unknown>}}.\\ \\The {{fi_dirname}} field contains _only_ the directory component of the file name. As with {{fi_name}}, this string may be set to {{<none>}} if no file information is present, or {{<unknown>}} if the pathname associated with the file is not known.\\ \\The {{fi_pathname}} field contains the full pathname to the file. As with {{fi_name}}, this string may be set to {{<none>}} if no file information is present, or {{<unknown>}} if the pathname associated with the file is not known.\\ \\The {{fi_offset}} field contains the offset within the file , or -1 if either file information is not present or if the offset is otherwise unspecified by the filesystem. [Top|#top] h2. {anchor:CHP-IO-EXAMPLES} Examples The following example script displays pertinent information for every I/O as it's issued: {noformat} #pragma D option quiet BEGIN { printf("%10s %58s %2s\n", "DEVICE", "FILE", "RW"); } io:::start { printf("%10s %58s %2s\n", args[1]->dev_statname, args[2]->fi_pathname, args[0]->b_flags & B_READ ? "R" : "W"); } {noformat} The output of the example when cold-starting Acrobat Reader on an x86 laptop system resembles the following example: {noformat} # dtrace -s ./iosnoop.d DEVICE FILE RW cmdk0 /opt/Acrobat4/bin/acroread R cmdk0 /opt/Acrobat4/bin/acroread R cmdk0 <unknown> R cmdk0 /opt/Acrobat4/Reader/AcroVersion R cmdk0 <unknown> R cmdk0 <unknown> R cmdk0 <none> R cmdk0 <unknown> R cmdk0 <none> R cmdk0 /usr/lib/locale/iso_8859_1/iso_8859_1.so.3 R cmdk0 /usr/lib/locale/iso_8859_1/iso_8859_1.so.3 R cmdk0 /usr/lib/locale/iso_8859_1/iso_8859_1.so.3 R cmdk0 <none> R cmdk0 <unknown> R cmdk0 <unknown> R cmdk0 <unknown> R cmdk0 /opt/Acrobat4/Reader/intelsolaris/bin/acroread R cmdk0 /opt/Acrobat4/Reader/intelsolaris/bin/acroread R cmdk0 <none> R cmdk0 /opt/Acrobat4/Reader/intelsolaris/bin/acroread R cmdk0 /opt/Acrobat4/Reader/intelsolaris/bin/acroread R cmdk0 /opt/Acrobat4/Reader/intelsolaris/bin/acroread R cmdk0 /opt/Acrobat4/Reader/intelsolaris/bin/acroread R cmdk0 /opt/Acrobat4/Reader/intelsolaris/bin/acroread R cmdk0 /opt/Acrobat4/Reader/intelsolaris/bin/acroread R cmdk0 /opt/Acrobat4/Reader/intelsolaris/bin/acroread R cmdk0 /opt/Acrobat4/Reader/intelsolaris/bin/acroread R cmdk0 <unknown> R cmdk0 /opt/Acrobat4/Reader/intelsolaris/lib/libreadcore.so.4.0 R cmdk0 <none> R cmdk0 /opt/Acrobat4/Reader/intelsolaris/lib/libreadcore.so.4.0 R cmdk0 /opt/Acrobat4/Reader/intelsolaris/lib/libreadcore.so.4.0 R cmdk0 /opt/Acrobat4/Reader/intelsolaris/lib/libreadcore.so.4.0 R cmdk0 /opt/Acrobat4/Reader/intelsolaris/lib/libreadcore.so.4.0 R cmdk0 /opt/Acrobat4/Reader/intelsolaris/lib/libreadcore.so.4.0 R cmdk0 /opt/Acrobat4/Reader/intelsolaris/lib/libreadcore.so.4.0 R cmdk0 /opt/Acrobat4/Reader/intelsolaris/lib/libreadcore.so.4.0 R cmdk0 /opt/Acrobat4/Reader/intelsolaris/lib/libreadcore.so.4.0 R cmdk0 /opt/Acrobat4/Reader/intelsolaris/bin/acroread R cmdk0 /opt/Acrobat4/Reader/intelsolaris/bin/acroread R cmdk0 <unknown> R cmdk0 /opt/Acrobat4/Reader/intelsolaris/lib/libAGM.so.3.0 R cmdk0 <none> R cmdk0 /opt/Acrobat4/Reader/intelsolaris/lib/libAGM.so.3.0 R cmdk0 /opt/Acrobat4/Reader/intelsolaris/lib/libAGM.so.3.0 R ... {noformat} The {{<none>}} entries in the output indicate that the I/O doesn't correspond to the data in any particular file: these I/Os are due to metadata of one form or another. The {{<unknown>}} entries in the output indicate that the pathname for the file is not known. This situation is relatively rare.\\ \\You could make the example script slightly more sophisticated by using an associative array to track the time spent on each I/O, as shown in the following example: {noformat} #pragma D option quiet BEGIN { printf("%10s %58s %2s %7s\n", "DEVICE", "FILE", "RW", "MS"); } io:::start { start[args[0]->b_edev, args[0]->b_blkno] = timestamp; } io:::done /start[args[0]->b_edev, args[0]->b_blkno]/ { this->elapsed = timestamp - start[args[0]->b_edev, args[0]->b_blkno]; printf("%10s %58s %2s %3d.%03d\n", args[1]->dev_statname, args[2]->fi_pathname, args[0]->b_flags & B_READ ? "R" : "W", this->elapsed / 10000000, (this->elapsed / 1000) % 1000); start[args[0]->b_edev, args[0]->b_blkno] = 0; } {noformat} The output of the above example while hot-plugging a USB storage device into an otherwise idle x86 laptop system is shown in the following example: {noformat} # dtrace -s ./iotime.d DEVICE FILE RW MS cmdk0 /kernel/drv/scsa2usb R 24.781 cmdk0 /kernel/drv/scsa2usb R 25.208 cmdk0 /var/adm/messages W 25.981 cmdk0 /kernel/drv/scsa2usb R 5.448 cmdk0 <none> W 4.172 cmdk0 /kernel/drv/scsa2usb R 2.620 cmdk0 /var/adm/messages W 0.252 cmdk0 <unknown> R 3.213 cmdk0 <none> W 3.011 cmdk0 <unknown> R 2.197 cmdk0 /var/adm/messages W 2.680 cmdk0 <none> W 0.436 cmdk0 /var/adm/messages W 0.542 cmdk0 <none> W 0.339 cmdk0 /var/adm/messages W 0.414 cmdk0 <none> W 0.344 cmdk0 /var/adm/messages W 0.361 cmdk0 <none> W 0.315 cmdk0 /var/adm/messages W 0.421 cmdk0 <none> W 0.349 cmdk0 <none> R 1.524 cmdk0 <unknown> R 3.648 cmdk0 /usr/lib/librcm.so.1 R 2.553 cmdk0 /usr/lib/librcm.so.1 R 1.332 cmdk0 /usr/lib/librcm.so.1 R 0.222 cmdk0 /usr/lib/librcm.so.1 R 0.228 cmdk0 /usr/lib/librcm.so.1 R 0.927 cmdk0 <none> R 1.189 ... cmdk0 /usr/lib/devfsadm/linkmod R 1.110 cmdk0 /usr/lib/devfsadm/linkmod/SUNW_audio_link.so R 1.763 cmdk0 /usr/lib/devfsadm/linkmod/SUNW_audio_link.so R 0.161 cmdk0 /usr/lib/devfsadm/linkmod/SUNW_cfg_link.so R 0.819 cmdk0 /usr/lib/devfsadm/linkmod/SUNW_cfg_link.so R 0.168 cmdk0 /usr/lib/devfsadm/linkmod/SUNW_disk_link.so R 0.886 cmdk0 /usr/lib/devfsadm/linkmod/SUNW_disk_link.so R 0.185 cmdk0 /usr/lib/devfsadm/linkmod/SUNW_fssnap_link.so R 0.778 cmdk0 /usr/lib/devfsadm/linkmod/SUNW_fssnap_link.so R 0.166 cmdk0 /usr/lib/devfsadm/linkmod/SUNW_lofi_link.so R 1.634 cmdk0 /usr/lib/devfsadm/linkmod/SUNW_lofi_link.so R 0.163 cmdk0 /usr/lib/devfsadm/linkmod/SUNW_md_link.so R 0.477 cmdk0 /usr/lib/devfsadm/linkmod/SUNW_md_link.so R 0.161 cmdk0 /usr/lib/devfsadm/linkmod/SUNW_misc_link.so R 0.198 cmdk0 /usr/lib/devfsadm/linkmod/SUNW_misc_link.so R 0.168 cmdk0 /usr/lib/devfsadm/linkmod/SUNW_misc_link.so R 0.247 cmdk0 /usr/lib/devfsadm/linkmod/SUNW_misc_link_i386.so R 1.735 ... {noformat} You can make several observations about the mechanics of the system based on this output. First, note the long time to perform the first several I/Os, which took about 25 milliseconds each. This time might have been due to the {{cmdk0}} device having been power managed on the laptop. Second, observe the I/O due to the [{{scsa2usb}}(7D)|http://docs.sun.com/doc/819-2254/scsa2usb-7d?a=view] driver loading to deal with USB Mass Storage device. Third, note the writes to {{/var/adm/messages}} as the device is reported. Finally, observe the reading of the device link generators (the files ending in {{link.so}}) , which presumably deal with the new device.\\ \\The {{io}} provider enables in-depth understanding of [{{iostat}}(1M)|http://docs.sun.com/doc/819-2240/iostat-1m?a=view] output. Assume you observe {{iostat}} output similar to the following example: {noformat} extended device statistics device r/s w/s kr/s kw/s wait actv svc_t %w %b cmdk0 8.0 0.0 399.8 0.0 0.0 0.0 0.8 0 1 sd0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 sd2 0.0 109.0 0.0 435.9 0.0 1.0 8.9 0 97 nfs1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 nfs2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 {noformat} You can use the {{iotime.d}} script to see these I/Os as they happen, as shown in the following example: {noformat} DEVICE FILE RW MS sd2 /mnt/archives.tar W 0.856 sd2 /mnt/archives.tar W 0.729 sd2 /mnt/archives.tar W 0.890 sd2 /mnt/archives.tar W 0.759 sd2 /mnt/archives.tar W 0.884 sd2 /mnt/archives.tar W 0.746 sd2 /mnt/archives.tar W 0.891 sd2 /mnt/archives.tar W 0.760 sd2 /mnt/archives.tar W 0.889 cmdk0 /export/archives/archives.tar R 0.827 sd2 /mnt/archives.tar W 0.537 sd2 /mnt/archives.tar W 0.887 sd2 /mnt/archives.tar W 0.763 sd2 /mnt/archives.tar W 0.878 sd2 /mnt/archives.tar W 0.751 sd2 /mnt/archives.tar W 0.884 sd2 /mnt/archives.tar W 0.760 sd2 /mnt/archives.tar W 3.994 sd2 /mnt/archives.tar W 0.653 sd2 /mnt/archives.tar W 0.896 sd2 /mnt/archives.tar W 0.975 sd2 /mnt/archives.tar W 1.405 sd2 /mnt/archives.tar W 0.724 sd2 /mnt/archives.tar W 1.841 cmdk0 /export/archives/archives.tar R 0.549 sd2 /mnt/archives.tar W 0.543 sd2 /mnt/archives.tar W 0.863 sd2 /mnt/archives.tar W 0.734 sd2 /mnt/archives.tar W 0.859 sd2 /mnt/archives.tar W 0.754 sd2 /mnt/archives.tar W 0.914 sd2 /mnt/archives.tar W 0.751 sd2 /mnt/archives.tar W 0.902 sd2 /mnt/archives.tar W 0.735 sd2 /mnt/archives.tar W 0.908 sd2 /mnt/archives.tar W 0.753 {noformat} This output appears to show that the file {{archives.tar}} is being read from {{cmdk0}} (in {{/export/archives}}), and being written to device {{sd2}} (in {{/mnt}}). This existence of two files named {{archives.tar}} that are being operated on separately in parallel seems unlikely. To investigate further, you can aggregate on device, application, process ID and bytes transferred, as shown in the following example: {noformat} #pragma D option quiet io:::start { @[args[1]->dev_statname, execname, pid] = sum(args[0]->b_bcount); } END { printf("%10s %20s %10s %15s\n", "DEVICE", "APP", "PID", "BYTES"); printa("%10s %20s %10d %15@d\n", @); } {noformat} Running this script for a few seconds results in output similar to the following example: {noformat} # dtrace -s ./whoio.d ^C DEVICE APP PID BYTES cmdk0 cp 790 1515520 sd2 cp 790 1527808 {noformat} This output shows that this activity _is_ a copy of the file {{archives.tar}} from one device to another. This conclusion leads to another natural question: is one of these devices faster than the other? Which device acts as the limiter on the copy? To answer these questions, you need to know the effective throughput of each device rather than the number of bytes per second each device is transferring. You can determine the throughput with the following example script: {noformat} #pragma D option quiet io:::start { start[args[0]->b_edev, args[0]->b_blkno] = timestamp; } io:::done /start[args[0]->b_edev, args[0]->b_blkno]/ { /* * We want to get an idea of our throughput to this device in KB/sec. * What we have, however, is nanoseconds and bytes. That is we want * to calculate: * * bytes / 1024 * ------------------------ * nanoseconds / 1000000000 * * But we can't calculate this using integer arithmetic without losing * precision (the denomenator, for one, is between 0 and 1 for nearly * all I/Os). So we restate the fraction, and cancel: * * bytes 1000000000 bytes 976562 * --------- * ------------- = --------- * ------------- * 1024 nanoseconds 1 nanoseconds * * This is easy to calculate using integer arithmetic; this is what * we do below. */ this->elapsed = timestamp - start[args[0]->b_edev, args[0]->b_blkno]; @[args[1]->dev_statname, args[1]->dev_pathname] = quantize((args[0]->b_bcount * 976562) / this->elapsed); start[args[0]->b_edev, args[0]->b_blkno] = 0; } END { printa(" %s (%s)\n%@d\n", @); } {noformat} Running the example script for several seconds yields the following output: {noformat} sd2 (/devices/pci@0,0/pci1179,1@1d/storage@2/disk@0,0:r) value ------------- Distribution ------------- count 32 | 0 64 | 3 128 | 1 256 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 2257 512 | 1 1024 | 0 cmdk0 (/devices/pci@0,0/pci-ide@1f,1/ide@0/cmdk@0,0:a) value ------------- Distribution ------------- count 128 | 0 256 | 1 512 | 0 1024 | 2 2048 | 0 4096 | 2 8192 |@@@@@@@@@@@@@@@@@@ 172 16384 |@@@@@ 52 32768 |@@@@@@@@@@@ 108 65536 |@@@ 34 131072 | 0 {noformat} The output shows that {{sd2}} is clearly the limiting device. The {{sd2}} throughput is between 256K/sec and 512K/sec, while {{cmdk0}} is delivering I/O at anywhere from 8 MB/second to over 64 MB/second. The script prints out both the name as seen in {{iostat}}, and the full path of the device. To find out more about the device, you could specify the device path to {{prtconf}}, as shown in the following example: {noformat} # prtconf -v /devices/pci@0,0/pci1179,1@1d/storage@2/disk@0,0 disk, instance #2 (driver name: sd) Driver properties: name='lba-access-ok' type=boolean dev=(29,128) name='removable-media' type=boolean dev=none name='pm-components' type=string items=3 dev=none value='NAME=spindle-motor' + '0=off' + '1=on' name='pm-hardware-state' type=string items=1 dev=none value='needs-suspend-resume' name='ddi-failfast-supported' type=boolean dev=none name='ddi-kernel-ioctl' type=boolean dev=none Hardware properties: name='inquiry-revision-id' type=string items=1 value='1.04' name='inquiry-product-id' type=string items=1 value='STORAGE DEVICE' name='inquiry-vendor-id' type=string items=1 value='Generic' name='inquiry-device-type' type=int items=1 value=00000000 name='usb' type=boolean name='compatible' type=string items=1 value='sd' name='lun' type=int items=1 value=00000000 name='target' type=int items=1 value=00000000 {noformat} As the emphasized terms indicate, this device is a removable USB storage device.\\ \\The examples in this section have explored all I/O requests. However, you might only be interested in one type of request. The following example tracks the directories in which writes are occurring, along with the applications performing the writes: {noformat} #pragma D option quiet io:::start /args[0]->b_flags & B_WRITE/ { @[execname, args[2]->fi_dirname] = count(); } END { printf("%20s %51s %5s\n", "WHO", "WHERE", "COUNT"); printa("%20s %51s %5@d\n", @); } {noformat} Running this example script on a desktop workload for a period of time yields some interesting results, as shown in the following example output: {noformat} # dtrace -s ./whowrite.d ^C WHO WHERE COUNT su /var/adm 1 fsflush /etc 1 fsflush / 1 fsflush /var/log 1 fsflush /export/bmc/lisa 1 esd /export/bmc/.phoenix/default/78cxczuy.slt/Cache 1 fsflush /export/bmc/.phoenix 1 esd /export/bmc/.phoenix/default/78cxczuy.slt 1 vi /var/tmp 2 vi /etc 2 cat <none> 2 bash / 2 vi <none> 3 xterm /var/adm 3 fsflush /export/bmc 7 MozillaFirebird <none> 8 vim /export/bmc 9 MozillaFirebird /export/bmc 10 fsflush /var/adm 11 devfsadm /dev 14 ksh <none> 71 ksh /export/bmc 71 fsflush /export/bmc/.phoenix/default/78cxczuy.slt 119 MozillaFirebird /export/bmc/.phoenix/default/78cxczuy.slt 119 fsflush <none> 211 MozillaFirebird /export/bmc/.phoenix/default/78cxczuy.slt/Cache 591 fsflush /export/bmc/.phoenix/default/78cxczuy.slt/Cache 666 sched <none> 2385 {noformat} As the output indicates, virtually all writes are associated with the Mozilla Firebird cache. The writes labeled {{<none>}} are likely due to writes associated with the UFS log, writes that are themselves induced by other writes in the filesystem. See [{{ufs}}(7FS)|http://docs.sun.com/doc/819-2254/ufs-7fs?a=view] for details on logging. This example shows how to use the {{io}} provider to discover a problem at a much higher layer of software. In this case, the script has revealed a configuration problem: the web browser would induce much less I/O (and quite likely none at all) if its cache were in a directory in a [{{tmpfs}}(7FS)|http://docs.sun.com/doc/819-2254/tmpfs-7fs?a=view] filesystem.\\ \\The previous examples have used only the {{start}} and {{done}} probes. You can use the {{wait-start}} and {{wait-done}} probes to understand why applications block for I/O – and for how long. The following example script uses both {{io}} probes and {{sched}} probes (see [Chapter 26, sched Provider|sched Provider]) to derive CPU time compared to I/O wait time for the StarOffice software: {noformat} #pragma D option quiet sched:::on-cpu /execname == "soffice.bin"/ { self->on = vtimestamp; } sched:::off-cpu /self->on/ { @time["<on cpu>"] = sum(vtimestamp - self->on); self->on = 0; } io:::wait-start /execname == "soffice.bin"/ { self->wait = timestamp; } io:::wait-done /self->wait/ { @io[args[2]->fi_name] = sum(timestamp - self->wait); @time["<I/O wait>"] = sum(timestamp - self->wait); self->wait = 0; } END { printf("Time breakdown (milliseconds):\n"); normalize(@time, 1000000); printa(" %-50s %15@d\n", @time); printf("\nI/O wait breakdown (milliseconds):\n"); normalize(@io, 1000000); printa(" %-50s %15@d\n", @io); } {noformat} Running the example script during a cold start of the StarOffice software yields the following output: {noformat} Time breakdown (milliseconds): <on cpu> 3634 <I/O wait> 13114 I/O wait breakdown (milliseconds): soffice.tmp 0 Office 0 unorc 0 sbasic.cfg 0 en 0 smath.cfg 0 toolboxlayout.xml 0 sdraw.cfg 0 swriter.cfg 0 Linguistic.dat 0 scalc.cfg 0 Views.dat 0 Store.dat 0 META-INF 0 Common.xml.tmp 0 afm 0 libsimreg.so 1 xiiimp.so.2 3 outline 4 Inet.dat 6 fontmetric 6 ... libucb1.so 44 libj641si_g.so 46 libX11.so.4 46 liblng641si.so 48 swriter.db 53 libwrp641si.so 53 liblocaledata_ascii.so 56 libi18npool641si.so 65 libdbtools2.so 69 ofa64101.res 74 libxcr641si.so 82 libucpchelp1.so 83 libsot641si.so 86 libcppuhelper3C52.so 98 libfwl641si.so 100 libsb641si.so 104 libcomphelp2.so 105 libxo641si.so 106 libucpfile1.so 110 libcppu.so.3 111 sw64101.res 114 libdb-3.2.so 119 libtk641si.so 126 libdtransX11641si.so 127 libgo641si.so 132 libfwe641si.so 150 libi18n641si.so 152 libfwi641si.so 154 libso641si.so 173 libpsp641si.so 186 libtl641si.so 189 <unknown> 189 libucbhelper1C52.so 195 libutl641si.so 213 libofa641si.so 216 libfwk641si.so 229 libsvl641si.so 261 libcfgmgr2.so 368 libsvt641si.so 373 libvcl641si.so 741 libsvx641si.so 885 libsfx641si.so 993 <none> 1096 libsw641si.so 1365 applicat.rdb 1580 {noformat} As this output shows, much of the cold StarOffice start time is due to waiting for I/O. (13.1 seconds waiting for I/O as opposed to 3.6 seconds on CPU.) Running the script on a warm start of the StarOffice software reveals that page caching has eliminated the I/O time , as shown in the following example output: {noformat} Time breakdown (milliseconds): <I/O wait> 0 <on cpu> 2860 I/O wait breakdown (milliseconds): temp 0 soffice.tmp 0 <unknown> 0 Office 0 {noformat} The cold start output shows that the file {{applicat.rdb}} accounts for more I/O wait time than any other file. This result is presumably due to many I/Os to the file. To explore the I/Os performed to this file, you can use the following D script: {noformat} io:::start /execname == "soffice.bin" && args[2]->fi_name == "applicat.rdb"/ { @ = lquantize(args[2]->fi_offset != -1 ? args[2]->fi_offset / (1000 * 1024) : -1, 0, 1000); } {noformat} This script uses the {{fi_offset}} field of the {{fileinfo_t}} structure to understand which parts of the file are being accessed, at the granularity of a megabyte. Running this script during a cold start of the StarOffice software results in output similar to the following example: {noformat} # dtrace -s ./applicat.d dtrace: script './applicat.d' matched 4 probes ^C value ------------- Distribution ------------ count < 0 | 0 0 |@@@ 28 1 |@@ 17 2 |@@@@ 35 3 |@@@@@@@@@ 72 4 |@@@@@@@@@@ 78 5 |@@@@@@@@ 65 6 | 0 {noformat} This output indicates that only the first six megabytes of the file are accessed, perhaps because the file is six megabytes in size. The output also indicates that the entire file is not accessed. If you wanted to improve the cold start time of StarOffice, you might want to understand the access pattern of the file. If the needed sections of the file could be largely contiguous, one way to improve StarOffice cold start time might be to have a scout thread run ahead of the application, inducing the I/O to the file before it's needed. (This approach is particularly straightforward if the file is accessed using [{{mmap}}(2)|http://docs.sun.com/doc/819-2241/mmap-2?a=view].) However, the ~1.6 seconds that this strategy would gain in cold start time does not merit the additional complexity and maintenance burden in the application. Either way, the data gathered with the {{io}} provider allows a precise understanding of the benefit that such work could ultimately deliver. [Top|#top] h2. {anchor:CHP-IO-STABILITY} Stability The {{io}} provider uses DTrace's stability mechanism to describe its stabilities, as shown in the following table. For more information about the stability mechanism, see [Chapter 39, Stability|Stability]. ||Element||Name stability||Data stability||Dependency class|| |Provider|Evolving|Evolving|ISA| |Module|Private|Private|Unknown| |Function|Private|Private|Unknown| |Name|Evolving|Evolving|ISA| |Arguments|Evolving|Evolving|ISA| {excerpt:hidden=true}Converted by tech dogg's sgml2wiki on Tue 20 Nov 2007 at 9:24:36 PM{excerpt} |