Motivation
Every DBA having worked with I/O intensive databases has been in the situation to analyze and potentially reconfigure the I/O subsystem. Between application and disks multiple layers are involved during database write activity. This is one reason why there's lots of confusion in the field with respect to usage of certain words, the most popular ones being 'synchronous' and 'asynchronous'. Although they seem to be precise and intuitive, in fact they are not ! They are used in different contexts with different meanings, which is a great source of misunderstanding.
Let's try to go through the whole stack from top to the bottom.
Keywords
Sun Solaris Oracle SAP Performance synchronous asynchronous concurrent direct I/O filesystem UFS buffer cache O_DSYNC AIO LGWR DBWR commit
From application ....
In order not to make things look too complicated right at the beginning, let's just look at three basic SQL database operations (DML) being used by an application providing transactional integrity:
- reads
- writes (inserts or updates)
- commits
From the application point of view reads are kind of 'synchronous', because without having the required data available, it's hard to do anything meaningful.
| Analogy: You won't be able to write a summary of a book until you've read it (you're not cheating, right ?) |
In this context 'synchronous' means that processing can't be resumed before the required data has been read (aka: the application is blocked during DB read).
If the application issues an "UPDATE KNA1 SET name1='my favourite customer';" (don't do this, even if you love all your customers !!!), the corresponding database blocks will be changed within the buffer pool of Oracle's SGA. The application can continue processing without having to wait for the blocks to be written to disk. Many people call this behaviour 'asynchronous', others may call it 'non-blocking' or 'deferred write'.
| Analogy: You are buying food in your favourite store, you put all the things you like (and can afford) into your shopping cart without having to wait and pay for each single item. |
Once you're done with all the processing, you probably want to make your transaction persistent by sending a 'COMMIT' to the database. This time the application has to wait until the database completed writing the changes to the current Oracle online redolog. Many people refer to 'synchronous' behaviour here. Others may call it 'blocking'.
| Analogy: Having finished shopping, you have to proceed to the cashier and pay for all the stuff in your shopping cart. I highly recommend to wait there until payment has been completed - unless you don't mind going to jail. |
With respect to SAP, this scenario is really simplified, because SAP sometimes uses application level asynchronous updates (using SAP UPD and UP2 processes and VB*-tables). However, I just want to write a small article, not a book.
.... via database ....
Now it's time to go down the stack and talk about the database itself. One basic thing that you expect from a modern DBMS is consistency even after a server crash. Changes that have been committed successfully before the crash have to be reflected by the data after the crash. In order to achieve this, the DB has to make sure that writes are persisted on the storage subsystem - and not just to the volatile file system cache. The database can not rely on an OS process doing regular flushes from FS-cache to disk. Therefore Oracle uses the famous "O_DSYNC" option when opening data-, control- or logfiles.
| Note: Oracle will always use 'O_DSYNC' no matter if configured using asynchronous or synchronous I/O ! Otherwise it won't be able to ensure consistency. |
Just to be precise: In case your storage array uses non-volatile RAM (NVRAM) for caching, data doesn't have to be written to the physical disk before the I/O is marked complete. The NVRAM based cache is good enough because the data will survive an outage. Did I mention that this type of I/O is frequently called 'synchronous' ?
| Analogy: Assuming you are somewhat forgetful (like myself), your better half is well advised to provide you with a shopping list before sending you to buy the missing ingredients for dinner - at least as long as the meal is supposed to be great. |
Another characteristic related to server based databases is the ability to handle lots of requests / users concurrently. This can be achieved by either providing multiple instances of specific processes (e.g. the database writer DBWR or the log writer LGWR) where each process / thread works in a blocking (you are right, often called 'synchronous') mode, or by using non-blocking mechanisms so that the corresponding process is able to issue I/Os without having to wait for the previous I/O to complete. While the first approach provides a fixed degree of parallelism (DOP) and requires a lot of synchronization between those instances, the second approach using non-blocking asynchronous I/O (AIO) is very effective and usually the preferred way.
Analogy: no idea so far
| Note: Don't get me wrong: Oracle doesn't just fire I/Os and expects them to succeed, it certainly tracks their completion ! |
In Oracle the default behaviour is to use AIO, in 10g this can be configured by setting the parameter
filesystemio_options
to either 'setall' or 'asynch'. At this point, we should discuss the role of
.... via filesystems ....
The 'filesystemio_options' parameter allows us to enable additional features. The default value of 'setall' already suggests that additional things will happen behind the scenes. 'setall' also enables a filesystem feature called 'direct I/O'. As you've learned before, Oracle uses 'O_DSYNC' to make sure that data changes will be written to the storage. The filesystem cache (officially called 'buffer cache') therefore doesn't speedup things, in reality it makes writes a lot slower since the buffer cache has to be maintained / updated additionally. The only operation that may benefit from FS-level buffering are database reads. However, Oracle already has huge caches (buffer pools within SGA) to store frequently used data by using a least recently used (LRU) algorithm and tries to prevent 'cache wastage' caused by large sequential reads (aka full table scans) by reusing those blocks first. In most cases, there is little chance that a FS cache can contribute to reduce physical I/O (but there's a big chance to waste memory and CPU cycles). Therefore it sounds like a smart idea to really disable the filesystem cache for Oracle datafiles and redologs. In fact this is what 'direct I/O' provides while maintaining filesystem handling and semantics. When setting 'filesystemio_options=setall', Oracle will use the Solaris directio() function to access the filesystems bypassing the buffer cache. Other processes outside Oracle (i.e. backup tools) may access those filesystems actually through the buffer cache. This may be beneficial since the filesystem layer provides additional things like prefetching data during sequential access.
Another option to activate direct I/O for UFS (Unix File System) in Solaris is to use the mount option 'forcedirectio' in /etc/vfstab. Obviously this will enable direct I/O for all processes accessing those filesystems, not just Oracle.
Actually, that's not the end of the story: the full name of this feature is called 'concurrent direct I/O' (as of Solaris 8 Update 3). Earlier we talked about concurrency on the demand side (meaning on the Oracle side), but there's a problem to provide concurrent writes to the same file. The POSIX standard implies that at any time only one instance (process / thread) is allowed to modify a file. This is not just bad for the logwriter, when adding bulk data to one specific table you will most likely end up in writing to the same extent within the same file. This issue is actually addressed by 'concurrent direct I/O'. Since Oracle synchronizes all I/O between all processes / threads, concurrent writes do not break data consistency. As an additional benefit 'concurrent direct I/O' avoids breaking up large sequential writes as issued by LGWR.
.... back to ? What's left ?
Ok, the term 'synchronous' has been used multiple times in different context for different messaging. The same is true for 'direct I/O', however this is not so bad. 'direct I/O' is also used within the Oracle database terminology. To be precise it should be called 'direct path I/O'. It doesn't refer to UFS at all. In certain situations Oracle bypasses the Oracle SGA and lets the Oracle shadow / server process access datafiles directly. Direct path reads can usually be seen when reading large object style data (LOBs). Direct path writes usually happen when large datasets have to be sorted (and don't fit within the PGA sort_area_size) or when direct path operations are explicitly enforced (e.g. SQL*Loader with DIRECT=TRUE option). The intention of 'direct path I/O' is not to bypass the FS buffer cache but to bypass the SQL-engine and Oracle buffer pools within the SGA.
Comments (3)
Dec 08, 2008
fdurm says:
Woww, great article, thanks vwetter ! I read quite a lot of stuff on this subjec...Woww,
great article, thanks vwetter ! I read quite a lot of stuff on this subject but nothing made things clear to me. I really like the 'from Application down to disk' approach. Looking forward to your dtrace posting ! Keep up the good work.
Frank D.
Dec 09, 2008
vwetter says:
Frank, thanks for your feedback. Hopefully I will have finished the DTrace artic...Frank,
thanks for your feedback. Hopefully I will have finished the DTrace article by the end of next week.
Volker
Dec 17, 2008
Andrew_Clinton says:
Excellent article, Volker. I would be interested to know how or if this change...Excellent article, Volker.
I would be interested to know how or if this changes with ZFS..