Sun Studio March 2009 Express Release (3/09) Details
The following are new and changed features in the March 2009 Express (3/09) release as well as all previous Express releases since the release of Sun Studio 12. Note that most of these features might not yet be fully documented in the Sun Studio man pages in this build.
Some of these features are experimental and might not be available in future releases. Some of these features might change significantly in future releases. The documentation is also preliminary and might not reflect the full range of functionality or problems and workarounds.
Performance Analyzer
OpenMP Program Analysis
The Performance Analyzer introduces new features for analyzing OpenMP programs:
- A new collector for OpenMP 3.0 profiling is available, but not enabled by default. It can be enabled by setting the environment variable SP_COLLECTOR_NEWOMP, and works only for code compiled with the option -qoption iropt -Apcg:mfcxt. If you enable the new collector, the following features are available:
- The Analyzer GUI includes OMP Parallel Regions tab and OpenMP Tasks tab, which are available when OpenMP data is collected, and have entries for the source lines of each construct and metrics for those entries.
- Support has been added in the experiment format for OpenMP 3.0 profiling.
- Two new commands for OpenMP profiles, OMP_preg and OMP_task have been added.
- User mode presentation of OpenMP profiles has been changed so that the parallel loop functions are no longer shown.
MPI Program Analysis
The Performance Analyzer introduces major enhancements for analyzing MPI programs:
- The collect command has a new -M on option that is used to collect data on MPI programs. The target program should be mpirun, from Sun HPC ClusterTools 8, although mpirun from other Open MPI implementations might also work.
- The Analyzer GUI includes the following new features:
- Context menus have been added to the MPI Timeline, MPI Chart, Functions, Callers-Callees, and Timeline tabs.
- Keyboard shortcuts have been added for navigation, selection, and zoom actions in the MPI Timeline, MPI Chart, and Timeline tabs.
- Navigation buttons have been added to the toolbars in the MPI Timeline and MPI Chart tabs.
- MPI Timeline tab, which shows a view of an MPI program's data over time. The MPI Timeline tab shows a set of bars, one for each MPI rank, indicating when each process is in user code and when it is in MPI code, and displaying messages sent among the processes.
- MPI Chart tab, which you can use to generate different displays of the data to analyze. The MPI Chart tab shows a variety of one- and two-dimensional plots of data and other aggregated data concerning MPI processing.
- Zooming and filtering capabilities to focus on particular aspects of the data in the MPI Timeline and MPI Chart tabs.
- With applications created using Sun HPC ClusterTools 8 Early Access 2, clock-profiling MPI experiments show two new metrics, MPI Work Time and MPI Wait Time.
You can find details about how to use the new features in the Sun Studio Performance Analyzer for MPI programs on the »MPI Analyzer wiki page.
Additional Changes to the Performance Analyzer tools
- Changes to experiment format to support MPI experiments and to support MPI Work and MPI Wait metrics from Sun HPC ClusterTools 8 Early Access 2.
- Changes to the collect command (in addition to MPI changes):
- Hardware counter profiling on the Solaris OS accepts libcpc generic counters, and where libcpc recognizes them, numeric values for counter specifications.
- Hardware counter profling on the Linux OS accepts numeric values for counter specifications.
- Hardware counter profiling is now supported for the Intel Nehalem processor.
- Hardware counter profiling is now supported for the Intel Atom processor on the Solaris OS.
- The -I and -N options are accepted when -c (count) profiling is specified.
- The -J option for Java profiling can support multiple arguments to the JavaTM Virtual Machine (JVM) as a quoted string containing blank or tab separated arguments. (The terms "Java Virtual Machine" and "JVM" mean a Virtual Machine for the Java platform.)
- Hardware counter profiling has better support for Intel Core2 and AMD Family 10h processors.
- The collector command in the dbx debugger has better support for Intel Core2 and AMD Family 10h processors.
- Changes to the Performance Analyzer GUI (in addition to MPI and OpenMP changes):
- The Show/Hide dialog box now supports a third option, API-only.
- The tabs directive in an .er.rc file now specifies tab order as well as tab visibility.
- Changes to the er_print command:
- If you enable the new collector for OpenMP profiling by setting the SP_COLLECTOR_NEWOMP environment variable, the following new features are available:
- User mode presentation of OpenMP profiles has been changed so that the parallel loop functions are no longer shown.
- Two new commands for OpenMP profiles, OMP_preg and OMP_task have been added.
- The er_print command has been ported to run in 64 bits, and will run the 64-bit version if the underlying system supports it, unless the SP_COMMAND_64 environment variable is set.
- A new object_api command enables API-only processing of call stacks from the named objects.
- You can include the object_show, object_hide, and object_api commands in .er.rc files.
- A new objects_default command resets the default for all shared objects. You cannot include this command in .er.rc files.
- The tabs directive in an .er.rc file now specifies tab order as well as tab visibility.
- With Sun HPC ClusterTools 8 Early Access 2, clock-profiling MPI experiments show two new metrics, MPI Work Time and MPI Wait Time.
- Compiler commentary controls, the cc, scc, and doc commands, have been extended to all control over whether the compiler options used to build the object are shown at the bottom of the source display.
- Two APIs have been added to the Thread Analyzer:
- tha_check_datarace_mem instructs the Thread Analyzer to monitor or ignore accesses to a specified block of memory when doing data race detection.
- tha_check_datarace_thr instructs the Thread Analyzer to monitor or ignore memory accesses by one or more specified threads when doing data race detection.
IDE
The following features are carried over from the November 2008 Express release:
- Based on NetBeans IDE 6.5
- The new Memory window displays the contents of memory addresses currently used by the process being debugged.
- The new Call Graph window displays a tree view of either the functions called from a selected function, or the functions that call that function.
- Syntactic and semantic errors are highlighted as you type code.
- You can package completed applications as tar files, zip files, SVR4 packages, RPMs, or Debian packages.
- You can define remote hosts and use development tools on those hosts to build and run projects from your client system.
The following are carried over from the July 2008 Express release:
- The Include Hierarchy window lets you inspect the hierarchy of source and header files
- The Type Hierarchy window lets you inspect all supertypes and subtypes of a class
- A new toolbar button lets you toggle between corresponding source and header files
- Code completion now works for #include directives
- The new Go to Type menu item lets you find a type (class, struct, enum, or typedef) by its name or prefix
- The new Go to Include menu item lets you go directly to a file that is included in a source or header file
- The new Go to Function or Variable menu item lets you find a function or variable by its name or prefix
- Project dependencies can be created for projects from existing code
- A choice of formatting styles for your source code
- The new Threads window shows you all the threads in the current debugging session
- The new Disassembler window displays the assembly instructions for the current source file
- The new Usages window shows you everywhere a class (structure), function, variable, macro, or file is used in your project's source code
C, C++, and Fortran Compilers
- Object files created by the compilers on the Solaris OS on x86 platforms or the Linux OS are incompatible with previous compiler versions if the application code contains functions with parameters or return values using _m128/_m64 data types. Users with .il inline function files, assembler code, or asm inline statements calling these functions also need to be aware of this incompatibility.
- The -xtarget=woodcrest option expands to -xarch=ssse3 -xchip=core2 -xcache=32/64/8:4096/64/16.
- The -xtarget=sparc64vii option expands to -xarch=sparcima -xchip=sparc=sparc64vii -xcache=64/64/2:5120/256/10.
- The -xtarget=penryn option expands to -xarch=sse4_1 -xchip=penryn -xcache=32/64/8:6144/64/24.
- The -xtarget=nehalem option expands to -xarch=sse4_2 -xchip=nehalem -xcache=32/64/8:256/64/8:8192/64/16.
- The -xtarget=ultraT2plus option expands to -xarch=sparcvis2 -xcache=8/16/4:4096/64/16 -xchip=ultraT2plus.
- The -xprofile=collect and -xprofile=use options provide improved support for profiling multi-threaded, dynamically linked applications.
- The -xarch=ssse3 option adds the SSSE3 instruction set to the SSE3 instruction set.
- The -xarch=sse4_1 option compiles for the SSE4.1 ISA.
- The -xarch=sse4_2 option compiles for the SSE4.2 ISA.
- The -xarch=sparcima option compiles for the sparcima version of the SPARC-V9 instruction set, plus the UltraSPARC extensions.
- The -xchip=core2 option optimizes for the core2 processor.
- The -xchip=sparc64vii option optimizes for the Fujitsu SPARC64® VII processor.
- The -xchip=penryn option optimizes for the Intel® Penryn processor.
- The -xchip=nehalem option optimizes for the Intel® CoreTM (Nehalem) processor.
- The -xchip=ultraT2plus option optimizes for the UltraSPARC® T2 Plus processor.
- The -xcrossfile=1 option becomes an alias of the -xipo=1 option. The -xcrossfile=0 option no longer has any effect. Specifically, -xcrossfile=1 and -xcrossfile=0 are equivalent to -xipo=1.
- On Solaris platforms, the -xpec[=yes|no] option generates a PEC binary that is recompilable for use with the Automatic Tuning System (ATS).
- The -Y option does not accept i as an argument.
- On SPARC® platforms, the -xdepend option is now implicitly enabled for optimization levels -x03 or higher, and is no longer included in the expansion of the -fast option.
- Support for OpenMP 3.0 in this Express release includes a libmtsk library. OpenMP programs will link with this library by default instead of the libmtsk library in the Solaris OS.
- -xannotate[=yes|no] (SPARC platforms only) instructs the compiler to create binaries that can be transformed later by binary modification tools like binopt(1).
C Compiler
- The -Wi option is no longer accepted for passing arguments to the interprocedural optimizer.
- The -xsb and -xsbfast options are obsolete and have been removed.
- A new flag has been added to the -xcheck option, [no%]init_local.
- __FUNCTION__ is a predefined identifier that contains the name of the lexically-enclosing function. It is functionally equivalent to the c99 predefined identifier, __func__. On Solaris platforms, __FUNCTION__ is not available in -Xs and -Xc modes.
- In standard C, a case label in a switch statement can have only one associated value. The Sun Studio C compiler allows an extension found in some compilers known as case ranges.
- The second operand in a conditional expression can be omitted. If the first operand is then non-zero, the value of the conditional expression is that of the first operand.
- For the -features=[no%]conststrings option, which enables and disables string literal placement in read-only memory, the default is -features=conststrings, which replaces the deprecated -xstrconst option.
- The -include filename option for specifying preprocessor include files has been added.
- New behavior for preprocessor ## operator: The compiler now issues a warning diagnostic for an undefined ## operation (C standard, section 3.4.3), where undefined is a ## result that, when preprocessed, consists of multiple tokens rather than one single token (C standard, section 6.10.3.3(3)). The result of an undefined ## operation is now defined as the first individual token generated by preprocessing the string created by concatenating the ## operands.
- Global asm Statements (SPARC only)
A basic asm statement is expressed as asm(assembly code);
It emits the given assembler text directly into the assembly file. A basic asm statement declared at file scope, rather than function scope, is referred to as a "global asm statement". Other compilers refer to this as a "toplevel" asm statement.
Global asm statements are emitted in the order they are specified, that is, they retain their order relative to each other and maintain their position relative to surrounding functions.
At higher optimization levels, the compiler may remove functions that it believes are not referenced. Since the compiler will not know which functions are referenced from within global asms, it is possible that they may be removed inadvertently. To avoid this potential problem, the new attribute, "used", can be applied to the affected function(s).
Note that extended asm statements, those which provide a template and operand specifications, are not allowed to be global.
"__asm" and "__asm__" are synonyms for the "asm" keyword and can be used interchangeably.
Global asm statements are only available on SPARC platforms in this release.
C++ Compiler
- The -xia (interval arithmetic) option is now supported on the Solaris OS on x86 platforms.
- The -xipo archive option is now supported on the Solaris OS on x86 platforms and on the Linux OS on x86 platforms.
- The -Qoption option does not accept ube_ipa as an argument.
- The expansion of the -fast option now includes -D_MATHERR_ERRNO_DONTCARE.
- The -xvpara option, which shows parallelization warning messages, is now supported.
- The -sb, -sbfast, -xsb and -xsbfast options are obsolete and have been removed.
- The compiler now inlines code when you specify the -g option with any -0 or -x0 optimization values as long as you do not also specify +d.
- The pragma must_have_frame is now supported.
- In standard C++, a case label in a switch statement can have only one associated value. The Sun Studio C++ compiler allows an extension found in some compilers known as case ranges.
- The compiler normally creates temporary files in the /tmp directory. You can specify another directory by setting the TMPDIR environment variable.
- The following attributes of functions are now supported:
_attribute_((const))
_attribute_((constructor))
_attribute_((destructor))
- The following attribute of variables is now supported for struct and enum types only:
- Universal Character Names are now supported.
- Loop pragmas are now supported.
- User-defined names for macro variadic arguments are now supported.
- The -include filename option for specifying preprocessor include files has been added.
Fortran Compiler
- Quad precision (REAL*16) is implemented on x86 platforms. REAL*16 is 128-bit IEEE floating point.
- New -ext_names=fsecond-underscore appends two underscores to external names that contain an underscore, and a single underscore to those that do not. This option is equivalent to gfortran's -fsecond-underscore option. This option does not affect external symbols with the BIND(C) attribute.
- New IVDEP directive and new compiler option -xivdep
- The -Qoption option does not accept ube_ipa as an argument.
- The -xvpara option, which shows parallelization warning messages, is now supported.
- The -sb, -sbfast, -xsb and -xsbfast options are obsolete and have been removed.
- The compiler normally creates temporary files in the /tmp directory. You can specify another directory by setting the TMPDIR environment variable.
- The behavior of the cpu_time() Fortran 95 intrinsic routine is different between Solaris and Linux platforms.
- The Fortran 2003 IMPORT statement is implemented.
- Automatic insertion of DTrace probes – this article by Diane Meirowitz of the Sun Studio Fortran team describes a new feature of Sun Studio Fortran — automatic insertion of DTrace static probes in optimized code. The Fortran -dtrace Qoption is now available as a technology preview stage. Please try it and give us your feedback.
Compilers and Assemblers on x86 Platforms
- New MOVBE assembly instruction for Intel Atom processor
- New Intel AES assembly instructions
- SSSE3 Assembly syntax/semantic and corresponding compiler intrinsics
- SSE4.1 Assembly syntax/semantic and corresponding compiler intrinsics (Rev 1.0)
- SSE4.2 Assembly syntax/semantic and corresponding compiler intrinsics
- Two new assembler options: -C and -a32
- The -b option, which generates extra symbol table information for the SourceBrowser, is now obsolete.
dbx Debugger
- Runtime checking (RTC) now gives information about array out-of-bounds access on the Solaris OS on x86 platforms.
- Runtime Checking (RTC) now supports access, leaks, and memuse checking on the following Linux platforms: SuSE Linux Enterprise Server 10, Red Hat Enterprise Linux 5.
- A new graphical user interface (GUI) for dbx, dbxtool, is included in this release. For information on invoking dbxtool, see the dbxtool(1) man page. dbxtool is a separate GUI from the Sun Studio IDE, but is also based on NetBeans IDE 6.5. dbxtool provides access to all the functionality of dbx. It also supports attaching to a process as it starts executing to begin debugging it immediately (see the ss_attach(1) man page), and fixing and continuing, which lets you relink source files after you make changes, without recompiling the entire program.
- dbx can now evaluate function parameters and local variables in optimized code when the code provides the needed debugging information. For more information, see Optimized Code Debugging With Sun Studio dbx
Sun Performance Library
- Sun Studio software now includes the ScaLAPACK 1.8.0 high performance cluster library. This library works with Sun HPC ClusterTools 8.1 based on the OpenMPI 1.3 release. The reference implementation along with documentation can be found at http://www.netlib.org/scalapack/.
- The new Custom Library Tool provides the option to create scaled down versions of Sun Performance Library.
- Numerous performance improvements have been made for BLAS, LAPACK, and FFT routines.
- Support for Intel® CoreTM i7 (Nehalem) and AMD Quad-Core OpteronTM (Shanghai) CPUs is available. To link with this library, use the following options:
-m64 -xlic_lib=sunperf (C and Fortran)
-m64 -library=sunperf (C++)
- Support for Fujitsu SPARC64-VII® CPUs is available. This version of Sun Performance Library uses the floating point multiply-add instruction to achieve the best performance possible. To link with this library, use the following options:
-xtarget=sparc64vii -fma=fused -xlic_lib=sunperf (C and Fortran)
-xtarget=sparc64vii -fma=fused -library=sunperf (C++)
- ZGEMM improvements for SPARC64-VI and SPARC64-VII
- LAPACK routine are updated to conform to the latest specification of LAPACK 3.1.1
- Support for Woodcrest CPUs is available.
- Support for SPARC64-VI CPUs is available.
OpenMP 3.0
Support for OpenMP 3.0 features in the C, C++, and Fortran compilers:
- libmtsk library: Sun Studio Express support for OpenMP 3.0 includes a libmtsk library. Users should be aware that their OpenMP programs will link with this library by default instead of the one included with Solaris.
- Tasking
- Loop collapse
- Runtime routines for nesting support
- Runtime routines for runtime schedule
- Environment variables OMP_STACKSIZE and OMP_WAIT_POLICY
- AUTO loop schedule
- Enhanced threadprivate support in C++
- Threadprivate static class member (C++)
- Unsigned int loop control variable (C and C++)
- New value for _OPENMP macro (200805L)
- Processor binding, specified by the SUNW_MP_PROCBIND environment variable, is available on Linux systems.
Added in the November 2008 Express Release:
- f90 allocatable arrays
- C++ iterator loops
- C++ pointer loops
DLight
DLight offers a variety of instrumentation that takes advantage of the Solaris TM Dynamic Tracing (DTrace) debugging and performance analysis functionality. In this Express release, DLight is supported on Linux platforms for two instruments: Clock Profiler (based on Performance Analyzer) and Java Ticker. For more information, see the DLight Tutorial.
DTrace GUI Plugin
The NetBeansTM DTrace GUI Plugin is a graphical user interface (GUI) for running DTrace scripts. The DTrace GUI includes Chime, a graphical tool for visualizing DTrace aggregations. For information on using the DTrace GUI, see NetBeans DTrace GUI Plug-in.
|
 | Add this wiki page to your watchlist to be notified of any new information added here as it becomes available. You can watch this wiki page using the Tools pulldown menu at the upper right of the page. |
|