{panel}
This page gives greater detail on some of the new features in the current Express release of Sun Studio compilers and tools. Note that some of these features might not yet be documented in the Sun Studio man pages in this build.
Some of these features are experimental and might not be available in future releases, while some of these features might change significantly in future releases. The documentation is also preliminary and might not reflect the full range of functionality or problems and workarounds. (See the !pointer.gif! [Known Problems and Workarounds page|Sun Studio Express 11.08 Known Problems and Workarounds] for the latest updated news about this express release. )
{panel}
{section}
{column}
----
h1. Details on Sun Studio Express Releases
{panel}
h2. Sun Studio November 2008 Express Release
The following are new features in the November 2008 Express release. More information can be found on the [Sun Studio Express November 2008 Release] wiki page.
h3. IDE
* Based on NetBeans IDE 6.5
* The new Memory window displays the contents of memory addresses currently used by the process being debugged.
* The new Call Graph window displays a tree view of either the functions called from a selected function, or the functions that call that function.
* Syntactic and semantic errors are highlighted as you type code.
* You can package completed applications as tar files, zip files, SVR4 packages, RPMs, or Debian packages.
* You can define remote hosts and use development tools on those hosts to build and run projects from your client system.
----
h3. Sun Performance Library
* ZGEMM improvements for SPARC64-VI and SPARC64-VII
----
h3. OpenMP 3.0
Additional functionality has been added to complete the implementation of the OpenMP 3.0 API specifications in the C, C++, and Fortran compilers. For more information about these features, please refer to the !pointer.gif! [OpenMP Specification Version 3.0|http://openmp.org] and the !pointer.gif! [Sun Studio OpenMP Wiki|https://wikis.sun.com/display/openmp/Sun+Studio+OpenMP].
----
h3. dbx
A new graphical user interface (GUI) for dbx, *dbxtool*, is included in this November 2008 Express release. For information on invoking dbxtool, see the {{dbxtool}} man page. dbxtool is a separate GUI from the Sun Studio IDE, but is also based on NetBeans IDE 6.5. {{dbxtool}} provides access to all the functionality of dbx. It also supports attaching to a process as it starts executing to begin debugging it immediately (see the {{ss_attach}} man page), and fixing and continuing, which lets you relink source files after you make changes, without recompiling the entire program.
----
h3. Performance Analyzer
MPI analysis has been tested with the following:
* Sun HPC ClusterTools 8.0 and 8.1 with:
** Sun Studio compilers on Solaris 10 11/06 Operating System or later
** gcc on Red Hat Enterprise Linux 5 or later
** gcc on SuSE Linux Enterprise Server 10 or later
* Sun HPC ClusterTools 7.0 and 7.1 with:
** Sun Studio compilers on Solaris 10 11/06 Operating System or later
** no Linux distributions
* Open MPI 1.2.7, MPICH2 1.0.7, and MVAPICH2 1.2rc2 built using shared libraries with:
** gcc on Solaris 10 11/06 or later
** gcc on
*** SuSE Linux Enterprise Server 10 or later
*** Red Hat Enterprise Linux 5 or later
Note that for these non-Sun MPI implementations, you must build the MPI distribution and your MPI application with the same compiler and shared libraries for successful collection of data for the MPI calls.
----
h3. DTrace GUI Plug-in
The NetBeans DTrace GUI plug-in, version 0.4, is a Graphical User Interface (GUI) for running DTrace scripts, even those that are embedded in shell scripts. In fact, the DTrace GUI plug-in runs all of the scripts that are packaged in the DTraceToolkit. The DTraceToolkit is a collection of useful documented scripts developed by the OpenSolaris DTrace community.
For documentation for the 0.4 version of the plugin that is included in this Express release, see [http://www.netbeans.org/kb/docs/ide/NetBeans_DTrace_GUI_Plugin_0_4.html]
{panel}
----
{panel}
h2. Sun Studio July 2008 Express Release
The following features and changes were introduced in the July 2008 Express release and are carried over into the current release.
----
h3. Performance Analyzer
The Performance Analyzer includes major enhancements for analyzing MPI programs:
* The collect command has a new {{\-M on}} option that is used to collect data on MPI programs. The target program should be {{mpirun}}, from Sun HPC ClusterTools 8, although {{mpirun}} from other Open MPI implementations might also work.
* The Analyzer GUI includes the following new features:
** MPI Timeline, which shows a view of an MPI program's data over time. The MPI Timeline tab shows a set of bars, one for each MPI rank, indicating when each process is in user code and when it is in MPI code, and displaying messages sent among the processes.
** MPI Charts, which you can use to generate different displays of the data to analyze. The MPI Charts tab shows a variety of one\- and two-dimensional plots of data and other aggregated data concerning MPI processing.
** Zooming and filtering capabilities to focus on particular aspects of the data in the MPI Timeline and MPI Charts tabs.
** With applications created using Sun HPC ClusterTools 8 Early Access 2, clock-profiling MPI experiments show two new metrics, MPI Work Time and MPI Wait Time.
You can find details about how to use the new features in the Sun Studio Performance Analyzer for MPI programs on the !pointer.gif! [MPI Analyzer wiki page|https://wikis.sun.com/display/MPIAnalyzer ].
h4. Experiment Format
The experiment format has been changed:
* To support MPI experiments. An MPI experiment consists of a founder experiment with sub-experiments for the processes instantiated for each rank of the MPI job. The founder experiment does not have real data on the mpirun executable; rather, it contains the aggregated trace of API calls across all the processes as generated with the open source Vampir Trace library.
* To support MPI Work metrics and MPI Wait metrics from Sun HPC ClusterTools 8 Early Access 2.
h4. Other changes to the collect command
* The \-I and \-N options are accepted when-c (count) profiling is specified to better control which libraries are instrumented and where the instrumented libraries are put.
* The \-J option for Java profiling can support multiple arguments to the JavaTM Virtual Machine (JVM) as a quoted string containing blank or tab separated arguments.
* Hardware counter profiling has better support for Intel Core2 and AMD Family 10h processors.
* The Sun Studio software installation no longer includes a JavaTM 2 Software Development Kit (JDK). The {{collect}} command looks in the location specified by theJDK_HOME environment variable, the location specified by the JAVA_PATH/usr/javafor a JVM to use in profiling a jar or classfile.
h4. collector Command in dbx Debugger
The collector command in the dbx debugger has better support for Intel Core2 and AMD Family 10h processors.
h4. Other changes to the Performance Analyzer
* The Show/Hide dialog box now supports a third option, API-only, which truncates all call stacks at the first function from a shared object that has the option set, effectively suppressing all detail below that API function. The dialog also has a button to reset the defaults for all shared objects.
* The tabs directive in an .er.rc fie now specifies tab order as well as tab visibility.
h4. er_print Command
* A new object_api command enables API-only processing of call stacks from the named objects, which truncates all call stacks at the first function from a shared object that has the option set, effectively suppressing all detail below that API function.
* You can include the object_show, object_hide, and object_api commands in .er.rc files.
* A new objects_default command resets the default for all shared objects. You cannot include this command in .er.rc files.
* The tabs directive in an .er.rc file now specifies tab order as well as tab visibility.
* With Sun HPC ClusterTools 8 Early Access 2, clock-profiling MPI experiments show two new metrics, MPI Work Time and MPI Wait Time.
* Compiler commentary controls, the cc, scc, and doc commands, have been extended to all control over whether the compiler options used to build the object are shown at the bottom of the source display.
* er_print has been ported to 64-bits, and will run the 64-bit version on any system capable of supporting it. This enables er_print and Analyzer to read much larger experiments on such systems.
----
h3. Thread Analyzer
Two new interfaces have been added to the libtha API:
* tha_check_datarace_mem() instructs the Thread Analyzer to monitor or ignore accesses to a specified block of memory when doing data race detection.
* tha_check_datarace_thr() instructs the Thread Analyzer to monitor or ignore memory accesses by one or more specified threads when doing data race detection.
----
h3. IDE
The IDE in this Sun Studio Express release is based on NetBeans IDE 6.1, and includes the following new features:
* The Include-Hierarchy window lets you inspect the hierarchy of source and header files
* The Type-Hierarchy window lets you inspect all supertypes and subtypes of a class
* A new toolbar button lets you toggle between corresponding source and header files
* Code completion now works for #include directives
* A new Go to Type menu item lets you find a type (class, struct, enum, or typedef) by its name or prefix
* A new Go to Include menu item lets you go directly to a file that is included in a source or header file
* A new Go to Function or Variable menu item lets you find a function or variable by its name or prefix
* Project dependencies can be created for projects from existing code
* A choice of formatting styles for your source code
* A new Threads window shows you all the threads in the current debugging session
* A new Disassembler window displays the assembly instructions for the current source file
* A new Usages window shows you everywhere a class (structure), function, variable, macro, or file is used in your project's source code
h4. {{sunstudio}} Command Options
The following changes have been made to the options of the {{sunstudio}} command:
* The {{\--enablejava}} and {{\--disablejava}} options have been removed. You can now use the Plugin Manager (Tools > Plugins) to enable and disable the Java plugins.
* The format of the {{\--ui-classpath}} _path_ option, which appends the specified path to the IDE's classpath, has changed to {{\--cp:a\{_\}path\_}}.
----
h3. C, C+\+ and Fortran compilers
* Option changes:
** The \-xarch=ssse3 adds SSSE3 instruction set to the SSE3 instruction set.
** The \-xarch=sse4_1 option compiles for the SSE4.1 ISA.
** The \-xarch=sse4_2 option compiles for the SSE4.2 ISA.
** The \-xarch=sparcima compiles for the sparcima version of the SPARC-V9 ISA. This option enables the compiler to use instructions from the SPARC-V9 instruction set, plus the UltraSPARC extensions, including the Visual Instruction Set (VIS) version 1.0; the UltraSPARC-III extensions, including the Visual Instruction Set version 2.0; the SPARC64 VI extensions for floating-point multiply-add; and the SPARC64 VII extensions for interger multiply-add.
** The \-xchip=core2 optimizes for the core2 processor.
** The \-xchip=sparc64vii optimizes for the SPARC64 VII processor.
** The \-xchip=penryn option optimizes for the Intel Penryn processor.
** The \-xchip=nehalem option optimizes for the Intel Nehalem processor.
** The \-xcrossfile=1 option becomes an alias of the \-xipo=1 option. \-xcrossfile=0 no longer has any effect. Specifically, \-xcrossfile=1 \-xcrossfile=0 results in \-xipo=1.
** The {{\-xpec}}\[{{=yes}}\|{{no}}\] option generates a PEC binary that is recompilable for use with the Automatic Tuning System (ATS). This option is not supported on the Linux operating system.
** The \-xtarget=woodcrest option expands to \-xarch=ssse3 \-xchip=core2 \-xcache=32/64/8:4096/64/16.
** The \-xtarget=sparc64vii option expands to-xarch=sparcima \-xchip=sparc64vii \-xcache=64/64/2:5120/256/10.
** The \-xtarget=penryn option expands to-xarch=sse4_1 \-xchip=penryn \-xcache=32/64/8:6144/64/24.
** The \-xtarget=nehalem option expands to \-xarch=sse4_2 \-xchip=nehalem \-xcache=generic.
** The \-Y option doesn't accept i as an argument.
** On SPARCĀ® platforms, the \-xdepend option is now implicitly enabled for optimization levels \-x03 or higher, and is no longer included in the expansion of the \-fast option. An explicit use of the \-xdepend option always has precedence over an implicit enabling of the \-xdepend option.
* Support for OpenMP 3.0 in this Express release includes a libmtsk library. OpenMP programs will link with this library by default instead of the libmtsk library in the Solaris OS.
* Optimization using {{\-xannotate}}\[{{=yes}}\|{{no}}\]
(SPARC platforms only) Instructs the compiler to create binaries that can later be transformed by binary modification tools like binopt(1). Future binary analysis, code coverage, and memory error detection tools will also work with binaries built with this option.
Use the {{\-xannotate=no}} option to prevent the modification of the binary file by these tools.
The {{\-xannotate=yes}} option must be used with optimization level {{\-xO1}} or higher to be effective, and is effective only on systems with the new linker support library interface {{\-ld_open()}}. The new compiler support library libld_annotate.so uses this new interface. If the compiler is used on a system without this linker interface (for example, Solaris OS 9), it silently reverts to {{\-xannotate=no}}. The new linker interface is provided by the fix to bug 6479848. This fix is available in Solaris patch 127111-07 and in current versions of OpenSolaris.
The default is {{\-xannotate=yes}}, but if either of the above conditions is not met, the default reverts to {{\-xannotate=no}}.
* See Also [#SSSE3, SSE4.1, and SSE4.2 Intrinsics].
----
h3. C compiler
* Option changes:
** The {{\-W}} option doesn't accept {{i}} as an argument.
** The {{\-xsb}} and {{\-xsbfast}} options are obsolete and have been removed.
** A new flag, \[{{no%}}\]{{init_local}}, has been added to the {{\-xcheck}} option. If you do not specify the {{\-xcheck}} option, the default is {{no%init_local}}, which means local variables will not be initialized. If you specify the {{\-xcheck}} option without a value for this flag, the default is {{init_local}}, meaning that the compiler will generate code to set local variables to values that are likely to cause exceptions if a variable is used before it is assigned. This option does not affect global or static variables.
* {{\__FUNCTION_\_}} is a predefined identifier that contains the name of the lexically-enclosing function. It is functionally equivalent to the c99 predefined identifier, {{\__func_\_}}. On Solaris platforms, {{\__FUNCTION_\_}} is not available in \-Xs and \-Xc modes.
* In standard C, a case label in a switch statement can have only one associated value. The Sun Studio C compiler allows an extension found in some compilers known as case ranges.A case range specifies a range of values to associate with an individual case label. The syntax of a case range is: {{case low ... high :}}
A case range behaves exactly as if a case label had been specified for each value in the given range from low to high inclusive. If low and high are equal, the case range specifies only the one value. The lower and upper values must conform to the requirements of the C standard; that is, they must be valid integer constant expressions (C standard 6.8.4.2). You can freely intermix case ranges and case labels, and you can specify multiple case ranges within a switch statement.
The following code is a programming example of a case range:
{code}
enum kind { alpha, number, white, other };
enum kind char_class(char c)
{
enum kind result;
switch(c) {
case 'a' ... 'z':
case 'A' ... 'Z':
result = alpha;
break;
case '0' ... '9':
result = number;
break;
case ' ':
case '\n':
case '\t':
case '\r':
case '\v':
result = white;
break;
default:
result = other;
break;
}
return result;
}
{code}
If an endpoint of a case range is a numeric literal, leave white space around the ellipsis (...) to avoid having one of the dots treated as a decimal point. For example:
{code}
case 0...4; //error
case 5 ... 9; // ok
{code}
* The second operand in a conditional expression can be omitted. If the first operand is then non-zero, the value of the conditional expression is that of the first operand. For example, in the following expression, if x is non-zero, then the value of the expression is x. Otherwise, the value is {{y}}.
{code}
x ? : y
{code}
The expression is equivalent to the following, except that second reference to x is not reevaluated:
{code}
x ? x : y
{code}
By omitting the second operand, the already computed value of the first operand is reused. The omission of the second operand is a gcc extension to the C language, which is now supported by the Sun Studio C compiler.
* The {{\-features=}}\[{{no%}}\]{{conststrings}} flag enables or disables string literal placement in read-only memory. The default is \-features=conststrings, which replaces the deprecated \-xstrconst option. Programs attempting to write to a string literal now fail under the default compilation mode just as if \-xstrconst had been explicitly specified on the command line.
----
h3. C+\+ compiler
* Option changes:
** The {{\-xia}} option is now supported by the C+\+ compiler on the Solaris OS on x86 platforms. This option links the appropriate interval arithmetic libraries and sets a suitable floating-point environment.{{\-xia}} is a macro that expands to {{\-fsimple=0 \-ftrap=%none \-fns=no \-library=interval}}. This option also requires that you specify a value for the {{\-xarch}} option that supports SSE2 instructions, such as {{\-xarch=sse2}}. See the CC.1 man page for more information.
** The \-xipo_archive option is now supported on the Solaris OS on x86 platforms and on the Linux OS on x86 platforms. See the C+\+ User's Guide and the CC(1) man page for more information.
** The \-Qoption option doesn't accept {{ube_ipa}} as an argument.
** The expansion of the \-fast option now includes {{\-D_MATHERR_ERRNO_DONTCARE}}.
** The \-xvpara option, which issues warnings about potential parallel programming related problems that might cause incorrect results when using OpenMP or Sun parallel directives as d pragmas, is now supported. Use this option with the \-xopenmp option and OpenMP API directives, or with the \-xexplicitpar option and MP parallelization directives. See the CC.1man page for more information.
** The {{\-sb, \-sbfast, \-xsb, and \-xsbfast}} options are obsolete and have been removed.
* The C+\+ compiler now inlines code when you specify \-g with any \-0 or \-x0 value as long as you do not also specify \+d. In previous releases, \-g automatically specified \+d, but this is no longer the case.
* The pragma {{\#pragma must_have_frame(}}{_}list-of-function-names{_}{{)}} is now supported.
This pragma requests that the specified list of routines always be compiled to have a complete stack frame (as defined in the System V ABI).This pragma is permitted only after the prototypes for the specified functions are declared. The pragma must precede the end of the function.
If a function name is overloaded, the most recently declared function is chosen.
Using the pragma after the function prototype:
{code}
extern void foo(int);
extern void bar(int);
#pragma must_have_frame(foo, bar)
{code}
Using the pragma inside the function definition:
{code}
void foo(int) {
.
#pragma must_have_frame(foo)
.
return;
}
{code}
* In C++, a case label in a switch statement can have only one associated value. The Sun Studio C+\+ compiler allows an extension found in some compilers known as case ranges.A case range specifies a range of values to associate with an individual case label. The syntax of a case range is: {{case low ... high :}}
A case range behaves exactly as if a case label had been specified for each value in the given range from low to high inclusive. If low and high are equal, the case range specifies only the one value. The lower and upper values must be valid integer constant expressions. You can freely intermix case ranges and case labels, and you can specify multiple case ranges within a switch statement.
The following code is a programming example of a case range:
{code}
enum kind { alpha, number, white, other };
enum kind char_class(char c)
{
enum kind result;
switch(c) {
case 'a' ... 'z':
case 'A' ... 'Z':
result = alpha;
break;
case '0' ... '9':
result = number;
break;
case ' ':
case '\n':
case '\t':
case '\r':
case '\v':
result = white;
break;
default:
result = other;
break;
}
return result;
}
{code}
If an endpoint of a case range is a numeric literal, leave white space around the ellipsis (...) to avoid having one of the dots treated as a decimal point. For example:
{code}
case 0...4; //error
case 5 ... 9; // ok
{code}
* The C+\+ compiler normally creates temporary files in the directory /tmp. You can specify another directory by setting the TMPDIR environment variable to the directory of your choice. However, if the directory to which you set the variable is not a valid directory, the compiler uses /tmp. The \-temp option has precedence over the TMPDIR environment variable.
* The following attributes of functions are now supported:
{code}
_attribute_((const))
_attribute_((constructor))
_attribute_((destructor))
{code}
For more information, see 5.33 Specifying Attributes of Types in the _GNU Manual_.
* The following attribute of variables is now supported for struct and enum types only: {{\_attribute_((packed))}}
For more information, see [5.32 Specifying Attributes of Variables in the _GNU Manual_|http://gcc.gnu.org/onlinedocs/gcc-3.3.6/gcc/Variable-Attributes.html#Variable-Attributes]
* Universal Character Names are now supported. For more information, see paragraph 2 of section 2.2 Character Sets in the _Standard for Programming Language C+\+_. Copies of the standard are available from INCITS (www.incits.org), the body that charters the US C+\+ Committee, at their store: [http://www.techstreet.com/cgi-bin/detail?product_id=1143945] .
* Loop pragmas are now supported.
* User-defined names for macro variadic arguments are now supported. For example:
{code}
void foo(int, int);
#define M(args...) foo(args)
int main()
{
M(1, 2);
}
{code}
For more information, see section 3.6 Variadic Macros in _The C Preprocessor_.
----
h3. Fortran compiler
* Option changes:
** The \-Qoption option doesn't accept ube_ipa as an argument.
** The \-xvpara option is now supported. This option is a synonym for \-vpara.
** The \-sb, \-sbfast, \-xsb, and \-xsbfast options are obsolete and have been removed.
* The Fortran 95 compiler normally creates temporary files in the directory /tmp. You can specify another directory by setting the TMPDIR environment variable to the directory of your choice. However, if the directory to which you set the variable is not a valid directory, the compiler uses /tmp. The \-temp option has precedence over the TMPDIR environment variable.
* The behavior of the cpu_time() Fortran 95 intrinsic routine is different between Solaris and Linux platforms. On Solaris platforms, the cpu_time() routine is based on the gethrtime()routine, while on Linux platforms it is based on the getrusage() routine. The differences between the two implementations are:
** The Linux version measures CPU usage time of one thread; the Solaris version measures the wall clock time.
** The Linux version has lower resolution (around 1 msec); the Solaris version has higher resolution.
* The Fortran 2003 {{IMPORT}} statement is implemented.
----
h3. dbx debugger
* Runtime checking (RTC) now gives information about array out-of-bounds access on the Solaris OS on x86 platforms. Runtime Checking reports the following array out-of-bounds errors:
| rob | Read from array out-of-bounds memory |
| wob | Write to array out-of-bounds memory |
* Runtime Checking (RTC) now supports access, leaks, and memuse checking
on the following Linux platforms: SLES10, RHEL5.
* dbx can now evaluate function parameters and local variables in optimized code when the code provides the needed debugging information. gcc compilers provide this information. Sun Studio compilers for SPARC platforms provide the information if you specify a new option (-Wc\,gen_loclist=1) when compiling. For more information, see [Optimized Code Debugging With Sun Studio dbx|http://developers.sun.com/sunstudio/documentation/techart/optimizedcode.html]
----
h3. Sun Performance Library
* LAPACK routine are updated to conform to the latest specification of LAPACK 3.1.1
* Support for Woodcrest CPUs is available. To link with this library, use the following options:
** For C and Fortran:-m64 \-xlic_lib=sunperf
** For C++:-m64 \-library=sunperf
* Support for SPARC64-VI and SPARC64-VII CPUs is available. This version of Sun Performance Library uses the floating point multiply-add instruction to achieve the best performance possible on these CPUs. To link with this library, use the following options:
** For C and Fortran: {{\-xtarget=sparc64vi \-fma=fused \-xlic_lib=sunperf}}
** For C++: {{\-xtarget=sparc64vi \-fma=fused \-library=sunperf}}
----
h3. OpenMP 3.0
This Express release includes support for OpenMP 3.0 features in the C, C++, and Fortran compilers.
* Support for OpenMP 3.0 in this Express release includes a libmtsk library. OpenMP programs will link with this library by default instead of the libmtsk library in the Solaris OS.
* Tasking
* Loop collapse
* Runtime routines for nesting support
* Runtime routines for runtime schedule
* Environment variables OMP_STACKSIZE and OMP_WAIT_POLICY
* AUTO loop schedule
* Enhanced threadprivate support in C+\+
* Threadprivate static class member (C++)
* Unsigned int loop control variable (C and C++)
* New value for \_OPENMP macro (200805L)
For more information about these features, please refer to the !pointer.gif! [OpenMP Specification Version 3.0|http://openmp.org] and the !pointer.gif! [Sun Studio OpenMP Wiki|https://wikis.sun.com/display/openmp/Sun+Studio+OpenMP].
----
h3. D-Light Tool
The objective of the D-Light tool is to make sophisticated application and system profiling, accessible. There are many tools that profile applications and there are other tool that profile the system stack, but there are few tools that can join these views into an easy to use interface. For the first time, you can optimize your application and system environment by visualizing performance bottlenecks and resource contention up and down the application system stack.
Using an intuitive drag and drop interface, the D-Light tool provides an extensible library of instruments that represent the latest advances of profiling technology, including Solaris Dynamic Tracing (DTrace). With instruments like CPU accountant and Sampler, developers can use the interactive GUI to quickly profile and peer into the runtime behavior of their applications.
For more information on using the D-Light tool, refer to the Project D-Light Tutorial.
The D-Light Tool is now supported on Linux platforms for twi instruments: Clock Profiler (based on Performance Analyzer) and Java Ticker.
----
h3. DTrace GUI Plug-in
The NetBeans DTrace GUI plug-in is a Graphical User Interface (GUI) for running DTrace scripts, even those that are embedded in shell scripts. In fact, the DTrace GUI plug-in runs all of the scripts that are packaged in the DTraceToolkit. The DTraceToolkit is a collection of useful documented scripts developed by the OpenSolaris DTrace community.
For documentation for the 0.2 version of the plugin that is included in this Express release, see [http://www.netbeans.org/kb/60/ide/NetBeans_DTrace_GUI_Plugin.html]
The 0.4 version of the plugin, which includes the Chime graphical tool for visualizing DTrace aggregations, is now available for download from the NetBeans Plugin Portal. To download and install this version, choose Tools->Plugins in the Sun Studio IDE, and select DTrace from the Available Plugins list.
----
h3. Automatic Tuning and Troubleshooting System (ats)
ats is a binary reoptimization and recompilation tool that can be used for tuning and troubleshooting applications. ats works by rebuilding the compiled PEC binary; the original source code is not required. Examples of what can be achieved using ats are:
* Find the compiler options that give the best performance
* Find the object file and the optimization flag that is causing a runtime problem
* Rebuild the application using new compiler options
There is an !pointer.gif! [ats(1) man page|https://wikis.sun.com/display/SunStudio/ats+man+page] and an !pointer.gif! [ATS Guide| https://wikis.sun.com/display/SunStudio/ATS+Guide]
In this Express release, ats is available for the Solaris OS on SPARC and x86/x64 platforms.
----
h3. Binary Improvement Tool (bit)
bit is a suite of tools for improving binaries. These tools are used via six subcommands:
* The instrument subcommand instruments a binary (the target) so that when the instrumented target is run, it creates an instrumentation data directory with information about the execution of the target.
* The analyze subcommand uses the instrumentation data to produce reports on instruction execution.
* The optimize subcommand uses the instrumentation data to optimize the target.
* The coverage subcommand uses the instrumentation data to produce a code coverage report.
* The collect subcommand combines an instrument subcommand, a target run, and an analyze subcommand.
* The check subcommand prints information about a target binary.
There is a !pointer.gif! [bit(1)|https://wikis.sun.com/display/SunStudio/bit+man+page] man page.
In this Express release, bit is only available for the Solaris OS on SPARC platforms.
----
h3. Discover
Sun Memory Error Discovery Tool (Discover) is a tool used to detect programming errors related to the allocation and use of program memory at runtime. Examples of errors detected by Discover include:
* Accessing uninitialized memory
* Reads from and writes to unallocated memory
* Accessing memory beyond allocated array bounds
* Use of freed memory
* Freeing wrong memory blocks
* Memory leaks
There is a !pointer.gif! [discover(1)|https://wikis.sun.com/display/SunStudio/discover+man+page] man page, and a !pointer.gif! [User Guide in PDF|https://wikis.sun.com/download/attachments/38211135/DISCOVER_users_guide.pdf].
Discover is only available on Solaris OS on SPARC platforms for Express.
----
h3. The Simple Performance Optimization Tool (SPOT)
The spot command runs a set of performance tools on the target application and renders the output as a set of hyperlinked web pages.The spot command can be used in two ways:
* Attaching to a running process and gathering data from the process using a variety of probes.
* Running an application multiple times, each time under a different probe.
There is a !pointer.gif! [spot(1)|spot man page] man page, and a User Guide on !pointer.gif! [docs.sun.com|http://docs.sun.com/app/docs/doc/820-5372]
In this Express release, the spot command is available only for the Solaris OS on SPARC platforms.
----
h3. SSSE3, SSE4.1, and SSE4.2 Intrinsics
h5. SSSE3
{noformat}
SSSE3 Assembler Syntax and Corresponding Compiler Intrinsics
PSIGNB, PSIGNW, PSIGND
- Syntax: psignb/psignw/psignd mem64/mmxreg, mmxreg
psignb/psignw/psignd mem128/xmmxreg, xmmxreg
- Semantic: Packed Sign
- Corresponding intrinsics:
extern __m64 _mm_sign_pi8 (__m64 p1, __m64 p2);
extern __m64 _mm_sign_pi16 (__m64 p1, __m64 p2);
extern __m64 _mm_sign_pi32 (__m64 p1, __m64 p2);
extern __m128i _mm_sign_epi8 (__m128i p1, __m128i p2);
extern __m128i _mm_sign_epi16 (__m128i p1, __m128i p2);
extern __m128i _mm_sign_epi32 (__m128i p1, __m128i p2);
PABSB, PABSW, PABSD
- Syntax: pabsb/pabsw/pabsd mem64/mmxreg, mmxreg
pabsb/pabsw/pabsd mem128/xmmxreg, xmmxreg
- Semantic: Packed Absolute Value
- Corresponding intrinsics:
extern __m64 _mm_abs_pi8 (__m64 p);
extern __m64 _mm_abs_pi16 (__m64 p);
extern __m64 _mm_abs_pi32 (__m64 p);
extern __m128i _mm_abs_epi8 (__m128i p);
extern __m128i _mm_abs_epi16 (__m128i p);
extern __m128i _mm_abs_epi32 (__m128i p);
PALIGNR
- Syntax: palignr imm, mem64/mmxreg, mmxreg
palignr imm, mem128/xmmreg, xmmreg
- Semantic: Packed Align Right
- Corresponding intrinsics:
extern __m64 _mm_alignr_pi8 (__m64 p1, __m64 p2, int immd);
extern __m128i _mm_alignr_epi8 (__m128i p1, __m128i p2, int immd);
PSHUFB
- Syntax: pshufb mem64/mmxreg, mmxreg
pshufb mem128/xmmxreg, xmmxreg
- Semantic: Packed Shuffle Bytes
- Corresponding intrinsics:
extern __m64 _mm_shuffle_pi8 (__m64 p1, __m64 p2);
extern __m128i _mm_shuffle_epi8 (__m128i p1, __m128i p2);
PMULHRSW
- Syntax: pmulhrsw mem64/mmxreg, mmxreg
pmulhrsw mem128/xmmxreg, xmmxreg
- Semantic: Packed Multiply High with Round and Scale
- Corresponding intrinsics:
extern __m64 _mm_mulhrs_pi16 (__m64 p1, __m64 p2);
extern __m128i _mm_mulhrs_epi16 (__m128i p1, __m128i p2);
PMADDUBSW
- Syntax: pmaddubsw mem64/mmxreg, mmxreg
pmaddubsw mem128/xmmxreg, xmmxreg
- Semantic: Multiply and Add Packed Signed and Unsigned Bytes
- Corresponding intrinsics:
extern __m64 _mm_maddubs_pi16 (__m64 p1, __m64 p2);
extern __m128i _mm_maddubs_epi16 (__m128i p1, __m128i p2);
PHSUBW, PHSUBD
- Syntax: phsubw/phsubd mem64/mmxreg, mmxreg
phsubw/phsubd mem128/xmmxreg, xmmxreg
- Semantic: Packed Horizontal Subtract
- Corresponding intrinsics:
extern __m64 _mm_hsub_pi16 (__m64 p1, __m64 p2);
extern __m64 _mm_hsub_pi32 (__m64 p1, __m64 p2);
extern __m128i _mm_hsub_epi16 (__m128i p1, __m128i p2);
extern __m128i _mm_hsub_epi32 (__m128i p1, __m128i p2);
PHSUBSW
- Syntax: phsubsw mem64/mmxreg, mmxreg
phsubsw mem128/xmmxreg, xmmxreg
- Semantic: Packed Horizontal Subtract and Saturate Words
- Corresponding intrinsics:
extern __m64 _mm_hsubs_pi16 (__m64 p1, __m64 p2);
extern __m128i _mm_hsubs_epi16 (__m128i p1, __m128i p2);
PHADDW, PHADDD
- Syntax: phaddw/phaddd mem64/mmxreg, mmxreg
phaddw/phaddd mem128/xmmxreg, xmmxreg
- Semantic: Packed Horizontal Add
- Corresponding intrinsics:
extern __m64 _mm_hadd_pi16 (__m64 p1, __m64 p2);
extern __m64 _mm_hadd_pi32 (__m64 p1, __m64 p2);
extern __m128i _mm_hadd_epi16 (__m128i p1, __m128i p2);
extern __m128i _mm_hadd_epi32 (__m128i p1, __m128i p2);
PHADDSW
- Syntax: phaddsw mem64/mmxreg, mmxreg
phaddsw mem128/xmmxreg, xmmxreg
- Semantic: Packed Horizontal Add and Saturate Words
- Corresponding intrinsics:
extern __m64 _mm_hadds_pi16 (__m64 p1, __m64 p2);
extern __m128i _mm_hadds_epi16 (__m128i p1, __m128i p2);
{noformat}
h5. SSE4.1
{noformat}
SSE4.1 Assembler Syntax and Corresponding Compiler Intrinsics (Rev 1.0)
BLENDPD/BLENDPS
- Syntax: Blend packed double/single precision floating point values
blendpd/blendps $imm8, xmmreg/mem128, xmmreg
- Semantic: Copy elements from one location to another based on bits
of an immediate operand
- Corresponding intrinsics:
__m128d _mm_blend_pd(__m128d p1, __m128d p2, const int immd);
__m128 _mm_blend_ps(__m128 p1, __m128 p2, const int immd);
BLENDVPD/BLENDVPS
- Syntax: Variable blend double/single precision floating point values
blendvpd/blendvps xmmreg/mem128, xmmreg
blendvpd/blendvps XMMREG, xmmreg/mem128, xmmreg
- Semantic: Copy elements from one location to another based on bits
in register XMMREG
- Corresponding intrinsics:
__m128d _mm_blendv_pd(__m128d p1, __m128d p2, __m128d p3);
__m128 _mm_blendv_ps(__m128 p1, __m128 p2, __m128 p3);
DPPD/DPPS
- Syntax: Dot product of packed double/single precision floating
point values
dppd/dpps $imm8, xmmreg/mem128, xmmreg
- Semantic: Based on bits in the immediate operand to select which of the
entries in the input to multiply and accumulate, and to
select whether to put 0 or the dot-product in the correspondent
field of the result register
- Corresponding intrinsics:
__m128d _mm_dp_pd(__m128d p1, __m128d p2, const int immd);
__m128 _mm_dp_ps(__m128 p1, __m128 p2, const int immd);
EXTRACTPS
- Syntax: Extract packed single precision floating point value
extractps $imm8, xmmreg, reg32/mem32
extractps $imm8, xmmreg, reg64/mem64
- Semantic: Based on bits in the immediate operand to extract a field from
the source register and insert it into an x86 register or
memory address
- Corresponding intrinsics:
int _mm_extract_ps(__m128 p1, const int immd);
INSERTPS
- Syntax: Insert packed single precision floating point value
insertps $imm8, xmmreg/mem32, xmmreg
- Semantic: Load a floating point value from memory indicated by mem32
or based on bits in the immediate operand to select a single
precision floating point value from the source xmmreg and
insert it into the destination register also based on the
bits of the immediate operand
- Corresponding intrinsics:
__m128 _mm_insert_ps(__m128 p1, __m128 p2, const int immd);
MOVNTDQA
- Syntax: Load 16 bytes with non-temporal Algined Hint
movntdqa mem128, xmmreg
- Semantic: Load from write-combining memory area into xmm register
- Corresponding intrinsics:
__m128i _mm_stream_load_si128(__m128i *p);
MPSADBW
- Syntax: Calculate muliple packed sums of absolute difference
mpsadbw $imm8, xmmreg/mem128, xmmreg
- Semantic: Based on bits in the immediate operand to select the
destination and source fields to be used, compute
eight offset sums of absolute differences for
(|x0-y0|+|x1-y1|+|x2-y2|+...)
- Corresponding intrinsics:
__m128i _mm_mpsadbw_epu8(__m128i p1, __m128i p2, const int immd);
PACKUSDW
- Syntax: Pack with Unsigned Saturation
packusdw xmmreg/mem128, xmmreg
- Semantic: Convert signed 4 bytes in source and destination operands
into unsigned 2 bytes with saturation
- Corresponding intrinsics:
__m128i _mm_packus_epi32(__m128i p1, __m128i p2);
PBLENDW
- Syntax: Blend packed 16-byte words
pblendw $imm8, xmmreg/mem128, xmmreg
- Semantic: Based on bits in the immediate operand, select 16-byte
values from the second and destination operands to be
stored into the destination operand
- Corresponding intrinsics:
__m128i _mm_blend_epi16(__m128i p1, __m128i p2, const int p3);
PCMPEQQ
- Syntax: Compare packed 64-bit values for equality
pcmpeqq xmmreg/mem128, xmmreg
- Semantic: Compare packed 64-bit values in source and destination
operand for equality. Set all 0s or all 1s in destination
register as result
- Corresponding intrinsics:
__m128i _mm_cmpeq_epi64(__m128i p1, __m128i p2);
PEXTRB/PEXTRW/PEXTRD/PEXTRQ
- Syntax: Extract byte/16-bit value/32-bit value/64-bit value
pextrb $imm8, xmmreg, reg32/mem8
pextrb $imm8, xmmreg, reg64/mem8
pextrw $imm8, xmmreg, reg32/mem16
pextrw $imm8, xmmreg, reg64/mem16
pextrd $imm8, xmmreg, reg32/mem32
pextrq $imm8, xmmreg, reg64/mem64
- Semantic: Based on bits in the immediate operand to select and
extract a 8/16/32/64-bit value from the xmmreg and
store into the destination operand
- Corresponding intrinsics:
int _mm_extract_epi8(__m128i p1, const int immd);
int _mm_extract_epi16(__m128i p1, const int immd);
int _mm_extract_epi32(__m128i p1, const int immd);
long long _mm_extract_epi64(__m128i p1, const int immd);
PHMINPOSUW
- Syntax: Packed horizontal 16-bit value minimum
phminposuw xmmreg/mem128, xmmreg
- Semantic: Find the minimum unsigned 16-bit value in the source operand
and place the value and its index in the destination register
- Corresponding intrinsics:
__m128i _mm_minpos_epu16(__m128i p1);
PINSRB/PINSRD/PINSRQ
- Syntax: Insert byte, 32-bit value, 64-bit value
pinsrb $imm8, reg32/mem8, xmmreg
pinsrd $imm8, reg32/mem32, xmmreg
pinsrq $imm8, reg64/mem64, xmmreg
- Semantic: Based on the bits in the immediate operand to insert
the byte/32-bit/64-bit value from the source operand into
the destination xmm register
- Corresponding intrinsics:
__m128i _mm_insert_epi8(__m128i p1, int p2, const int immd);
__m128i _mm_insert_epi32(__m128i p1, int p2, const int immd);
__m128i _mm_insert_epi64(__m128i p1, long long p2, const int immd);
PMAXSB/PMAXSD
- Syntax: Maximum of packed signed byte/32-bit integers
pmaxsb xmmreg/mem128, xmmreg
pmaxsd xmmreg/mem128, xmmreg
- Semantic: Compare the packed signed byte/32-bit values in the 2
operands and store the maximum packed values in the destination
register
- Corresponding intrinsics:
__m128i _mm_max_epi8(__m128i p1, __m128i p2);
__m128i _mm_max_epi32(__m128i p1, __m128i p2);
PMAXUW/PMAXUD
- Syntax: Maximum of packed unsigned 16-bit/32-bit integers
pmaxuw xmmreg/mem128, xmmreg
pmaxud xmmreg/mem128, xmmreg
- Semantic: Compare the packed unsigned 16-bit/32-bit values in the
2 operands and store the maximum packed values in the
destination register
- Corresponding intrinsics:
__m128i _mm_max_epu16(__m128i p1, __m128i p2);
__m128i _mm_max_epu32(__m128i p1, __m128i p2);
PMINSB/PMINSD
- Syntax: Minimum of packed signed byte/32-bit integers
pminsb xmmreg/mem128, xmmreg
pminsd xmmreg/mem128, xmmreg
- Semantic: Compare the packed signed byte/32-bit values in the
2 operands and store the minimum packed values in the
destination register
- Corresponding intrinsics:
__m128i _mm_min_epi8(__m128i p1, __m128i p2);
__m128i _mm_min_epi32(__m128i p1, __m128i p2);
PMINUW/PMINUD
- Syntax: Minimum of packed unsigned 16-bit/32-bit integers
pminuw xmmreg/mem128, xmmreg
pminud xmmreg/mem128, xmmreg
- Semantic: Compare the packed unsigned 32-bit values in the 2 operands
and store the minimum packed values in the destination
register
- Corresponding intrinsics:
__m128i _mm_min_epu32(__m128i p1, __m128i p2);
__m128i _mm_min_epu16(__m128i p1, __m128i p2);
PMOVSXBW/PMOVSXBD/PMOVSXBQ/PMOVSXWD/PMOVSXWQ/PMOVSXDQ
- Syntax: Move packed values with sign extension
pmovsxbw xmmreg/mem64, xmmreg
pmovsxbd xmmreg/mem32, xmmreg
pmovsxbq xmmreg/mem16, xmmreg
pmovsxwd xmmreg/mem64, xmmreg
pmovsxwq xmmreg/mem32, xmmreg
pmovsxdq xmmreg/mem64, xmmreg
- Semantic: Sign extend 8/4/2 packed 8-bit values or 4/2 packed 16-bit
values or 2 packed 32-bit values in the source operand and
move it into 8/4/2 packed 16-bit/32-bit/64-bit values or
4/2 packed 32-bit/64-bit values or 2 packed 64-bit values
in the destination register respectively
- Corresponding intrinsics:
__m128i _mm_cvtepi8_epi16(__m128i p1);
__m128i _mm_cvtepi8_epi32(__m128i p1);
__m128i _mm_cvtepi8_epi64(__m128i p1);
__m128i _mm_cvtepi16_epi32(__m128i p1);
__m128i _mm_cvtepi16_epi64(__m128i p1);
__m128i _mm_cvtepi32_epi64(__m128i p1);
PMOVZXBW/PMOVZXBD/PMOVZXBQ/PMOVZXWD/PMOVZXWQ/PMOVZXDQ
- Syntax: Move packed values with zero extension
pmovzxbw xmmreg/mem64, xmmreg
pmovzxbd xmmreg/mem32, xmmreg
pmovzxbq xmmreg/mem16, xmmreg
pmovzxwd xmmreg/mem64, xmmreg
pmovzxwq xmmreg/mem32, xmmreg
pmovzxdq xmmreg/mem64, xmmreg
- Semantic: Zero extend 8/4/2 packed 8-bit values or 4/2 packed 16-bit
values or 2 packed 32-bit values in the source operand and
move it into 8/4/2 packed 16-bit/32-bit/64-bit values or
4/2 packed 32-bit/64-bit values or 2 packed 64-bit values
in the destination register respectively
- Corresponding intrinsics:
__m128i _mm_cvtepu8_epi16(__m128i p1);
__m128i _mm_cvtepu8_epi32(__m128i p1);
__m128i _mm_cvtepu8_epi64(__m128i p1);
__m128i _mm_cvtepu16_epi32(__m128i p1);
__m128i _mm_cvtepu16_epi64(__m128i p1);
__m128i _mm_cvtepu32_epi64(__m128i p1);
PMULDD/PMULDQ
- Syntax: Multiply packed signed 32-bit/64-bit integers
pmuldd xmmreg/mem128, xmmreg
pmuldq xmmreg/mem128, xmmreg
- Semantic: Multiply the packed signed 32-bit/64-bit values in the 2
operands and store the 32-bit/64-bit result in the destination
register
- Corresponding intrinsics:
__m128i _mm_mullo_epi32(__m128i p1, __m128i p2);
__m128i _mm_mul_epi32(__m128i p1, __m128i p2);
PTEST
- Syntax: Logical compare
ptest xmmreg/mem128, xmmreg
- Semantic: Set the Z flag if any of the bits in the 2 operands matched
and the C flag if all of them matched.
- Corresponding intrinsics:
int _mm_testz_si128(__m128i p1, __m128i p2);
int _mm_testc_si128(__m128i p1, __m128i p2);
int _mm_testnzc_si128(__m128i p1, __m128i p2);
ROUNDPS/ROUNDPD
- Syntax: Round packed single/double precision floating point values
roundps $imm8, xmmreg/mem128, xmmreg
roundpd $imm8, xmmreg/mem128, xmmreg
- Semantic: Based on the rounding mode in the immediate operand, round
the single/double precision packed values in the source
operand and place them in the destination register
- Corresponding intrinsics:
__m128 _mm_round_ps(__m128 p1, int immd);
__m128 _mm_floor_ps(__m128 p1);
__m128 _mm_cell_ps(__m128 p1);
__m128d _mm_round_pd(__m128d p1, int immd);
__m128d _mm_floor_pd(__m128d p1);
__m128d _mm_cell_pd(__m128d p1);
ROUNDSS/ROUNDSD
- Syntax: Round scalar single/double precision floating point values
roundss $imm8, xmmreg/mem64, xmmreg
roundsd $imm8, xmmreg/mem32, xmmreg
- Semantic: Based on the rounding mode in the immediate operand, round
the single/double precision scalar low value in the source
operand and place it in the destination register
- Corresponding intrinsics:
__m128 _mm_round_ss(__m128 p1, __m128 p2, int immd);
__m128 _mm_floor_ss(__m128 p1, __m128 p2);
__m128 _mm_cell_ss(__m128 p1, __m128 p2);
__m128d _mm_round_sd(__m128d p1, __m128d p2, int immd);
__m128d _mm_floor_sd(__m128d p1, __m128d p2);
__m128d _mm_cell_sd(__m128d p1, __m128d p2);
{noformat}
h5. SSE4.2
{noformat}
SSE4.2 Assembler Syntax and Corresponding Compiler Intrinsics
CRC32
- Syntax: Accumulate CRC32 value
crc32 reg8/reg16/reg32/mem8/mem16/mem32, reg32
crc32 reg8/reg64/mem8/mem64, reg64
crc32b reg8/mem8, reg32
crc32b reg8/mem8, reg64
crc32w reg16/mem16, reg32
crc32l reg32/mem32, reg32
crc32q reg64/mem64, reg64
- Semantic: Accumulate CRC32C value using the polynomial 0x11edc7f41
- Corresponding intrinsics:
unsigned int _mm_crc32_u8(unsigned int crc, unsigned char data);
unsigned int _mm_crc32_u16(unsigned int crc, unsigned short data);
unsigned int _mm_crc32_u32(unsigned int crc, unsigned int data);
unsigned long long _mm_crc32_u64(unsigned long long crc,
unsigned long long data);
PCMPESTRI
- Syntax: Packed compare explicit length strings, return index
pcmestri $imm8, xmmreg/mem128, xmmreg
- Semantic: Perform packed comparison of string data with explicit lengths,
generate an index stored to %ecx
- Corresponding intrinsics:
int _mm_cmpestri(__m128i a, int len_a, __m128i b, int len_b, const int imm);
int _mm_cmpestra(__m128i a, int len_a, __m128i b, int len_b, const int imm);
int _mm_cmpestrc(__m128i a, int len_a, __m128i b, int len_b, const int imm);
int _mm_cmpestro(__m128i a, int len_a, __m128i b, int len_b, const int imm);
int _mm_cmpestrs(__m128i a, int len_a, __m128i b, int len_b, const int imm);
int _mm_cmpestrz(__m128i a, int len_a, __m128i b, int len_b, const int imm);
PCMPESTRM
- Syntax: Packed compare explicit length strings, return mask
pcmestrm $imm8, xmmreg/mem128, xmmreg
- Semantic: Perform packed comparison of string data with explicit lengths,
generate a mask stored to %xmm0
- Corresponding intrinsics:
int _mm_cmpestrm(__m128i a, int len_a, __m128i b, int len_b, const int imm);
int _mm_cmpestra(__m128i a, int len_a, __m128i b, int len_b, const int imm);
int _mm_cmpestrc(__m128i a, int len_a, __m128i b, int len_b, const int imm);
int _mm_cmpestro(__m128i a, int len_a, __m128i b, int len_b, const int imm);
int _mm_cmpestrs(__m128i a, int len_a, __m128i b, int len_b, const int imm);
int _mm_cmpestrz(__m128i a, int len_a, __m128i b, int len_b, const int imm);
PCMISPTRI
- Syntax: Packed compare implicit length strings, return index
pcmistri $imm8, xmmreg/mem128, xmmreg
- Semantic: Perform packed comparison of string data with implicit lengths,
generate an index stored to %ecx
- Corresponding intrinsics:
int _mm_cmpistri(__m128i a, __m128i b, const int imm);
int _mm_cmpistra(__m128i a, __m128i b, const int imm);
int _mm_cmpistrc(__m128i a, __m128i b, const int imm);
int _mm_cmpistro(__m128i a, __m128i b, const int imm);
int _mm_cmpistrs(__m128i a, __m128i b, const int imm);
int _mm_cmpistrz(__m128i a, __m128i b, const int imm);
PCMPESTRM
- Syntax: Packed compare explicit length strings, return mask
pcmestrm $imm8, xmmreg/mem128, xmmreg
- Semantic: Perform packed comparison of string data with explicit lengths,
generate a mask stored to %xmm0
- Corresponding intrinsics:
int _mm_cmpestrm(__m128i a, int len_a, __m128i b, int len_b, const int imm);
int _mm_cmpestra(__m128i a, int len_a, __m128i b, int len_b, const int imm);
int _mm_cmpestrc(__m128i a, int len_a, __m128i b, int len_b, const int imm);
int _mm_cmpestro(__m128i a, int len_a, __m128i b, int len_b, const int imm);
int _mm_cmpestrs(__m128i a, int len_a, __m128i b, int len_b, const int imm);
int _mm_cmpestrz(__m128i a, int len_a, __m128i b, int len_b, const int imm);
PCMISPTRI
- Syntax: Packed compare implicit length strings, return index
pcmistri $imm8, xmmreg/mem128, xmmreg
- Semantic: Perform packed comparison of string data with implicit lengths,
generate an index stored to %ecx
- Corresponding intrinsics:
int _mm_cmpistri(__m128i a, __m128i b, const int imm);
int _mm_cmpistra(__m128i a, __m128i b, const int imm);
int _mm_cmpistrc(__m128i a, __m128i b, const int imm);
int _mm_cmpistro(__m128i a, __m128i b, const int imm);
int _mm_cmpistrs(__m128i a, __m128i b, const int imm);
int _mm_cmpistrz(__m128i a, __m128i b, const int imm);
PCMPESTRM
- Syntax: Packed compare explicit length strings, return mask
pcmestrm $imm8, xmmreg/mem128, xmmreg
- Semantic: Perform packed comparison of string data with explicit lengths,
generate a mask stored to %xmm0
- Corresponding intrinsics:
int _mm_cmpestrm(__m128i a, int len_a, __m128i b, int len_b, const int imm);
int _mm_cmpestra(__m128i a, int len_a, __m128i b, int len_b, const int imm);
int _mm_cmpestrc(__m128i a, int len_a, __m128i b, int len_b, const int imm);
int _mm_cmpestro(__m128i a, int len_a, __m128i b, int len_b, const int imm);
int _mm_cmpestrs(__m128i a, int len_a, __m128i b, int len_b, const int imm);
int _mm_cmpestrz(__m128i a, int len_a, __m128i b, int len_b, const int imm);
PCMISPTRI
- Syntax: Packed compare implicit length strings, return index
pcmistri $imm8, xmmreg/mem128, xmmreg
- Semantic: Perform packed comparison of string data with implicit lengths,
generate an index stored to %ecx
- Corresponding intrinsics:
int _mm_cmpistri(__m128i a, __m128i b, const int imm);
int _mm_cmpistra(__m128i a, __m128i b, const int imm);
int _mm_cmpistrc(__m128i a, __m128i b, const int imm);
int _mm_cmpistro(__m128i a, __m128i b, const int imm);
int _mm_cmpistrs(__m128i a, __m128i b, const int imm);
int _mm_cmpistrz(__m128i a, __m128i b, const int imm);
PCMPISTRM
- Syntax: Packed compare implicit length strings, return mask
pcmistrm $imm8, xmmreg/mem128, xmmreg
- Semantic: Perform packed comparison of string data with implicit lengths,
generate a mask stored to %xmm0
- Corresponding intrinsics:
int _mm_cmpistrm(__m128i a, __m128i b, const int imm);
int _mm_cmpistra(__m128i a, __m128i b, const int imm);
int _mm_cmpistrc(__m128i a, __m128i b, const int imm);
int _mm_cmpistro(__m128i a, __m128i b, const int imm);
int _mm_cmpistrs(__m128i a, __m128i b, const int imm);
int _mm_cmpistrz(__m128i a, __m128i b, const int imm);
PCMPGTQ
- Syntax: Compare packed data for greater than
pcmpgtq xmmreg/mem128, xmmreg
- Semantic: Compare packed 64-bit values in xmmreg/mem128 with xmmreg. Set
corresponding data in destination register to all 1s or 0s
based on the result of the greater than compare.
- Corresponding intrinsics:
__m128i _mm_cmpgt_epi64(__m128i a, __m128i b);
POPCNT
- Syntax: Population count
popcnt reg16/mem16, reg16
popcnt reg32/mem32, reg32
popcnt reg64/mem64, reg64
- Semantic: Count the number of set bits in reg/mem
- Corresponding intrinsics:
int _mm_popcnt_u32(unsigned int a);
long long _mm_popcnt_u64(unsigned long long a);
{noformat}
----
h3. Assembler on x86 Platforms
Two new options:
* \-C: In general you do not need \-C to be GNU Assembler compatible; \-C is needed only in several situations to make the semantic compatible. Refer to the changes below.
* \-a32: To allow 32-bit memory addresses in \-m64 64-bit mode.
Major area of changes:
* For mnemonics without suffix, the presence of register operand determines the suffix implicitly. If size of operation can not be determined due to absence of register operand, an error is issued if the option \-C was used, otherwise the suffix defaults to 'l'. For example
{{mov $10, %ax}} can now be used for {{movw $10, %ax}}
For {{mov $10, mem}} the Sun Studio Assembler defaults to {{movl $10, mem}}
but gives an error if \-C is used, to be compatible with the GNU Assembler.
* Allow all 16-bit instructions to accept 32-bit register operands, but to issue a warning if the \-C option is used.
* Allow 32-bit addresses under 64-bit mode using the new option \-a32. For example, the command {{fbe \-m64 \-a32 file.s}} can assemble {{lea 123(%eax,%r10d),%eax}}
* A program can now have more than 10 local labels, and a local label can now be as large as a 32-bit integer.
* You can now place the lock/rep/repnz/repz/repe/repne prefix on the same line as the following instruction.
* GNU Assembler compatible instruction synonyms have been added:
cbw==cwtd, cwd==cwtd, cwde==cwtl, cdq==cltd, cdqe==cltq, cqo==cqto,
movzb==movzbl, sysret==sysretl
* GNU Assembler compatible assembler directives have been added: .p2align,.extern,.global,.
* The \-b option, which generates extra symbol table information for the SourceBrowser, is now obsolete.
----
{panel}
{column}
{column:width=30%}
{panel}
h5. Contents
{toc:maxLevel=2|minLevel=1}
{panel}
{column}
{section}
----
This page gives greater detail on some of the new features in the current Express release of Sun Studio compilers and tools. Note that some of these features might not yet be documented in the Sun Studio man pages in this build.
Some of these features are experimental and might not be available in future releases, while some of these features might change significantly in future releases. The documentation is also preliminary and might not reflect the full range of functionality or problems and workarounds. (See the !pointer.gif! [Known Problems and Workarounds page|Sun Studio Express 11.08 Known Problems and Workarounds] for the latest updated news about this express release. )
{panel}
{section}
{column}
----
h1. Details on Sun Studio Express Releases
{panel}
h2. Sun Studio November 2008 Express Release
The following are new features in the November 2008 Express release. More information can be found on the [Sun Studio Express November 2008 Release] wiki page.
h3. IDE
* Based on NetBeans IDE 6.5
* The new Memory window displays the contents of memory addresses currently used by the process being debugged.
* The new Call Graph window displays a tree view of either the functions called from a selected function, or the functions that call that function.
* Syntactic and semantic errors are highlighted as you type code.
* You can package completed applications as tar files, zip files, SVR4 packages, RPMs, or Debian packages.
* You can define remote hosts and use development tools on those hosts to build and run projects from your client system.
----
h3. Sun Performance Library
* ZGEMM improvements for SPARC64-VI and SPARC64-VII
----
h3. OpenMP 3.0
Additional functionality has been added to complete the implementation of the OpenMP 3.0 API specifications in the C, C++, and Fortran compilers. For more information about these features, please refer to the !pointer.gif! [OpenMP Specification Version 3.0|http://openmp.org] and the !pointer.gif! [Sun Studio OpenMP Wiki|https://wikis.sun.com/display/openmp/Sun+Studio+OpenMP].
----
h3. dbx
A new graphical user interface (GUI) for dbx, *dbxtool*, is included in this November 2008 Express release. For information on invoking dbxtool, see the {{dbxtool}} man page. dbxtool is a separate GUI from the Sun Studio IDE, but is also based on NetBeans IDE 6.5. {{dbxtool}} provides access to all the functionality of dbx. It also supports attaching to a process as it starts executing to begin debugging it immediately (see the {{ss_attach}} man page), and fixing and continuing, which lets you relink source files after you make changes, without recompiling the entire program.
----
h3. Performance Analyzer
MPI analysis has been tested with the following:
* Sun HPC ClusterTools 8.0 and 8.1 with:
** Sun Studio compilers on Solaris 10 11/06 Operating System or later
** gcc on Red Hat Enterprise Linux 5 or later
** gcc on SuSE Linux Enterprise Server 10 or later
* Sun HPC ClusterTools 7.0 and 7.1 with:
** Sun Studio compilers on Solaris 10 11/06 Operating System or later
** no Linux distributions
* Open MPI 1.2.7, MPICH2 1.0.7, and MVAPICH2 1.2rc2 built using shared libraries with:
** gcc on Solaris 10 11/06 or later
** gcc on
*** SuSE Linux Enterprise Server 10 or later
*** Red Hat Enterprise Linux 5 or later
Note that for these non-Sun MPI implementations, you must build the MPI distribution and your MPI application with the same compiler and shared libraries for successful collection of data for the MPI calls.
----
h3. DTrace GUI Plug-in
The NetBeans DTrace GUI plug-in, version 0.4, is a Graphical User Interface (GUI) for running DTrace scripts, even those that are embedded in shell scripts. In fact, the DTrace GUI plug-in runs all of the scripts that are packaged in the DTraceToolkit. The DTraceToolkit is a collection of useful documented scripts developed by the OpenSolaris DTrace community.
For documentation for the 0.4 version of the plugin that is included in this Express release, see [http://www.netbeans.org/kb/docs/ide/NetBeans_DTrace_GUI_Plugin_0_4.html]
{panel}
----
{panel}
h2. Sun Studio July 2008 Express Release
The following features and changes were introduced in the July 2008 Express release and are carried over into the current release.
----
h3. Performance Analyzer
The Performance Analyzer includes major enhancements for analyzing MPI programs:
* The collect command has a new {{\-M on}} option that is used to collect data on MPI programs. The target program should be {{mpirun}}, from Sun HPC ClusterTools 8, although {{mpirun}} from other Open MPI implementations might also work.
* The Analyzer GUI includes the following new features:
** MPI Timeline, which shows a view of an MPI program's data over time. The MPI Timeline tab shows a set of bars, one for each MPI rank, indicating when each process is in user code and when it is in MPI code, and displaying messages sent among the processes.
** MPI Charts, which you can use to generate different displays of the data to analyze. The MPI Charts tab shows a variety of one\- and two-dimensional plots of data and other aggregated data concerning MPI processing.
** Zooming and filtering capabilities to focus on particular aspects of the data in the MPI Timeline and MPI Charts tabs.
** With applications created using Sun HPC ClusterTools 8 Early Access 2, clock-profiling MPI experiments show two new metrics, MPI Work Time and MPI Wait Time.
You can find details about how to use the new features in the Sun Studio Performance Analyzer for MPI programs on the !pointer.gif! [MPI Analyzer wiki page|https://wikis.sun.com/display/MPIAnalyzer ].
h4. Experiment Format
The experiment format has been changed:
* To support MPI experiments. An MPI experiment consists of a founder experiment with sub-experiments for the processes instantiated for each rank of the MPI job. The founder experiment does not have real data on the mpirun executable; rather, it contains the aggregated trace of API calls across all the processes as generated with the open source Vampir Trace library.
* To support MPI Work metrics and MPI Wait metrics from Sun HPC ClusterTools 8 Early Access 2.
h4. Other changes to the collect command
* The \-I and \-N options are accepted when-c (count) profiling is specified to better control which libraries are instrumented and where the instrumented libraries are put.
* The \-J option for Java profiling can support multiple arguments to the JavaTM Virtual Machine (JVM) as a quoted string containing blank or tab separated arguments.
* Hardware counter profiling has better support for Intel Core2 and AMD Family 10h processors.
* The Sun Studio software installation no longer includes a JavaTM 2 Software Development Kit (JDK). The {{collect}} command looks in the location specified by theJDK_HOME environment variable, the location specified by the JAVA_PATH/usr/javafor a JVM to use in profiling a jar or classfile.
h4. collector Command in dbx Debugger
The collector command in the dbx debugger has better support for Intel Core2 and AMD Family 10h processors.
h4. Other changes to the Performance Analyzer
* The Show/Hide dialog box now supports a third option, API-only, which truncates all call stacks at the first function from a shared object that has the option set, effectively suppressing all detail below that API function. The dialog also has a button to reset the defaults for all shared objects.
* The tabs directive in an .er.rc fie now specifies tab order as well as tab visibility.
h4. er_print Command
* A new object_api command enables API-only processing of call stacks from the named objects, which truncates all call stacks at the first function from a shared object that has the option set, effectively suppressing all detail below that API function.
* You can include the object_show, object_hide, and object_api commands in .er.rc files.
* A new objects_default command resets the default for all shared objects. You cannot include this command in .er.rc files.
* The tabs directive in an .er.rc file now specifies tab order as well as tab visibility.
* With Sun HPC ClusterTools 8 Early Access 2, clock-profiling MPI experiments show two new metrics, MPI Work Time and MPI Wait Time.
* Compiler commentary controls, the cc, scc, and doc commands, have been extended to all control over whether the compiler options used to build the object are shown at the bottom of the source display.
* er_print has been ported to 64-bits, and will run the 64-bit version on any system capable of supporting it. This enables er_print and Analyzer to read much larger experiments on such systems.
----
h3. Thread Analyzer
Two new interfaces have been added to the libtha API:
* tha_check_datarace_mem() instructs the Thread Analyzer to monitor or ignore accesses to a specified block of memory when doing data race detection.
* tha_check_datarace_thr() instructs the Thread Analyzer to monitor or ignore memory accesses by one or more specified threads when doing data race detection.
----
h3. IDE
The IDE in this Sun Studio Express release is based on NetBeans IDE 6.1, and includes the following new features:
* The Include-Hierarchy window lets you inspect the hierarchy of source and header files
* The Type-Hierarchy window lets you inspect all supertypes and subtypes of a class
* A new toolbar button lets you toggle between corresponding source and header files
* Code completion now works for #include directives
* A new Go to Type menu item lets you find a type (class, struct, enum, or typedef) by its name or prefix
* A new Go to Include menu item lets you go directly to a file that is included in a source or header file
* A new Go to Function or Variable menu item lets you find a function or variable by its name or prefix
* Project dependencies can be created for projects from existing code
* A choice of formatting styles for your source code
* A new Threads window shows you all the threads in the current debugging session
* A new Disassembler window displays the assembly instructions for the current source file
* A new Usages window shows you everywhere a class (structure), function, variable, macro, or file is used in your project's source code
h4. {{sunstudio}} Command Options
The following changes have been made to the options of the {{sunstudio}} command:
* The {{\--enablejava}} and {{\--disablejava}} options have been removed. You can now use the Plugin Manager (Tools > Plugins) to enable and disable the Java plugins.
* The format of the {{\--ui-classpath}} _path_ option, which appends the specified path to the IDE's classpath, has changed to {{\--cp:a\{_\}path\_}}.
----
h3. C, C+\+ and Fortran compilers
* Option changes:
** The \-xarch=ssse3 adds SSSE3 instruction set to the SSE3 instruction set.
** The \-xarch=sse4_1 option compiles for the SSE4.1 ISA.
** The \-xarch=sse4_2 option compiles for the SSE4.2 ISA.
** The \-xarch=sparcima compiles for the sparcima version of the SPARC-V9 ISA. This option enables the compiler to use instructions from the SPARC-V9 instruction set, plus the UltraSPARC extensions, including the Visual Instruction Set (VIS) version 1.0; the UltraSPARC-III extensions, including the Visual Instruction Set version 2.0; the SPARC64 VI extensions for floating-point multiply-add; and the SPARC64 VII extensions for interger multiply-add.
** The \-xchip=core2 optimizes for the core2 processor.
** The \-xchip=sparc64vii optimizes for the SPARC64 VII processor.
** The \-xchip=penryn option optimizes for the Intel Penryn processor.
** The \-xchip=nehalem option optimizes for the Intel Nehalem processor.
** The \-xcrossfile=1 option becomes an alias of the \-xipo=1 option. \-xcrossfile=0 no longer has any effect. Specifically, \-xcrossfile=1 \-xcrossfile=0 results in \-xipo=1.
** The {{\-xpec}}\[{{=yes}}\|{{no}}\] option generates a PEC binary that is recompilable for use with the Automatic Tuning System (ATS). This option is not supported on the Linux operating system.
** The \-xtarget=woodcrest option expands to \-xarch=ssse3 \-xchip=core2 \-xcache=32/64/8:4096/64/16.
** The \-xtarget=sparc64vii option expands to-xarch=sparcima \-xchip=sparc64vii \-xcache=64/64/2:5120/256/10.
** The \-xtarget=penryn option expands to-xarch=sse4_1 \-xchip=penryn \-xcache=32/64/8:6144/64/24.
** The \-xtarget=nehalem option expands to \-xarch=sse4_2 \-xchip=nehalem \-xcache=generic.
** The \-Y option doesn't accept i as an argument.
** On SPARCĀ® platforms, the \-xdepend option is now implicitly enabled for optimization levels \-x03 or higher, and is no longer included in the expansion of the \-fast option. An explicit use of the \-xdepend option always has precedence over an implicit enabling of the \-xdepend option.
* Support for OpenMP 3.0 in this Express release includes a libmtsk library. OpenMP programs will link with this library by default instead of the libmtsk library in the Solaris OS.
* Optimization using {{\-xannotate}}\[{{=yes}}\|{{no}}\]
(SPARC platforms only) Instructs the compiler to create binaries that can later be transformed by binary modification tools like binopt(1). Future binary analysis, code coverage, and memory error detection tools will also work with binaries built with this option.
Use the {{\-xannotate=no}} option to prevent the modification of the binary file by these tools.
The {{\-xannotate=yes}} option must be used with optimization level {{\-xO1}} or higher to be effective, and is effective only on systems with the new linker support library interface {{\-ld_open()}}. The new compiler support library libld_annotate.so uses this new interface. If the compiler is used on a system without this linker interface (for example, Solaris OS 9), it silently reverts to {{\-xannotate=no}}. The new linker interface is provided by the fix to bug 6479848. This fix is available in Solaris patch 127111-07 and in current versions of OpenSolaris.
The default is {{\-xannotate=yes}}, but if either of the above conditions is not met, the default reverts to {{\-xannotate=no}}.
* See Also [#SSSE3, SSE4.1, and SSE4.2 Intrinsics].
----
h3. C compiler
* Option changes:
** The {{\-W}} option doesn't accept {{i}} as an argument.
** The {{\-xsb}} and {{\-xsbfast}} options are obsolete and have been removed.
** A new flag, \[{{no%}}\]{{init_local}}, has been added to the {{\-xcheck}} option. If you do not specify the {{\-xcheck}} option, the default is {{no%init_local}}, which means local variables will not be initialized. If you specify the {{\-xcheck}} option without a value for this flag, the default is {{init_local}}, meaning that the compiler will generate code to set local variables to values that are likely to cause exceptions if a variable is used before it is assigned. This option does not affect global or static variables.
* {{\__FUNCTION_\_}} is a predefined identifier that contains the name of the lexically-enclosing function. It is functionally equivalent to the c99 predefined identifier, {{\__func_\_}}. On Solaris platforms, {{\__FUNCTION_\_}} is not available in \-Xs and \-Xc modes.
* In standard C, a case label in a switch statement can have only one associated value. The Sun Studio C compiler allows an extension found in some compilers known as case ranges.A case range specifies a range of values to associate with an individual case label. The syntax of a case range is: {{case low ... high :}}
A case range behaves exactly as if a case label had been specified for each value in the given range from low to high inclusive. If low and high are equal, the case range specifies only the one value. The lower and upper values must conform to the requirements of the C standard; that is, they must be valid integer constant expressions (C standard 6.8.4.2). You can freely intermix case ranges and case labels, and you can specify multiple case ranges within a switch statement.
The following code is a programming example of a case range:
{code}
enum kind { alpha, number, white, other };
enum kind char_class(char c)
{
enum kind result;
switch(c) {
case 'a' ... 'z':
case 'A' ... 'Z':
result = alpha;
break;
case '0' ... '9':
result = number;
break;
case ' ':
case '\n':
case '\t':
case '\r':
case '\v':
result = white;
break;
default:
result = other;
break;
}
return result;
}
{code}
If an endpoint of a case range is a numeric literal, leave white space around the ellipsis (...) to avoid having one of the dots treated as a decimal point. For example:
{code}
case 0...4; //error
case 5 ... 9; // ok
{code}
* The second operand in a conditional expression can be omitted. If the first operand is then non-zero, the value of the conditional expression is that of the first operand. For example, in the following expression, if x is non-zero, then the value of the expression is x. Otherwise, the value is {{y}}.
{code}
x ? : y
{code}
The expression is equivalent to the following, except that second reference to x is not reevaluated:
{code}
x ? x : y
{code}
By omitting the second operand, the already computed value of the first operand is reused. The omission of the second operand is a gcc extension to the C language, which is now supported by the Sun Studio C compiler.
* The {{\-features=}}\[{{no%}}\]{{conststrings}} flag enables or disables string literal placement in read-only memory. The default is \-features=conststrings, which replaces the deprecated \-xstrconst option. Programs attempting to write to a string literal now fail under the default compilation mode just as if \-xstrconst had been explicitly specified on the command line.
----
h3. C+\+ compiler
* Option changes:
** The {{\-xia}} option is now supported by the C+\+ compiler on the Solaris OS on x86 platforms. This option links the appropriate interval arithmetic libraries and sets a suitable floating-point environment.{{\-xia}} is a macro that expands to {{\-fsimple=0 \-ftrap=%none \-fns=no \-library=interval}}. This option also requires that you specify a value for the {{\-xarch}} option that supports SSE2 instructions, such as {{\-xarch=sse2}}. See the CC.1 man page for more information.
** The \-xipo_archive option is now supported on the Solaris OS on x86 platforms and on the Linux OS on x86 platforms. See the C+\+ User's Guide and the CC(1) man page for more information.
** The \-Qoption option doesn't accept {{ube_ipa}} as an argument.
** The expansion of the \-fast option now includes {{\-D_MATHERR_ERRNO_DONTCARE}}.
** The \-xvpara option, which issues warnings about potential parallel programming related problems that might cause incorrect results when using OpenMP or Sun parallel directives as d pragmas, is now supported. Use this option with the \-xopenmp option and OpenMP API directives, or with the \-xexplicitpar option and MP parallelization directives. See the CC.1man page for more information.
** The {{\-sb, \-sbfast, \-xsb, and \-xsbfast}} options are obsolete and have been removed.
* The C+\+ compiler now inlines code when you specify \-g with any \-0 or \-x0 value as long as you do not also specify \+d. In previous releases, \-g automatically specified \+d, but this is no longer the case.
* The pragma {{\#pragma must_have_frame(}}{_}list-of-function-names{_}{{)}} is now supported.
This pragma requests that the specified list of routines always be compiled to have a complete stack frame (as defined in the System V ABI).This pragma is permitted only after the prototypes for the specified functions are declared. The pragma must precede the end of the function.
If a function name is overloaded, the most recently declared function is chosen.
Using the pragma after the function prototype:
{code}
extern void foo(int);
extern void bar(int);
#pragma must_have_frame(foo, bar)
{code}
Using the pragma inside the function definition:
{code}
void foo(int) {
.
#pragma must_have_frame(foo)
.
return;
}
{code}
* In C++, a case label in a switch statement can have only one associated value. The Sun Studio C+\+ compiler allows an extension found in some compilers known as case ranges.A case range specifies a range of values to associate with an individual case label. The syntax of a case range is: {{case low ... high :}}
A case range behaves exactly as if a case label had been specified for each value in the given range from low to high inclusive. If low and high are equal, the case range specifies only the one value. The lower and upper values must be valid integer constant expressions. You can freely intermix case ranges and case labels, and you can specify multiple case ranges within a switch statement.
The following code is a programming example of a case range:
{code}
enum kind { alpha, number, white, other };
enum kind char_class(char c)
{
enum kind result;
switch(c) {
case 'a' ... 'z':
case 'A' ... 'Z':
result = alpha;
break;
case '0' ... '9':
result = number;
break;
case ' ':
case '\n':
case '\t':
case '\r':
case '\v':
result = white;
break;
default:
result = other;
break;
}
return result;
}
{code}
If an endpoint of a case range is a numeric literal, leave white space around the ellipsis (...) to avoid having one of the dots treated as a decimal point. For example:
{code}
case 0...4; //error
case 5 ... 9; // ok
{code}
* The C+\+ compiler normally creates temporary files in the directory /tmp. You can specify another directory by setting the TMPDIR environment variable to the directory of your choice. However, if the directory to which you set the variable is not a valid directory, the compiler uses /tmp. The \-temp option has precedence over the TMPDIR environment variable.
* The following attributes of functions are now supported:
{code}
_attribute_((const))
_attribute_((constructor))
_attribute_((destructor))
{code}
For more information, see 5.33 Specifying Attributes of Types in the _GNU Manual_.
* The following attribute of variables is now supported for struct and enum types only: {{\_attribute_((packed))}}
For more information, see [5.32 Specifying Attributes of Variables in the _GNU Manual_|http://gcc.gnu.org/onlinedocs/gcc-3.3.6/gcc/Variable-Attributes.html#Variable-Attributes]
* Universal Character Names are now supported. For more information, see paragraph 2 of section 2.2 Character Sets in the _Standard for Programming Language C+\+_. Copies of the standard are available from INCITS (www.incits.org), the body that charters the US C+\+ Committee, at their store: [http://www.techstreet.com/cgi-bin/detail?product_id=1143945] .
* Loop pragmas are now supported.
* User-defined names for macro variadic arguments are now supported. For example:
{code}
void foo(int, int);
#define M(args...) foo(args)
int main()
{
M(1, 2);
}
{code}
For more information, see section 3.6 Variadic Macros in _The C Preprocessor_.
----
h3. Fortran compiler
* Option changes:
** The \-Qoption option doesn't accept ube_ipa as an argument.
** The \-xvpara option is now supported. This option is a synonym for \-vpara.
** The \-sb, \-sbfast, \-xsb, and \-xsbfast options are obsolete and have been removed.
* The Fortran 95 compiler normally creates temporary files in the directory /tmp. You can specify another directory by setting the TMPDIR environment variable to the directory of your choice. However, if the directory to which you set the variable is not a valid directory, the compiler uses /tmp. The \-temp option has precedence over the TMPDIR environment variable.
* The behavior of the cpu_time() Fortran 95 intrinsic routine is different between Solaris and Linux platforms. On Solaris platforms, the cpu_time() routine is based on the gethrtime()routine, while on Linux platforms it is based on the getrusage() routine. The differences between the two implementations are:
** The Linux version measures CPU usage time of one thread; the Solaris version measures the wall clock time.
** The Linux version has lower resolution (around 1 msec); the Solaris version has higher resolution.
* The Fortran 2003 {{IMPORT}} statement is implemented.
----
h3. dbx debugger
* Runtime checking (RTC) now gives information about array out-of-bounds access on the Solaris OS on x86 platforms. Runtime Checking reports the following array out-of-bounds errors:
| rob | Read from array out-of-bounds memory |
| wob | Write to array out-of-bounds memory |
* Runtime Checking (RTC) now supports access, leaks, and memuse checking
on the following Linux platforms: SLES10, RHEL5.
* dbx can now evaluate function parameters and local variables in optimized code when the code provides the needed debugging information. gcc compilers provide this information. Sun Studio compilers for SPARC platforms provide the information if you specify a new option (-Wc\,gen_loclist=1) when compiling. For more information, see [Optimized Code Debugging With Sun Studio dbx|http://developers.sun.com/sunstudio/documentation/techart/optimizedcode.html]
----
h3. Sun Performance Library
* LAPACK routine are updated to conform to the latest specification of LAPACK 3.1.1
* Support for Woodcrest CPUs is available. To link with this library, use the following options:
** For C and Fortran:-m64 \-xlic_lib=sunperf
** For C++:-m64 \-library=sunperf
* Support for SPARC64-VI and SPARC64-VII CPUs is available. This version of Sun Performance Library uses the floating point multiply-add instruction to achieve the best performance possible on these CPUs. To link with this library, use the following options:
** For C and Fortran: {{\-xtarget=sparc64vi \-fma=fused \-xlic_lib=sunperf}}
** For C++: {{\-xtarget=sparc64vi \-fma=fused \-library=sunperf}}
----
h3. OpenMP 3.0
This Express release includes support for OpenMP 3.0 features in the C, C++, and Fortran compilers.
* Support for OpenMP 3.0 in this Express release includes a libmtsk library. OpenMP programs will link with this library by default instead of the libmtsk library in the Solaris OS.
* Tasking
* Loop collapse
* Runtime routines for nesting support
* Runtime routines for runtime schedule
* Environment variables OMP_STACKSIZE and OMP_WAIT_POLICY
* AUTO loop schedule
* Enhanced threadprivate support in C+\+
* Threadprivate static class member (C++)
* Unsigned int loop control variable (C and C++)
* New value for \_OPENMP macro (200805L)
For more information about these features, please refer to the !pointer.gif! [OpenMP Specification Version 3.0|http://openmp.org] and the !pointer.gif! [Sun Studio OpenMP Wiki|https://wikis.sun.com/display/openmp/Sun+Studio+OpenMP].
----
h3. D-Light Tool
The objective of the D-Light tool is to make sophisticated application and system profiling, accessible. There are many tools that profile applications and there are other tool that profile the system stack, but there are few tools that can join these views into an easy to use interface. For the first time, you can optimize your application and system environment by visualizing performance bottlenecks and resource contention up and down the application system stack.
Using an intuitive drag and drop interface, the D-Light tool provides an extensible library of instruments that represent the latest advances of profiling technology, including Solaris Dynamic Tracing (DTrace). With instruments like CPU accountant and Sampler, developers can use the interactive GUI to quickly profile and peer into the runtime behavior of their applications.
For more information on using the D-Light tool, refer to the Project D-Light Tutorial.
The D-Light Tool is now supported on Linux platforms for twi instruments: Clock Profiler (based on Performance Analyzer) and Java Ticker.
----
h3. DTrace GUI Plug-in
The NetBeans DTrace GUI plug-in is a Graphical User Interface (GUI) for running DTrace scripts, even those that are embedded in shell scripts. In fact, the DTrace GUI plug-in runs all of the scripts that are packaged in the DTraceToolkit. The DTraceToolkit is a collection of useful documented scripts developed by the OpenSolaris DTrace community.
For documentation for the 0.2 version of the plugin that is included in this Express release, see [http://www.netbeans.org/kb/60/ide/NetBeans_DTrace_GUI_Plugin.html]
The 0.4 version of the plugin, which includes the Chime graphical tool for visualizing DTrace aggregations, is now available for download from the NetBeans Plugin Portal. To download and install this version, choose Tools->Plugins in the Sun Studio IDE, and select DTrace from the Available Plugins list.
----
h3. Automatic Tuning and Troubleshooting System (ats)
ats is a binary reoptimization and recompilation tool that can be used for tuning and troubleshooting applications. ats works by rebuilding the compiled PEC binary; the original source code is not required. Examples of what can be achieved using ats are:
* Find the compiler options that give the best performance
* Find the object file and the optimization flag that is causing a runtime problem
* Rebuild the application using new compiler options
There is an !pointer.gif! [ats(1) man page|https://wikis.sun.com/display/SunStudio/ats+man+page] and an !pointer.gif! [ATS Guide| https://wikis.sun.com/display/SunStudio/ATS+Guide]
In this Express release, ats is available for the Solaris OS on SPARC and x86/x64 platforms.
----
h3. Binary Improvement Tool (bit)
bit is a suite of tools for improving binaries. These tools are used via six subcommands:
* The instrument subcommand instruments a binary (the target) so that when the instrumented target is run, it creates an instrumentation data directory with information about the execution of the target.
* The analyze subcommand uses the instrumentation data to produce reports on instruction execution.
* The optimize subcommand uses the instrumentation data to optimize the target.
* The coverage subcommand uses the instrumentation data to produce a code coverage report.
* The collect subcommand combines an instrument subcommand, a target run, and an analyze subcommand.
* The check subcommand prints information about a target binary.
There is a !pointer.gif! [bit(1)|https://wikis.sun.com/display/SunStudio/bit+man+page] man page.
In this Express release, bit is only available for the Solaris OS on SPARC platforms.
----
h3. Discover
Sun Memory Error Discovery Tool (Discover) is a tool used to detect programming errors related to the allocation and use of program memory at runtime. Examples of errors detected by Discover include:
* Accessing uninitialized memory
* Reads from and writes to unallocated memory
* Accessing memory beyond allocated array bounds
* Use of freed memory
* Freeing wrong memory blocks
* Memory leaks
There is a !pointer.gif! [discover(1)|https://wikis.sun.com/display/SunStudio/discover+man+page] man page, and a !pointer.gif! [User Guide in PDF|https://wikis.sun.com/download/attachments/38211135/DISCOVER_users_guide.pdf].
Discover is only available on Solaris OS on SPARC platforms for Express.
----
h3. The Simple Performance Optimization Tool (SPOT)
The spot command runs a set of performance tools on the target application and renders the output as a set of hyperlinked web pages.The spot command can be used in two ways:
* Attaching to a running process and gathering data from the process using a variety of probes.
* Running an application multiple times, each time under a different probe.
There is a !pointer.gif! [spot(1)|spot man page] man page, and a User Guide on !pointer.gif! [docs.sun.com|http://docs.sun.com/app/docs/doc/820-5372]
In this Express release, the spot command is available only for the Solaris OS on SPARC platforms.
----
h3. SSSE3, SSE4.1, and SSE4.2 Intrinsics
h5. SSSE3
{noformat}
SSSE3 Assembler Syntax and Corresponding Compiler Intrinsics
PSIGNB, PSIGNW, PSIGND
- Syntax: psignb/psignw/psignd mem64/mmxreg, mmxreg
psignb/psignw/psignd mem128/xmmxreg, xmmxreg
- Semantic: Packed Sign
- Corresponding intrinsics:
extern __m64 _mm_sign_pi8 (__m64 p1, __m64 p2);
extern __m64 _mm_sign_pi16 (__m64 p1, __m64 p2);
extern __m64 _mm_sign_pi32 (__m64 p1, __m64 p2);
extern __m128i _mm_sign_epi8 (__m128i p1, __m128i p2);
extern __m128i _mm_sign_epi16 (__m128i p1, __m128i p2);
extern __m128i _mm_sign_epi32 (__m128i p1, __m128i p2);
PABSB, PABSW, PABSD
- Syntax: pabsb/pabsw/pabsd mem64/mmxreg, mmxreg
pabsb/pabsw/pabsd mem128/xmmxreg, xmmxreg
- Semantic: Packed Absolute Value
- Corresponding intrinsics:
extern __m64 _mm_abs_pi8 (__m64 p);
extern __m64 _mm_abs_pi16 (__m64 p);
extern __m64 _mm_abs_pi32 (__m64 p);
extern __m128i _mm_abs_epi8 (__m128i p);
extern __m128i _mm_abs_epi16 (__m128i p);
extern __m128i _mm_abs_epi32 (__m128i p);
PALIGNR
- Syntax: palignr imm, mem64/mmxreg, mmxreg
palignr imm, mem128/xmmreg, xmmreg
- Semantic: Packed Align Right
- Corresponding intrinsics:
extern __m64 _mm_alignr_pi8 (__m64 p1, __m64 p2, int immd);
extern __m128i _mm_alignr_epi8 (__m128i p1, __m128i p2, int immd);
PSHUFB
- Syntax: pshufb mem64/mmxreg, mmxreg
pshufb mem128/xmmxreg, xmmxreg
- Semantic: Packed Shuffle Bytes
- Corresponding intrinsics:
extern __m64 _mm_shuffle_pi8 (__m64 p1, __m64 p2);
extern __m128i _mm_shuffle_epi8 (__m128i p1, __m128i p2);
PMULHRSW
- Syntax: pmulhrsw mem64/mmxreg, mmxreg
pmulhrsw mem128/xmmxreg, xmmxreg
- Semantic: Packed Multiply High with Round and Scale
- Corresponding intrinsics:
extern __m64 _mm_mulhrs_pi16 (__m64 p1, __m64 p2);
extern __m128i _mm_mulhrs_epi16 (__m128i p1, __m128i p2);
PMADDUBSW
- Syntax: pmaddubsw mem64/mmxreg, mmxreg
pmaddubsw mem128/xmmxreg, xmmxreg
- Semantic: Multiply and Add Packed Signed and Unsigned Bytes
- Corresponding intrinsics:
extern __m64 _mm_maddubs_pi16 (__m64 p1, __m64 p2);
extern __m128i _mm_maddubs_epi16 (__m128i p1, __m128i p2);
PHSUBW, PHSUBD
- Syntax: phsubw/phsubd mem64/mmxreg, mmxreg
phsubw/phsubd mem128/xmmxreg, xmmxreg
- Semantic: Packed Horizontal Subtract
- Corresponding intrinsics:
extern __m64 _mm_hsub_pi16 (__m64 p1, __m64 p2);
extern __m64 _mm_hsub_pi32 (__m64 p1, __m64 p2);
extern __m128i _mm_hsub_epi16 (__m128i p1, __m128i p2);
extern __m128i _mm_hsub_epi32 (__m128i p1, __m128i p2);
PHSUBSW
- Syntax: phsubsw mem64/mmxreg, mmxreg
phsubsw mem128/xmmxreg, xmmxreg
- Semantic: Packed Horizontal Subtract and Saturate Words
- Corresponding intrinsics:
extern __m64 _mm_hsubs_pi16 (__m64 p1, __m64 p2);
extern __m128i _mm_hsubs_epi16 (__m128i p1, __m128i p2);
PHADDW, PHADDD
- Syntax: phaddw/phaddd mem64/mmxreg, mmxreg
phaddw/phaddd mem128/xmmxreg, xmmxreg
- Semantic: Packed Horizontal Add
- Corresponding intrinsics:
extern __m64 _mm_hadd_pi16 (__m64 p1, __m64 p2);
extern __m64 _mm_hadd_pi32 (__m64 p1, __m64 p2);
extern __m128i _mm_hadd_epi16 (__m128i p1, __m128i p2);
extern __m128i _mm_hadd_epi32 (__m128i p1, __m128i p2);
PHADDSW
- Syntax: phaddsw mem64/mmxreg, mmxreg
phaddsw mem128/xmmxreg, xmmxreg
- Semantic: Packed Horizontal Add and Saturate Words
- Corresponding intrinsics:
extern __m64 _mm_hadds_pi16 (__m64 p1, __m64 p2);
extern __m128i _mm_hadds_epi16 (__m128i p1, __m128i p2);
{noformat}
h5. SSE4.1
{noformat}
SSE4.1 Assembler Syntax and Corresponding Compiler Intrinsics (Rev 1.0)
BLENDPD/BLENDPS
- Syntax: Blend packed double/single precision floating point values
blendpd/blendps $imm8, xmmreg/mem128, xmmreg
- Semantic: Copy elements from one location to another based on bits
of an immediate operand
- Corresponding intrinsics:
__m128d _mm_blend_pd(__m128d p1, __m128d p2, const int immd);
__m128 _mm_blend_ps(__m128 p1, __m128 p2, const int immd);
BLENDVPD/BLENDVPS
- Syntax: Variable blend double/single precision floating point values
blendvpd/blendvps xmmreg/mem128, xmmreg
blendvpd/blendvps XMMREG, xmmreg/mem128, xmmreg
- Semantic: Copy elements from one location to another based on bits
in register XMMREG
- Corresponding intrinsics:
__m128d _mm_blendv_pd(__m128d p1, __m128d p2, __m128d p3);
__m128 _mm_blendv_ps(__m128 p1, __m128 p2, __m128 p3);
DPPD/DPPS
- Syntax: Dot product of packed double/single precision floating
point values
dppd/dpps $imm8, xmmreg/mem128, xmmreg
- Semantic: Based on bits in the immediate operand to select which of the
entries in the input to multiply and accumulate, and to
select whether to put 0 or the dot-product in the correspondent
field of the result register
- Corresponding intrinsics:
__m128d _mm_dp_pd(__m128d p1, __m128d p2, const int immd);
__m128 _mm_dp_ps(__m128 p1, __m128 p2, const int immd);
EXTRACTPS
- Syntax: Extract packed single precision floating point value
extractps $imm8, xmmreg, reg32/mem32
extractps $imm8, xmmreg, reg64/mem64
- Semantic: Based on bits in the immediate operand to extract a field from
the source register and insert it into an x86 register or
memory address
- Corresponding intrinsics:
int _mm_extract_ps(__m128 p1, const int immd);
INSERTPS
- Syntax: Insert packed single precision floating point value
insertps $imm8, xmmreg/mem32, xmmreg
- Semantic: Load a floating point value from memory indicated by mem32
or based on bits in the immediate operand to select a single
precision floating point value from the source xmmreg and
insert it into the destination register also based on the
bits of the immediate operand
- Corresponding intrinsics:
__m128 _mm_insert_ps(__m128 p1, __m128 p2, const int immd);
MOVNTDQA
- Syntax: Load 16 bytes with non-temporal Algined Hint
movntdqa mem128, xmmreg
- Semantic: Load from write-combining memory area into xmm register
- Corresponding intrinsics:
__m128i _mm_stream_load_si128(__m128i *p);
MPSADBW
- Syntax: Calculate muliple packed sums of absolute difference
mpsadbw $imm8, xmmreg/mem128, xmmreg
- Semantic: Based on bits in the immediate operand to select the
destination and source fields to be used, compute
eight offset sums of absolute differences for
(|x0-y0|+|x1-y1|+|x2-y2|+...)
- Corresponding intrinsics:
__m128i _mm_mpsadbw_epu8(__m128i p1, __m128i p2, const int immd);
PACKUSDW
- Syntax: Pack with Unsigned Saturation
packusdw xmmreg/mem128, xmmreg
- Semantic: Convert signed 4 bytes in source and destination operands
into unsigned 2 bytes with saturation
- Corresponding intrinsics:
__m128i _mm_packus_epi32(__m128i p1, __m128i p2);
PBLENDW
- Syntax: Blend packed 16-byte words
pblendw $imm8, xmmreg/mem128, xmmreg
- Semantic: Based on bits in the immediate operand, select 16-byte
values from the second and destination operands to be
stored into the destination operand
- Corresponding intrinsics:
__m128i _mm_blend_epi16(__m128i p1, __m128i p2, const int p3);
PCMPEQQ
- Syntax: Compare packed 64-bit values for equality
pcmpeqq xmmreg/mem128, xmmreg
- Semantic: Compare packed 64-bit values in source and destination
operand for equality. Set all 0s or all 1s in destination
register as result
- Corresponding intrinsics:
__m128i _mm_cmpeq_epi64(__m128i p1, __m128i p2);
PEXTRB/PEXTRW/PEXTRD/PEXTRQ
- Syntax: Extract byte/16-bit value/32-bit value/64-bit value
pextrb $imm8, xmmreg, reg32/mem8
pextrb $imm8, xmmreg, reg64/mem8
pextrw $imm8, xmmreg, reg32/mem16
pextrw $imm8, xmmreg, reg64/mem16
pextrd $imm8, xmmreg, reg32/mem32
pextrq $imm8, xmmreg, reg64/mem64
- Semantic: Based on bits in the immediate operand to select and
extract a 8/16/32/64-bit value from the xmmreg and
store into the destination operand
- Corresponding intrinsics:
int _mm_extract_epi8(__m128i p1, const int immd);
int _mm_extract_epi16(__m128i p1, const int immd);
int _mm_extract_epi32(__m128i p1, const int immd);
long long _mm_extract_epi64(__m128i p1, const int immd);
PHMINPOSUW
- Syntax: Packed horizontal 16-bit value minimum
phminposuw xmmreg/mem128, xmmreg
- Semantic: Find the minimum unsigned 16-bit value in the source operand
and place the value and its index in the destination register
- Corresponding intrinsics:
__m128i _mm_minpos_epu16(__m128i p1);
PINSRB/PINSRD/PINSRQ
- Syntax: Insert byte, 32-bit value, 64-bit value
pinsrb $imm8, reg32/mem8, xmmreg
pinsrd $imm8, reg32/mem32, xmmreg
pinsrq $imm8, reg64/mem64, xmmreg
- Semantic: Based on the bits in the immediate operand to insert
the byte/32-bit/64-bit value from the source operand into
the destination xmm register
- Corresponding intrinsics:
__m128i _mm_insert_epi8(__m128i p1, int p2, const int immd);
__m128i _mm_insert_epi32(__m128i p1, int p2, const int immd);
__m128i _mm_insert_epi64(__m128i p1, long long p2, const int immd);
PMAXSB/PMAXSD
- Syntax: Maximum of packed signed byte/32-bit integers
pmaxsb xmmreg/mem128, xmmreg
pmaxsd xmmreg/mem128, xmmreg
- Semantic: Compare the packed signed byte/32-bit values in the 2
operands and store the maximum packed values in the destination
register
- Corresponding intrinsics:
__m128i _mm_max_epi8(__m128i p1, __m128i p2);
__m128i _mm_max_epi32(__m128i p1, __m128i p2);
PMAXUW/PMAXUD
- Syntax: Maximum of packed unsigned 16-bit/32-bit integers
pmaxuw xmmreg/mem128, xmmreg
pmaxud xmmreg/mem128, xmmreg
- Semantic: Compare the packed unsigned 16-bit/32-bit values in the
2 operands and store the maximum packed values in the
destination register
- Corresponding intrinsics:
__m128i _mm_max_epu16(__m128i p1, __m128i p2);
__m128i _mm_max_epu32(__m128i p1, __m128i p2);
PMINSB/PMINSD
- Syntax: Minimum of packed signed byte/32-bit integers
pminsb xmmreg/mem128, xmmreg
pminsd xmmreg/mem128, xmmreg
- Semantic: Compare the packed signed byte/32-bit values in the
2 operands and store the minimum packed values in the
destination register
- Corresponding intrinsics:
__m128i _mm_min_epi8(__m128i p1, __m128i p2);
__m128i _mm_min_epi32(__m128i p1, __m128i p2);
PMINUW/PMINUD
- Syntax: Minimum of packed unsigned 16-bit/32-bit integers
pminuw xmmreg/mem128, xmmreg
pminud xmmreg/mem128, xmmreg
- Semantic: Compare the packed unsigned 32-bit values in the 2 operands
and store the minimum packed values in the destination
register
- Corresponding intrinsics:
__m128i _mm_min_epu32(__m128i p1, __m128i p2);
__m128i _mm_min_epu16(__m128i p1, __m128i p2);
PMOVSXBW/PMOVSXBD/PMOVSXBQ/PMOVSXWD/PMOVSXWQ/PMOVSXDQ
- Syntax: Move packed values with sign extension
pmovsxbw xmmreg/mem64, xmmreg
pmovsxbd xmmreg/mem32, xmmreg
pmovsxbq xmmreg/mem16, xmmreg
pmovsxwd xmmreg/mem64, xmmreg
pmovsxwq xmmreg/mem32, xmmreg
pmovsxdq xmmreg/mem64, xmmreg
- Semantic: Sign extend 8/4/2 packed 8-bit values or 4/2 packed 16-bit
values or 2 packed 32-bit values in the source operand and
move it into 8/4/2 packed 16-bit/32-bit/64-bit values or
4/2 packed 32-bit/64-bit values or 2 packed 64-bit values
in the destination register respectively
- Corresponding intrinsics:
__m128i _mm_cvtepi8_epi16(__m128i p1);
__m128i _mm_cvtepi8_epi32(__m128i p1);
__m128i _mm_cvtepi8_epi64(__m128i p1);
__m128i _mm_cvtepi16_epi32(__m128i p1);
__m128i _mm_cvtepi16_epi64(__m128i p1);
__m128i _mm_cvtepi32_epi64(__m128i p1);
PMOVZXBW/PMOVZXBD/PMOVZXBQ/PMOVZXWD/PMOVZXWQ/PMOVZXDQ
- Syntax: Move packed values with zero extension
pmovzxbw xmmreg/mem64, xmmreg
pmovzxbd xmmreg/mem32, xmmreg
pmovzxbq xmmreg/mem16, xmmreg
pmovzxwd xmmreg/mem64, xmmreg
pmovzxwq xmmreg/mem32, xmmreg
pmovzxdq xmmreg/mem64, xmmreg
- Semantic: Zero extend 8/4/2 packed 8-bit values or 4/2 packed 16-bit
values or 2 packed 32-bit values in the source operand and
move it into 8/4/2 packed 16-bit/32-bit/64-bit values or
4/2 packed 32-bit/64-bit values or 2 packed 64-bit values
in the destination register respectively
- Corresponding intrinsics:
__m128i _mm_cvtepu8_epi16(__m128i p1);
__m128i _mm_cvtepu8_epi32(__m128i p1);
__m128i _mm_cvtepu8_epi64(__m128i p1);
__m128i _mm_cvtepu16_epi32(__m128i p1);
__m128i _mm_cvtepu16_epi64(__m128i p1);
__m128i _mm_cvtepu32_epi64(__m128i p1);
PMULDD/PMULDQ
- Syntax: Multiply packed signed 32-bit/64-bit integers
pmuldd xmmreg/mem128, xmmreg
pmuldq xmmreg/mem128, xmmreg
- Semantic: Multiply the packed signed 32-bit/64-bit values in the 2
operands and store the 32-bit/64-bit result in the destination
register
- Corresponding intrinsics:
__m128i _mm_mullo_epi32(__m128i p1, __m128i p2);
__m128i _mm_mul_epi32(__m128i p1, __m128i p2);
PTEST
- Syntax: Logical compare
ptest xmmreg/mem128, xmmreg
- Semantic: Set the Z flag if any of the bits in the 2 operands matched
and the C flag if all of them matched.
- Corresponding intrinsics:
int _mm_testz_si128(__m128i p1, __m128i p2);
int _mm_testc_si128(__m128i p1, __m128i p2);
int _mm_testnzc_si128(__m128i p1, __m128i p2);
ROUNDPS/ROUNDPD
- Syntax: Round packed single/double precision floating point values
roundps $imm8, xmmreg/mem128, xmmreg
roundpd $imm8, xmmreg/mem128, xmmreg
- Semantic: Based on the rounding mode in the immediate operand, round
the single/double precision packed values in the source
operand and place them in the destination register
- Corresponding intrinsics:
__m128 _mm_round_ps(__m128 p1, int immd);
__m128 _mm_floor_ps(__m128 p1);
__m128 _mm_cell_ps(__m128 p1);
__m128d _mm_round_pd(__m128d p1, int immd);
__m128d _mm_floor_pd(__m128d p1);
__m128d _mm_cell_pd(__m128d p1);
ROUNDSS/ROUNDSD
- Syntax: Round scalar single/double precision floating point values
roundss $imm8, xmmreg/mem64, xmmreg
roundsd $imm8, xmmreg/mem32, xmmreg
- Semantic: Based on the rounding mode in the immediate operand, round
the single/double precision scalar low value in the source
operand and place it in the destination register
- Corresponding intrinsics:
__m128 _mm_round_ss(__m128 p1, __m128 p2, int immd);
__m128 _mm_floor_ss(__m128 p1, __m128 p2);
__m128 _mm_cell_ss(__m128 p1, __m128 p2);
__m128d _mm_round_sd(__m128d p1, __m128d p2, int immd);
__m128d _mm_floor_sd(__m128d p1, __m128d p2);
__m128d _mm_cell_sd(__m128d p1, __m128d p2);
{noformat}
h5. SSE4.2
{noformat}
SSE4.2 Assembler Syntax and Corresponding Compiler Intrinsics
CRC32
- Syntax: Accumulate CRC32 value
crc32 reg8/reg16/reg32/mem8/mem16/mem32, reg32
crc32 reg8/reg64/mem8/mem64, reg64
crc32b reg8/mem8, reg32
crc32b reg8/mem8, reg64
crc32w reg16/mem16, reg32
crc32l reg32/mem32, reg32
crc32q reg64/mem64, reg64
- Semantic: Accumulate CRC32C value using the polynomial 0x11edc7f41
- Corresponding intrinsics:
unsigned int _mm_crc32_u8(unsigned int crc, unsigned char data);
unsigned int _mm_crc32_u16(unsigned int crc, unsigned short data);
unsigned int _mm_crc32_u32(unsigned int crc, unsigned int data);
unsigned long long _mm_crc32_u64(unsigned long long crc,
unsigned long long data);
PCMPESTRI
- Syntax: Packed compare explicit length strings, return index
pcmestri $imm8, xmmreg/mem128, xmmreg
- Semantic: Perform packed comparison of string data with explicit lengths,
generate an index stored to %ecx
- Corresponding intrinsics:
int _mm_cmpestri(__m128i a, int len_a, __m128i b, int len_b, const int imm);
int _mm_cmpestra(__m128i a, int len_a, __m128i b, int len_b, const int imm);
int _mm_cmpestrc(__m128i a, int len_a, __m128i b, int len_b, const int imm);
int _mm_cmpestro(__m128i a, int len_a, __m128i b, int len_b, const int imm);
int _mm_cmpestrs(__m128i a, int len_a, __m128i b, int len_b, const int imm);
int _mm_cmpestrz(__m128i a, int len_a, __m128i b, int len_b, const int imm);
PCMPESTRM
- Syntax: Packed compare explicit length strings, return mask
pcmestrm $imm8, xmmreg/mem128, xmmreg
- Semantic: Perform packed comparison of string data with explicit lengths,
generate a mask stored to %xmm0
- Corresponding intrinsics:
int _mm_cmpestrm(__m128i a, int len_a, __m128i b, int len_b, const int imm);
int _mm_cmpestra(__m128i a, int len_a, __m128i b, int len_b, const int imm);
int _mm_cmpestrc(__m128i a, int len_a, __m128i b, int len_b, const int imm);
int _mm_cmpestro(__m128i a, int len_a, __m128i b, int len_b, const int imm);
int _mm_cmpestrs(__m128i a, int len_a, __m128i b, int len_b, const int imm);
int _mm_cmpestrz(__m128i a, int len_a, __m128i b, int len_b, const int imm);
PCMISPTRI
- Syntax: Packed compare implicit length strings, return index
pcmistri $imm8, xmmreg/mem128, xmmreg
- Semantic: Perform packed comparison of string data with implicit lengths,
generate an index stored to %ecx
- Corresponding intrinsics:
int _mm_cmpistri(__m128i a, __m128i b, const int imm);
int _mm_cmpistra(__m128i a, __m128i b, const int imm);
int _mm_cmpistrc(__m128i a, __m128i b, const int imm);
int _mm_cmpistro(__m128i a, __m128i b, const int imm);
int _mm_cmpistrs(__m128i a, __m128i b, const int imm);
int _mm_cmpistrz(__m128i a, __m128i b, const int imm);
PCMPESTRM
- Syntax: Packed compare explicit length strings, return mask
pcmestrm $imm8, xmmreg/mem128, xmmreg
- Semantic: Perform packed comparison of string data with explicit lengths,
generate a mask stored to %xmm0
- Corresponding intrinsics:
int _mm_cmpestrm(__m128i a, int len_a, __m128i b, int len_b, const int imm);
int _mm_cmpestra(__m128i a, int len_a, __m128i b, int len_b, const int imm);
int _mm_cmpestrc(__m128i a, int len_a, __m128i b, int len_b, const int imm);
int _mm_cmpestro(__m128i a, int len_a, __m128i b, int len_b, const int imm);
int _mm_cmpestrs(__m128i a, int len_a, __m128i b, int len_b, const int imm);
int _mm_cmpestrz(__m128i a, int len_a, __m128i b, int len_b, const int imm);
PCMISPTRI
- Syntax: Packed compare implicit length strings, return index
pcmistri $imm8, xmmreg/mem128, xmmreg
- Semantic: Perform packed comparison of string data with implicit lengths,
generate an index stored to %ecx
- Corresponding intrinsics:
int _mm_cmpistri(__m128i a, __m128i b, const int imm);
int _mm_cmpistra(__m128i a, __m128i b, const int imm);
int _mm_cmpistrc(__m128i a, __m128i b, const int imm);
int _mm_cmpistro(__m128i a, __m128i b, const int imm);
int _mm_cmpistrs(__m128i a, __m128i b, const int imm);
int _mm_cmpistrz(__m128i a, __m128i b, const int imm);
PCMPESTRM
- Syntax: Packed compare explicit length strings, return mask
pcmestrm $imm8, xmmreg/mem128, xmmreg
- Semantic: Perform packed comparison of string data with explicit lengths,
generate a mask stored to %xmm0
- Corresponding intrinsics:
int _mm_cmpestrm(__m128i a, int len_a, __m128i b, int len_b, const int imm);
int _mm_cmpestra(__m128i a, int len_a, __m128i b, int len_b, const int imm);
int _mm_cmpestrc(__m128i a, int len_a, __m128i b, int len_b, const int imm);
int _mm_cmpestro(__m128i a, int len_a, __m128i b, int len_b, const int imm);
int _mm_cmpestrs(__m128i a, int len_a, __m128i b, int len_b, const int imm);
int _mm_cmpestrz(__m128i a, int len_a, __m128i b, int len_b, const int imm);
PCMISPTRI
- Syntax: Packed compare implicit length strings, return index
pcmistri $imm8, xmmreg/mem128, xmmreg
- Semantic: Perform packed comparison of string data with implicit lengths,
generate an index stored to %ecx
- Corresponding intrinsics:
int _mm_cmpistri(__m128i a, __m128i b, const int imm);
int _mm_cmpistra(__m128i a, __m128i b, const int imm);
int _mm_cmpistrc(__m128i a, __m128i b, const int imm);
int _mm_cmpistro(__m128i a, __m128i b, const int imm);
int _mm_cmpistrs(__m128i a, __m128i b, const int imm);
int _mm_cmpistrz(__m128i a, __m128i b, const int imm);
PCMPISTRM
- Syntax: Packed compare implicit length strings, return mask
pcmistrm $imm8, xmmreg/mem128, xmmreg
- Semantic: Perform packed comparison of string data with implicit lengths,
generate a mask stored to %xmm0
- Corresponding intrinsics:
int _mm_cmpistrm(__m128i a, __m128i b, const int imm);
int _mm_cmpistra(__m128i a, __m128i b, const int imm);
int _mm_cmpistrc(__m128i a, __m128i b, const int imm);
int _mm_cmpistro(__m128i a, __m128i b, const int imm);
int _mm_cmpistrs(__m128i a, __m128i b, const int imm);
int _mm_cmpistrz(__m128i a, __m128i b, const int imm);
PCMPGTQ
- Syntax: Compare packed data for greater than
pcmpgtq xmmreg/mem128, xmmreg
- Semantic: Compare packed 64-bit values in xmmreg/mem128 with xmmreg. Set
corresponding data in destination register to all 1s or 0s
based on the result of the greater than compare.
- Corresponding intrinsics:
__m128i _mm_cmpgt_epi64(__m128i a, __m128i b);
POPCNT
- Syntax: Population count
popcnt reg16/mem16, reg16
popcnt reg32/mem32, reg32
popcnt reg64/mem64, reg64
- Semantic: Count the number of set bits in reg/mem
- Corresponding intrinsics:
int _mm_popcnt_u32(unsigned int a);
long long _mm_popcnt_u64(unsigned long long a);
{noformat}
----
h3. Assembler on x86 Platforms
Two new options:
* \-C: In general you do not need \-C to be GNU Assembler compatible; \-C is needed only in several situations to make the semantic compatible. Refer to the changes below.
* \-a32: To allow 32-bit memory addresses in \-m64 64-bit mode.
Major area of changes:
* For mnemonics without suffix, the presence of register operand determines the suffix implicitly. If size of operation can not be determined due to absence of register operand, an error is issued if the option \-C was used, otherwise the suffix defaults to 'l'. For example
{{mov $10, %ax}} can now be used for {{movw $10, %ax}}
For {{mov $10, mem}} the Sun Studio Assembler defaults to {{movl $10, mem}}
but gives an error if \-C is used, to be compatible with the GNU Assembler.
* Allow all 16-bit instructions to accept 32-bit register operands, but to issue a warning if the \-C option is used.
* Allow 32-bit addresses under 64-bit mode using the new option \-a32. For example, the command {{fbe \-m64 \-a32 file.s}} can assemble {{lea 123(%eax,%r10d),%eax}}
* A program can now have more than 10 local labels, and a local label can now be as large as a 32-bit integer.
* You can now place the lock/rep/repnz/repz/repe/repne prefix on the same line as the following instruction.
* GNU Assembler compatible instruction synonyms have been added:
cbw==cwtd, cwd==cwtd, cwde==cwtl, cdq==cltd, cdqe==cltq, cqo==cqto,
movzb==movzbl, sysret==sysretl
* GNU Assembler compatible assembler directives have been added: .p2align,.extern,.global,.
* The \-b option, which generates extra symbol table information for the SourceBrowser, is now obsolete.
----
{panel}
{column}
{column:width=30%}
{panel}
h5. Contents
{toc:maxLevel=2|minLevel=1}
{panel}
{column}
{section}
----