This page gives greater detail on some of the new features in the current Express release of Sun Studio compilers and tools. Note that some of these features might not yet be documented in the Sun Studio man pages in this build.
Some of these features are experimental and might not be available in future releases, while some of these features might change significantly in future releases. The documentation is also preliminary and might not reflect the full range of functionality or problems and workarounds. (See the
Known Problems and Workarounds page for the latest updated news about this express release. )
Details on Sun Studio Express ReleasesSun Studio November 2008 Express ReleaseThe following are new features in the November 2008 Express release. More information can be found on the Sun Studio Express November 2008 Release wiki page. IDE
Sun Performance Library
OpenMP 3.0Additional functionality has been added to complete the implementation of the OpenMP 3.0 API specifications in the C, C++, and Fortran compilers. For more information about these features, please refer to the dbxA new graphical user interface (GUI) for dbx, dbxtool, is included in this November 2008 Express release. For information on invoking dbxtool, see the dbxtool man page. dbxtool is a separate GUI from the Sun Studio IDE, but is also based on NetBeans IDE 6.5. dbxtool provides access to all the functionality of dbx. It also supports attaching to a process as it starts executing to begin debugging it immediately (see the ss_attach man page), and fixing and continuing, which lets you relink source files after you make changes, without recompiling the entire program. Performance AnalyzerMPI analysis has been tested with the following:
Note that for these non-Sun MPI implementations, you must build the MPI distribution and your MPI application with the same compiler and shared libraries for successful collection of data for the MPI calls. DTrace GUI Plug-inThe NetBeans DTrace GUI plug-in, version 0.4, is a Graphical User Interface (GUI) for running DTrace scripts, even those that are embedded in shell scripts. In fact, the DTrace GUI plug-in runs all of the scripts that are packaged in the DTraceToolkit. The DTraceToolkit is a collection of useful documented scripts developed by the OpenSolaris DTrace community. For documentation for the 0.4 version of the plugin that is included in this Express release, see http://www.netbeans.org/kb/docs/ide/NetBeans_DTrace_GUI_Plugin_0_4.html Sun Studio July 2008 Express ReleaseThe following features and changes were introduced in the July 2008 Express release and are carried over into the current release. Performance AnalyzerThe Performance Analyzer includes major enhancements for analyzing MPI programs:
You can find details about how to use the new features in the Sun Studio Performance Analyzer for MPI programs on the Experiment FormatThe experiment format has been changed:
Other changes to the collect command
collector Command in dbx DebuggerThe collector command in the dbx debugger has better support for Intel Core2 and AMD Family 10h processors. Other changes to the Performance Analyzer
er_print Command
Thread AnalyzerTwo new interfaces have been added to the libtha API:
IDEThe IDE in this Sun Studio Express release is based on NetBeans IDE 6.1, and includes the following new features:
sunstudio Command OptionsThe following changes have been made to the options of the sunstudio command:
C, C++ and Fortran compilers
(SPARC platforms only) Instructs the compiler to create binaries that can later be transformed by binary modification tools like binopt(1). Future binary analysis, code coverage, and memory error detection tools will also work with binaries built with this option. Use the -xannotate=no option to prevent the modification of the binary file by these tools. The -xannotate=yes option must be used with optimization level -xO1 or higher to be effective, and is effective only on systems with the new linker support library interface -ld_open(). The new compiler support library libld_annotate.so uses this new interface. If the compiler is used on a system without this linker interface (for example, Solaris OS 9), it silently reverts to -xannotate=no. The new linker interface is provided by the fix to bug 6479848. This fix is available in Solaris patch 127111-07 and in current versions of OpenSolaris. The default is -xannotate=yes, but if either of the above conditions is not met, the default reverts to -xannotate=no.
C compiler
A case range behaves exactly as if a case label had been specified for each value in the given range from low to high inclusive. If low and high are equal, the case range specifies only the one value. The lower and upper values must conform to the requirements of the C standard; that is, they must be valid integer constant expressions (C standard 6.8.4.2). You can freely intermix case ranges and case labels, and you can specify multiple case ranges within a switch statement. The following code is a programming example of a case range: enum kind { alpha, number, white, other }; enum kind char_class(char c) { enum kind result; switch(c) { case 'a' ... 'z': case 'A' ... 'Z': result = alpha; break; case '0' ... '9': result = number; break; case ' ': case '\n': case '\t': case '\r': case '\v': result = white; break; default: result = other; break; } return result; } If an endpoint of a case range is a numeric literal, leave white space around the ellipsis (...) to avoid having one of the dots treated as a decimal point. For example: case 0...4; //error case 5 ... 9; // ok
C++ compiler
This pragma requests that the specified list of routines always be compiled to have a complete stack frame (as defined in the System V ABI).This pragma is permitted only after the prototypes for the specified functions are declared. The pragma must precede the end of the function. If a function name is overloaded, the most recently declared function is chosen. Using the pragma after the function prototype: extern void foo(int); extern void bar(int); #pragma must_have_frame(foo, bar) Using the pragma inside the function definition: void foo(int) { . #pragma must_have_frame(foo) . return; }
The following code is a programming example of a case range: enum kind { alpha, number, white, other }; enum kind char_class(char c) { enum kind result; switch(c) { case 'a' ... 'z': case 'A' ... 'Z': result = alpha; break; case '0' ... '9': result = number; break; case ' ': case '\n': case '\t': case '\r': case '\v': result = white; break; default: result = other; break; } return result; } If an endpoint of a case range is a numeric literal, leave white space around the ellipsis (...) to avoid having one of the dots treated as a decimal point. For example: case 0...4; //error case 5 ... 9; // ok
For more information, see 5.33 Specifying Attributes of Types in the GNU Manual.
For more information, see 5.32 Specifying Attributes of Variables in the GNU Manual
For more information, see section 3.6 Variadic Macros in The C Preprocessor. Fortran compiler
dbx debugger
Sun Performance Library
OpenMP 3.0This Express release includes support for OpenMP 3.0 features in the C, C++, and Fortran compilers.
For more information about these features, please refer to the D-Light ToolThe objective of the D-Light tool is to make sophisticated application and system profiling, accessible. There are many tools that profile applications and there are other tool that profile the system stack, but there are few tools that can join these views into an easy to use interface. For the first time, you can optimize your application and system environment by visualizing performance bottlenecks and resource contention up and down the application system stack. Using an intuitive drag and drop interface, the D-Light tool provides an extensible library of instruments that represent the latest advances of profiling technology, including Solaris Dynamic Tracing (DTrace). With instruments like CPU accountant and Sampler, developers can use the interactive GUI to quickly profile and peer into the runtime behavior of their applications. For more information on using the D-Light tool, refer to the Project D-Light Tutorial. The D-Light Tool is now supported on Linux platforms for twi instruments: Clock Profiler (based on Performance Analyzer) and Java Ticker. DTrace GUI Plug-inThe NetBeans DTrace GUI plug-in is a Graphical User Interface (GUI) for running DTrace scripts, even those that are embedded in shell scripts. In fact, the DTrace GUI plug-in runs all of the scripts that are packaged in the DTraceToolkit. The DTraceToolkit is a collection of useful documented scripts developed by the OpenSolaris DTrace community. For documentation for the 0.2 version of the plugin that is included in this Express release, see http://www.netbeans.org/kb/60/ide/NetBeans_DTrace_GUI_Plugin.html The 0.4 version of the plugin, which includes the Chime graphical tool for visualizing DTrace aggregations, is now available for download from the NetBeans Plugin Portal. To download and install this version, choose Tools->Plugins in the Sun Studio IDE, and select DTrace from the Available Plugins list. Automatic Tuning and Troubleshooting System (ats)ats is a binary reoptimization and recompilation tool that can be used for tuning and troubleshooting applications. ats works by rebuilding the compiled PEC binary; the original source code is not required. Examples of what can be achieved using ats are:
There is an In this Express release, ats is available for the Solaris OS on SPARC and x86/x64 platforms. Binary Improvement Tool (bit)bit is a suite of tools for improving binaries. These tools are used via six subcommands:
There is a In this Express release, bit is only available for the Solaris OS on SPARC platforms. DiscoverSun Memory Error Discovery Tool (Discover) is a tool used to detect programming errors related to the allocation and use of program memory at runtime. Examples of errors detected by Discover include:
There is a Discover is only available on Solaris OS on SPARC platforms for Express. The Simple Performance Optimization Tool (SPOT)The spot command runs a set of performance tools on the target application and renders the output as a set of hyperlinked web pages.The spot command can be used in two ways:
There is a In this Express release, the spot command is available only for the Solaris OS on SPARC platforms. SSSE3, SSE4.1, and SSE4.2 IntrinsicsSSSE3SSSE3 Assembler Syntax and Corresponding Compiler Intrinsics PSIGNB, PSIGNW, PSIGND - Syntax: psignb/psignw/psignd mem64/mmxreg, mmxreg psignb/psignw/psignd mem128/xmmxreg, xmmxreg - Semantic: Packed Sign - Corresponding intrinsics: extern __m64 _mm_sign_pi8 (__m64 p1, __m64 p2); extern __m64 _mm_sign_pi16 (__m64 p1, __m64 p2); extern __m64 _mm_sign_pi32 (__m64 p1, __m64 p2); extern __m128i _mm_sign_epi8 (__m128i p1, __m128i p2); extern __m128i _mm_sign_epi16 (__m128i p1, __m128i p2); extern __m128i _mm_sign_epi32 (__m128i p1, __m128i p2); PABSB, PABSW, PABSD - Syntax: pabsb/pabsw/pabsd mem64/mmxreg, mmxreg pabsb/pabsw/pabsd mem128/xmmxreg, xmmxreg - Semantic: Packed Absolute Value - Corresponding intrinsics: extern __m64 _mm_abs_pi8 (__m64 p); extern __m64 _mm_abs_pi16 (__m64 p); extern __m64 _mm_abs_pi32 (__m64 p); extern __m128i _mm_abs_epi8 (__m128i p); extern __m128i _mm_abs_epi16 (__m128i p); extern __m128i _mm_abs_epi32 (__m128i p); PALIGNR - Syntax: palignr imm, mem64/mmxreg, mmxreg palignr imm, mem128/xmmreg, xmmreg - Semantic: Packed Align Right - Corresponding intrinsics: extern __m64 _mm_alignr_pi8 (__m64 p1, __m64 p2, int immd); extern __m128i _mm_alignr_epi8 (__m128i p1, __m128i p2, int immd); PSHUFB - Syntax: pshufb mem64/mmxreg, mmxreg pshufb mem128/xmmxreg, xmmxreg - Semantic: Packed Shuffle Bytes - Corresponding intrinsics: extern __m64 _mm_shuffle_pi8 (__m64 p1, __m64 p2); extern __m128i _mm_shuffle_epi8 (__m128i p1, __m128i p2); PMULHRSW - Syntax: pmulhrsw mem64/mmxreg, mmxreg pmulhrsw mem128/xmmxreg, xmmxreg - Semantic: Packed Multiply High with Round and Scale - Corresponding intrinsics: extern __m64 _mm_mulhrs_pi16 (__m64 p1, __m64 p2); extern __m128i _mm_mulhrs_epi16 (__m128i p1, __m128i p2); PMADDUBSW - Syntax: pmaddubsw mem64/mmxreg, mmxreg pmaddubsw mem128/xmmxreg, xmmxreg - Semantic: Multiply and Add Packed Signed and Unsigned Bytes - Corresponding intrinsics: extern __m64 _mm_maddubs_pi16 (__m64 p1, __m64 p2); extern __m128i _mm_maddubs_epi16 (__m128i p1, __m128i p2); PHSUBW, PHSUBD - Syntax: phsubw/phsubd mem64/mmxreg, mmxreg phsubw/phsubd mem128/xmmxreg, xmmxreg - Semantic: Packed Horizontal Subtract - Corresponding intrinsics: extern __m64 _mm_hsub_pi16 (__m64 p1, __m64 p2); extern __m64 _mm_hsub_pi32 (__m64 p1, __m64 p2); extern __m128i _mm_hsub_epi16 (__m128i p1, __m128i p2); extern __m128i _mm_hsub_epi32 (__m128i p1, __m128i p2); PHSUBSW - Syntax: phsubsw mem64/mmxreg, mmxreg phsubsw mem128/xmmxreg, xmmxreg - Semantic: Packed Horizontal Subtract and Saturate Words - Corresponding intrinsics: extern __m64 _mm_hsubs_pi16 (__m64 p1, __m64 p2); extern __m128i _mm_hsubs_epi16 (__m128i p1, __m128i p2); PHADDW, PHADDD - Syntax: phaddw/phaddd mem64/mmxreg, mmxreg phaddw/phaddd mem128/xmmxreg, xmmxreg - Semantic: Packed Horizontal Add - Corresponding intrinsics: extern __m64 _mm_hadd_pi16 (__m64 p1, __m64 p2); extern __m64 _mm_hadd_pi32 (__m64 p1, __m64 p2); extern __m128i _mm_hadd_epi16 (__m128i p1, __m128i p2); extern __m128i _mm_hadd_epi32 (__m128i p1, __m128i p2); PHADDSW - Syntax: phaddsw mem64/mmxreg, mmxreg phaddsw mem128/xmmxreg, xmmxreg - Semantic: Packed Horizontal Add and Saturate Words - Corresponding intrinsics: extern __m64 _mm_hadds_pi16 (__m64 p1, __m64 p2); extern __m128i _mm_hadds_epi16 (__m128i p1, __m128i p2); SSE4.1SSE4.1 Assembler Syntax and Corresponding Compiler Intrinsics (Rev 1.0) BLENDPD/BLENDPS - Syntax: Blend packed double/single precision floating point values blendpd/blendps $imm8, xmmreg/mem128, xmmreg - Semantic: Copy elements from one location to another based on bits of an immediate operand - Corresponding intrinsics: __m128d _mm_blend_pd(__m128d p1, __m128d p2, const int immd); __m128 _mm_blend_ps(__m128 p1, __m128 p2, const int immd); BLENDVPD/BLENDVPS - Syntax: Variable blend double/single precision floating point values blendvpd/blendvps xmmreg/mem128, xmmreg blendvpd/blendvps XMMREG, xmmreg/mem128, xmmreg - Semantic: Copy elements from one location to another based on bits in register XMMREG - Corresponding intrinsics: __m128d _mm_blendv_pd(__m128d p1, __m128d p2, __m128d p3); __m128 _mm_blendv_ps(__m128 p1, __m128 p2, __m128 p3); DPPD/DPPS - Syntax: Dot product of packed double/single precision floating point values dppd/dpps $imm8, xmmreg/mem128, xmmreg - Semantic: Based on bits in the immediate operand to select which of the entries in the input to multiply and accumulate, and to select whether to put 0 or the dot-product in the correspondent field of the result register - Corresponding intrinsics: __m128d _mm_dp_pd(__m128d p1, __m128d p2, const int immd); __m128 _mm_dp_ps(__m128 p1, __m128 p2, const int immd); EXTRACTPS - Syntax: Extract packed single precision floating point value extractps $imm8, xmmreg, reg32/mem32 extractps $imm8, xmmreg, reg64/mem64 - Semantic: Based on bits in the immediate operand to extract a field from the source register and insert it into an x86 register or memory address - Corresponding intrinsics: int _mm_extract_ps(__m128 p1, const int immd); INSERTPS - Syntax: Insert packed single precision floating point value insertps $imm8, xmmreg/mem32, xmmreg - Semantic: Load a floating point value from memory indicated by mem32 or based on bits in the immediate operand to select a single precision floating point value from the source xmmreg and insert it into the destination register also based on the bits of the immediate operand - Corresponding intrinsics: __m128 _mm_insert_ps(__m128 p1, __m128 p2, const int immd); MOVNTDQA - Syntax: Load 16 bytes with non-temporal Algined Hint movntdqa mem128, xmmreg - Semantic: Load from write-combining memory area into xmm register - Corresponding intrinsics: __m128i _mm_stream_load_si128(__m128i *p); MPSADBW - Syntax: Calculate muliple packed sums of absolute difference mpsadbw $imm8, xmmreg/mem128, xmmreg - Semantic: Based on bits in the immediate operand to select the destination and source fields to be used, compute eight offset sums of absolute differences for (|x0-y0|+|x1-y1|+|x2-y2|+...) - Corresponding intrinsics: __m128i _mm_mpsadbw_epu8(__m128i p1, __m128i p2, const int immd); PACKUSDW - Syntax: Pack with Unsigned Saturation packusdw xmmreg/mem128, xmmreg - Semantic: Convert signed 4 bytes in source and destination operands into unsigned 2 bytes with saturation - Corresponding intrinsics: __m128i _mm_packus_epi32(__m128i p1, __m128i p2); PBLENDW - Syntax: Blend packed 16-byte words pblendw $imm8, xmmreg/mem128, xmmreg - Semantic: Based on bits in the immediate operand, select 16-byte values from the second and destination operands to be stored into the destination operand - Corresponding intrinsics: __m128i _mm_blend_epi16(__m128i p1, __m128i p2, const int p3); PCMPEQQ - Syntax: Compare packed 64-bit values for equality pcmpeqq xmmreg/mem128, xmmreg - Semantic: Compare packed 64-bit values in source and destination operand for equality. Set all 0s or all 1s in destination register as result - Corresponding intrinsics: __m128i _mm_cmpeq_epi64(__m128i p1, __m128i p2); PEXTRB/PEXTRW/PEXTRD/PEXTRQ - Syntax: Extract byte/16-bit value/32-bit value/64-bit value pextrb $imm8, xmmreg, reg32/mem8 pextrb $imm8, xmmreg, reg64/mem8 pextrw $imm8, xmmreg, reg32/mem16 pextrw $imm8, xmmreg, reg64/mem16 pextrd $imm8, xmmreg, reg32/mem32 pextrq $imm8, xmmreg, reg64/mem64 - Semantic: Based on bits in the immediate operand to select and extract a 8/16/32/64-bit value from the xmmreg and store into the destination operand - Corresponding intrinsics: int _mm_extract_epi8(__m128i p1, const int immd); int _mm_extract_epi16(__m128i p1, const int immd); int _mm_extract_epi32(__m128i p1, const int immd); long long _mm_extract_epi64(__m128i p1, const int immd); PHMINPOSUW - Syntax: Packed horizontal 16-bit value minimum phminposuw xmmreg/mem128, xmmreg - Semantic: Find the minimum unsigned 16-bit value in the source operand and place the value and its index in the destination register - Corresponding intrinsics: __m128i _mm_minpos_epu16(__m128i p1); PINSRB/PINSRD/PINSRQ - Syntax: Insert byte, 32-bit value, 64-bit value pinsrb $imm8, reg32/mem8, xmmreg pinsrd $imm8, reg32/mem32, xmmreg pinsrq $imm8, reg64/mem64, xmmreg - Semantic: Based on the bits in the immediate operand to insert the byte/32-bit/64-bit value from the source operand into the destination xmm register - Corresponding intrinsics: __m128i _mm_insert_epi8(__m128i p1, int p2, const int immd); __m128i _mm_insert_epi32(__m128i p1, int p2, const int immd); __m128i _mm_insert_epi64(__m128i p1, long long p2, const int immd); PMAXSB/PMAXSD - Syntax: Maximum of packed signed byte/32-bit integers pmaxsb xmmreg/mem128, xmmreg pmaxsd xmmreg/mem128, xmmreg - Semantic: Compare the packed signed byte/32-bit values in the 2 operands and store the maximum packed values in the destination register - Corresponding intrinsics: __m128i _mm_max_epi8(__m128i p1, __m128i p2); __m128i _mm_max_epi32(__m128i p1, __m128i p2); PMAXUW/PMAXUD - Syntax: Maximum of packed unsigned 16-bit/32-bit integers pmaxuw xmmreg/mem128, xmmreg pmaxud xmmreg/mem128, xmmreg - Semantic: Compare the packed unsigned 16-bit/32-bit values in the 2 operands and store the maximum packed values in the destination register - Corresponding intrinsics: __m128i _mm_max_epu16(__m128i p1, __m128i p2); __m128i _mm_max_epu32(__m128i p1, __m128i p2); PMINSB/PMINSD - Syntax: Minimum of packed signed byte/32-bit integers pminsb xmmreg/mem128, xmmreg pminsd xmmreg/mem128, xmmreg - Semantic: Compare the packed signed byte/32-bit values in the 2 operands and store the minimum packed values in the destination register - Corresponding intrinsics: __m128i _mm_min_epi8(__m128i p1, __m128i p2); __m128i _mm_min_epi32(__m128i p1, __m128i p2); PMINUW/PMINUD - Syntax: Minimum of packed unsigned 16-bit/32-bit integers pminuw xmmreg/mem128, xmmreg pminud xmmreg/mem128, xmmreg - Semantic: Compare the packed unsigned 32-bit values in the 2 operands and store the minimum packed values in the destination register - Corresponding intrinsics: __m128i _mm_min_epu32(__m128i p1, __m128i p2); __m128i _mm_min_epu16(__m128i p1, __m128i p2); PMOVSXBW/PMOVSXBD/PMOVSXBQ/PMOVSXWD/PMOVSXWQ/PMOVSXDQ - Syntax: Move packed values with sign extension pmovsxbw xmmreg/mem64, xmmreg pmovsxbd xmmreg/mem32, xmmreg pmovsxbq xmmreg/mem16, xmmreg pmovsxwd xmmreg/mem64, xmmreg pmovsxwq xmmreg/mem32, xmmreg pmovsxdq xmmreg/mem64, xmmreg - Semantic: Sign extend 8/4/2 packed 8-bit values or 4/2 packed 16-bit values or 2 packed 32-bit values in the source operand and move it into 8/4/2 packed 16-bit/32-bit/64-bit values or 4/2 packed 32-bit/64-bit values or 2 packed 64-bit values in the destination register respectively - Corresponding intrinsics: __m128i _mm_cvtepi8_epi16(__m128i p1); __m128i _mm_cvtepi8_epi32(__m128i p1); __m128i _mm_cvtepi8_epi64(__m128i p1); __m128i _mm_cvtepi16_epi32(__m128i p1); __m128i _mm_cvtepi16_epi64(__m128i p1); __m128i _mm_cvtepi32_epi64(__m128i p1); PMOVZXBW/PMOVZXBD/PMOVZXBQ/PMOVZXWD/PMOVZXWQ/PMOVZXDQ - Syntax: Move packed values with zero extension pmovzxbw xmmreg/mem64, xmmreg pmovzxbd xmmreg/mem32, xmmreg pmovzxbq xmmreg/mem16, xmmreg pmovzxwd xmmreg/mem64, xmmreg pmovzxwq xmmreg/mem32, xmmreg pmovzxdq xmmreg/mem64, xmmreg - Semantic: Zero extend 8/4/2 packed 8-bit values or 4/2 packed 16-bit values or 2 packed 32-bit values in the source operand and move it into 8/4/2 packed 16-bit/32-bit/64-bit values or 4/2 packed 32-bit/64-bit values or 2 packed 64-bit values in the destination register respectively - Corresponding intrinsics: __m128i _mm_cvtepu8_epi16(__m128i p1); __m128i _mm_cvtepu8_epi32(__m128i p1); __m128i _mm_cvtepu8_epi64(__m128i p1); __m128i _mm_cvtepu16_epi32(__m128i p1); __m128i _mm_cvtepu16_epi64(__m128i p1); __m128i _mm_cvtepu32_epi64(__m128i p1); PMULDD/PMULDQ - Syntax: Multiply packed signed 32-bit/64-bit integers pmuldd xmmreg/mem128, xmmreg pmuldq xmmreg/mem128, xmmreg - Semantic: Multiply the packed signed 32-bit/64-bit values in the 2 operands and store the 32-bit/64-bit result in the destination register - Corresponding intrinsics: __m128i _mm_mullo_epi32(__m128i p1, __m128i p2); __m128i _mm_mul_epi32(__m128i p1, __m128i p2); PTEST - Syntax: Logical compare ptest xmmreg/mem128, xmmreg - Semantic: Set the Z flag if any of the bits in the 2 operands matched and the C flag if all of them matched. - Corresponding intrinsics: int _mm_testz_si128(__m128i p1, __m128i p2); int _mm_testc_si128(__m128i p1, __m128i p2); int _mm_testnzc_si128(__m128i p1, __m128i p2); ROUNDPS/ROUNDPD - Syntax: Round packed single/double precision floating point values roundps $imm8, xmmreg/mem128, xmmreg roundpd $imm8, xmmreg/mem128, xmmreg - Semantic: Based on the rounding mode in the immediate operand, round the single/double precision packed values in the source operand and place them in the destination register - Corresponding intrinsics: __m128 _mm_round_ps(__m128 p1, int immd); __m128 _mm_floor_ps(__m128 p1); __m128 _mm_cell_ps(__m128 p1); __m128d _mm_round_pd(__m128d p1, int immd); __m128d _mm_floor_pd(__m128d p1); __m128d _mm_cell_pd(__m128d p1); ROUNDSS/ROUNDSD - Syntax: Round scalar single/double precision floating point values roundss $imm8, xmmreg/mem64, xmmreg roundsd $imm8, xmmreg/mem32, xmmreg - Semantic: Based on the rounding mode in the immediate operand, round the single/double precision scalar low value in the source operand and place it in the destination register - Corresponding intrinsics: __m128 _mm_round_ss(__m128 p1, __m128 p2, int immd); __m128 _mm_floor_ss(__m128 p1, __m128 p2); __m128 _mm_cell_ss(__m128 p1, __m128 p2); __m128d _mm_round_sd(__m128d p1, __m128d p2, int immd); __m128d _mm_floor_sd(__m128d p1, __m128d p2); __m128d _mm_cell_sd(__m128d p1, __m128d p2); SSE4.2SSE4.2 Assembler Syntax and Corresponding Compiler Intrinsics CRC32 - Syntax: Accumulate CRC32 value crc32 reg8/reg16/reg32/mem8/mem16/mem32, reg32 crc32 reg8/reg64/mem8/mem64, reg64 crc32b reg8/mem8, reg32 crc32b reg8/mem8, reg64 crc32w reg16/mem16, reg32 crc32l reg32/mem32, reg32 crc32q reg64/mem64, reg64 - Semantic: Accumulate CRC32C value using the polynomial 0x11edc7f41 - Corresponding intrinsics: unsigned int _mm_crc32_u8(unsigned int crc, unsigned char data); unsigned int _mm_crc32_u16(unsigned int crc, unsigned short data); unsigned int _mm_crc32_u32(unsigned int crc, unsigned int data); unsigned long long _mm_crc32_u64(unsigned long long crc, unsigned long long data); PCMPESTRI - Syntax: Packed compare explicit length strings, return index pcmestri $imm8, xmmreg/mem128, xmmreg - Semantic: Perform packed comparison of string data with explicit lengths, generate an index stored to %ecx - Corresponding intrinsics: int _mm_cmpestri(__m128i a, int len_a, __m128i b, int len_b, const int imm); int _mm_cmpestra(__m128i a, int len_a, __m128i b, int len_b, const int imm); int _mm_cmpestrc(__m128i a, int len_a, __m128i b, int len_b, const int imm); int _mm_cmpestro(__m128i a, int len_a, __m128i b, int len_b, const int imm); int _mm_cmpestrs(__m128i a, int len_a, __m128i b, int len_b, const int imm); int _mm_cmpestrz(__m128i a, int len_a, __m128i b, int len_b, const int imm); PCMPESTRM - Syntax: Packed compare explicit length strings, return mask pcmestrm $imm8, xmmreg/mem128, xmmreg - Semantic: Perform packed comparison of string data with explicit lengths, generate a mask stored to %xmm0 - Corresponding intrinsics: int _mm_cmpestrm(__m128i a, int len_a, __m128i b, int len_b, const int imm); int _mm_cmpestra(__m128i a, int len_a, __m128i b, int len_b, const int imm); int _mm_cmpestrc(__m128i a, int len_a, __m128i b, int len_b, const int imm); int _mm_cmpestro(__m128i a, int len_a, __m128i b, int len_b, const int imm); int _mm_cmpestrs(__m128i a, int len_a, __m128i b, int len_b, const int imm); int _mm_cmpestrz(__m128i a, int len_a, __m128i b, int len_b, const int imm); PCMISPTRI - Syntax: Packed compare implicit length strings, return index pcmistri $imm8, xmmreg/mem128, xmmreg - Semantic: Perform packed comparison of string data with implicit lengths, generate an index stored to %ecx - Corresponding intrinsics: int _mm_cmpistri(__m128i a, __m128i b, const int imm); int _mm_cmpistra(__m128i a, __m128i b, const int imm); int _mm_cmpistrc(__m128i a, __m128i b, const int imm); int _mm_cmpistro(__m128i a, __m128i b, const int imm); int _mm_cmpistrs(__m128i a, __m128i b, const int imm); int _mm_cmpistrz(__m128i a, __m128i b, const int imm); PCMPESTRM - Syntax: Packed compare explicit length strings, return mask pcmestrm $imm8, xmmreg/mem128, xmmreg - Semantic: Perform packed comparison of string data with explicit lengths, generate a mask stored to %xmm0 - Corresponding intrinsics: int _mm_cmpestrm(__m128i a, int len_a, __m128i b, int len_b, const int imm); int _mm_cmpestra(__m128i a, int len_a, __m128i b, int len_b, const int imm); int _mm_cmpestrc(__m128i a, int len_a, __m128i b, int len_b, const int imm); int _mm_cmpestro(__m128i a, int len_a, __m128i b, int len_b, const int imm); int _mm_cmpestrs(__m128i a, int len_a, __m128i b, int len_b, const int imm); int _mm_cmpestrz(__m128i a, int len_a, __m128i b, int len_b, const int imm); PCMISPTRI - Syntax: Packed compare implicit length strings, return index pcmistri $imm8, xmmreg/mem128, xmmreg - Semantic: Perform packed comparison of string data with implicit lengths, generate an index stored to %ecx - Corresponding intrinsics: int _mm_cmpistri(__m128i a, __m128i b, const int imm); int _mm_cmpistra(__m128i a, __m128i b, const int imm); int _mm_cmpistrc(__m128i a, __m128i b, const int imm); int _mm_cmpistro(__m128i a, __m128i b, const int imm); int _mm_cmpistrs(__m128i a, __m128i b, const int imm); int _mm_cmpistrz(__m128i a, __m128i b, const int imm); PCMPESTRM - Syntax: Packed compare explicit length strings, return mask pcmestrm $imm8, xmmreg/mem128, xmmreg - Semantic: Perform packed comparison of string data with explicit lengths, generate a mask stored to %xmm0 - Corresponding intrinsics: int _mm_cmpestrm(__m128i a, int len_a, __m128i b, int len_b, const int imm); int _mm_cmpestra(__m128i a, int len_a, __m128i b, int len_b, const int imm); int _mm_cmpestrc(__m128i a, int len_a, __m128i b, int len_b, const int imm); int _mm_cmpestro(__m128i a, int len_a, __m128i b, int len_b, const int imm); int _mm_cmpestrs(__m128i a, int len_a, __m128i b, int len_b, const int imm); int _mm_cmpestrz(__m128i a, int len_a, __m128i b, int len_b, const int imm); PCMISPTRI - Syntax: Packed compare implicit length strings, return index pcmistri $imm8, xmmreg/mem128, xmmreg - Semantic: Perform packed comparison of string data with implicit lengths, generate an index stored to %ecx - Corresponding intrinsics: int _mm_cmpistri(__m128i a, __m128i b, const int imm); int _mm_cmpistra(__m128i a, __m128i b, const int imm); int _mm_cmpistrc(__m128i a, __m128i b, const int imm); int _mm_cmpistro(__m128i a, __m128i b, const int imm); int _mm_cmpistrs(__m128i a, __m128i b, const int imm); int _mm_cmpistrz(__m128i a, __m128i b, const int imm); PCMPISTRM - Syntax: Packed compare implicit length strings, return mask pcmistrm $imm8, xmmreg/mem128, xmmreg - Semantic: Perform packed comparison of string data with implicit lengths, generate a mask stored to %xmm0 - Corresponding intrinsics: int _mm_cmpistrm(__m128i a, __m128i b, const int imm); int _mm_cmpistra(__m128i a, __m128i b, const int imm); int _mm_cmpistrc(__m128i a, __m128i b, const int imm); int _mm_cmpistro(__m128i a, __m128i b, const int imm); int _mm_cmpistrs(__m128i a, __m128i b, const int imm); int _mm_cmpistrz(__m128i a, __m128i b, const int imm); PCMPGTQ - Syntax: Compare packed data for greater than pcmpgtq xmmreg/mem128, xmmreg - Semantic: Compare packed 64-bit values in xmmreg/mem128 with xmmreg. Set corresponding data in destination register to all 1s or 0s based on the result of the greater than compare. - Corresponding intrinsics: __m128i _mm_cmpgt_epi64(__m128i a, __m128i b); POPCNT - Syntax: Population count popcnt reg16/mem16, reg16 popcnt reg32/mem32, reg32 popcnt reg64/mem64, reg64 - Semantic: Count the number of set bits in reg/mem - Corresponding intrinsics: int _mm_popcnt_u32(unsigned int a); long long _mm_popcnt_u64(unsigned long long a); Assembler on x86 PlatformsTwo new options:
Major area of changes:
For mov $10, mem the Sun Studio Assembler defaults to movl $10, mem
|