インテル® VTune™ Amplifier 2018 ヘルプ

Running Command Line Analysis

To run an analysis from command line, use any of the following actions:

Command Line Analysis Workflow

To run an analysis from the command line, do the following:

  1. Choose an appropriate analysis type: predefined analysis or custom analysis.
  2. Configure Analysis Options

Choosing an Analysis Type

For a list of available predefined analysis types, enter:

amplxe-cl -help collect

Intel® VTune™ Amplifier displays all collection options and provides a list of available predefined analyses that can be categorized into the following analysis modules:

Analysis Type

Description

hotspots

Analyze application flow and identify sections of code that take a long time to execute (hotspots).

advanced-hotspots

Extend the hotspots analysis by collecting call stacks, context switch and statistical call count data as well as analyzing the CPI (Cycles Per Instruction) metric.

concurrency

Collect data on how an application is using available logical CPU cores, discover where parallelism is incurring synchronization overhead, and identify potential candidates for parallelization.

locksandwaits

Identify where an application is waiting on synchronization objects or I/O operations, and discover how these waits affect the application performance.

hpc-performance

Identify opportunities to optimize CPU, memory, and FPU utilization for compute-intensive or throughput applications. The HPC Performance Characterization analysis type is a starting point for understanding the performance landscape of your application. Use this analysis type to improve application performance by increasing the number of floating-point operations per second (GFLOPS) and reducing the overall application run time. The analysis collects data related to CPU, memory, and FPU utilization. Additional scalability metrics are available for applications that use OpenMP* or MPI runtime libraries.

general-exploration

Collect hardware events for analyzing a typical client application. This analysis calculates a set of predefined ratios used for the metrics and facilitates identifying hardware-level performance problems.

memory-access

Identify memory-related issues, like NUMA problems and bandwidth-limited accesses, and attribute performance events to memory objects (data structures), which is provided due to instrumentation of memory allocations/de-allocations and getting static/global variables from symbol information.

sgx-hotspots

Analyze hotspots inside security enclaves for systems with the Intel Software Guard Extensions (Intel SGX) feature enabled. This analysis type uses the INST_RETIRED.PREC_DIST hardware event that emulates precise clockticks and helps identify performance-critical program units inside enclaves.

tsx-exploration

Collect events that help understand Intel Transactional Synchronization Extensions (Intel TSX) behavior and causes of transactional aborts.

tsx-hotspots

Monitor the UOPS_RETIRED.ALL_PS hardware event that emulates precise clockticks and identify performance-critical program units inside transactions.

cpugpu-concurrency

Explore code execution on the various CPU and GPU cores in your system, correlate CPU and GPU activity and identify whether your application is GPU or CPU bound.

gpu-hotspots

Identify Graphics Processing Unit (GPU) tasks with high GPU utilization and estimate the effectiveness of this utilization. This analysis type is intended for analysis of applications that use a GPU for rendering, video processing, and computations with explicit support of Intel® Media SDK and OpenCL™ software technology.

gpu-profiling

Use the GPU In-kernel Profiling to analyze GPU kernel execution per code line and identify performance issues caused by memory latency or inefficient kernel algorithms.

disk-io

Monitor utilization of the disk subsystem, CPU and processor buses. This analysis type uses the hardware event-based sampling collection and system-wide Ftrace* collection (for Linux* and Android* targets)/ETW collection (Windows* targets) to provide a consistent view of the storage sub-system combined with hardware events and an easy-to-use method to match user-level source code with I/O packets executed by the hardware.

Note

This is a PREVIEW FEATURE on Windows* OS. A preview feature may or may not appear in a future production release. It is available for your use in the hopes that you will provide feedback on its usefulness and help determine its future. Data collected with a preview feature is not guaranteed to be backward compatible with future releases. Please send your feedback to parallel.studio.support@intel.com or to intelsystemstudio@intel.com.

system-overview

Evaluate general behavior of Linux* or Android* target systems and correlate power and performance metrics with IRQ handling.

For a list of available custom analysis types, enter:

amplxe-cl -help collect-with

Intel® VTune™ Amplifier displays all collection options and provides a list of available collection types that can used for custom analysis:

Collector Description
runsa

Profile your application using the counter overflow feature of the Performance Monitoring Unit (PMU).

runss

Profile the application execution and take snapshots of how that application utilizes the processors in the system. The collector interrupts a process, collects the value of all active instruction addresses and captures a calling sequence for each of these samples.

Running Analysis

To run a predefined performance analysis from the command line, enter:

amplxe-cl -collect <analysis_type> [-knob <knobName=knobValue>] [--] <target>

where

To run a custom analysis from the command line, enter:

amplxe-cl -collect-with <collection_type> [-knob <knobName=knobValue>] [--] <target>

where

Next Steps

After collecting performance results for your target, you can view the results in the GUI or generate a formatted analysis report.

関連情報