JTAG debugging can perform a valuable role in multicore development and improve the edit-compile-debug time when integrated with a standards-based integrated development environment such as Eclipse.
Ideal is a single JTAG debugger that leverages the IEEE 1149.1 JTAG standard in daisy-chain and employs JTAG acceleration to improve throughput and performance. Multicore technology delivers significant advantages to hardware and software developers by providing higher processor performance, more effective power usage, and a smaller physical footprint for embedded devices. Multicore solutions are often implemented in tandem with multiprocessing in which multiple processors are used on a single-board or in an integrated system. To unlock the power of multicore and multiprocessing solutions takes more than just silicon; it takes a new approach to debugging that allows software and hardware developers to debug an entire system and optimise the compile-edit-debug process.
JTAG debugging has traditionally been used for hardware bring-up and more recently to augment agent-based debugging. However, on-chip debugging plays a more significant role in multicore and multiprocessing debugging, helping to debug the operating system and middleware by isolating complex interactions between the software running on one or more cores.
Multicore processors contain a single chip with multiple distinct processing engines. These characteristics provide increased CPU performance, functional specification, and partitioning options. Multicore is typically implemented in one of two configurations:
1. Symmetric multiprocessing (SMP): a single operating environment in which an SMP operating system abstracts the hardware from the developer and decides which core to use for each task. This scenario has homogenous cores with shared memory, in which a single operating system runs across multiple cores.
2. Asymmetric multiprocessing (AMP): a collection of interacting but independent operating environments, with a separate instance of an operating system running for each core. This independence means that the environment can be either homogenous (all processors the same, one OS type) or heterogeneous (multiple processor types or operating systems). The added complexity of the multicore environment requires a robust tool-chain for debugging operating systems that run on multiple cores, as well as the hardware related to those cores.
While the conventional definition of multicore is multiple cores in a single die, the real-world use of multicore and its debugging challenges represents a specific instance of the more general multiprocessing case and extends beyond single-die debugging. Developers are also taking that single die and developing solutions using multiple CPUs on a single board with one or more cores. In highly complex systems, developers are writing software that runs across multiple CPU boards in a system using a multicore and multiprocessing technology.
The convergence of multicore and multiprocessing technology is introducing new debugging challenges based on growing system complexity and the requirement to realise the inherent performance potential of multicore through optimised hardware and software development. Specific challenges include:
* How to effectively manage shared resources such as memory and peripherals.
* How to debug OS and application code over multiple cores, boards, and systems.
* How to optimise the JTAG interface and fully utilise the JTAG bandwidth.
* How to debug homogeneous and heterogeneous cores on a single die and then coordinate the debugging over an entire system.
* How to effectively use JTAG debugging with agent-based debug and ensure a smooth handoff between different debug tasks.
* How to ensure synchronisation when debugging applications over multiple cores.
There are three primary technology options for multicore JTAG debugging:
1. A debugger that supports all cores through a single JTAG interface.
2. JTAG muxing (multiplexing) using independent debuggers at a single JTAG debug interface.
3. JTAG linkers or addressable scan ports.
These technology approaches deal with a central issue in debugging multiple cores with a JTAG interface: the limitation of JTAG interfaces by the SoC vendors. To save on costs, many SoC vendors provide only a single JTAG interface on a die, regardless of the number of cores. The challenge for developers is how to cost-effectively use that interface to synchronise the debug of multiple cores and multiple processors.
The single debugger method uses the IEEE 1149.1 standard daisy-chain methodology. In the JTAG interface there are four wires: TDI, TDO, TCK, and TMS. For the purpose of connecting to the JTAG interface in multicore debugging, the relevant wires are the TDI and TDO. In daisy-chaining, the output of the first core is connected to the input of the second core, and so on to reach the maximum number of cores. The daisy-chain methodology is standards-based, widely used, and will work in all the multicore debugging scenarios: single die, multiple CPUs on a board, and complex systems. It also works well in a heterogeneous environment in which more than one processor family and operating system is used for development such as in the mobile handset and consumer electronics devices. In the daisy-chain method, the JTAG debuggers use the software interface of a JTAG server, which manages the addressing of the individual cores, regardless of location, via a single JTAG interface, solving the problem of a limited JTAG connection often found in multicore environments.
The JTAG server also enables the developer to synchronise cores, start and stop processes on the same JTAG clock, and add or remove a connection without impacting microprocessor or device on the scan-chain. This method maintains accurate clocking and facilitates the debugging of different operating systems across multiple cores or different processes in the same operating system running across multiple cores. The key objections to the daisy-chain method are performance and JTAG bandwidth utilisation.
The issue with daisy-chained JTAG is that the amount of data to transmit at the Shift-IR stage depends on the number of devices on said scan-chain as well as the IR length of each device. As an example, it will take 24 bits of data to access an 8-bit IR register of a device in a daisy-chain containing three of those devices. The problem exists also in DR, but is minimised by the fact that a device in bypass mode will only require one bit at the Shift-DR stage.
When the JTAG server is designed properly, such as the JTAG solution offered by Wind River, there is little performance degradation. Wind River has JTAG accelerator and server technology that virtually optimises the JTAG bandwidth by reducing the idle time between JTAG sequence packets, using 100% of available JTAG bandwidth.
The other issue with the JTAG server is the concern regarding additional debugging capabilities such as the ability to use a stop request signal to stop a core immediately or a stop indication signal to stop a core and then synchronising the stopping of all cores. Like all limitations, this one again is dependent upon the vendor implementation. For example, the Wind River Workbench on-chip debugging solution can start and stop multiple cores simultaneously.
Vendors such as Wind River offer JTAG solutions (Workbench On-Chip Debugging) that centralise the multicore and multiprocessing debug function. This solution can simultaneously debug up to eight cores in a single scan-chain regardless of whether those cores are on a single die, board, or in a system configuration.
With the Wind River multicore solution, the developer can stop and start cores simultaneously, set breakpoints on one or more cores, including conditional breakpoints. In addition, the availability of the Workbench Eclipse framework and agent-based debugging enables developers to manage the multicore/multiprocessor projects from a single console. The developer has the flexibility to use the JTAG connection for hardware bring-up, kernel, middleware, and other application functions and then seamlessly move to agent-based debugging when appropriate, all within the same debug application. These capabilities increase collaboration between different developers and improve time to problem resolution. The dominant alternative to the single debugger is JTAG multiplexing. This technique extends the IEEE JTAG specification to support the use of an independent debugger for every core that is connected through a shared JTAG interface. The mux technology enables the developer to access multiple discrete cores on a single die by registering, through a single JTAG interface, the core it wants to debug. The main advantage of this solution is its connection and debugging performance. Because the mux connects to each core individually, it does not have the bit shifting challenge of the daisy-chaining method and provides relatively good performance on a single die. The other advantage is that this solution does not require any modifications on the tool, enabling it to be used effectively across multiple projects.
The main issue with the mux approach is the inability to simultaneously by start and stop cores to synchronise applications across multiple cores in the debug process. With a mux, stopping all the cores requires the developer to stop each core sequentially, introducing delay call skid. The problem with introducing delay in the debug process is that it becomes more difficult to locate problems with the OS, middleware, and application across cores, especially if there is dependence between the parts of the application running on the different cores. For example, if the developer has a product with a DSP and an ARM 9 core with the DSP streaming video and the ARM 9 core providing the file system, synchronisation of the start and stop of the cores is critical. If there is a large delay (skid) between the stopping of the DSP and the starting of the ARM core during debugging, the DSP streaming video can quickly overrun the ARM file buffer and video traffic will be dropped, making it challenging to analyse the problem. Moreover, the muxing process has now introduced a new variable that the developer will have to measure and account for during troubleshooting, dramatically increasing the debugging cycle time.
The final issue occurs in heterogeneous environments when debugging cores from different vendors such as a processor from one vendor and a DSP from another vendor. In this case, muxing is more complex and if the instrumentation is not uniformly available, impossible to execute. This issue is even further compounded when multiple cores run across a system. In that case, muxing alone will not solve the problem; developers will also need to use an addressable scan port. The last technology is the addressable scan port. This architecture requires the use of very specialised components. These components allow the developer to partition the JTAG scan-chain into functional groups with each group being accessed by a unique address. This is a multidrop architecture and is often used in backplanes where a separate addressable scan-chain would be routed across the backplane such that each board in a rack would have a dedicated scan-chain. This architecture is limited in speed by the speed of the addressable scan port itself, typically 25 MHz.
JTAG debugging can perform a valuable role in multicore development and improve the edit-compile-debug time when integrated with a standards-based integrated development environment such as Eclipse. The most optimal technology solution is using a single JTAG debugger that leverages the IEEE 1149.1 JTAG standard in daisy-chain and employs JTAG acceleration to improve throughput and performance. Vendors such as Wind River offer unique capabilities in on-chip debugging that integrate effectively with agent-based debugging to improve the debugging performance in even the most complex environments.
For more information contact Andrew Palmer, Embedded Industrial Solutions, +27 (0)12 547 6071, [email protected]
© Technews Publishing (Pty) Ltd | All Rights Reserved