NEC ELECTRONICS GLOBAL
nec electronics global
HOME
APPLICATIONS
PRODUCTS
TECHNOLOGY
SUPPORT
BUY ONLINE
NEWS & EVENTS
ABOUT US
header
GO
AdvancedParametric
SITE MAP CONTACT US

Execution Time

Contents

    
FAQ-ID = v85exec-nnnn
0001: Required clocks for V850 instructions
0002: Pipeline and instruction execution speed [V850E/IA1]
0003: States for bit manipulation instructions [V850/SA1]
0004: Execution speed [All V850]
0005: MIPS value calculation
0006: Programming area that allows the fastest processing
0007: Instruction execution in one clock through pipeline control
v85exec
-0001
Required clocks for V850 instructions
Q1
In which manual are the required clocks for each instruction described?
A1
The execution time of each instruction is described in the "V850 Family Architecture" manual.
Note that the execution time of instructions varies depending on the pipeline status.
Is this information useful for you ?
back to top  

v85exec
-0002
Pipeline and instruction execution speed [V850E/IA1]
Q1
Does an external memory access take three clocks + the number of waits, regardless of whether it is a memory access for instruction fetch or not?
And does an internal memory access take one clock, regardless of whether it is a memory access for instruction fetch or not?
A1
No.
As described in 4.5.1 "Number of access clocks" in the V850E/IA1 Hardware User's Manual, a fetch from internal RAM conflicting with data access requires 2 clocks, and a data access from internal ROM requires 5 clocks.
Is this information useful for you ?
Q2
Assuming that rewrite to RAM is performed in the MEM stage, the execution cycle of this instruction is:
   IF  +  IF +  ID  +  EX1  +  MEM
(Lower)(Higher)
IF + IF: Because these are external RAM fetches, 2 clocks each = a total of 4 clocks are required.
ID: Because the next instruction is fetched, 2 clocks are required.
EX1 + MEM: Because no instruction is fetched and this is an access to internal RAM, 1 clock each = a total of 2 clocks are required.

Overall, a total of 8 clocks are required. Is this right?
A2
No.
External instructions are fetched in the look-ahead queue when the bus is free.
Because external instructions are transferred from this look-ahead queue to the CPU pipeline, IF to ID do not require 2 clocks.

Conversely, the SET1 and CLR1 instructions may require a longer time because these instructions are consecutive, as EX1-MEM-EX2-MEM. Considering these points, at least 7 clocks (including WB) are required.

However, because the look-ahead queue seems unlikely to be so full, it is better to assume that 2 or 3 more clocks are required.
Because these behaviors depend on the previous instruction, it is not really possible to accurately define the number of required clocks.
Is this information useful for you ?
back to top  

v85exec
-0003
States for bit manipulation instructions [V850/SA1]
Q1
Do bit access instructions perform a "read, modify, and write" operation?
Also, does any interrupt occur during such processing?
A1
Any CPU requires more clocks for bit manipulation instructions than for other instructions.
In accordance with these extra required clocks, a 3-clock idle (for the V850/SA1) or a 2-clock idle (for the V850E/MS1) is inserted between IF and ID.
For the detailed operation in this case, refer to the section "Bit Manipulation Instructions" in Chapter 8 "Pipeline" in the Architecture manual.
No interrupt is inserted between read, modify, and write.
Is this information useful for you ?
back to top  

v85exec
-0004
Execution speed [All V850]
Q1
The following program requires 200ns or so for the P1 output to change from high to low to high.
According to the description "The ST.B instruction is executed in 1 clock" in the User's Manual, high-low-high should be executed in 40ns or so.
Why is the execution speed actually so slow?
    MOV     FF,r10
    ST.B    r10,-BFE[r0]
    MOV     0,r10
    ST.B    r10,-BFE[r0]
    MOV     FF,r10
    ST.B    r10,-BFE[r0]
A1
This is because an internal peripheral (P1) is accessed and it takes at least 7 clocks.
The execution time is lengthened by this access time.
If an internal peripheral is not accessed, it does not take so long.
Is this information useful for you ?
Q2
In the V850E/MA1, how many clock cycles are needed for each I/O access? [V850E/MA1]
A2
The access time to the I/O ports (peripheral I/O registers) is determined by settings in the VSWC register, and six clock cycles are required when using the recommended value (12H) at 50 MHz.
Read-modify-write operation is performed for bit manipulation instructions, in which case access requires 13 clock cycles (two access cycles plus operation cycles).
Is this information useful for you ?
Q3
For example, how many clock cycles are needed when writing "1, 0, 1, 0" to an output port? [V850E/MA1]
A3
When outputting just 0, 1 using consecutive instructions, the number of clock cycles is 12.
For consecutive bit manipulation, 26 clock cycles are required, since it is necessary to read the port.
Is this information useful for you ?
Q4
The following type of program was executed while running the V850/SA1 at 20 MHz.
The waveform at P20 was observed, but the waveform was not inverted every 50 ns as expected but was inverted every 400 ns.
The disassembly results show that P2.0 = ~P2.0 is converted to one NOT1 instruction and nothing extra is executed. [V850E/SA1]

    while(1)
    {
         P2.0 = ~P2.0;
             :
             :
         P2.0 = ~P2.0;
    }
A4
There are two reasons for this. One reason is that the NOT1 instruction itself combines three operations (read, modify, and write) for port 2 when it is executed. The second reason is that three clock cycles are required to access port 2.

Remark
In almost all V850 Series products, the internal memory access speed (number of clock cycles) is noted at the start of the "Bus Access" section in the User's Manual.
And for some devices, the speed for accessing on-chip peripherals is also described in the User's Manual.
Some devices include a VSWC register that is used to specify wait times when accessing on-chip peripherals, in which case the number of waits must be set according to the operating frequency.
Is this information useful for you ?
back to top  
(2006/04)

v85exec
-0005
MIPS value calculation
Q1
How can I calculate a MIPS value?
A1
This conversion (calculation) is not simple.
Execute the benchmark program with an in-circuit emulator, etc., and estimate the MIPS value based on the execution results.

The Dhrystone program can be found at the following URL.
http://www.gpul.org/ftp/lang/benchmark/aburto/dhrystone/
Is this information useful for you ?
back to top  
(2006/04)

v85exec
-0006
Programming area that allows the fastest processing
Q1
Which area allows the fastest program execution?
A1
The fastest processing can be achieved when a program is located in internal ROM (including flash memory).
From the internal memory, a 32-bit instruction can be fetched in 1 clock at the fastest.
However, when fetching an instruction from an external memory, it cannot be fetched in 1 clock because the data bus is 16 bits wide.

Remark
The V850E/ME2 does not include any internal ROM (including flash memory), but it can include large-capacity RAM that can fetch a 32-bit instruction in one clock cycle.
It also features a 32-bit external bus width.
Is this information useful for you ?
back to top  
(2006/04)

v85exec
-0007
Instruction execution in one clock through pipeline control
Q1
I measured the execution time by actually executing the program.
However, the results differ from the time stated in the manual.
According to the manual, the CPU executes "almost all the instructions in one clock cycle."
Is this actually true?
A1
When an instruction is fetched from internal ROM and executed without pipeline hazards, almost all instructions can be executed in 1 clock.
However, if an instruction is fetched from external memory, at least 4 clocks are required to fetch a one-word (4-byte) instruction.

This results in requiring four times as much time as when fetching from internal ROM. In addition, some instructions may actually require 2 clocks instead of 1 clock.

If the execution result of an instruction is referenced by the next instruction, extra execution clocks may be required (when instructions are in succession, as in mov 3.r10 followed by st.b r10,11 [r29]).
Because such delay factors are mixed together, the instruction execution time becomes longer.
For details of these behaviors, refer to 5.4 "Number of Instruction Execution Clock Cycles" in the Architecture User's Manual.
Note that it is very difficult to figure out the behavior of the pipeline in detail.
Is this information useful for you ?
back to top  









































 LEGAL  RSS Feeds       © 1995-2008  NEC Electronics Corporation