CPU board bring-up

Photo of the top an SFWA CPU board

After a six-week lead time, we received the three prototype CPU boards on the 13th November. The supplier Daniel used handled the PCB fabrication as well as assembly and QC/X-ray inspection. Just as well, since the BOM has 135 line items including 618 individual components per board—mostly 0402 resistors and capacitors, but also a bunch of QFNs, 0.5mm-pitch MBGAs and other tiny things. The PCBs are densely-packed, double-sided 90×51mm with 12 layers and HDI, so they were the dominant factor in the lead time.

The most important component on the board is the TMS320C6657 DSP, a dual-core part running at 1.0GHz capable of 16GFLOPS/core in single precision and 4GFLOPS/core in double. Peak power consumption in our application will be under 5W, and average (assuming we can meet our CPU usage targets in the NMPC code) should be about 3.5W.

Support chips for the DSP include an Altera MAX II EPM1270 CPLD, which handles power/clock sequencing and I/O routing, a UCD9222 programmable power controller running the DSP’s 1.0V and variable voltage core supplies, and a CDCE62002 programmable PLL/clock generator running the 100MHz DSP core clock input, and the 50MHz DDR3 clock input (converted to 1.0GHz and 667MHz by the DSP’s internal PLLs).

The DSP and DDR3 also require 1.8V, 1.5V, and 0.75V supplies, which are controlled by more pedestrian regulators without things like integrated PMBus or SPI support. (While I initially thought the power and clock setup for the C6657 was insanely over-engineered, the UCD9222 PMBus interface is actually very useful because you can get real-time current, voltage and temperature plots via TI’s Fusion Digital Power Designer tool. The PMBus does make it a bit of a pain to configure on-board, but it’s a worthwhile trade-off. The CDCE62002’s SPI configuration setup is a bit less useful, since you can’t read anything and the parameters are pretty arcane.)

Thankfully from both a cost and timing perspective, Daniel’s DSP hardware design works perfectly—after a day and a half of debugging CPLD power sequencing issues and trying endless combinations of CDCE62002 register settings, we were able to boot the DSP and run a 10-second test of our UKF code which completed with the correct results in the correct time.

CPU board top silkscreen with PCB dimensions

According to the memory test built into the EVM6657’s GEL file (which we used with minor modifications to disable interfaces we haven’t exposed), the DDR3 implementation is also working correctly; a relief to Daniel since the DDR3 trace layout was an absolute pain. We have 256MB on the board; at this stage we don’t know if we’ll need it, but it’s better to have and not need than the reverse.

Despite the (bold, italic) warnings in the TI datasheet, the C6657 is actually pretty tolerant of completely incorrect power-up sequences. The canonical approach is either (CVDD, CVDD10, DVDD18, DVDD15, PLL lock, end reset) or (DVDD18, CVDD, CVDD10, DVDD15, PLL lock, end reset), with the whole process completing in under 100ms, but while sorting the CPLD out we had CVDD15 high without CVDD18 for several minutes, or all supplies high but no clock, or clock without any supplies. None of that did any detectable damage to the DSP, so it’s nowhere near as fragile as I’d expected.

We haven’t yet validated all of the DSP peripherals—the NET2272 USB chip may be the source of some leakage into the DSP’s 1.8V rail, and I haven’t yet written drivers for the UART on the DSP’s EMIF port—but we’ve tested enough to confirm that our FCS code has the hardware it needs to run on board our UAV.

Since the three major risks with our DIY approach were UKF accuracy, sufficiently-performant hardware, and NMPC feasibility, and we’ve resolved the first two of those without running into any major problems, we’re confident we’ll have time to give the control implementation the focus it needs.

The in-progress FCS code (including Verilog for the CPLD) is available sfwa/fcs. Note however that the DSP firmware is not going to compile, let alone run, for some time.

(Aside: in case you’re wondering why we’re not running our FCS one of the recent fast ARM chips designed for consumer mobile devices—like, for example, the quad-core 1.7GHz Exynos 4412 that will run our comms and image recognition code—there are two main factors:

  1. ARM floating-point performance is much lower than the DSP’s. On a per-core basis the Exynos 5250 at 1.7GHz achieves 30% lower peak performance than the C6657 at 1.0GHz, and while DSP performance for general-purpose code is lower, for loop-heavy floating-point code it’s easier to reach peak performance on the DSP than using ARM NEON.
  2. The C6657’s steady-state power consumption is likely to be somewhat lower than an ARM part running the same math-intensive code, and its thermal design allows it to sustain high performance for long periods of time without running into thermal throttling issues (particularly as most compact high-performance ARM boards use PoP DRAM). Since we’re aiming for a constant 50% CPU usage on both cores (although NMPC might end up higher) and everything is time-critical, we can’t risk CPU-induced throttling.

It seems likely that mobile-targeted ARM parts will close the performance gap within the next year or two, probably with the Cortex A57 architecture, but it’s likely to take a while longer before those architectures become available in parts designed for embedded use [i.e. more predictable performance, longer lifecycles, more open documentation, can be ordered in quantities less than one million]. Of course, barring a switch from UKF to MHE we’re unlikely to need much more CPU power than we have now, so at that point it’s probably irrelevant.)

github.com/sfwa twitter.com/sfwa_uav youtube.com/user/sfwavideo