Embedded Software Testing: Practical Continuous Integration with Hardware in the Loop (Part 2)
November 03, 2024
Written by Leonardo Held, Torizon Developer
This is the second installment in the embedded software testing series. This one will be about the architecture of a system to test embedded software. I’ll give you a practical example of a test for an Embedded Device with integration to a CI/CD system. We’ll briefly cover what is usually called a “dual-targeting” approach as well.
Theory
In the previous installment we talked about the different levels of software testing, and the last one was “end-to-end testing”, which is treating the Device Under Test (DUT) like a black box, poking it with a certain stimuli (inputs), seeing how it reacts (output) and comparing with a known expected value.
So essentially we need something to actually do the poking, watch the outputs and compare with the known good values. Practically speaking though, that’s hooking up UARTs, displays and whatever else to an embedded system and writing some software to do that automatically for us.
1+--------------------+ +-----------------------+
2| | | |
3| Testing Software | | Device Under Test |
4| (Controller) | | (DUT) |
5| | | [Black Box] |
6| | poking | |
7| - Sends Inputs | ---------> | - Processes Inputs |
8| via UART, | | |
9| Display, etc. | | |
10| | reacting | |
11| - Receives | <--------- | - Generates Outputs |
12| Outputs | | via UART, etc. |
13| via UART, | | |
14| Other I/O | | |
15+--------------------+ +-----------------------+
Practice
A Simple Testing Architecture
So, as an example let’s assume you need to monitor whether your Linux box boots. The manual steps to achieve this are:
- Reset the board from whatever state it was
- Wait for the bootloader, kernel and userspace to boot
- See the login prompt
Great, now we’ve reduced a requirement (the board must boot) to a set of manual execution steps. But you don’t have just one board, you have around 100 different ones, and every single one of them must be tested every time your Continuous Integration builds a new OS for the board. Thinking about it in terms of automation, we can reduce this to:
- Connect the board’s UART to a “Controller” - simply put, hooking up the board to a computer
- Write some software that parses the logs from the UART
- If it detects our login prompt, consider the boart fully booted
As a practical example, I have a Verdin iMX8MP SoM connected to a Dahlia carrier board. Power is connected as is the UART to my Controller, which in this case is my computer.
1+-------------------+ UART Cable +------------------------+
2| | <-------------------------> | |
3| Controller | | Dahlia Carrier Board |
4| (A Computer) | | + Verdin iMX8MP SoM |
5| | | |
6+-------------------+ +------------------------+
7 |
8 |
9 +------------+
10 | Power |
11 | Supply |
12 +------------+
A note on the “Power Supply”: I’ll assume here a Power Supply (PSU or sometimes PDU for “Power Delivery Unit”) will always be connected to the board. In reality, PSUs used for this kind of application generally have built-in or interface with relays to cut or serve power to devices, and specific software running on the controller is used to control how the PSU behaves (for example, boards must be rebooted, or put into recovery mode etc, this can be done from a PSU).
Implementation of a Prompting “Expect” in Python
Then it’s a matter of writing some software that runs in the controller and waits for the login prompt.
You can choose whatever language you feel comfortable with writing tests, but I’d highly suggest looking into either Python, Perl or Tcl. Interpreted, dynamic languages are generally better for these tasks because you want to chug out tests quickly. Also, it makes sense to use a language that is familiar to all in your team.
Wiping up some Python code I got this:
1import unittest
2import serial
3import time
4import xmlrunner
5
6
7class SerialPromptTest(unittest.TestCase):
8 SERIAL_PORT = "/dev/tty.usbserial-110085063"
9 BAUD_RATE = 115200
10 TIMEOUT = 40
11 PROMPT = "verdin-imx8mp-06817296 login:"
12
13 def test_wait_for_PROMPT(self):
14 try:
15 with serial.Serial(
16 self.SERIAL_PORT, self.BAUD_RATE, timeout=self.TIMEOUT
17 ) as ser:
18 time.sleep(1)
19 ser.reset_input_buffer()
20
21 start_time = time.time()
22 received_data = ""
23
24 while time.time() - start_time < self.TIMEOUT:
25 if ser.in_waiting > 0:
26 chunk = ser.read(ser.in_waiting).decode(errors="ignore")
27 print(chunk, end="")
28 received_data += chunk
29
30 if self.PROMPT in received_data:
31 print(f"\nFound expected prompt '{self.PROMPT}'")
32 return
33 else:
34 time.sleep(0.1)
35
36 self.fail(
37 f"Expected prompt '{self.PROMPT}' not found within {self.TIMEOUT} seconds."
38 )
39
40 except serial.SerialException as e:
41 self.fail(f"Serial connection failed: {e}")
42
43
44if __name__ == "__main__":
45 with open("serial_test_results.xml", "wb") as output:
46 unittest.main(
47 testRunner=xmlrunner.XMLTestRunner(output=output), exit=False
48 )
The test_wait_for_PROMPT
is quite simple and can be reutilized many times. The basic gist is that we keep looking at
data chunks coming from the UART, decoding them and waiting until our desired prompt (verdin-imx8mp-06817296 login:
)
shows up.
Human-readable Test Results
Because this function is encapsulated into a module that calls unittest.main
, it can output the tests results in a
standardized format for us, JUnit.xml, which can be interpreted by many other services such as GitLab, GitHub and others.
I recommend setting up some visualization of test results early in the process because as you add tests, you will find yourself digging through logs and other artifacts instead of just glancing at a screen telling you with tests failed and which tests passed.
Another option is using the handy junit2html
program, which renders the test results as easily digestible HTML pages
that can be viewed with a web browser.
De-coupling the Controller the Hardware
The tests we executed might not seem like… a lot, right? But one can easily integrate this further by using ser2net
,
which takes a serial output and exposes it as a telnet connection for a distributed testing environment.
To use ser2net
I simply installed on my Debian machine using apt install ser2net
and I’m running manually with
1# ser2net -n -c /etc/ser2net.conf -P /run/ser2net.pid
where /etc/ser2net.conf
contains the following
19000:raw:0:/dev/ttyUSB0:115200 8DATABITS NONE 1STOPBIT
With that, from anywhere in the world, I can connect to my Verdin’s UART by using telnet. The diagram looks a bit like this now:
1+-------------------+ TCP/IP Connection +-------------------+
2| | <-----------------------------> | |
3| Testing Software | | Controller |
4|(running anywhere) | | (A Computer) |
5| | | |
6+-------------------+ | +-------------+ |
7 | | ser2net | |
8 | +-------------+ |
9 +---------|---------+
10 |
11 UART Cable
12 |
13 +-----------------+
14 | |
15 | Embedded Device |
16 | (Dahlia Carrier |
17 | Board + Verdin |
18 | iMX8MP SoM) |
19 | |
20 +-----------------+
21 |
22 |
23 +------------+
24 | Power |
25 | Supply |
26 +------------+
ser2net will complain about the old configuration format, but I’m not ready to move to YAML yet. You should!
1ser2net:WARNING: Using old config file format, this will go away 2soon. Please switch to the yaml-based format.
So, moving over to my other computer, I can grab the IP of the Controller that is running ser2net and modify my previous testing code a bit to use that telnet connection instead of the local UART:
1import unittest
2import telnetlib3
3import asyncio
4import time
5import xmlrunner
6
7
8class TelnetPromptTest(unittest.TestCase):
9 TELNET_HOST = "192.168.15.3"
10 TELNET_PORT = 9000
11 TIMEOUT = 40
12 PROMPT = "verdin-imx8mp-06817296 login:"
13
14 async def check_prompt(self):
15 reader, writer = await telnetlib3.open_connection(self.TELNET_HOST, self.TELNET_PORT)
16 start_time = time.time()
17 received_data = ""
18
19 try:
20 while time.time() - start_time < self.TIMEOUT:
21 chunk = await reader.read(1024)
22 if chunk:
23 print(chunk, end="")
24 received_data += chunk
25
26 if self.PROMPT in received_data:
27 print(f"\nFound expected prompt '{self.PROMPT}'")
28 return True
29 else:
30 await asyncio.sleep(0.1)
31
32 self.fail(f"Expected prompt '{self.PROMPT}' not found within {self.TIMEOUT} seconds.")
33 finally:
34 writer.close()
35
36 def test_wait_for_PROMPT(self):
37 asyncio.run(self.check_prompt())
38
39
40if __name__ == "__main__":
41 with open("telnet_test_results.xml", "wb") as output:
42 unittest.main(testRunner=xmlrunner.XMLTestRunner(output=output), exit=False)
which yields the same results as before.
Integration with a CI/CD System
Because the script can run anywhere now, integrating everything within a CI/CD systems should be fairly straight forward.
Let’s do it with Jenkins, which is a popular and mature CI/CD system. For that, I’ll write a simple Jenkinsfile
that
will execute the tests and parse the results for us:
1pipeline {
2 agent any
3
4 stages {
5 stage('Setup') {
6 steps {
7 echo 'Setting up Python environment'
8 sh 'python3 -m venv venv'
9 sh '. venv/bin/activate && pip install -r requirements.txt'
10 }
11 }
12
13 stage('Run Telnet Test') {
14 steps {
15 echo 'Running Telnet Test'
16 sh '. venv/bin/activate && python3 boot-test-telnet.py'
17 }
18 }
19 }
20
21 post {
22 always {
23 echo 'Cleaning up virtual environment'
24 sh 'rm -rf venv'
25 }
26
27 success {
28 echo 'Tests ran successfully'
29 }
30
31 failure {
32 echo 'Tests failed, please check the results.'
33 }
34
35 unstable {
36 junit 'telnet_test_results.xml'
37 }
38 }
39}
Which results in this nice CI job that is testing if our Verdin iMX8MP is properly booting! Cool, right?
Of course you’re free to use CircleCI, GitLab, GitHub whatever other service to do this. But the fact is, if you’re doing Embedded at any scale, this architecture does scale very well.
Having an automated service building and testing your hardware for every new git push
(CI tools can automatically
trigger jobs after new commits are added or even before during merge requests) allows you to safely push new code faster,
and as I’ve hinted in the first article, it’s more important than ever to safely create new patch releases due to
cybersecurity regulations.
I have to note that during these tests I was manually rebooting the board. Most places hook up the controller to relays or even other boards so that the controller itself resets and/or flashes the board at the beggining of a test. Notice that tests should start from a known state; if you’re testing a software version 7.0.0, that version has to be installed when running tests.
This pattern of “putting the board into a good state” is generally called Setup, and the process of cleaning up any mess your test left behind so as to not destroy the capability of a next test to put the board into a new known state is called Teardown.
Repeating the Pattern
This example here was quite simple, but it would already be invaluable for any sized project. If you’re a contractor, having tests like this is extremely helpful because you have a way to prove certain criteria were met during the project development - hard proof that your work was fully completed.
Other more significant tests can simply repeat the pattern: want to test if the busses are working correctly? Hook the board up to a logic analyzer, the logic analyzer to the Controller and write some logic to test if your software is properly communicating with the hardware.
A common case we’ll talk about in the next installment are GUI validation tests, but as a spoiler you could, for example,
setup a test where a screenshot is taken from within the board, moved over using scp
to the runner job and use image
comparison techniques to evaluate if the image is being rendered properly. Tests are mostly limited by the ingenuity of
whoever is writing them.
Dual-targeting with QEMU
Dual-targeting is a technique which involves isolating a given layer (generally the application layer) of a system to develop and test it further within a second target that is not generally the one actually being deployed.
Why do this? Well, testing with hardware in the loop is quite time consuming. Developing even more so. We want fast feedback and today a lot is heavily abstracted between the hardware, OS and application layers, meaning if we’re developing an application or a non-hardware-bound kernel feature, we could use something better and faster.
There are a few strategies to do this: let’s say you’re developing a microcontrolled system where you need to push some images to a display. At the end of the day what you’ll be doing is writing a framebuffer that gets processed by the display controller: it shouldn’t take a long time to develop a visualizer in Javascript or some other high-level language that ingests in real time the framebuffer written by your custom soon-to-be embedded application and shows something on the screen. Actually, that’s a pretty common thing to do nowadays1.
The whole idea is: “how much can I abstract the hardware away so it’s easier for me to test code that only contains business logic?”.
For Linux, we’re well served with QEMU. Double-targeting to QEMU is specially nice because you can run the exact same
kernel in QEMU if you can use an upstream one. Even better if you only care about the application layer: just target
your Yocto build to a different machine, like qemuarm64
and you can easily write tests that verify if all the dependencies
for your application are correctly installed - no need for hardware.
Torizon OS also can dual-target fairly easily with Docker. Building Docker Containers for x86 is just a matter of passing
another flag, --platform linux/amd64
, and quite a few of our customers run applications on their desktop as a development
environment before pushing it to hardware.
To wrap it up, here’s an example similar to the one I showed you before, where we made sure our board was properly booting.
This time I built the same exact image, but instead of selecting MACHINE="verdin-imx8"
on Yocto I chose MACHINE=qemuarm64
and launched it as follows:
1qemu-system-aarch64 \
2 -M virt \
3 -cpu cortex-a72 \
4 -m 2048 \
5 -nographic \
6 -drive if=none,file=torizon-docker-qemuarm64-20241103175530.wic,id=hd0,format=raw \
7 -device virtio-blk-device,drive=hd0 \
8 -device virtio-net-device,netdev=net0 \
9 -netdev user,id=net0,hostfwd=tcp::2223-:22 \
10 -bios u-boot.bin \
11 -serial pty
QEMU will redirect the serial output to a virtual serial, in this case it tells me to look to /dev/ttys004
:
char device redirected to /dev/ttys004 (label serial0)
Changing our expected PROMPT in the first iteration of our script to expect for a different prompt and telling it to look to the new virtual serial:
1...
2 SERIAL_PORT = "/dev/ttys004"
3 TIMEOUT = 40
4 PROMPT = "qemuarm64-36267642 login:"
5
6 def test_wait_for_PROMPT(self):
7...
Yields the exact same result as before, we have succesfully double-targetted a test with hardware-in-the-loop!
Again, why is this useful at all? Imagine you have an issue with your board but QEMU works. By default you’re narrowing down the issue to something specific to your hardware, not the build itself, which can save you from a whole lot of trouble.
Closing Remarks
I hope this was useful. This is something that a lot of companies do but not many people talk about, and I’m happy to get the information out! Note that in reality, there are whole teams dedicated to this process and maintaining infrastructure to run tests…2 what I’ve done here is just give you a general idea.
Next installment we’ll talk about test scheduler architectures and how we’ve developed our own Torizon Testing Framework that anyone can easily use, plus how we use it to test core projects inside the Torizon team.
Please send questions about this article to my e-mail leonardo.held
[at] toradex.com
. I’ll be happy to answer them.
An example of such software that runs in the browser: https://wokwi.com/projects/355043425307837441 ↩︎
The BSP team at Toradex had a great talk going over some of our LAVA (which is an open source testing orchestrator) setup, available here: https://www.youtube.com/watch?v=4nRQMXfj4u4. ↩︎