RFC: Twister v2 – a general rework of the testing framework #42458

PerMac · 2022-02-03T16:02:25Z

Why?

Twister was developed and evolved in a more “reactive” manner, adding new features when they were needed. In its current state Twister lacks a coherent overall architecture. A lot of parts are done in a procedural way, where a single function is responsible for too many operations (e.g. handle() method which: locks a device where a test will be run, creates a command for a runner, flashes, counts execution time, sets test statuses, unlocks the device). This makes the framework hard to maintain and expand.
Several places in the code have a similar responsibility (e.g. different report formats have their own decision trees handling test statuses) which makes maintenance very error-prone and requires high vigilance on all other parts that have to be synced when a change is made in one of them.
In addition, transparency, reliability, and traceability of test specifications/results are hard to obtain due to the current design . This is a major concern of test managers.
The current state of Twister makes many developers reluctant towards contributing to the source code, which is harmful to an open-source project. Our goal is to make the software more transparent and modularized, flattening the learning curve and making the code more approachable for potential contributors.
Having Quality Assurance at heart it seems we are at a point where a general rework of Twister is a necessary path to follow.

How?

Twister will get a grand rework. We will follow ISTQB (International Software Testing Qualifications Board) standards for test architecture design. The idea is to start with a generic Test Automation Architecture (from ISTQB test automation engineer syllabus, section 3.1) and adapt it to our needs. The core principles to follow when designing Twister v2 are:

Single responsibility of components,
Extension not modification: components open for extensions, but closed for modifications,
Replicability: components replaceable without affecting the overall behavior,
Component segregation: many specific components > several multi-purpose components
Dependency inversion: components depend on abstractions rather than on low-level details.

The new design will also enforce higher QA standards at its core. New components will be designed according to the requirements and will be developed alongside corresponding tests verifying the correctness of components’ performance.

Overall requirements:

Full backward compatibility is not strictly required but some parts must be adapted to use the existing resources:
- Compatible with the existing tests (however individual tests might need to be improved to fulfill new standards)
- Maybe we will be able to leave most of the existing CLI args?
Twister operation will be divided into 3 major and separate stages: test plan (specification), test session (execution), and test results (reporting).
- One must be able to run each block individually and be able to start the workflow from any point between them (e.g. start with an already existing test plan, or run just the results stage to create a different format of a report for the already existing results.
- The above will be achieved by enforcing test data (configurations) serialization.
Reliability will be enforced by the process where results (after test execution) are matched against the specification and 1:1 relation is required (no configurations in the specification without results, no results without matching entry in the specification)
Traceability of tests (being able to easily match a certain test with a corresponding software component) has to be improved (TBD, e.g. improved hierarchy of tests and some verification process)
Functional tests have to be delivered and fulfilled for each component before it is added to the framework
Be able to conduct complex test scenarios, involving usage of multiple devices and multiple images (e.g. bootloader tests where a bootloader, an image, and its upgrade are consecutively flashed on a single device and the output is verified after each step; Bluetooth tests with multiple devices or tests requiring more than just flashing an image, like resetting a device or sending some data to the device)
Coding standards will be enforced (to define which ones and how)

The above requirements were gathered during testing wg meetings and discussions with colleagues involved in testing (at various levels). Just to stress this out, nothing is set in stone. I would really appreciate all the suggestions and/or additional requirements.

The process for developing Twister v2 will be transparent and contributors will be encourage to take part in it. I volunteer to create and maintain a GitHub "project" for this process. I will split the framework into major modules (e.g. test configuration, reporting, execution etc.) and gather/provide a separate high-level description and requirements for each of them. The idea is to be able to implement them in parallel. I also hope that having modules, components and corresponding tasks divided into smaller chunks and easily visible on a kanban-style table will attract contributors to fulfilling them. I would also like to try using good first issue and help wanted labels on easier and smaller tasks.

PoC (pytest)

Goal
Show the concept behind pytest and its major benefits. Show that the framework can be compatible with zephyr and its tests.
Requirements:
Show how plugins/fixtures/hooks works and how a framework can be built from blocks -> benefits from standardized apis and specs:

Show that we can reproduce already existing tests based on twister v1 workflow, but where the current methods are divided into smaller blocks according to pytest schema. Can start with the existing code test runner but „wrapped” in pytest
Show that test specification, test execution, test results blocks can be called individually
Show how test identification can be handled reusing the existing schemas or some other simple yaml based parser -> test configuration (suite) level. Weak req: show test case level (parsing C code).
Show that both qemu and hardware tests are working and providing results
Show simplified example of report generation at test configuration (suite) level. Weak: also at test case level.
Show example of test coverage for a framework component (e.g. some simplistic new test and how the code coverage is calculated for a specific component)

ETA:

Hard to estimate yet.

I will keep editing this entry, by adding more graphs, requirements, and details.

The text was updated successfully, but these errors were encountered:

gmarull · 2022-02-03T21:15:44Z

Random thoughts: we could consider re-using pytest ecosystem, see for example https://github.com/pytest-dev/pytest-cpp (not tested, though)

PerMac · 2022-02-14T10:09:44Z

After having several discussions I started to be convinced that pytest might be a good framework to wrap twister around. I've heard several times: "Why do you want to reinvent the wheel and not to take an existing framework, where your requirements are already supported, and which is in a common usage? Why not pytest?". From my initial prestudy, using pytest looks really tempting. We will collect and present more information about it and its benefits. We will also present a Proof Of Concept for the pytest-based twister. Below is my idea for the requirements for the PoC. I also added them to the RFC text. Of course, they are a subject to be discussed:

PoC

Goal
Show the concept behind pytest and its major benefits. Show that the framework can be compatible with zephyr and its tests.
Requirements:
Show how plugins/fixtures/hooks works and how a framework can be built from blocks -> benefits from standardized apis and specs:

Show that we can reproduce already existing tests based on twister v1 workflow, but where the current methods are divided into smaller blocks according to pytest schema. Can start with the existing code test runner but „wrapped” in pytest
Show that test specification, test execution, test results blocks can be called individually
Show how test identification can be handled reusing the existing schemas or some other simple yaml based parser -> test configuration (suite) level. Weak req: show test case level (parsing C code).
Show that both qemu and hardware tests are working and providing results
Show simplified example of report generation at test configuration (suite) level. Weak: also at test case level.
Show example of test coverage for a framework component (e.g. some simplistic new test and how the code coverage is calculated for a specific component)

trond-snekvik · 2022-02-17T16:07:14Z

I just wanted to chime in and mention IDE support. VS Code introduced a test runner API a couple of months ago that allows listing and running tests through a UI in the sidebar. Microsoft's Python extension already supports this API, and will populate the test list with tests discovered using pytest: https://code.visualstudio.com/docs/python/testing. They're also working on a similar API for code coverage, but that's still not in the public release.

If pytest is a viable solution, Twister will essentially get a VS Code UI for free. I'm not sure how debugging would work for Twister tests, but it's likely possible to implement this through Zephyr side python code.

I haven't tried it myself, but CLion also appears to support this.

Regardless, we'd like to create a Twister user experience with Nordic's VS Code extensions - either as a new standalone Twister extension, or as part of the main extension. For this to work out, we'll need to be able to:

List test scenarios in the workspace in a machine readable manner
Run test scenarios through an API-locked command line interface ("porcelain")
Read out test outcomes through some result files, preferably in a common, existing format, like junit
Have a clear set of permutations of board types and toolchains for each test case, so we can condense the possible test cases to a single button, then be able to discover what permutations are available for each test suite. This could potentially be some sort of runner profile the user sets up (e.g. "all tests available on QEMU_x86"). Listing, running and discovering test cases matching these profiles would need a clear mechanism. Some mechanism for listing potential mutations, or the limitations of each test case would also be very helpful. Currently, it can be hard to discover how you can even run a test case, with all the toolchain, board and tag parameters.

torsteingrindvik · 2022-02-23T09:14:59Z

I'd like to add a minor point about junit.
Currently, the generated xml is displayed like this in Jenkins:

I suggest that for Twister v2, the output xml should be adjusted such that tests/samples in Jenkins ends up being browsable in a similar layout to the actual filesystem layout.

So the above image (which shows samples in sdk-nrf/samples/crypto/*) would be something more close to

sdk-nrf/
    samples/
        crypto/
            aes_cbc/
                nrf52840dk_nrf52840/
                <other boards>/
            aes_ccm/
                nrf52840dk_nrf52840/
                <other boards>/
    tests/
        ...
zephyr/
    samples/
        ...
    tests/
        ...

This will make it less noisy in the UI and more obvious where to locate tests.

I believe this should be as simple as rearranging the components of the test names.

nordicjm · 2022-02-24T09:56:00Z

Be able to conduct complex test scenarios, involving usage of multiple devices and multiple images (e.g. bootloader tests where a bootloader, an image, and its upgrade are consecutively flashed on a single device and the output is verified after each step; Bluetooth tests with multiple devices or tests requiring more than just flashing an image, like resetting a device or sending some data to the device)

For this, at least from a mcumgr standpoint, it's vital that there is also support for being able to use and interact with an external utility (likely python based, but any type of external script/application would be preferred) so that it can send/receive commands to/from real devices (qemu devices could also be considered) and to also use the return code of the process when exiting to influence if a test is successful or not.

PerMac · 2022-02-24T11:09:09Z

@lairdjm Could you rephrase this a bit? Not sure if I follow. What would be a use case? I understand it now as to be able to use twister as a middleman between external unity and hardware/emulator. Is it the case?

nordicjm · 2022-02-24T11:15:08Z

So for testing mcumgr, it will have the code on the module and it will need to communicate with an external utility either via serial, bluetooth, UDP or another transport method and as part of that the test will need to set the tool up and run it. Example would be for getting a file from the filesystem of a device, zephyr application gets loaded to the target board and runs, an application/script runs on the test PC which then communicates with the device to download the file and checks the commands work and that the received file is what was expected.

henrikbrixandersen · 2022-02-24T11:55:04Z

So for testing mcumgr, it will have the code on the module and it will need to communicate with an external utility either via serial, bluetooth, UDP or another transport method and as part of that the test will need to set the tool up and run it. Example would be for getting a file from the filesystem of a device, zephyr application gets loaded to the target board and runs, an application/script runs on the test PC which then communicates with the device to download the file and checks the commands work and that the received file is what was expected.

Similar as to what you can do using the pytest twister harness today?

nashif · 2022-03-01T14:20:46Z

For this, at least from a mcumgr standpoint, it's vital that there is also support for being able to use and interact with an external utility (likely python based, but any type of external script/application would be preferred)

for example this: #10112 and this #5325

nashif · 2022-03-08T12:47:42Z

@PerMac is there an update on pytest PoC? We need to decide very soon if this is going to be the way to go or if we have to implement things differently.

PerMac · 2022-03-08T13:23:24Z

The status is that me and my colleague will work on a prototype for the pytest-based runner. Some initial work was already done, but it is too early to show anything meaningful yet. We had to get internally a green light for the commitment and its scope and it was granted recently.
I started calling it a prototype after seeing that there are some differences between PoC and prototype, as PoC tends to focus more on a theoretical side. I think now we believe that pytest can be used for embedded, based on e.g. espressif approach https://github.com/espressif/pytest-embedded. So now it is more about showing how we can achieve what we want and need.
I am working on a more detailed plan and schedule for the prototype. It will be divided into major components (e.g. test identification, execution, etc) and into phases. I will define what functionality is needed for all components at any stage. We will implement features incrementally. E.g. During phase one, test execution will be handled by reading a static txt file with a mock output. Reading hw/qemu output will be added in phase 2 and phase 3 will add parallelization.
I plan to finish such proposal before our next wg meeting, so we can discuss the approach during the meeting.

nashif · 2023-07-28T13:07:29Z

can we close this please?

PerMac · 2023-07-28T14:33:59Z

This RFC is closed since twister v2 won't be introduced. The strategy to improve the existing framework instead was adapted. Working on v2 brought an understanding on how to integrate pytest with twister allowing introduction of more complex tests, which were postulated in this PR. Such tests are now possibile to be executed within the current framework. https://docs.zephyrproject.org/latest/develop/test/pytest.html.

More context:
We managed to get quite far with v2, i.e. twister rewritten to work as a plugin to pytest. We were able to execute the majority of existing tests on qemu, native posix, and hardware. However, we realized that to get the same performance and user experience pytest will require a lot of bending, furtherly complicating the whole framework. As an example, we found a major issue with memory consumption when working with parallel test execution. The issue is that xdist i.e. pytest library responsible for test parallelization must do the process of creating test instances for every spawned worker. Zephyr test base is huge and there are ~600 platforms. Executing "all on all" creates ~1000000 test instances, resulting in ~1GB ram consumption. In pytest, this number has to be multiplied by the number of parallel workers, ending in crushing the operation. Current framework collects all the test instances in advance and only then parallelize the execution with custom scheduler. Some workarounds could be adapted to mitigate this issue in pytest, however it would continue to make the framework more and more complex and moving further from the standard pytest behavior.
We decided to descope the project. We abandon the idea of reimplementing what is already working into pytest. We focused on exporting code we developed for new type of tests and integrating it with the existing twister. This resulted in pytest plugin which allows to execute tests written in pytest, where interactions with devices are possible. In the current workflow, whenever it is detected a test requires pytest harness, twister calls pytest as a subprocess, passing all the required information, and then parses results from pytest.

PerMac added the RFC Request For Comments: want input from the community label Feb 3, 2022

PerMac self-assigned this Feb 3, 2022

PerMac changed the title ~~Twister v2 – a general rework of the testing framework~~ RFC: Twister v2 – a general rework of the testing framework Feb 3, 2022

stephanosio added area: Test Framework Issues related not to a particular test, but to the framework instead area: Twister Twister labels Feb 4, 2022

PerMac closed this as completed Jul 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Twister v2 – a general rework of the testing framework #42458

RFC: Twister v2 – a general rework of the testing framework #42458

PerMac commented Feb 3, 2022 •

edited

gmarull commented Feb 3, 2022

PerMac commented Feb 14, 2022

trond-snekvik commented Feb 17, 2022

torsteingrindvik commented Feb 23, 2022

nordicjm commented Feb 24, 2022

PerMac commented Feb 24, 2022

nordicjm commented Feb 24, 2022

henrikbrixandersen commented Feb 24, 2022

nashif commented Mar 1, 2022

nashif commented Mar 8, 2022

PerMac commented Mar 8, 2022

nashif commented Jul 28, 2023

PerMac commented Jul 28, 2023

RFC: Twister v2 – a general rework of the testing framework #42458

RFC: Twister v2 – a general rework of the testing framework #42458

Comments

PerMac commented Feb 3, 2022 • edited

Why?

How?

Overall requirements:

PoC (pytest)

ETA:

gmarull commented Feb 3, 2022

PerMac commented Feb 14, 2022

PoC

trond-snekvik commented Feb 17, 2022

torsteingrindvik commented Feb 23, 2022

nordicjm commented Feb 24, 2022

PerMac commented Feb 24, 2022

nordicjm commented Feb 24, 2022

henrikbrixandersen commented Feb 24, 2022

nashif commented Mar 1, 2022

nashif commented Mar 8, 2022

PerMac commented Mar 8, 2022

nashif commented Jul 28, 2023

PerMac commented Jul 28, 2023

PerMac commented Feb 3, 2022 •

edited