Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

twister: ignore serial device SerialException. #44976

Merged
merged 1 commit into from May 16, 2022

Conversation

chen-png
Copy link
Collaborator

@chen-png chen-png commented Apr 20, 2022

twister: ignore serial device SerialException.

For some boards, like up_squared and ehl_crb, they need do a
power off/on operation and use bios to load and execute zephyr
test binary, during the power off, it may meet this SerialException,
but the serial object still works well after power on, we can continue
to get all results from serial, so we don't need to close it,
it's ok to ignore this exception.

BTW, even if we ignore this exception here, it doesn't influence twister
function, because if it was really caused by disconnection, the test would
still be judged as failed due to timeout and serial would be closed later
in main thread, it doesn't impact results.
On the contrary, close serial directly in the monitor serial thread, it
will cause later tests failed due to mismatch errors.

fixes #45009

Signed-off-by: Chen Peng1 peng1.chen@intel.com

@chen-png chen-png requested a review from nashif as a code owner April 20, 2022 10:30
@github-actions github-actions bot added the area: Twister Twister label Apr 20, 2022
@chen-png chen-png changed the title twister: add an option to make device serial opened. twister: monitor device serial after running flash process. Apr 20, 2022
@zephyrbot zephyrbot requested a review from galak April 20, 2022 12:04
scripts/twister Outdated
@@ -608,6 +608,9 @@ structure in the main Zephyr tree: boards/<arch>/<board_name>/""")
(twister.csv and twister.xml).
""")

parser.add_argument("--monitor-serial-later", action="store_true",
help="Monitor device serial after running flash process.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not explain well what is going on here, the option itself is also awkward. Can you please explain the situation when this is needed and why is it needed, i.e. why do we need a special option when it works without the option most of time?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nashif This was due to that I got many "mismatch errors" on a x86 platform due to recent twister change "verify ztest suite name in output", but this is a positive change, because it did find some potential issues, like below:

when running test A, it opened serial and started to monitor it, but the monitor thread met a serial exception "device reports readiness to read but returned no data (device disconnected or multiple access on port?)" during flash process,
then the current test A will be judged as failed immediately without waiting because the monitor thread was already exited due to serial exception with ser.close(), the handler time was only 3.135s, like below:

DEBUG   - Flash command: ['west', 'flash', '--skip-rebuild', '-d', '/home/chenp1/github-zephyr-intel/zephyr/twister-out/board/tests/kernel/fatal/exception/kernel.common.stack_sentinel', '--', '/home/chenp1/test.sh']
DEBUG   - DEVICE: 
DEBUG   - serial exception
DEBUG   - -- west flash: using runner misc-flasher
Test binary updated successfully!
DEBUG - exception: device reports readiness to read but returned no data (device disconnected or multiple access on port?)
DEBUG   - run status: board/tests/kernel/fatal/exception/kernel.common.stack_sentinel failed
INFO    - 163/168 board                   tests/kernel/fatal/exception/kernel.common.stack_sentinel FAILEDUnknown (device 3.135s)

then when running next test B, after it started the monitor thread, it would catch the output form previous test A, finished the test judgement and did ser.close(), because all tests' pass/fail judgement were same, it won't wait flash process to update the test binary on board, thus causing a mismatch error, like below,
we could see that the monitor thread already finished before doing "west flash" command.

DEBUG   - DEVICE: *** Booting Zephyr OS build zephyr-v3.0.0-2764-ge295b9643f0a  ***
DEBUG   - DEVICE: Running test suite obj_tracking
DEBUG   - DEVICE: 
DEBUG   - DEVICE: START - test_obj_tracking_sanity
DEBUG   - DEVICE: PASS - test_obj_tracking_sanity in 0.1 seconds
DEBUG   - DEVICE: ===================================================================
DEBUG   - DEVICE: Test suite obj_tracking succeeded
DEBUG   - DEVICE: ===================================================================
DEBUG   - DEVICE: RunID: 12ddf21c1b4d4f960be9d9d1c91e9582
DEBUG   - DEVICE: PROJECT EXECUTION SUCCESSFUL
DEBUG   - -- west flash: using runner misc-flasher
Test binary updated successfully!
DEBUG   - Test suite names were not printed or some of them in output do not correspond with expected: ['tickless_concept']
DEBUG   - run status: board/tests/kernel/tickless/tickless_concept/kernel.tickless.concept failed
INFO    - 130/168 board                    tests/kernel/tickless/tickless_concept/kernel.tickless.concept FAILED RunID mismatch (device 18.446s)

and all tests later after this error would be judged as failed with "mismatch error".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so I wanted to monitor thread after running flash process, because I could make sure the binary updated successfully during flash process, so the current test B won't be influenced by previous test A even if met this serial exception.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, I am curious why we need to invoke ser.close() when met a serial exception "device reports readiness to read but returned no data (device disconnected or multiple access on port?)", it seems that this exception maybe caused by other reasons, not only serial disconnected.

 try:
            serial_line = ser.readline()
            except TypeError:
                pass
            except serial.SerialException:
                ser.close()
                break

because I found I could still got all expected results from serial even I did change as below:

            except serial.SerialException:
                pass

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in NXP platform, this will introduce another issue i.e. missing leading log.

@chen-png
Copy link
Collaborator Author

this is mainly caused by SerialException, I opened an issue #45009 to give more details.

@chen-png chen-png changed the title twister: monitor device serial after running flash process. twister: ignore all exceptions from pyserial. Apr 21, 2022
@chen-png
Copy link
Collaborator Author

@hakehuang @PerMac @gopiotr Could you help take a look of this? thanks.

PerMac
PerMac previously requested changes Apr 26, 2022
Copy link
Member

@PerMac PerMac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the most generic Exception and then ignoring it with pass looks like an example from "mistakes to avoid" guide. I read you description of the issue, but couldn't fully follow on what is the root of it. Could you try to refine the except part to something more reasonable?

@chen-png
Copy link
Collaborator Author

@PerMac I think no matter what exceptions we got from serial, we shouldn't close serial directly in the monitor serial thread, it will be closed by main thread anyway finally,

if ser.isOpen():
ser.close()

because if we closed serial in monitor serial thread when met an exception, but now only got half results of test instance A, the main thread won't wait at t.join(self.timeout) and it will run next test instance B immediately, then when the serial is opened again, it will catch another half results of previous test instance A.

@PerMac
Copy link
Member

PerMac commented Apr 27, 2022

I think I start following. So the issue is that we start serial connection at higher level than monitoring serial thread. Therefore serial connection should be managed on this higher level, and shouldn't be closed inside the thread. Shouldn't then instead of ignoring the serial exception, have the exception passed to the higher level and make it be handled there?
I see more places in monitor_serial where the connection can get close so I guess this also might be problematic.

TBH I have some problems understanding the logic of the twister code responsible for serial connection. I am worried it might require some more thoughtful refactoring. E.g. I believe that serial connection should be open using context menager (with statement) which makes sure it is closed at finally.

@chen-png Will you join the upcoming testing wg meeting? Maybe we could discuss the issue there?
Anyway, I will dismiss my request for change. The code there is already very "hacky", and I could imagine, that your proposal might be the easiest solution to the existing problem. However, I am not convinced on giving +1, since I cannot understand the implication of your change.

@PerMac PerMac self-requested a review April 27, 2022 09:25
For some boards, like up_squared and ehl_crb, they need do a
power off/on operation and use bios to load and execute zephyr
test binary, during the power off, it may meet this SerialException,
but the serial object still works well after power on, we can continue
to get all results from serial, so we don't need to close it,
it's ok to ignore this exception.
BTW, even if we ignore this exception here, it doesn't influence twister
function, because if it was really caused by disconnection, the test would
still be judged as failed due to timeout and serial would be closed later
in main thread, it doesn't impact results.
On the contrary, close serial directly in the monitor serial thread, it
will cause later tests failed due to mismatch errors.

Signed-off-by: Chen Peng1 <peng1.chen@intel.com>
@chen-png chen-png changed the title twister: ignore all exceptions from pyserial. twister: ignore serial device SerialException. May 6, 2022
@nashif
Copy link
Member

nashif commented May 12, 2022

original issue for this code in twister: #13541

@hakehuang hakehuang self-requested a review May 12, 2022 14:51
@nashif nashif dismissed PerMac’s stale review May 16, 2022 11:49

said 'Anyway, I will dismiss my request for change. ', but never happened

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

twister: many tests failed with "mismatch error" after met a SerialException.
6 participants