255 migrate the so3 build system to infrabase and move to the new so3 logo#256
Open
daniel-rossier wants to merge 110 commits into
Open
Conversation
added 30 commits
June 12, 2026 12:14
Re-sync the build system from the edgemtech Infrabase tree (without
torizon and e1c), nest the SO3 sources under so3/ to match the Infrabase
per-OS layout, and build so3/usr-so3/rootfs-so3/avz in-tree.
- build/: Infrabase meta-layers re-synced from edgemtech; torizon, e1c
and verdin removed; new meta-toolchain layer (musl-cross-make recipe
building the aarch64/arm musl user-space toolchain into build/tmp)
- SO3 sources nested under so3/{so3,usr,rootfs,target}; recipe paths and
.gitignore updated for the new layout (artifacts re-ignored)
- in-tree recipes: so3 (6.2.0), usr-so3, rootfs-so3, avz (no github
fetch); u-boot fetched+patched (2022.04, aligned with edgemtech)
- deploy via unprivileged bitbake + sudo -n (meta-filesystem)
- bsp-so3 builds, deploys and boots to so3% standalone (virt64) and as
an AVZ guest (virt64_avz_so3 ITS, EL2)
…e-and-move-to-the-new-so3-logo
…system-to-infrabase-and-move-to-the-new-so3-logo
The old manual qemu/ mechanism (fetch.sh + qemu.patch) is superseded by
the meta-qemu recipe: it fetches the same QEMU 8.2.2 and applies the same
hw/arm/virt.{c,h} patches (CLCD/KMI/PS2). Verified that build.sh -x qemu
rebuilds an equivalent qemu-system-aarch64. qemu/ stays gitignored and is
regenerated on demand.
Revert the avz recipe to fetching SO3 from upstream at a pinned SRCREV and building the hypervisor (EL2) from it, instead of the in-tree so3/ sources. AVZ is decoupled from the in-tree SO3, which is the guest/ capsule (EL1) under development. Verified: bitbake avz fetches, attaches into avz/, configures virt64_avz_defconfig and builds avz/so3.bin.
The do_build make invocation relied on a CROSS_COMPILE inherited from the
caller's shell, which broke virt32 (arm) builds when the shell had an
aarch64 CROSS_COMPILE set (cc1: unknown value 'generic-armv7-a' for
-mtune). Pass CROSS_COMPILE=${IB_TOOLCHAIN}- explicitly so virt32 uses
arm-linux-gnueabihf- and virt64 uses aarch64-none-linux-gnu-, matching
atf.bbclass.
Drop the usr/lib/lvgl git submodule (.gitmodules removed) and go back to the original meta-usr strategy: lvgl is fetched at build time by the meta-usr lvgl bbappend, gated on the :lvgl OVERRIDE. usr-so3 re-enables do_fetch/unpack/attach so the bbappend pulls lvgl into usr/lib/lvgl (do_patch stays noexec — the slv/lvgl integration patches are already baked into the in-tree usr/). The lvgl bbappend now mkdir's lib/lvgl (no longer pre-created by the submodule). usr/lib/lvgl is gitignored; meta-usr otherwise realigned with edgemtech.
The bbclass selects the current platform's target (QEMU_TARGET: arm-softmmu for virt32, aarch64-softmmu for virt64) and, when reconfiguring, appends any other arch already built under qemu/build so meson does not drop it. Thus building arm-softmmu then aarch64-softmmu (or vice-versa, e.g. switching IB_PLATFORM between so3 standalone/avz/capsule) keeps both qemu-system-* binaries instead of wiping the previous one. do_configure is nostamp so the accumulation re-evaluates each build.
The SO3 kernel is built in place, so switching IB_PLATFORM between virt64 and virt32 (aarch64<->arm) leaves a stale .config and object files behind, producing a wrong-arch kernel. Track the last built arch in a .ib_last_arch marker and run 'make distclean' only when it changes, keeping same-arch rebuilds incremental.
Two arch-switch bugs surfaced when building SO3 for virt32 (arm) after
virt64 (aarch64):
1. 'OVERRIDES += ":so3"' inserts a leading space, so OVERRIDES became
"...:arm :so3" and the CPU token parsed as "arm " (trailing space).
:<cpu> overrides such as IB_MUSL_TARGET:arm then never collapsed, so
the user-space cmake build got a literal ${IB_MUSL_TARGET} on PATH and
could not find arm-linux-musleabihf-gcc. Switch to OVERRIDES:append in
all five SO3 recipes (no inserted space).
2. The usr-so3 cmake build dir caches the toolchain in CMakeCache.txt, so
switching arch kept emitting aarch64 binaries (an aarch64 init.elf on a
32-bit kernel -> prefetch abort at boot). Wipe so3/usr/build when the
arch changes, tracked via a .ib_last_arch marker at the usr/ root.
Both QEMU launch scripts only handled virt64, so with IB_PLATFORM=virt32 they printed the MAC/GDB lines and exited without starting QEMU. Select QEMU_BIN per platform (qemu-system-arm for virt32) and add a virt32 branch booting U-Boot directly (-M virt -cpu cortex-a15 -kernel u-boot/u-boot, sdcard.img.virt32). stg.sh keeps the virtio GPU/keyboard/mouse + SDL window; the virt64-only guard is widened to accept virt32.
u-boot is built from the meta-uboot recipe (github 2022.04 @ pinned SRCREV + the SO3 patch set), which fetches and attaches it, backing any prior copy up to u-boot.back. The committed in-tree u-boot/ was therefore obsolete and was clobbered on every build, producing a huge spurious diff. Remove all 18k files from tracking and gitignore /u-boot/, matching how qemu/ and avz/ are already handled.
The patch set was inherited wholesale from the edgemtech recipe and had
never been regenerated by do_updiff in this repo. It carried two classes
of cruft:
* duplicate chains — the same source file patched twice (e.g. board.c
in 0004 and 0077, setexpr.c in 0008/0081, the tools/boot/*.c and the
defconfigs each appearing in two generations with ./ vs b/ labels),
the residue of repeated append-only updiff runs across a label-format
change;
* build artifacts frozen as patches — hello_world.srec, autoconf.mk,
autoconf.mk.dep, include/config/uboot.release, include/generated/*
(dt.h, *_autogenerated.h), lib/efi_selftest/efi_miniapp_*.h.
Regenerated from scratch: diff the pristine fetch against the working
tree (do_diffcompose), drop the old numbered set, promote the staged
one-patch-per-file result (do_updiff). 64 messy patches -> 54 clean,
consolidated, git-labelled patches. e1c_boot.c is kept (compiled but
unused) per decision. Verified: a clean fetch+unpack+patch+build applies
all 54 and produces a working u-boot.
Also completed the do_diffcompose artifact exclude-list in patch.bbclass
(autoconf.mk, autoconf.mk.dep, *.srec, efi_miniapp_*.h) so future updiff
runs stay clean.
ls sets CLOEXEC via fcntl(). arm64 musl issues this as fcntl (NR 25), which SO3 handles; arm32 (virt32) musl issues the same call as fcntl64 (NR 221), which syscall.tbl never registered -> 'unhandled syscall: 221' warning and a silently-failing -ENOSYS. Map fcntl64 to the existing __sys_fcntl handler so virt32 behaves like virt64.
Killing a process whose spawned thread was blocked in the kernel hit 'BUG in kernel/thread.c:105' (discard_tcb_in_pcb: WAITING 'not handled yet'). A sleeping thread sits in __sleep() with a struct timer on its own kernel stack, so it cannot just be freed — the pending timer would dangle and later fire on freed memory. Handle it cooperatively: add a tcb->killed flag; discard_tcb_in_pcb() flags+wakes WAITING threads (instead of BUG()) and waits for them via the existing threads_active completion, reaping them afterwards. A woken thread resumes in __sleep(), stops its own timer, sees killed and self-terminates with thread_exit() — entirely in kernel, never returning to the (already-released) user pages. READY threads are still force-freed (they must not resume into freed user space). Verified: Ctrl-C of lvgl_demo stress (whose slv tick thread loops in usleep) no longer panics. Limitation: only the __sleep() wait is instrumented. A thread killed while blocked on a futex/mutex would not yet self-terminate; that needs the same killed-check added to those wait paths.
The 128 KB lvgl heap is too small to build lv_demo_widgets (lv_conf.h's own note flags this), so the widget tree failed to allocate, nothing rendered, and the main thread spun in lv_timer_handler() without reaching a syscall boundary — making Ctrl-C undeliverable. 4 MB fits the demo comfortably; it is BSS (zero-init) so the .elf on disk is unchanged.
A diagnostic that bypasses LVGL: opens /dev/fb, queries geometry via the same ioctls slv uses, mmap()s the VRAM and draws colour bars + an animated square straight into it. Lets us tell apart a broken display pipeline (PL111 CLCD -> QEMU SDL) from an LVGL-side problem. Ctrl-C to quit.
fb_mmap() mapped the CLCD VRAM cacheable, which is wrong for a framebuffer: on real hardware the CPU writes linger in the data cache and never reach the scanout buffer. Map it non-cacheable (nocache=true). (Under QEMU/TCG it is cosmetic since the cache is not modelled, but it is required on real targets.)
SO3 drives the PL111 CLCD + PL050 keyboard/mouse that the so3 QEMU patch wires unconditionally into '-M virt'; it has no virtio-gpu driver, so the virtio-gpu/keyboard/mouse devices only added a competing blank console. More importantly the SDL backend did not present the PL111 console's surface at all (verified: pl110 renders the framebuffer into the surface - monitor 'screendump' shows it - yet the SDL window stayed black). Switching to '-display gtk' shows the panel correctly (and its View menu lists every console). Drop the virtio-gpu/keyboard/mouse devices.
Paint the colour-bar background once, then per frame only restore the square's previous rows and redraw it, instead of memcpy-ing the whole 3 MB framebuffer every frame.
The serial IRQ delivered SIGINT to current() - whatever thread happened to be running when the Ctrl-C key arrived. A foreground app asleep in a syscall (e.g. usleep) is not the running thread (the idle thread is, with pcb==NULL), so Ctrl-C was silently dropped; it only worked for CPU-busy apps. And at the shell prompt the prompt was never reprinted. Two parts: 1. Track the foreground console process. Add a global fg_pcb, set by sys_do_wait4() to the child a process blocks waiting on (the shell's foreground job) and restored to the waiter when it exits. The serial IRQ now targets fg_pcb (fallback: current()), so SIGINT reaches the foreground app even while it sleeps. 2. Cancel the line at the prompt instead of signalling the shell. When a console read is in progress (read_lock held), the IRQ sets serial_intr; pl011_get_byte returns ETX and console_getc discards the typed line and returns an empty line, so the shell's fgets returns and it reprints the prompt once. This avoids musl's sticky-EOF on a 0-byte read and a siglongjmp-through-fgets file-lock leak. Matches the driver's existing read_lock design comment. Relies on the cooperative WAITING-thread teardown for the kill path.
Mirror the virt32 graphical fix onto the virt64 branch: SO3 drives the same PL111 CLCD + PL050 (virt64.dts has clcd@08800000 / pl050 nodes), has no virtio-gpu driver, and the SDL backend does not present the PL111 console. Switch to '-display gtk' and drop the virtio-gpu/keyboard/mouse devices. The flash0.img AVZ-vs-U-Boot boot heuristic is unchanged. Untested (no virt64 graphical run this session) but the framebuffer path is identical to virt32; the kernel-side fixes (non-cacheable fb, Ctrl-C) are arch-shared.
An interrupted task (e.g. Ctrl-C during a clean) can leave a recipe WORKDIR that exists but lacks its temp/ subdir. bitbake then cannot create that task's fifo and fails with do_clean: [Errno 2] No such file or directory: .../temp/fifo.NNNN (hit on 'build.sh -ca bsp-so3'). Before clean/build, scan tmp/work and remove any workdir missing temp/ (it holds nothing useful) so bitbake recreates it cleanly.
'build.sh -ca bsp-so3' failed with
usr-so3 do_clean: [Errno 2] No such file or directory: .../temp/fifo.NNNN
Root cause: the lvgl bbappend's shell do_clean:append ran
'rm -rf ${WORKDIR}/*', which deleted the running clean task's own temp/
(holding its fifo + run script) mid-execution, leaving an empty workdir.
The next clean then could not create its fifo there and failed.
Fix: make do_clean a Python task (usr.bbclass) plus Python do_clean:append
in the usr-so3 recipe and the lvgl bbappend. Python tasks create their
temp dir themselves and use no fifo, so they are robust when the workdir
is fresh/empty. The lvgl append no longer touches WORKDIR (bitbake owns
it); it only purges the fetched lvgl tree (in-tree usr/lib/lvgl, src/lib,
${S}/lib/lvgl). Verified: fresh clean, repeat clean, full 'bsp-so3 -c
clean', and clean->rebuild (aarch64) all succeed.
…2 entries Remove the IB_PLATFORM:so3 override: SO3 now always builds for the main IB_PLATFORM. That override was referenced only here and resolved via OVERRIDES, so a value diverging from IB_PLATFORM silently built SO3 for the wrong arch. The standalone / AVZ-guest / capsule contexts are not distinct platforms - they differ only by IB_CONFIG:so3 / IB_TARGET_ITS:so3 (e.g. capsule = virt64_capsule_defconfig + virt64_capsule), which are independent of the platform variable. Also add the virt32 counterparts that were missing (PREFERRED_VERSION_so3, IB_CONFIG:so3, IB_TARGET_ITS:so3, IB_STORAGE_MODE) and default IB_PLATFORM to virt64.
AVZ is an EL2 hypervisor. The virt64 launcher only enabled EL2 (virtualization=on) when filesystem/flash0.img was present (ATF chain); booting AVZ via the ITS without ATF used plain -M virt,gic-version=2 (EL1), so AVZ faulted on its first EL2 system-register write (Synchronous Abort -> reset). Detect the selected so3 ITS from local.conf and, when it is an avz ITS, add virtualization=on with -kernel u-boot (virt64_defconfig is EL2-aware). Verified: AVZ now boots.
added 9 commits
June 17, 2026 14:18
The **/build .gitignore rule (.gitignore:17) silently keeps recipe source patches out of the index; they must be force-added like the 187 already tracked. 25 patches referenced by committed recipe metadata (atf, linux, buildroot, lvgl) were never added, so the clean CI checkout failed parsing with "Unable to get checksum for ... SRC_URI entry". Force-add them.
**/build silently ignored the whole /build metadata tree, so recipe patches had to be force-added; a forgotten one broke CI parsing (fixed in a1624e7). Un-ignore /build, re-ignore output one level down, re-include conf + meta-* source dirs. Anchor the bare 'atf' rule to /atf so it stops matching the meta-atf recipe dir. Ignore generated build/conf/auto.conf and *.orig/*.rej.
do_build compiles gcc 12.4.0 with in-tree mpfr/gmp/mpc. When the source tree has inconsistent timestamps (configure.ac newer than configure, as on a fresh copy that doesn't preserve mtimes), make tries to regenerate the autotools files and invokes automake-1.17/autoconf. The so3-env CI image ships no autotools (and Ubuntu 24.04 has automake 1.16, not the 1.17 mpfr wants), so do_build died with 'automake-1.17: command not found'. --disable-maintainer-mode turns the regen rules into no-ops, making the toolchain build environment- and timestamp-independent. Verified by reproducing the exact failure in the so3-env container and confirming the flag builds gcc past the mpfr stage.
Temporary diagnostic: the toolchain build fails only on GitHub-hosted runners (passes locally and on a self-hosted box with the same image and commit), and the inner log.do_build is never shown in the CI console. Print nproc/df/free and tail the failing toolchain log so we can see the actual error. To be reverted once diagnosed.
The CI failure on 32852c3 was transient: the same build logic passed on re-run (and passes locally + on a self-hosted box). The runner had ample disk (85G) and RAM (16G), so the cause was a flaky mirror download — musl- cross-make fetches tarballs from ftpmirror.gnu.org during do_build with a no-retry 'wget -c -O'. Override DL_CMD with --tries/--waitretry/--timeout so a single bad mirror recovers instead of failing the toolchain build. Also revert the temporary build.yml diagnostic (cedbc42) now that the root cause is understood; the workflow is back to its clean form.
Reproduces .github/workflows/build.yml without pushing: exports the git-tracked tree into a throwaway dir under $HOME (snap/rootless Docker cannot bind-mount /tmp) and runs the exact 'build.sh -k so3' + 'build.sh -x usr-so3' in the so3-env image, per platform. Mounting only tracked files means untracked-but-referenced sources fail locally exactly as in CI, and build/tmp is excluded so the toolchain builds from scratch. Use -r <ref> for an exact committed state.
The toolchain build failed intermittently in CI (FAIL/FAIL/PASS/PASS/FAIL across runs), always at musl-toolchain do_build, very early and with no build output — i.e. a download failure. musl-cross-make's default GNU_SITE is ftpmirror.gnu.org, which 302-redirects to a random mirror; incomplete mirrors 404 and wget --tries just re-hits the same redirect. Pin GNU_SITE to the canonical https://ftp.gnu.org/gnu (complete, no random mirror); keep the wget retries as a safety net. Also keep a minimal on-failure dump of the toolchain do_build log in CI so any residual download flake is diagnosable without a separate commit.
The Check Code Style workflow was red (pre-existing): after the Infrabase migration the SO3 sources moved under so3/, so check-path 'so3' swept in vendored code (micropython, libxml2) and check-path 'usr/src' (no longer a real dir) silently fell back to scanning the whole repo. clang-format also flagged genuinely-misformatted first-party files. - Point check-paths at the real nested dirs: so3/so3 (kernel) and so3/usr (user space); exclude vendored trees (micropython, libxml2, usr/lib/linux, lvgl). - Reformat the 13 tracked first-party files that violated the repo's own .clang-format (5 kernel, 7 usr/lib/slv, fb_test.c). Verified by replicating the action's exact logic (find + exclude regex, clang-format 19, --style=file) over the tracked tree: both jobs report 0 failures.
Make the generic build/ files byte-identical to edgem1 where they should be (meld-minimal), while keeping the torizon/e1c separation intact: - restore the EDGEMTech copyright headers on the generic layer files (meta-so3/meta-qemu/meta-rootfs/meta-filesystem/meta-uboot layer.conf, avz/so3 bbclass, bsp-so3, rootfs-so3, so3_6.2.0) - drop the dead utils_restore_user_ownership() call in usr-so3 (undefined, error-path only) - drop a stray whitespace line in rootfs-linux
4934926 to
97585d5
Compare
Contributor
Author
|
@AndreCostaaa @clemdiep You can proceed with the review :-) Thanks. |
added 18 commits
June 17, 2026 21:09
Rewrite the landing README around the three build modes (standalone / AVZ / SO3 capsule), supported targets, and a clear pointer to the published documentation as the source of truth. Remove the no-longer-current discourse.heig-vd.ch discussion-forum link and the obsolete in-tree CI-patch and ./st/./stv/./stg run notes (all covered by doc/ now).
The discourse.heig-vd.ch forum no longer exists. Remove the 'Discussion forum' section from the index (keeping the sponsor acknowledgement and the HEIG-VD/REDS logo) and the forum link from the LVGL page; questions now go through GitHub issues / the maintainer (see the README).
Add a proper 'Welcome to SO3' opening and a dedicated section explaining SO3's defining trait — polymorphism: one source tree built into a standalone OS (EL1), the AVZ hypervisor (EL2), or an SO3 capsule (S3C) on top of AVZ beside a Linux agency.
The source IB_TARGET/fs is the rootfs image loop-mounted as root, and the ext4 rootfs partition needs ownership/perms/symlinks preserved. Replace the unprivileged non-preserving `cp -rv` (which aborts on root-owned files) with `sudo cp -av`. Keeps this recipe identical to the edgem1 tree.
-k so3 -> -x so3, -f -> -x filesystem (the -k/-b/-r/-f options were removed when build.sh/deploy.sh were reduced to -a/-x).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.