Skip to content

fetch_schema generates syntactically invalid osw/model/entity.py (RiskAssessmentProcess default_factory -> SyntaxError) #125

Description

@simontaurus

Summary

Runtime schema generation via OSW.fetch_schema can write a syntactically invalid osw/model/entity.py and then importlib.reload() it, raising SyntaxError and breaking the process (and every subsequent import osw.core, since osw.core imports osw.model.entity).

The generated offending line:

risk_assessment: RiskAssessmentProcess | None = Field(
    default_factory=lambda :RiskAssessmentProcess.parse_obj(<object object at 0x000001...>),
    options={'$comment': 'hide inherited property', 'hidden': True},
)

<object object at 0x...> is a Python sentinel repr'd into source code, which is not valid Python.

How it is triggered

It is hit during lazy relation resolution (oold), not only on explicit load_entity:

  1. An entity with a relation (e.g. a Process with tool/input/output) is loaded.
  2. Accessing the relation attribute triggers oold __getattribute__ -> _resolve -> the OSW resolver backend OswDefaultBackend.resolve (osw/core.py).
  3. That backend calls osw_obj.load_entity(OSW.LoadEntityParam(titles=request.iris)) without model_to_use and with autofetch_schema defaulting to True.
  4. In load_entity, for each category it does if not hasattr(model, cls_name): if param.autofetch_schema: self.fetch_schema(...) where model is osw.model.entity.
  5. fetch_schema -> _fetch_schema regenerates osw/model/entity.py and importlib.reload(model) -> SyntaxError.

Because the resolver passes no model_to_use, any class that is registered in a different module (e.g. a separately published cloud.* model package) is not found via hasattr(osw.model.entity, cls_name), so resolution always falls into the regeneration path.

Stacktrace

oold/model/v1/__init__.py:569 in __getattribute__  -> self._resolve(iris)
oold/model/v1/__init__.py:632 in _resolve          -> resolver.resolve(...)
osw/core.py:138 in resolve                          -> osw_obj.load_entity(OSW.LoadEntityParam(titles=request.iris))
osw/core.py:1204 in load_entity                     -> self.fetch_schema(...)
osw/core.py:464 -> 1004 in _fetch_schema            -> importlib.reload(model)   # osw.model.entity
importlib ... source_to_code
  File ".../osw/model/entity.py", line 7429
    risk_assessment: RiskAssessmentProcess | None = Field(default_factory=lambda :RiskAssessmentProcess.parse_obj(<object object at 0x...>), options={'$comment': 'hide inherited property', 'hidden': True})
SyntaxError: invalid syntax

datamodel-code-generator also warns while emitting it:

datamodel_code_generator/parser/base.py: UserWarning: Failed to format code:
InvalidInput("Cannot parse for target version Python 3.10: ... RiskAssessmentProcess.parse_obj(<object object at 0x...>) ... ParseError: bad input"). Emitting unformatted output.

i.e. the generator fails to parse its own output but writes it anyway, and osw then imports it.

Root cause (likely)

A hidden/inherited property (risk_assessment: RiskAssessmentProcess, {'hidden': True}) carries a default that is a sentinel object. When the model is regenerated, that sentinel default is rendered into a default_factory=lambda: RiskAssessmentProcess.parse_obj(<object object>) instead of a valid expression. The result is committed to osw/model/entity.py despite the formatter raising InvalidInput.

Suggestions

  • Don't write generated source if formatting/parse fails (fail fast instead of emitting unparseable code), or validate with ast.parse before replacing entity.py.
  • Render sentinel/undefined defaults for hidden properties safely (e.g. None) rather than repr'ing the sentinel object.
  • Possibly skip fetch_schema regeneration when a usable model class is already importable from a registered external module.

Environment

  • osw 1.1.2
  • datamodel-code-generator 0.51.0
  • pydantic 2.13.4
  • Python 3.11.6 (Windows)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions