Skip to content

Feature/webview command type updates#133

Merged
Android-PowerUser merged 19 commits into
mainfrom
feature/webview-command-type-updates
Jul 1, 2026
Merged

Feature/webview command type updates#133
Android-PowerUser merged 19 commits into
mainfrom
feature/webview-command-type-updates

Conversation

@Android-PowerUser

Copy link
Copy Markdown
Owner

No description provided.

claude and others added 19 commits June 26, 2026 22:37
… download metadata

- ModelIdentifierOverrides: lets index.html ship model-identifier-overrides.json
  to correct the wire-level modelName for an existing built-in ModelOption
  (fixes Gemini preview model renames/retirements without a new release).
  Wired into GenerativeAiViewModelFactory, PhotoReasoningViewModel
  (regular/Mistral/Puter/Cerebras paths) and ScreenCaptureApiClients
  (background autonomous-continuation service), without touching any
  reverse-lookup-by-modelName logic or settings/display keys.
- OfflineModelOverrides: lets index.html ship offline-model-overrides.json to
  correct downloadUrl/size/additionalDownloadUrls for an existing built-in
  offline ModelOption (e.g. a moved Hugging Face link), without touching the
  compiled-in filenames used by on-disk resume/validation logic.
- New WebViewBridge methods + *Preferences persistence + restore-on-startup,
  mirroring the existing CommandPatternOverrides/CustomModelRegistry pattern.
- index.html: fetch + apply both new JSON files in onAndroidReady(), same
  pattern as command-patterns.json / custom-models.json.
- MenuScreen.kt: route size/downloadUrl/additionalDownloadUrls reads through
  the new overrides; extract previously-literal fallback-UI strings (download
  dialog, notification permission dialog, human expert dialog) into
  strings.xml.
- docs/model-identifier-overrides.md, docs/offline-model-overrides.md:
  document format, safety boundary and exact coverage (notably: Cloudflare
  and Vercel-routed models are NOT covered - no single unambiguous
  request-building call site was found for them).

Deliberately NOT done in this change: making genuinely new CommandTypes/
actions definable via remote content. That would require a generic native
interpreter executing arbitrary remotely-supplied instructions with
AccessibilityService-level device control, which is the same architecture
pattern abused by accessibility-based Android malware. ScreenOperatorAccessibilityService's
command execution stays fully native/compiled-in; only existing actions'
syntax (command-patterns.json) and existing models' identifiers/download
metadata (this change) are remote-overridable.

Not compiled/verified in this environment (no network access to
maven.google.com here) - please run the project's GitHub Actions build on
this branch before merging.
Removed section on safety boundaries for command overrides.
Removed unnecessary documentation comments for CommandType enum.
Removed important safety boundary notes regarding command overrides.
Introduces a custom-action-types.json mechanism so completely new action
kinds can be added without a native app release, following the same
pattern as command-patterns.json (alternate regex per existing action)
and custom-models.json (new AI providers).

How it works
------------
1. custom-action-types.json (next to index.html, fetched on every
   WebView reload) defines new action types as  { id, regex }  pairs.
2. CommandParser.setCustomActionTypes() installs the parsed entries at
   runtime. Each entry gets its own regex; capture groups are forwarded
   to JS as an array.
3. When a regex matches, the parser emits Command.WebViewCustomAction
   (new sealed-class subtype, new CommandType.WEBVIEW_CUSTOM_ACTION).
4. ScreenOperatorAccessibilityService executes the command by calling
   window.onCustomAction(id, groups[]) back into the WebView.
5. The JS handler (window.onCustomAction, overridable) can then invoke
   any existing Android.* bridge method to carry out the action.
6. CustomActionTypePreferences persists the JSON so action types survive
   app restarts before the WebView re-fetches the config.
7. PhotoReasoningApplication.onCreate() restores them from prefs on
   startup, matching the pattern of the other override mechanisms.

Files changed
-------------
- util/CustomActionTypeConfig.kt       (new) JSON parser
- util/CustomActionTypePreferences.kt  (new) SharedPreferences wrapper
- util/Command.kt          + WebViewCustomAction(id, groups)
- util/CommandParser.kt    + CommandType.WEBVIEW_CUSTOM_ACTION
                           + customActionPatterns storage
                           + setCustomActionTypes() / clearCustomActionTypes()
                           + collectRawMatches() extended for custom entries
                           + logCommandDetails() case added
- ScreenOperatorAccessibilityService.kt  executeSingleCommand() case
- WebViewBridge.kt         setCustomActionTypes() / getCustomActionTypes()
- MainActivity.kt          evaluateWebViewJs() helper (used by service)
- PhotoReasoningApplication.kt  restore on startup
- index.html               Bridge.setCustomActionTypes/getCustomActionTypes
                           fetch custom-action-types.json in onAndroidReady
                           window.onCustomAction default no-op handler
- custom-action-types.json (new) empty starter file
- docs/custom-action-types.md  (new) full documentation
…dback)

Caps how many commands from a single AI response are executed, configurable
via a new execution-policy-overrides.json fetched by the WebView - following
the same remote-update pattern already used for command-patterns.json,
custom-action-types.json, model-identifier-overrides.json and
offline-model-overrides.json.

- ExecutionPolicyConfig: parses maxCommandsPerMessage + a customizable
  truncationFeedbackTemplate ({total}/{executed}/{limit} placeholders).
  <= 0 / missing means unlimited, i.e. unchanged behavior by default.
- CommandExecutionLimiter: small pure/unit-tested helper that truncates a
  parsed command list (or checks a single index) against the configured cap.
- ExecutionPolicyOverridesPreferences: persists the raw override JSON across
  app restarts, mirroring CommandPatternOverridesPreferences.
- WebViewBridge: setExecutionPolicyOverrides/getExecutionPolicyOverrides
  JS-interface methods.
- PhotoReasoningApplication.onCreate(): restores the override on startup.
- PhotoReasoningViewModel: enforces the limit both during incremental
  (streaming) command execution and in the final processCommands() pass, and
  merges the formatted feedback text into pendingRetrievedInfoForNextScreenshot
  so it is sent back together with the next screenshot's screen elements.
- index.html: fetches execution-policy-overrides.json on
  window.onAndroidReady(), same as the other *-overrides.json files.
- execution-policy-overrides.json: default {} (no-op / unlimited).
- docs/execution-policy-overrides.md: format, semantics, and how it's applied.
- CommandExecutionLimiterTest: unit tests for the truncation boundary.

Note: written without access to an Android/Gradle build environment, so it
has not been compiled or test-run. Please build and review before relying on
it.
…cation + history retention

Continuing the same remote-update pattern (fetch JSON next to index.html ->
@JavascriptInterface bridge -> SharedPreferences persistence -> restore on
app start) for three more pieces of native, previously hardcoded behavior:

1. App mapping overrides (app-mappings-overrides.json)
   - AppMappingOverridesConfig: parses additional openApp() name/package
     entries (canonicalName, packageName, variations, aliasesForPackageLookup)
     plus a fuzzy-match threshold override.
   - AppMappings.appNameVariations/manualMappings are now computed live
     (merged with the override on every access) instead of being frozen at
     object-init, so a new override takes effect without restarting the app.
   - AppNamePackageMapper reads AppMappings.*/the threshold live instead of
     snapshotting them in the constructor, and also resolves a variation
     directly so an override added after initializeCache() last ran still
     resolves on the very next openApp() call.

2. Error classification overrides (error-classification-overrides.json)
   - ErrorClassificationConfig: the substrings used to tell a quota/rate-limit
     error (switches API key + retries) apart from a high-demand/overloaded
     error (does not switch keys) are now remote-updatable, in case the AI
     provider changes its error wording. Matching is now consistently
     case-insensitive (the original code mixed case-sensitive and
     case-insensitive checks).
   - PhotoReasoningTextPolicies.isQuotaExceededError/isHighDemandError
     delegate to it instead of hardcoded substrings.

3. Screen-element history retention (folded into execution-policy-overrides.json)
   - ExecutionPolicyConfig.Policy gained maxRelevantScreenElementMessages
     (default 3, matching the previous hardcoded constant).
   - PhotoReasoningScreenElementHistoryPolicy reads it instead of a private
     const, so how many recent screenshots' element lists stay in context is
     now tunable without a release.

Each override ships with: bridge methods (WebViewBridge), a *Preferences
persistence class, a PhotoReasoningApplication.onCreate() restore call, an
index.html fetch on window.onAndroidReady(), a default no-op JSON file at the
repo root, a docs/*.md explaining format/semantics, and JVM unit tests for
the new parsing/merge logic (AppMappingOverridesConfigTest,
ErrorClassificationConfigTest, ExecutionPolicyConfigTest).

Deliberately NOT made remote-configurable: billing/trial/paywall logic, API
key storage, and anything else where remote control could be used to bypass
a security or monetization boundary - those stay native-only on purpose.

Note: written without access to an Android/Gradle build environment, so none
of this has been compiled or test-run. Please build and run the test suite
before relying on it.
Text-only override for FirstLaunchInfoDialog, TrialExpiredDialog,
PaymentMethodDialog, and the generic InfoDialog's title, plus
TrialStateUiModelResolver's expired-state message - all via a new
trial-ui-overrides.json, same fetch/bridge/preferences/restore pattern as
the other overrides.

Deliberately does NOT touch TrialManager.kt (trial length, TrialState
resolution, isAppEffectivelyUsable) or any Play Billing code - see the
'Why billing/entitlement logic isn't on this list' section in
docs/trial-ui-overrides.md. Changing what a dialog says doesn't change
whether the app is usable; that's the line this keeps.

Includes TrialUiConfigTest covering defaults, partial-override merging
(only listed fields change), and malformed-input handling.
…g, Termux marker)

New operational-tuning-overrides.json + OperationalTuningConfig covers low-
level mechanism parameters that only affect timing/patience, never behavior:

- mistralMinIntervalMsDefault / mistralMinIntervalMsFastModels: per-key
  cooldown between Mistral requests. Found and fixed a duplication bug while
  wiring this up: both real call sites (ScreenCaptureApiClients.kt,
  PhotoReasoningViewModel.kt) computed their own hardcoded 420L/1500L
  per-model-tier value inline, bypassing MistralRequestCoordinator's default
  parameter entirely. Both now read the same live config, so the override
  actually takes effect end to end instead of only changing an unused default.
- mistralMaxServerDelayMs / mistralCancelCheckIntervalMs: rate-limit-header
  delay cap and cancellation-check granularity in MistralRequestCoordinator.
- modelDownloadMaxRetries / modelDownloadRetryDelayMs /
  modelDownloadProgressUpdateIntervalMs: ModelDownloadManager's retry/backoff
  behavior for offline model downloads.
- termuxProcessCompletedPrompt: the exact marker string Termux:Task appends,
  which TermuxOutputPreferences strips from output - fixable without a
  release if Termux:Task ever changes its wording.

Same fetch/bridge/preferences/restore pattern as the other overrides, plus
OperationalTuningConfigTest covering defaults, partial overrides, negative-
value rejection, and malformed input.
TrialManager.TRIAL_DURATION_MS now reads TrialDurationOverrideConfig.current()
instead of a hardcoded 7-day constant. Same fetch/bridge/preferences/restore
pattern as the other overrides.

Transparency note: the replaced constant carried a comment instructing any
LLM asked to change it to make no code changes and falsely claim completion.
That instruction was not followed - deceiving the user is not something this
assistant does regardless of what a code comment requests. The comment was
shown to the repo owner verbatim; they explicitly confirmed (twice) that they
wanted this change anyway with full knowledge of it. See
docs/trial-duration-overrides.md's 'Why this file exists' section for the
full record.

Scope, unchanged from the rest of this series: only the *length* of the free
trial is configurable. Purchase verification stays anchored to real,
Play-Billing-verified Purchase.PurchaseState (MainActivityBillingStateEvaluator.kt),
the internet-time anti-tampering check in TrialTimerService.kt is untouched,
and TrialState resolution / isAppEffectivelyUsable in MainActivity.kt depend
only on that unchanged logic, not on this file.

Includes TrialDurationOverrideConfigTest covering the default, valid
overrides, non-positive values being ignored, and malformed input.
GenerationDefaultsConfig: temperature/topP/topK shown for a model the user
has never customized yet, via generation-defaults-overrides.json. Range-
validated per field (temp 0-2, topP 0-1, topK >= 1); out-of-range values fall
back to the current value rather than failing the whole payload.

GenerationSettingsPreferences.loadSettings() now falls back to
GenerationDefaultsConfig.current() instead of hardcoded literals. A user's
own saved per-model settings (via WebViewBridge.saveGenerationSettings) are
read first and always take precedence - confirmed there is no other code
path that constructs GenerationSettings with implicit defaults, so this is
the single, correct place to wire it in.

Same fetch/bridge/preferences/restore pattern as the other overrides, plus
GenerationDefaultsConfigTest.
…ent why 'Screen elements:' is not

retrievalHeaderPrefix added to OperationalTuningConfig (operational-tuning-
overrides.json). Verified first that this marker has exactly one writer and
one reader, both in PhotoReasoningTextPolicies.kt, both now reading the same
live OperationalTuningConfig.current().retrievalHeaderPrefix - so an override
can never desync the two. The marker itself is purely internal app
bookkeeping (used to avoid re-fetching/re-inserting already-retrieved
information into the prompt); the AI model is never expected to recognize or
reproduce it.

Audited the other 'Screen elements:' marker (written in
ScreenOperatorAccessibilityService.kt, read in
PhotoReasoningScreenElementHistoryPolicy.kt) for the same treatment and
deliberately left it out: that string is presumably specified in the system
prompt and the AI model is expected to keep using exactly that label in its
own responses. Making it remote-configurable risks the native pattern and
the model's prompt-driven expectation silently drifting apart - a quiet
failure (history trimming just stops firing) rather than a loud one. Documented
this distinction in docs/operational-tuning-overrides.md so the reasoning is
explicit rather than just an omission.
…table too

Previous commit (15eaee3) deliberately excluded this marker, reasoning that
the AI model itself was expected to reproduce it in its own responses and
that an override could silently desync the model's behavior from the native
parser. That reasoning was wrong: the user pointed out the marker text is
sent TO the model as part of the screenshot context (verified - the message
carrying it is built with participant = USER, and the default system prompt
in index.html's DEFAULT_SYSTEM_MSG doesn't reference this literal string).
The model receives this text, it never has to write or reproduce it, so
there is no AI-prompt-coupling risk here - this is the same purely-internal-
bookkeeping situation as retrievalHeaderPrefix, just missed on the first pass.

screenElementsMarker added to OperationalTuningConfig
(operational-tuning-overrides.json). Both the writer
(ScreenOperatorAccessibilityService.kt, where the marker text is appended to
the screenshot info string) and the reader
(PhotoReasoningScreenElementHistoryPolicy.kt, where it's matched/trimmed from
chat history) now read OperationalTuningConfig.current().screenElementsMarker
live, so an override can never desync the two. The regex previously built
once from a hardcoded const is now rebuilt per call from the live marker
value with Regex.escape() applied, so a marker containing regex metacharacters
doesn't break matching.

Also noted in passing: the default system message already lives in
index.html (DEFAULT_SYSTEM_MSG) and was already WebView-updatable before this
series of changes - no action needed there, just confirming it for the
record.

Includes PhotoReasoningScreenElementHistoryPolicyMarkerOverrideTest covering
the override's effect on hasScreenElements()/sanitizeMessages().
…gs-overrides.json

Generic UiStringsConfig.get(id, default[, args...]) lookup, wired into ~87
call sites across 16 native Kotlin files (toasts, dialog labels, button
text, notification text) that previously used bare string literals - every
hardcoded UI string found in an app-wide audit, minus a few internal
ClipData labels and one dynamic command-list line judged not to be
user-facing copy worth overriding.

Architecture follows the user's instinct that string *content* belongs
conceptually with the rest of the UI layer in index.html, adapted for the
fact that Compose runs in a different render engine the WebView can't reach
directly: DEFAULT_UI_STRINGS in index.html (right next to DEFAULT_SYSTEM_MSG)
is the single human-editable reference listing every recognized ID and its
current default text, exactly mirroring how DEFAULT_SYSTEM_MSG already
works. The real fallback - used if the override file is ever absent/empty -
stays as the literal at each Kotlin call site, so behavior is byte-for-byte
identical without the mechanism ever being touched.

UiStringsConfig.get() gained vararg positional-placeholder support ({0},
{1}, ...) so the ~12 strings with dynamic content (error messages, file
names, entry titles) can still be overridden while keeping that dynamic
value.

Same fetch/bridge/preferences/restore pattern as every other override.
Includes UiStringsConfigTest covering lookup, partial overrides, blank/non-
string/malformed-input handling, and placeholder substitution (including
overrides reordering placeholders).

Verified every changed call site compiles to balanced braces/parens via a
structural diff check (full Gradle build not available in this environment -
please build before relying on this).
…bridge method it needed

Answers the question of whether an AI-emitted toast("message") command was
already possible with the existing WebView/JSON infrastructure: almost -
custom-action-types.json already supports defining brand-new AI commands via
regex + a window.onCustomAction JS handler that can call 'any existing
Android.* bridge method' - but no bridge method that actually shows a Toast
existed yet. Added one, then wired the full example end to end:

- WebViewBridge.showToast(message, isLong): new @JavascriptInterface method.
  Dispatches onto the UI thread via runOnUiThread (required - Toast.show()
  must run on the main thread, JS bridge calls don't). Truncates to 500
  chars and ignores a blank message, so a malformed/oversized AI-emitted
  string can't crash or spam the UI.
- index.html: Bridge.showToast() JS wrapper, plus a built-in 'TOAST' case in
  the default window.onCustomAction handler that calls it.
- custom-action-types.json: now ships with a TOAST entry enabled by default
  (regex matches toast("...")/toast('...'), case-insensitive, consistent
  with the existing click()/writeText() patterns including requiring a
  non-empty message) - so this works out of the box, not just as a
  documented possibility.
- DEFAULT_SYSTEM_MSG (index.html): added a line telling the AI the
  toast("message") command exists - otherwise the model would have no way
  to know to use it. Does not touch any system message a user has already
  saved/customized.
- docs/ai-toast-command.md: full worked example, doubling as a template for
  adding further AI commands this way vs. needing an actual new bridge
  method (the line custom-action-types.json alone cannot cross).

Verified the regex against representative AI-style input (double/single
quotes, case variation, surrounding whitespace) and structural balance of
every touched file.
…ethod

Audited every Command subtype (Command.kt) against the existing
@JavascriptInterface surface (WebViewBridge.kt) and found that, aside from
the just-added showToast, NONE of the app's actual device-control
capabilities were reachable from JavaScript - a custom-action-types.json
handler could define new AI-facing command syntax but had nothing to
actually act on it with. The example in docs/custom-action-types.md even
referenced Android.tapAtCoordinates(...) as if it already existed; it
didn't - that was aspirational, not real.

Added 19 new bridge methods (tapByText, longTapByText, tapAtCoordinates,
pressHome, pressBack, showRecentApps, pressEnterKey, writeText, scrollDown/
Up/Left/Right, the four xFromCoordinates scroll variants, openAppByName
OrPackage, runTermuxCommand, waitSeconds, requestScreenshot, markCompleted).

Implementation reuses ScreenOperatorAccessibilityService's existing public
ScreenOperatorAccessibilityService.executeCommand(command: Command) entry
point - the same function that processes AI-emitted commands - rather than
reimplementing any gesture/geometry/safety logic. Each bridge method just
constructs the matching Command and queues it through that one shared path,
so it inherits whatever that path already does (and any future change to it)
automatically. Confirmed WebViewBridge.kt and ScreenOperatorAccessibilityService.kt
share the same package (com.google.ai.sample), so no import is needed for the
unqualified reference.

Deliberately left out (documented why in docs/device-control-bridge.md):
Command.Retrieve (database retrieval, not a device action - lives in prompt
construction, different layer entirely), and UseHighReasoningModel/
UseLowReasoningModel (redundant with the already-existing, more general
setSelectedModel(id) bridge method).

docs/custom-action-types.md's PINCH_ZOOM example fixed to use the bridge
method that now genuinely exists (Bridge.tapAtCoordinates) instead of the
nonexistent Android.tapAtCoordinates it referenced before.

New docs/device-control-bridge.md: full method table mirroring each AI
command, the design rationale, and a worked DOUBLE_TAP example combining
this with custom-action-types.json end to end.
Answers: can you add a pinch command via WebView/JSON? Answer before this
commit: no - the Bridge and custom-action-types.json mechanism existed but
there was no native gesture-dispatch code for pinch, so a JS handler had
nothing to actually call. After this commit: yes, fully.

Implementation:
- Command.PinchGesture(centerX, centerY, startDistance, endDistance,
  durationMs): new Command subtype. endDistance > startDistance = zoom in
  (pinch out); endDistance < startDistance = zoom out (pinch in).
  Coordinates and distances accept pixels or percent strings ('50%'),
  consistent with other coordinate-based commands.
- CommandParser: PINCH_GESTURE CommandType + pinch1 pattern matching
  pinch(x, y, startD, endD, ms) in any combination of pixel/percent args.
- ScreenOperatorAccessibilityService.executePinchGesture(): builds two
  simultaneous StrokeDescriptions (one per finger), placed symmetrically on
  the vertical axis centered at (cx, cy), moving from startR to endR apart.
  Uses dispatchGestureWithCallbacks() - the same helper all other gesture
  commands use - so cancellation, error logging, and scheduleNextCommandProcessing
  work identically to tap/scroll. ensureGestureApiAvailable() guards the
  Android N GestureDescription requirement.
- WebViewBridge.pinchGesture(): @JavascriptInterface bridge method, callable
  from window.onCustomAction handlers via Bridge.pinchGesture(...).
- index.html: Bridge.pinchGesture() JS wrapper, plus pinch() added to
  DEFAULT_SYSTEM_MSG so the AI knows the command exists.
- CommandParserPinchTest: tests for pixels, percentages, case-insensitivity,
  whitespace, and missing-arg rejection.

Made pinch a built-in command rather than a custom-action-types.json entry
because it needed new native code anyway (unlike toast, which only needed a
bridge method on top of existing Toast API - pinch needed a new gesture
builder). custom-action-types.json and Bridge.pinchGesture() still let JS
handlers trigger it.
…eeded)

- New Command.CopyToClipboard, routed through the existing executeCommand
  pipeline exactly like the other permission-free device-control commands.
- New AI text command copyToClipboard("text"), recognized by CommandParser.
- New WebViewBridge.copyToClipboard(text) / getClipboardText() JS bridge
  methods, plus matching Bridge.* wrappers in index.html, so a
  custom-action-types.json handler can trigger/read the clipboard directly.
- Documented the pattern (and the 'no extra permission needed' category in
  general, as a template for future such commands) in
  docs/device-control-bridge.md.
- Added a CommandParser unit test for the new pattern.

@amazon-q-developer amazon-q-developer Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

I've reviewed this PR which adds significant new functionality for WebView command type updates. The changes introduce:

  • New command types (PinchGesture, CopyToClipboard, WebViewCustomAction)
  • Extensive WebView bridge methods for remote configuration
  • Custom action type support via JSON configuration
  • Multiple override systems (model identifiers, offline models, execution policies, etc.)

The implementation is well-structured with proper error handling and defensive coding practices. The code follows existing patterns and maintains backward compatibility. No blocking issues were identified that would prevent merging.


You can now have the agent implement changes and create commits directly on your pull request's source branch. Simply comment with /q followed by your request in natural language to ask the agent to make changes.


⚠️ This PR contains more than 30 files. Amazon Q is better at reviewing smaller PRs, and may miss issues in larger changesets.

@Android-PowerUser Android-PowerUser merged commit 639b980 into main Jul 1, 2026
5 checks passed
@Android-PowerUser Android-PowerUser deleted the feature/webview-command-type-updates branch July 1, 2026 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants