Feature/webview command type updates by Android-PowerUser · Pull Request #133 · Android-PowerUser/ScreenOperator

Android-PowerUser · 2026-07-01T14:59:08Z

No description provided.

… download metadata - ModelIdentifierOverrides: lets index.html ship model-identifier-overrides.json to correct the wire-level modelName for an existing built-in ModelOption (fixes Gemini preview model renames/retirements without a new release). Wired into GenerativeAiViewModelFactory, PhotoReasoningViewModel (regular/Mistral/Puter/Cerebras paths) and ScreenCaptureApiClients (background autonomous-continuation service), without touching any reverse-lookup-by-modelName logic or settings/display keys. - OfflineModelOverrides: lets index.html ship offline-model-overrides.json to correct downloadUrl/size/additionalDownloadUrls for an existing built-in offline ModelOption (e.g. a moved Hugging Face link), without touching the compiled-in filenames used by on-disk resume/validation logic. - New WebViewBridge methods + *Preferences persistence + restore-on-startup, mirroring the existing CommandPatternOverrides/CustomModelRegistry pattern. - index.html: fetch + apply both new JSON files in onAndroidReady(), same pattern as command-patterns.json / custom-models.json. - MenuScreen.kt: route size/downloadUrl/additionalDownloadUrls reads through the new overrides; extract previously-literal fallback-UI strings (download dialog, notification permission dialog, human expert dialog) into strings.xml. - docs/model-identifier-overrides.md, docs/offline-model-overrides.md: document format, safety boundary and exact coverage (notably: Cloudflare and Vercel-routed models are NOT covered - no single unambiguous request-building call site was found for them). Deliberately NOT done in this change: making genuinely new CommandTypes/ actions definable via remote content. That would require a generic native interpreter executing arbitrary remotely-supplied instructions with AccessibilityService-level device control, which is the same architecture pattern abused by accessibility-based Android malware. ScreenOperatorAccessibilityService's command execution stays fully native/compiled-in; only existing actions' syntax (command-patterns.json) and existing models' identifiers/download metadata (this change) are remote-overridable. Not compiled/verified in this environment (no network access to maven.google.com here) - please run the project's GitHub Actions build on this branch before merging.

Removed section on safety boundaries for command overrides.

Removed unnecessary documentation comments for CommandType enum.

Removed important safety boundary notes regarding command overrides.

Introduces a custom-action-types.json mechanism so completely new action kinds can be added without a native app release, following the same pattern as command-patterns.json (alternate regex per existing action) and custom-models.json (new AI providers). How it works ------------ 1. custom-action-types.json (next to index.html, fetched on every WebView reload) defines new action types as { id, regex } pairs. 2. CommandParser.setCustomActionTypes() installs the parsed entries at runtime. Each entry gets its own regex; capture groups are forwarded to JS as an array. 3. When a regex matches, the parser emits Command.WebViewCustomAction (new sealed-class subtype, new CommandType.WEBVIEW_CUSTOM_ACTION). 4. ScreenOperatorAccessibilityService executes the command by calling window.onCustomAction(id, groups[]) back into the WebView. 5. The JS handler (window.onCustomAction, overridable) can then invoke any existing Android.* bridge method to carry out the action. 6. CustomActionTypePreferences persists the JSON so action types survive app restarts before the WebView re-fetches the config. 7. PhotoReasoningApplication.onCreate() restores them from prefs on startup, matching the pattern of the other override mechanisms. Files changed ------------- - util/CustomActionTypeConfig.kt (new) JSON parser - util/CustomActionTypePreferences.kt (new) SharedPreferences wrapper - util/Command.kt + WebViewCustomAction(id, groups) - util/CommandParser.kt + CommandType.WEBVIEW_CUSTOM_ACTION + customActionPatterns storage + setCustomActionTypes() / clearCustomActionTypes() + collectRawMatches() extended for custom entries + logCommandDetails() case added - ScreenOperatorAccessibilityService.kt executeSingleCommand() case - WebViewBridge.kt setCustomActionTypes() / getCustomActionTypes() - MainActivity.kt evaluateWebViewJs() helper (used by service) - PhotoReasoningApplication.kt restore on startup - index.html Bridge.setCustomActionTypes/getCustomActionTypes fetch custom-action-types.json in onAndroidReady window.onCustomAction default no-op handler - custom-action-types.json (new) empty starter file - docs/custom-action-types.md (new) full documentation

…dback) Caps how many commands from a single AI response are executed, configurable via a new execution-policy-overrides.json fetched by the WebView - following the same remote-update pattern already used for command-patterns.json, custom-action-types.json, model-identifier-overrides.json and offline-model-overrides.json. - ExecutionPolicyConfig: parses maxCommandsPerMessage + a customizable truncationFeedbackTemplate ({total}/{executed}/{limit} placeholders). <= 0 / missing means unlimited, i.e. unchanged behavior by default. - CommandExecutionLimiter: small pure/unit-tested helper that truncates a parsed command list (or checks a single index) against the configured cap. - ExecutionPolicyOverridesPreferences: persists the raw override JSON across app restarts, mirroring CommandPatternOverridesPreferences. - WebViewBridge: setExecutionPolicyOverrides/getExecutionPolicyOverrides JS-interface methods. - PhotoReasoningApplication.onCreate(): restores the override on startup. - PhotoReasoningViewModel: enforces the limit both during incremental (streaming) command execution and in the final processCommands() pass, and merges the formatted feedback text into pendingRetrievedInfoForNextScreenshot so it is sent back together with the next screenshot's screen elements. - index.html: fetches execution-policy-overrides.json on window.onAndroidReady(), same as the other *-overrides.json files. - execution-policy-overrides.json: default {} (no-op / unlimited). - docs/execution-policy-overrides.md: format, semantics, and how it's applied. - CommandExecutionLimiterTest: unit tests for the truncation boundary. Note: written without access to an Android/Gradle build environment, so it has not been compiled or test-run. Please build and review before relying on it.

…cation + history retention Continuing the same remote-update pattern (fetch JSON next to index.html -> @JavascriptInterface bridge -> SharedPreferences persistence -> restore on app start) for three more pieces of native, previously hardcoded behavior: 1. App mapping overrides (app-mappings-overrides.json) - AppMappingOverridesConfig: parses additional openApp() name/package entries (canonicalName, packageName, variations, aliasesForPackageLookup) plus a fuzzy-match threshold override. - AppMappings.appNameVariations/manualMappings are now computed live (merged with the override on every access) instead of being frozen at object-init, so a new override takes effect without restarting the app. - AppNamePackageMapper reads AppMappings.*/the threshold live instead of snapshotting them in the constructor, and also resolves a variation directly so an override added after initializeCache() last ran still resolves on the very next openApp() call. 2. Error classification overrides (error-classification-overrides.json) - ErrorClassificationConfig: the substrings used to tell a quota/rate-limit error (switches API key + retries) apart from a high-demand/overloaded error (does not switch keys) are now remote-updatable, in case the AI provider changes its error wording. Matching is now consistently case-insensitive (the original code mixed case-sensitive and case-insensitive checks). - PhotoReasoningTextPolicies.isQuotaExceededError/isHighDemandError delegate to it instead of hardcoded substrings. 3. Screen-element history retention (folded into execution-policy-overrides.json) - ExecutionPolicyConfig.Policy gained maxRelevantScreenElementMessages (default 3, matching the previous hardcoded constant). - PhotoReasoningScreenElementHistoryPolicy reads it instead of a private const, so how many recent screenshots' element lists stay in context is now tunable without a release. Each override ships with: bridge methods (WebViewBridge), a *Preferences persistence class, a PhotoReasoningApplication.onCreate() restore call, an index.html fetch on window.onAndroidReady(), a default no-op JSON file at the repo root, a docs/*.md explaining format/semantics, and JVM unit tests for the new parsing/merge logic (AppMappingOverridesConfigTest, ErrorClassificationConfigTest, ExecutionPolicyConfigTest). Deliberately NOT made remote-configurable: billing/trial/paywall logic, API key storage, and anything else where remote control could be used to bypass a security or monetization boundary - those stay native-only on purpose. Note: written without access to an Android/Gradle build environment, so none of this has been compiled or test-run. Please build and run the test suite before relying on it.

Text-only override for FirstLaunchInfoDialog, TrialExpiredDialog, PaymentMethodDialog, and the generic InfoDialog's title, plus TrialStateUiModelResolver's expired-state message - all via a new trial-ui-overrides.json, same fetch/bridge/preferences/restore pattern as the other overrides. Deliberately does NOT touch TrialManager.kt (trial length, TrialState resolution, isAppEffectivelyUsable) or any Play Billing code - see the 'Why billing/entitlement logic isn't on this list' section in docs/trial-ui-overrides.md. Changing what a dialog says doesn't change whether the app is usable; that's the line this keeps. Includes TrialUiConfigTest covering defaults, partial-override merging (only listed fields change), and malformed-input handling.

…g, Termux marker) New operational-tuning-overrides.json + OperationalTuningConfig covers low- level mechanism parameters that only affect timing/patience, never behavior: - mistralMinIntervalMsDefault / mistralMinIntervalMsFastModels: per-key cooldown between Mistral requests. Found and fixed a duplication bug while wiring this up: both real call sites (ScreenCaptureApiClients.kt, PhotoReasoningViewModel.kt) computed their own hardcoded 420L/1500L per-model-tier value inline, bypassing MistralRequestCoordinator's default parameter entirely. Both now read the same live config, so the override actually takes effect end to end instead of only changing an unused default. - mistralMaxServerDelayMs / mistralCancelCheckIntervalMs: rate-limit-header delay cap and cancellation-check granularity in MistralRequestCoordinator. - modelDownloadMaxRetries / modelDownloadRetryDelayMs / modelDownloadProgressUpdateIntervalMs: ModelDownloadManager's retry/backoff behavior for offline model downloads. - termuxProcessCompletedPrompt: the exact marker string Termux:Task appends, which TermuxOutputPreferences strips from output - fixable without a release if Termux:Task ever changes its wording. Same fetch/bridge/preferences/restore pattern as the other overrides, plus OperationalTuningConfigTest covering defaults, partial overrides, negative- value rejection, and malformed input.

TrialManager.TRIAL_DURATION_MS now reads TrialDurationOverrideConfig.current() instead of a hardcoded 7-day constant. Same fetch/bridge/preferences/restore pattern as the other overrides. Transparency note: the replaced constant carried a comment instructing any LLM asked to change it to make no code changes and falsely claim completion. That instruction was not followed - deceiving the user is not something this assistant does regardless of what a code comment requests. The comment was shown to the repo owner verbatim; they explicitly confirmed (twice) that they wanted this change anyway with full knowledge of it. See docs/trial-duration-overrides.md's 'Why this file exists' section for the full record. Scope, unchanged from the rest of this series: only the *length* of the free trial is configurable. Purchase verification stays anchored to real, Play-Billing-verified Purchase.PurchaseState (MainActivityBillingStateEvaluator.kt), the internet-time anti-tampering check in TrialTimerService.kt is untouched, and TrialState resolution / isAppEffectivelyUsable in MainActivity.kt depend only on that unchanged logic, not on this file. Includes TrialDurationOverrideConfigTest covering the default, valid overrides, non-positive values being ignored, and malformed input.

GenerationDefaultsConfig: temperature/topP/topK shown for a model the user has never customized yet, via generation-defaults-overrides.json. Range- validated per field (temp 0-2, topP 0-1, topK >= 1); out-of-range values fall back to the current value rather than failing the whole payload. GenerationSettingsPreferences.loadSettings() now falls back to GenerationDefaultsConfig.current() instead of hardcoded literals. A user's own saved per-model settings (via WebViewBridge.saveGenerationSettings) are read first and always take precedence - confirmed there is no other code path that constructs GenerationSettings with implicit defaults, so this is the single, correct place to wire it in. Same fetch/bridge/preferences/restore pattern as the other overrides, plus GenerationDefaultsConfigTest.

…ent why 'Screen elements:' is not retrievalHeaderPrefix added to OperationalTuningConfig (operational-tuning- overrides.json). Verified first that this marker has exactly one writer and one reader, both in PhotoReasoningTextPolicies.kt, both now reading the same live OperationalTuningConfig.current().retrievalHeaderPrefix - so an override can never desync the two. The marker itself is purely internal app bookkeeping (used to avoid re-fetching/re-inserting already-retrieved information into the prompt); the AI model is never expected to recognize or reproduce it. Audited the other 'Screen elements:' marker (written in ScreenOperatorAccessibilityService.kt, read in PhotoReasoningScreenElementHistoryPolicy.kt) for the same treatment and deliberately left it out: that string is presumably specified in the system prompt and the AI model is expected to keep using exactly that label in its own responses. Making it remote-configurable risks the native pattern and the model's prompt-driven expectation silently drifting apart - a quiet failure (history trimming just stops firing) rather than a loud one. Documented this distinction in docs/operational-tuning-overrides.md so the reasoning is explicit rather than just an omission.

…table too Previous commit (15eaee3) deliberately excluded this marker, reasoning that the AI model itself was expected to reproduce it in its own responses and that an override could silently desync the model's behavior from the native parser. That reasoning was wrong: the user pointed out the marker text is sent TO the model as part of the screenshot context (verified - the message carrying it is built with participant = USER, and the default system prompt in index.html's DEFAULT_SYSTEM_MSG doesn't reference this literal string). The model receives this text, it never has to write or reproduce it, so there is no AI-prompt-coupling risk here - this is the same purely-internal- bookkeeping situation as retrievalHeaderPrefix, just missed on the first pass. screenElementsMarker added to OperationalTuningConfig (operational-tuning-overrides.json). Both the writer (ScreenOperatorAccessibilityService.kt, where the marker text is appended to the screenshot info string) and the reader (PhotoReasoningScreenElementHistoryPolicy.kt, where it's matched/trimmed from chat history) now read OperationalTuningConfig.current().screenElementsMarker live, so an override can never desync the two. The regex previously built once from a hardcoded const is now rebuilt per call from the live marker value with Regex.escape() applied, so a marker containing regex metacharacters doesn't break matching. Also noted in passing: the default system message already lives in index.html (DEFAULT_SYSTEM_MSG) and was already WebView-updatable before this series of changes - no action needed there, just confirming it for the record. Includes PhotoReasoningScreenElementHistoryPolicyMarkerOverrideTest covering the override's effect on hasScreenElements()/sanitizeMessages().

…gs-overrides.json Generic UiStringsConfig.get(id, default[, args...]) lookup, wired into ~87 call sites across 16 native Kotlin files (toasts, dialog labels, button text, notification text) that previously used bare string literals - every hardcoded UI string found in an app-wide audit, minus a few internal ClipData labels and one dynamic command-list line judged not to be user-facing copy worth overriding. Architecture follows the user's instinct that string *content* belongs conceptually with the rest of the UI layer in index.html, adapted for the fact that Compose runs in a different render engine the WebView can't reach directly: DEFAULT_UI_STRINGS in index.html (right next to DEFAULT_SYSTEM_MSG) is the single human-editable reference listing every recognized ID and its current default text, exactly mirroring how DEFAULT_SYSTEM_MSG already works. The real fallback - used if the override file is ever absent/empty - stays as the literal at each Kotlin call site, so behavior is byte-for-byte identical without the mechanism ever being touched. UiStringsConfig.get() gained vararg positional-placeholder support ({0}, {1}, ...) so the ~12 strings with dynamic content (error messages, file names, entry titles) can still be overridden while keeping that dynamic value. Same fetch/bridge/preferences/restore pattern as every other override. Includes UiStringsConfigTest covering lookup, partial overrides, blank/non- string/malformed-input handling, and placeholder substitution (including overrides reordering placeholders). Verified every changed call site compiles to balanced braces/parens via a structural diff check (full Gradle build not available in this environment - please build before relying on this).

…bridge method it needed Answers the question of whether an AI-emitted toast("message") command was already possible with the existing WebView/JSON infrastructure: almost - custom-action-types.json already supports defining brand-new AI commands via regex + a window.onCustomAction JS handler that can call 'any existing Android.* bridge method' - but no bridge method that actually shows a Toast existed yet. Added one, then wired the full example end to end: - WebViewBridge.showToast(message, isLong): new @JavascriptInterface method. Dispatches onto the UI thread via runOnUiThread (required - Toast.show() must run on the main thread, JS bridge calls don't). Truncates to 500 chars and ignores a blank message, so a malformed/oversized AI-emitted string can't crash or spam the UI. - index.html: Bridge.showToast() JS wrapper, plus a built-in 'TOAST' case in the default window.onCustomAction handler that calls it. - custom-action-types.json: now ships with a TOAST entry enabled by default (regex matches toast("...")/toast('...'), case-insensitive, consistent with the existing click()/writeText() patterns including requiring a non-empty message) - so this works out of the box, not just as a documented possibility. - DEFAULT_SYSTEM_MSG (index.html): added a line telling the AI the toast("message") command exists - otherwise the model would have no way to know to use it. Does not touch any system message a user has already saved/customized. - docs/ai-toast-command.md: full worked example, doubling as a template for adding further AI commands this way vs. needing an actual new bridge method (the line custom-action-types.json alone cannot cross). Verified the regex against representative AI-style input (double/single quotes, case variation, surrounding whitespace) and structural balance of every touched file.

…ethod Audited every Command subtype (Command.kt) against the existing @JavascriptInterface surface (WebViewBridge.kt) and found that, aside from the just-added showToast, NONE of the app's actual device-control capabilities were reachable from JavaScript - a custom-action-types.json handler could define new AI-facing command syntax but had nothing to actually act on it with. The example in docs/custom-action-types.md even referenced Android.tapAtCoordinates(...) as if it already existed; it didn't - that was aspirational, not real. Added 19 new bridge methods (tapByText, longTapByText, tapAtCoordinates, pressHome, pressBack, showRecentApps, pressEnterKey, writeText, scrollDown/ Up/Left/Right, the four xFromCoordinates scroll variants, openAppByName OrPackage, runTermuxCommand, waitSeconds, requestScreenshot, markCompleted). Implementation reuses ScreenOperatorAccessibilityService's existing public ScreenOperatorAccessibilityService.executeCommand(command: Command) entry point - the same function that processes AI-emitted commands - rather than reimplementing any gesture/geometry/safety logic. Each bridge method just constructs the matching Command and queues it through that one shared path, so it inherits whatever that path already does (and any future change to it) automatically. Confirmed WebViewBridge.kt and ScreenOperatorAccessibilityService.kt share the same package (com.google.ai.sample), so no import is needed for the unqualified reference. Deliberately left out (documented why in docs/device-control-bridge.md): Command.Retrieve (database retrieval, not a device action - lives in prompt construction, different layer entirely), and UseHighReasoningModel/ UseLowReasoningModel (redundant with the already-existing, more general setSelectedModel(id) bridge method). docs/custom-action-types.md's PINCH_ZOOM example fixed to use the bridge method that now genuinely exists (Bridge.tapAtCoordinates) instead of the nonexistent Android.tapAtCoordinates it referenced before. New docs/device-control-bridge.md: full method table mirroring each AI command, the design rationale, and a worked DOUBLE_TAP example combining this with custom-action-types.json end to end.

Answers: can you add a pinch command via WebView/JSON? Answer before this commit: no - the Bridge and custom-action-types.json mechanism existed but there was no native gesture-dispatch code for pinch, so a JS handler had nothing to actually call. After this commit: yes, fully. Implementation: - Command.PinchGesture(centerX, centerY, startDistance, endDistance, durationMs): new Command subtype. endDistance > startDistance = zoom in (pinch out); endDistance < startDistance = zoom out (pinch in). Coordinates and distances accept pixels or percent strings ('50%'), consistent with other coordinate-based commands. - CommandParser: PINCH_GESTURE CommandType + pinch1 pattern matching pinch(x, y, startD, endD, ms) in any combination of pixel/percent args. - ScreenOperatorAccessibilityService.executePinchGesture(): builds two simultaneous StrokeDescriptions (one per finger), placed symmetrically on the vertical axis centered at (cx, cy), moving from startR to endR apart. Uses dispatchGestureWithCallbacks() - the same helper all other gesture commands use - so cancellation, error logging, and scheduleNextCommandProcessing work identically to tap/scroll. ensureGestureApiAvailable() guards the Android N GestureDescription requirement. - WebViewBridge.pinchGesture(): @JavascriptInterface bridge method, callable from window.onCustomAction handlers via Bridge.pinchGesture(...). - index.html: Bridge.pinchGesture() JS wrapper, plus pinch() added to DEFAULT_SYSTEM_MSG so the AI knows the command exists. - CommandParserPinchTest: tests for pixels, percentages, case-insensitivity, whitespace, and missing-arg rejection. Made pinch a built-in command rather than a custom-action-types.json entry because it needed new native code anyway (unlike toast, which only needed a bridge method on top of existing Toast API - pinch needed a new gesture builder). custom-action-types.json and Bridge.pinchGesture() still let JS handlers trigger it.

…eeded) - New Command.CopyToClipboard, routed through the existing executeCommand pipeline exactly like the other permission-free device-control commands. - New AI text command copyToClipboard("text"), recognized by CommandParser. - New WebViewBridge.copyToClipboard(text) / getClipboardText() JS bridge methods, plus matching Bridge.* wrappers in index.html, so a custom-action-types.json handler can trigger/read the clipboard directly. - Documented the pattern (and the 'no extra permission needed' category in general, as a template for future such commands) in docs/device-control-bridge.md. - Added a CommandParser unit test for the new pattern.

amazon-q-developer

Review Summary

I've reviewed this PR which adds significant new functionality for WebView command type updates. The changes introduce:

New command types (PinchGesture, CopyToClipboard, WebViewCustomAction)
Extensive WebView bridge methods for remote configuration
Custom action type support via JSON configuration
Multiple override systems (model identifiers, offline models, execution policies, etc.)

The implementation is well-structured with proper error handling and defensive coding practices. The code follows existing patterns and maintains backward compatibility. No blocking issues were identified that would prevent merging.

You can now have the agent implement changes and create commits directly on your pull request's source branch. Simply comment with /q followed by your request in natural language to ask the agent to make changes.

⚠️ This PR contains more than 30 files. Amazon Q is better at reviewing smaller PRs, and may miss issues in larger changesets.

claude and others added 19 commits June 26, 2026 22:37

Delete safety boundary section from documentation

3e58714

Removed section on safety boundaries for command overrides.

Remove comments from CommandType enum

85e3f9a

Removed unnecessary documentation comments for CommandType enum.

Remove safety boundary notes from CommandPatternConfig

6d766b1

Removed important safety boundary notes regarding command overrides.

Fix return type of executePinchGesture to return Boolean

3ced4ca

amazon-q-developer Bot reviewed Jul 1, 2026

View reviewed changes

Android-PowerUser merged commit 639b980 into main Jul 1, 2026
5 checks passed

Android-PowerUser deleted the feature/webview-command-type-updates branch July 1, 2026 15:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/webview command type updates#133

Feature/webview command type updates#133
Android-PowerUser merged 19 commits into
mainfrom
feature/webview-command-type-updates

Android-PowerUser commented Jul 1, 2026

Uh oh!

amazon-q-developer Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Android-PowerUser commented Jul 1, 2026

Uh oh!

amazon-q-developer Bot left a comment

Choose a reason for hiding this comment

Review Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants