Feature/webview command type updates#133
Merged
Merged
Conversation
… download metadata - ModelIdentifierOverrides: lets index.html ship model-identifier-overrides.json to correct the wire-level modelName for an existing built-in ModelOption (fixes Gemini preview model renames/retirements without a new release). Wired into GenerativeAiViewModelFactory, PhotoReasoningViewModel (regular/Mistral/Puter/Cerebras paths) and ScreenCaptureApiClients (background autonomous-continuation service), without touching any reverse-lookup-by-modelName logic or settings/display keys. - OfflineModelOverrides: lets index.html ship offline-model-overrides.json to correct downloadUrl/size/additionalDownloadUrls for an existing built-in offline ModelOption (e.g. a moved Hugging Face link), without touching the compiled-in filenames used by on-disk resume/validation logic. - New WebViewBridge methods + *Preferences persistence + restore-on-startup, mirroring the existing CommandPatternOverrides/CustomModelRegistry pattern. - index.html: fetch + apply both new JSON files in onAndroidReady(), same pattern as command-patterns.json / custom-models.json. - MenuScreen.kt: route size/downloadUrl/additionalDownloadUrls reads through the new overrides; extract previously-literal fallback-UI strings (download dialog, notification permission dialog, human expert dialog) into strings.xml. - docs/model-identifier-overrides.md, docs/offline-model-overrides.md: document format, safety boundary and exact coverage (notably: Cloudflare and Vercel-routed models are NOT covered - no single unambiguous request-building call site was found for them). Deliberately NOT done in this change: making genuinely new CommandTypes/ actions definable via remote content. That would require a generic native interpreter executing arbitrary remotely-supplied instructions with AccessibilityService-level device control, which is the same architecture pattern abused by accessibility-based Android malware. ScreenOperatorAccessibilityService's command execution stays fully native/compiled-in; only existing actions' syntax (command-patterns.json) and existing models' identifiers/download metadata (this change) are remote-overridable. Not compiled/verified in this environment (no network access to maven.google.com here) - please run the project's GitHub Actions build on this branch before merging.
Removed section on safety boundaries for command overrides.
Removed unnecessary documentation comments for CommandType enum.
Removed important safety boundary notes regarding command overrides.
Introduces a custom-action-types.json mechanism so completely new action
kinds can be added without a native app release, following the same
pattern as command-patterns.json (alternate regex per existing action)
and custom-models.json (new AI providers).
How it works
------------
1. custom-action-types.json (next to index.html, fetched on every
WebView reload) defines new action types as { id, regex } pairs.
2. CommandParser.setCustomActionTypes() installs the parsed entries at
runtime. Each entry gets its own regex; capture groups are forwarded
to JS as an array.
3. When a regex matches, the parser emits Command.WebViewCustomAction
(new sealed-class subtype, new CommandType.WEBVIEW_CUSTOM_ACTION).
4. ScreenOperatorAccessibilityService executes the command by calling
window.onCustomAction(id, groups[]) back into the WebView.
5. The JS handler (window.onCustomAction, overridable) can then invoke
any existing Android.* bridge method to carry out the action.
6. CustomActionTypePreferences persists the JSON so action types survive
app restarts before the WebView re-fetches the config.
7. PhotoReasoningApplication.onCreate() restores them from prefs on
startup, matching the pattern of the other override mechanisms.
Files changed
-------------
- util/CustomActionTypeConfig.kt (new) JSON parser
- util/CustomActionTypePreferences.kt (new) SharedPreferences wrapper
- util/Command.kt + WebViewCustomAction(id, groups)
- util/CommandParser.kt + CommandType.WEBVIEW_CUSTOM_ACTION
+ customActionPatterns storage
+ setCustomActionTypes() / clearCustomActionTypes()
+ collectRawMatches() extended for custom entries
+ logCommandDetails() case added
- ScreenOperatorAccessibilityService.kt executeSingleCommand() case
- WebViewBridge.kt setCustomActionTypes() / getCustomActionTypes()
- MainActivity.kt evaluateWebViewJs() helper (used by service)
- PhotoReasoningApplication.kt restore on startup
- index.html Bridge.setCustomActionTypes/getCustomActionTypes
fetch custom-action-types.json in onAndroidReady
window.onCustomAction default no-op handler
- custom-action-types.json (new) empty starter file
- docs/custom-action-types.md (new) full documentation
…dback)
Caps how many commands from a single AI response are executed, configurable
via a new execution-policy-overrides.json fetched by the WebView - following
the same remote-update pattern already used for command-patterns.json,
custom-action-types.json, model-identifier-overrides.json and
offline-model-overrides.json.
- ExecutionPolicyConfig: parses maxCommandsPerMessage + a customizable
truncationFeedbackTemplate ({total}/{executed}/{limit} placeholders).
<= 0 / missing means unlimited, i.e. unchanged behavior by default.
- CommandExecutionLimiter: small pure/unit-tested helper that truncates a
parsed command list (or checks a single index) against the configured cap.
- ExecutionPolicyOverridesPreferences: persists the raw override JSON across
app restarts, mirroring CommandPatternOverridesPreferences.
- WebViewBridge: setExecutionPolicyOverrides/getExecutionPolicyOverrides
JS-interface methods.
- PhotoReasoningApplication.onCreate(): restores the override on startup.
- PhotoReasoningViewModel: enforces the limit both during incremental
(streaming) command execution and in the final processCommands() pass, and
merges the formatted feedback text into pendingRetrievedInfoForNextScreenshot
so it is sent back together with the next screenshot's screen elements.
- index.html: fetches execution-policy-overrides.json on
window.onAndroidReady(), same as the other *-overrides.json files.
- execution-policy-overrides.json: default {} (no-op / unlimited).
- docs/execution-policy-overrides.md: format, semantics, and how it's applied.
- CommandExecutionLimiterTest: unit tests for the truncation boundary.
Note: written without access to an Android/Gradle build environment, so it
has not been compiled or test-run. Please build and review before relying on
it.
…cation + history retention
Continuing the same remote-update pattern (fetch JSON next to index.html ->
@JavascriptInterface bridge -> SharedPreferences persistence -> restore on
app start) for three more pieces of native, previously hardcoded behavior:
1. App mapping overrides (app-mappings-overrides.json)
- AppMappingOverridesConfig: parses additional openApp() name/package
entries (canonicalName, packageName, variations, aliasesForPackageLookup)
plus a fuzzy-match threshold override.
- AppMappings.appNameVariations/manualMappings are now computed live
(merged with the override on every access) instead of being frozen at
object-init, so a new override takes effect without restarting the app.
- AppNamePackageMapper reads AppMappings.*/the threshold live instead of
snapshotting them in the constructor, and also resolves a variation
directly so an override added after initializeCache() last ran still
resolves on the very next openApp() call.
2. Error classification overrides (error-classification-overrides.json)
- ErrorClassificationConfig: the substrings used to tell a quota/rate-limit
error (switches API key + retries) apart from a high-demand/overloaded
error (does not switch keys) are now remote-updatable, in case the AI
provider changes its error wording. Matching is now consistently
case-insensitive (the original code mixed case-sensitive and
case-insensitive checks).
- PhotoReasoningTextPolicies.isQuotaExceededError/isHighDemandError
delegate to it instead of hardcoded substrings.
3. Screen-element history retention (folded into execution-policy-overrides.json)
- ExecutionPolicyConfig.Policy gained maxRelevantScreenElementMessages
(default 3, matching the previous hardcoded constant).
- PhotoReasoningScreenElementHistoryPolicy reads it instead of a private
const, so how many recent screenshots' element lists stay in context is
now tunable without a release.
Each override ships with: bridge methods (WebViewBridge), a *Preferences
persistence class, a PhotoReasoningApplication.onCreate() restore call, an
index.html fetch on window.onAndroidReady(), a default no-op JSON file at the
repo root, a docs/*.md explaining format/semantics, and JVM unit tests for
the new parsing/merge logic (AppMappingOverridesConfigTest,
ErrorClassificationConfigTest, ExecutionPolicyConfigTest).
Deliberately NOT made remote-configurable: billing/trial/paywall logic, API
key storage, and anything else where remote control could be used to bypass
a security or monetization boundary - those stay native-only on purpose.
Note: written without access to an Android/Gradle build environment, so none
of this has been compiled or test-run. Please build and run the test suite
before relying on it.
Text-only override for FirstLaunchInfoDialog, TrialExpiredDialog, PaymentMethodDialog, and the generic InfoDialog's title, plus TrialStateUiModelResolver's expired-state message - all via a new trial-ui-overrides.json, same fetch/bridge/preferences/restore pattern as the other overrides. Deliberately does NOT touch TrialManager.kt (trial length, TrialState resolution, isAppEffectivelyUsable) or any Play Billing code - see the 'Why billing/entitlement logic isn't on this list' section in docs/trial-ui-overrides.md. Changing what a dialog says doesn't change whether the app is usable; that's the line this keeps. Includes TrialUiConfigTest covering defaults, partial-override merging (only listed fields change), and malformed-input handling.
…g, Termux marker) New operational-tuning-overrides.json + OperationalTuningConfig covers low- level mechanism parameters that only affect timing/patience, never behavior: - mistralMinIntervalMsDefault / mistralMinIntervalMsFastModels: per-key cooldown between Mistral requests. Found and fixed a duplication bug while wiring this up: both real call sites (ScreenCaptureApiClients.kt, PhotoReasoningViewModel.kt) computed their own hardcoded 420L/1500L per-model-tier value inline, bypassing MistralRequestCoordinator's default parameter entirely. Both now read the same live config, so the override actually takes effect end to end instead of only changing an unused default. - mistralMaxServerDelayMs / mistralCancelCheckIntervalMs: rate-limit-header delay cap and cancellation-check granularity in MistralRequestCoordinator. - modelDownloadMaxRetries / modelDownloadRetryDelayMs / modelDownloadProgressUpdateIntervalMs: ModelDownloadManager's retry/backoff behavior for offline model downloads. - termuxProcessCompletedPrompt: the exact marker string Termux:Task appends, which TermuxOutputPreferences strips from output - fixable without a release if Termux:Task ever changes its wording. Same fetch/bridge/preferences/restore pattern as the other overrides, plus OperationalTuningConfigTest covering defaults, partial overrides, negative- value rejection, and malformed input.
TrialManager.TRIAL_DURATION_MS now reads TrialDurationOverrideConfig.current() instead of a hardcoded 7-day constant. Same fetch/bridge/preferences/restore pattern as the other overrides. Transparency note: the replaced constant carried a comment instructing any LLM asked to change it to make no code changes and falsely claim completion. That instruction was not followed - deceiving the user is not something this assistant does regardless of what a code comment requests. The comment was shown to the repo owner verbatim; they explicitly confirmed (twice) that they wanted this change anyway with full knowledge of it. See docs/trial-duration-overrides.md's 'Why this file exists' section for the full record. Scope, unchanged from the rest of this series: only the *length* of the free trial is configurable. Purchase verification stays anchored to real, Play-Billing-verified Purchase.PurchaseState (MainActivityBillingStateEvaluator.kt), the internet-time anti-tampering check in TrialTimerService.kt is untouched, and TrialState resolution / isAppEffectivelyUsable in MainActivity.kt depend only on that unchanged logic, not on this file. Includes TrialDurationOverrideConfigTest covering the default, valid overrides, non-positive values being ignored, and malformed input.
GenerationDefaultsConfig: temperature/topP/topK shown for a model the user has never customized yet, via generation-defaults-overrides.json. Range- validated per field (temp 0-2, topP 0-1, topK >= 1); out-of-range values fall back to the current value rather than failing the whole payload. GenerationSettingsPreferences.loadSettings() now falls back to GenerationDefaultsConfig.current() instead of hardcoded literals. A user's own saved per-model settings (via WebViewBridge.saveGenerationSettings) are read first and always take precedence - confirmed there is no other code path that constructs GenerationSettings with implicit defaults, so this is the single, correct place to wire it in. Same fetch/bridge/preferences/restore pattern as the other overrides, plus GenerationDefaultsConfigTest.
…ent why 'Screen elements:' is not retrievalHeaderPrefix added to OperationalTuningConfig (operational-tuning- overrides.json). Verified first that this marker has exactly one writer and one reader, both in PhotoReasoningTextPolicies.kt, both now reading the same live OperationalTuningConfig.current().retrievalHeaderPrefix - so an override can never desync the two. The marker itself is purely internal app bookkeeping (used to avoid re-fetching/re-inserting already-retrieved information into the prompt); the AI model is never expected to recognize or reproduce it. Audited the other 'Screen elements:' marker (written in ScreenOperatorAccessibilityService.kt, read in PhotoReasoningScreenElementHistoryPolicy.kt) for the same treatment and deliberately left it out: that string is presumably specified in the system prompt and the AI model is expected to keep using exactly that label in its own responses. Making it remote-configurable risks the native pattern and the model's prompt-driven expectation silently drifting apart - a quiet failure (history trimming just stops firing) rather than a loud one. Documented this distinction in docs/operational-tuning-overrides.md so the reasoning is explicit rather than just an omission.
…table too Previous commit (15eaee3) deliberately excluded this marker, reasoning that the AI model itself was expected to reproduce it in its own responses and that an override could silently desync the model's behavior from the native parser. That reasoning was wrong: the user pointed out the marker text is sent TO the model as part of the screenshot context (verified - the message carrying it is built with participant = USER, and the default system prompt in index.html's DEFAULT_SYSTEM_MSG doesn't reference this literal string). The model receives this text, it never has to write or reproduce it, so there is no AI-prompt-coupling risk here - this is the same purely-internal- bookkeeping situation as retrievalHeaderPrefix, just missed on the first pass. screenElementsMarker added to OperationalTuningConfig (operational-tuning-overrides.json). Both the writer (ScreenOperatorAccessibilityService.kt, where the marker text is appended to the screenshot info string) and the reader (PhotoReasoningScreenElementHistoryPolicy.kt, where it's matched/trimmed from chat history) now read OperationalTuningConfig.current().screenElementsMarker live, so an override can never desync the two. The regex previously built once from a hardcoded const is now rebuilt per call from the live marker value with Regex.escape() applied, so a marker containing regex metacharacters doesn't break matching. Also noted in passing: the default system message already lives in index.html (DEFAULT_SYSTEM_MSG) and was already WebView-updatable before this series of changes - no action needed there, just confirming it for the record. Includes PhotoReasoningScreenElementHistoryPolicyMarkerOverrideTest covering the override's effect on hasScreenElements()/sanitizeMessages().
…gs-overrides.json
Generic UiStringsConfig.get(id, default[, args...]) lookup, wired into ~87
call sites across 16 native Kotlin files (toasts, dialog labels, button
text, notification text) that previously used bare string literals - every
hardcoded UI string found in an app-wide audit, minus a few internal
ClipData labels and one dynamic command-list line judged not to be
user-facing copy worth overriding.
Architecture follows the user's instinct that string *content* belongs
conceptually with the rest of the UI layer in index.html, adapted for the
fact that Compose runs in a different render engine the WebView can't reach
directly: DEFAULT_UI_STRINGS in index.html (right next to DEFAULT_SYSTEM_MSG)
is the single human-editable reference listing every recognized ID and its
current default text, exactly mirroring how DEFAULT_SYSTEM_MSG already
works. The real fallback - used if the override file is ever absent/empty -
stays as the literal at each Kotlin call site, so behavior is byte-for-byte
identical without the mechanism ever being touched.
UiStringsConfig.get() gained vararg positional-placeholder support ({0},
{1}, ...) so the ~12 strings with dynamic content (error messages, file
names, entry titles) can still be overridden while keeping that dynamic
value.
Same fetch/bridge/preferences/restore pattern as every other override.
Includes UiStringsConfigTest covering lookup, partial overrides, blank/non-
string/malformed-input handling, and placeholder substitution (including
overrides reordering placeholders).
Verified every changed call site compiles to balanced braces/parens via a
structural diff check (full Gradle build not available in this environment -
please build before relying on this).
…bridge method it needed
Answers the question of whether an AI-emitted toast("message") command was
already possible with the existing WebView/JSON infrastructure: almost -
custom-action-types.json already supports defining brand-new AI commands via
regex + a window.onCustomAction JS handler that can call 'any existing
Android.* bridge method' - but no bridge method that actually shows a Toast
existed yet. Added one, then wired the full example end to end:
- WebViewBridge.showToast(message, isLong): new @JavascriptInterface method.
Dispatches onto the UI thread via runOnUiThread (required - Toast.show()
must run on the main thread, JS bridge calls don't). Truncates to 500
chars and ignores a blank message, so a malformed/oversized AI-emitted
string can't crash or spam the UI.
- index.html: Bridge.showToast() JS wrapper, plus a built-in 'TOAST' case in
the default window.onCustomAction handler that calls it.
- custom-action-types.json: now ships with a TOAST entry enabled by default
(regex matches toast("...")/toast('...'), case-insensitive, consistent
with the existing click()/writeText() patterns including requiring a
non-empty message) - so this works out of the box, not just as a
documented possibility.
- DEFAULT_SYSTEM_MSG (index.html): added a line telling the AI the
toast("message") command exists - otherwise the model would have no way
to know to use it. Does not touch any system message a user has already
saved/customized.
- docs/ai-toast-command.md: full worked example, doubling as a template for
adding further AI commands this way vs. needing an actual new bridge
method (the line custom-action-types.json alone cannot cross).
Verified the regex against representative AI-style input (double/single
quotes, case variation, surrounding whitespace) and structural balance of
every touched file.
…ethod Audited every Command subtype (Command.kt) against the existing @JavascriptInterface surface (WebViewBridge.kt) and found that, aside from the just-added showToast, NONE of the app's actual device-control capabilities were reachable from JavaScript - a custom-action-types.json handler could define new AI-facing command syntax but had nothing to actually act on it with. The example in docs/custom-action-types.md even referenced Android.tapAtCoordinates(...) as if it already existed; it didn't - that was aspirational, not real. Added 19 new bridge methods (tapByText, longTapByText, tapAtCoordinates, pressHome, pressBack, showRecentApps, pressEnterKey, writeText, scrollDown/ Up/Left/Right, the four xFromCoordinates scroll variants, openAppByName OrPackage, runTermuxCommand, waitSeconds, requestScreenshot, markCompleted). Implementation reuses ScreenOperatorAccessibilityService's existing public ScreenOperatorAccessibilityService.executeCommand(command: Command) entry point - the same function that processes AI-emitted commands - rather than reimplementing any gesture/geometry/safety logic. Each bridge method just constructs the matching Command and queues it through that one shared path, so it inherits whatever that path already does (and any future change to it) automatically. Confirmed WebViewBridge.kt and ScreenOperatorAccessibilityService.kt share the same package (com.google.ai.sample), so no import is needed for the unqualified reference. Deliberately left out (documented why in docs/device-control-bridge.md): Command.Retrieve (database retrieval, not a device action - lives in prompt construction, different layer entirely), and UseHighReasoningModel/ UseLowReasoningModel (redundant with the already-existing, more general setSelectedModel(id) bridge method). docs/custom-action-types.md's PINCH_ZOOM example fixed to use the bridge method that now genuinely exists (Bridge.tapAtCoordinates) instead of the nonexistent Android.tapAtCoordinates it referenced before. New docs/device-control-bridge.md: full method table mirroring each AI command, the design rationale, and a worked DOUBLE_TAP example combining this with custom-action-types.json end to end.
Answers: can you add a pinch command via WebView/JSON? Answer before this
commit: no - the Bridge and custom-action-types.json mechanism existed but
there was no native gesture-dispatch code for pinch, so a JS handler had
nothing to actually call. After this commit: yes, fully.
Implementation:
- Command.PinchGesture(centerX, centerY, startDistance, endDistance,
durationMs): new Command subtype. endDistance > startDistance = zoom in
(pinch out); endDistance < startDistance = zoom out (pinch in).
Coordinates and distances accept pixels or percent strings ('50%'),
consistent with other coordinate-based commands.
- CommandParser: PINCH_GESTURE CommandType + pinch1 pattern matching
pinch(x, y, startD, endD, ms) in any combination of pixel/percent args.
- ScreenOperatorAccessibilityService.executePinchGesture(): builds two
simultaneous StrokeDescriptions (one per finger), placed symmetrically on
the vertical axis centered at (cx, cy), moving from startR to endR apart.
Uses dispatchGestureWithCallbacks() - the same helper all other gesture
commands use - so cancellation, error logging, and scheduleNextCommandProcessing
work identically to tap/scroll. ensureGestureApiAvailable() guards the
Android N GestureDescription requirement.
- WebViewBridge.pinchGesture(): @JavascriptInterface bridge method, callable
from window.onCustomAction handlers via Bridge.pinchGesture(...).
- index.html: Bridge.pinchGesture() JS wrapper, plus pinch() added to
DEFAULT_SYSTEM_MSG so the AI knows the command exists.
- CommandParserPinchTest: tests for pixels, percentages, case-insensitivity,
whitespace, and missing-arg rejection.
Made pinch a built-in command rather than a custom-action-types.json entry
because it needed new native code anyway (unlike toast, which only needed a
bridge method on top of existing Toast API - pinch needed a new gesture
builder). custom-action-types.json and Bridge.pinchGesture() still let JS
handlers trigger it.
…eeded)
- New Command.CopyToClipboard, routed through the existing executeCommand
pipeline exactly like the other permission-free device-control commands.
- New AI text command copyToClipboard("text"), recognized by CommandParser.
- New WebViewBridge.copyToClipboard(text) / getClipboardText() JS bridge
methods, plus matching Bridge.* wrappers in index.html, so a
custom-action-types.json handler can trigger/read the clipboard directly.
- Documented the pattern (and the 'no extra permission needed' category in
general, as a template for future such commands) in
docs/device-control-bridge.md.
- Added a CommandParser unit test for the new pattern.
Contributor
There was a problem hiding this comment.
Review Summary
I've reviewed this PR which adds significant new functionality for WebView command type updates. The changes introduce:
- New command types (PinchGesture, CopyToClipboard, WebViewCustomAction)
- Extensive WebView bridge methods for remote configuration
- Custom action type support via JSON configuration
- Multiple override systems (model identifiers, offline models, execution policies, etc.)
The implementation is well-structured with proper error handling and defensive coding practices. The code follows existing patterns and maintains backward compatibility. No blocking issues were identified that would prevent merging.
You can now have the agent implement changes and create commits directly on your pull request's source branch. Simply comment with /q followed by your request in natural language to ask the agent to make changes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.