GPT 4-Vision & Dall-E 3 in ToolVox AI Shortcut
Shortcut Update with Dalle-3, GPT-4 Vision, and More. ToolVox AI Interacts with Open AI and OpenRouter Models Providing Presets of Roles and Prompts for Quick AI Tasks
I didn't want 2023 to end without sharing the latest update on ToolVox AI, my Shortcut for interacting with OpenAI and OpenRouter LLM models.
When I first developed this Shortcut months ago, I realized that keeping up with the changes in OpenAI and its API would soon become difficult. I’ve observed many apps come and go, with some failing to update their models or keep pace with what is possible. After Chat GPT’s official app, it seems like the average user has enough needs covered, but what the API can do still goes beyond that. I’ve quietly continued working on ToolVox AI, modifying it to fit my everyday needs. I created it with flexibility in mind, so I can easily add or remove models, create or modify presets, prompts, or characters. Upon discovering that OpenAI had opened up the opportunity to use DALL·E and their vision model—for example—I incorporated them into ToolVox AI. I have also been able to incorporate the use of “functions,” which opens a whole new layer of possibilities such as creating plugins.
I’m afraid I haven’t been able to explain clearly how to set this up or explore everything that this Shortcut has to offer, which may be why ToolVox AI has had limited reach despite the significant time investment on my side. As a result, I continue to build primarily for myself. If anyone is interested in learning more about how to use it, please let me know and I will be glad to do a screen recording where I talk more in depth about it.
Here’s some the new features:
Web browsing is now a plugin activated with the •wp• flag. This allows users to enable web browsing functionality either on a preset or per-interaction basis, saving tokens.
Dall-E 3 is also present as a plugin activated with the •dp• flag. See the included presets in the default setup for an example.
These plugins and any future ones require a model that can accept OpenAI’s function schema. They won’t work with OpenRouter plugins, but this may change in the future.
Users can activate the Web Browsing or Dall-E Plugins in the middle of a conversation with the prompt commands “Wp” or “Dp” respectively.
Regardless of the current conversation or active plugins, users can use the prompt command “D” to create an image with Dall-E 3. The resulting image is saved in the camera roll automatically.
Users can trigger a Chat GPT 4 Vision request by using “@@”. For example, asking “What’s this @@” will prompt the user to take or choose a photo. OCR can also be used in this way.
When making a Chat GPT 4 Vision request, it's not necessary to be in the Chat GPT 4 Vision model. However, the image sent through the API will only be present for that one interaction if not in the Vision model.
Some flags have been removed, and a few others have been added. Please refer to the ToolVox AI original post documentation for details.
The “simple presets” aspect of ToolVox AI has been deprecated. I personally have found that the Text Workflow app can handle text transformations in a simple enough and convenient way, so there’s no more need for having this feature in ToolVox AI.
Minor fixes have been made regarding how Bear saves and opens ToolVox AI conversations. The Bear integration with the Shortcut is being reconsidered since I barely ever use it. I find Bookmarks (with prompt command “B”) a much more convenient feature.
The BetterTouchTool Preset and PopClip Extension will no longer be updated. The Alfred workflow has been rewritten from the ground up. There’s different actions to perform on a selected text, and there’s multiple options on how to present the results. I personally use this in conjunction with Karabiner—and I’d be glad to share more details if anyone is interested—but there’s also hotkeys that can be assigned to each and every available option.
As part of Alfred’s Workflow update, users can now perform a Chat GPT request by typing ::ai at the end of it, utilizing the functionality of snippets built into Alfred.
Please find the download link in the original post. Make sure to also read the previous article that mentions features that came just before this release. It will give you a better idea on how to use it. Still, if you would like to know more or find this useful I’d be grateful if you let me know.
If you liked this you may also enjoy some content I have up on my YT Channel! I don’t hang around social media a lot, but when I do I’m on IG or Twitter. You can also check out some of my online classes, listen to my music, or in case you haven’t already, subscribe to my weekly newsletter. Thank you for reading!
Hi sounds amazing but I would like to just use openrouter APi and not OpenAi. So at setup I didn’t fill OpenAI key and just filled openrouteur key. And then I put one openrouteur model in the GTModel duct. Then I downloaded your shortcut but it doesn’t work, it asks me to send to openrouteur a strange dictionary containing pricing for openrouteur models (looks like openrouteur own dict of models) and then exit with a mistake… in a word sounds like an amazing work but very difficult to understand how to use :)