AI-Powered Dictation: A Comprehensive Guide and Tool Comparison
Top Apps & Tools, Tips for Efficiency and Productivity Using This Technology, and a Free Alternative Using Shortcuts
In the last few weeks, I've been trying out some popular AI dictation tools. I planned to make a short video with some thoughts, but it ended up being longer than I expected. I think there are some really cool uses for these tools that are worth talking about. Learning about this has been super useful to me, and I’m happy to share some of this with all of you. The video covers everything more into detail, but you can also read below for some quick key points, and to grab a handy Shortcut for you to explore AI dictation for yourself.
If you've been following me for a while, you might know about my Alfred workflow, Kiki, or ToolVox, an AI Shortcut I shared last year. Both have some integration with voice and AI but they are too complex for someone just getting started out with this. It’s really cool that we're starting to see tools that are much easier to use than my hacky solutions.
While I think typing on your keyboard or writing isn't going away anytime soon, getting to know these tools and technology can help you work faster and better. They can really be a productivity booster.
AI What?
AI dictation, as I want to cover here, isn't just your regular voice-to-text. It's got two main parts:
Transcription: This is where the speech-to-text AI model (like Whisper AI) takes your voice and transforms it into text. It's super accurate and works in a lot of languages.
Language Model Processing: After the AI transcribes your dictated audio, it sends the text through a Large Language Model (like ChatGPT). This is where things get really interesting. This model can fix your grammar, organize/format your text, translate, summarize, and a LOT more.
By putting these two parts together, not only can we get near-perfect & fast dictation/transcription that you can quickly insert anywhere as if you were typing, but you also have access to this AI Assistant is always a simple click away.
Popular AI Dictation Apps
In the past year, many AI applications have come and gone, but a few have stuck around and proven to be solid choices. I'll give you a quick rundown of three apps that I believe are currently the top contenders in the field. If you're interested in learning more details and seeing them in action, don't forget to check out the video I've linked at the top.
Mac Whisper
Mac Whisper reat for transcription of individual audio files. In this area I don’t think there’s anything quite like MacWhisper out there, especially when you look at all the advanced settings and models you can download. They've just added dictation, so what I’m talking here is pretty new and I’m sure it will be improved. Here's what's good and not so good about the dictation feature:
Pros:
Dictation combined with different system prompts allow for many different use cases
You can use dictation with popular cloud models (by using your API tokens), and you can use it with offline models as well
One-time payment
Very responsive developer
Cons:
The user experience around the interface can be a bit tricky
Limited ways to control the app outside the custom set Shortcuts
No history feature yet (I believe it’s in the works)
If you are using local models, the model stays loaded on your RAM which can take up considerable system resources
The app offers no way to interact with written text or use context, which means it’s very basic
Wispr Flow
Wispr Flow has been gaining popularity recently, and it's easy to see why. It's an incredibly user-friendly application, perfect for those new to AI dictation or anyone seeking a straightforward, no-fuss solution. While the application itself feels amazing and powerful, there are some concerns regarding privacy and transparency of the developers team. The company behind Wispr Flow hasn't been very clear about how much user context is gathered or how it's processed. This lack of transparency, combined with the fact that it's not a native app and uses significant system resources even when running in the background, makes it difficult to wholeheartedly recommend. Still, I think I have to mention it because it really is a very good app by itself. Users just need to know what they are getting into. If the app eventually allows the use of local models and can work without the Internet, most privacy concerns would be cleared on my side.
Pros:
Very user-friendly interface
Highly accurate, and dictated text is cleanly formatted depending on intended use (which means it has a very effective—though secret—system prompt)
Very fast (faster than MacWhisper)
Can do both transcription and AI assistant tasks
Can perform system actions, like opening web searches by voice only
Developed by a team, which could mean faster updates and better support
Can understand selected text and context
Advanced voice control capabilities accessible in a VERY intuitive way
Cons:
Only works online
Not clear about how much data is captured and how is it used
Adds itself as a login item each time you open it
Uses a lot of system resources even when not in use
Subscription-only app
Not many customization options, which may be limiting for advanced users
Privacy concerns due to lack of transparency
Super Whisper
Super Whisper stands out as my top personal recommendation, offering a lot of features and incredible flexibility. It's designed with voice dictation as its core function but extends far beyond that. This app can match most capabilities of MacWhisper and Wispr Flow, while offering even more advanced options for power users. Note that my recommendation comes as an advanced user that is also looking for AI assistant features to use along dictation. For something more basic or simple, you may be okay with one of the other options.
Pros:
Extensive customization options and settings
Compatible with both cloud and local models
Incredible and intuitive implementation of different modes with different custom settings that can be used for different tasks
Very fast, on par with Wispr Flow
Excellent for both dictation and as an AI assistant
Can understand context from screen, selected text, or user’s clipboard using the accessibility API (developer is very transparent about this, and you can opt out of these features if not comfortable)
Efficient system resources management with custom model timeout
Can be optionally used with a separate window, which expands the possibilities
Lifetime payment option available
Cons:
Lifetime option is relatively expensive (currently $250)
Abundance of settings might overwhelm some users
Limited documentation
Currently there’s a single developer handling all aspects, which could be challenging as the app grows
Lacks some advanced voice control features found in Wispr Flow
A Free Alternative: Aiko + Shortcuts
If you want a free way to do something similar, you can use Aiko (a free transcription app) with Shortcuts and LM Studio. If you have an OpenAI API Token, you can simply use the Shortcut I’m linking with Whisper’s API and one of the available cloud models.
The Shortcut does the following:
Records your audio
Your audio is transcribed either locally with Aiko or Online with Whisper AI API
Your transcribed text is then passed to either LM Studio (which should have its local server active) or OpenAI chosen model
The result is copied to your Clipboard and will popup in a small window
How to Customize the Shortcut
You will need to dive into the Shortcut to set it up, especially if you want to change the System Prompt or use it with Cloud Models. I’ve made sure to add comments to help you out.
If you want the result pasted directly instead of showing in a popup, you can use AppleScript. If that’s the case, remove the last “Show Result” action and add a “Run AppleScript” action. Then you can simply add:
tell application "System Events" to keystroke "v" using command down
Consider that in Shortcuts you can also save the dictated text to a note, send it to an email, or a lot more. Take this chance to explore!
Expanding Functionality
I am providing a more advanced version of the Shortcut (less prone to errors) that works together with a Keyboard Maestro macro. In this case you would trigger the KM macro which—in turn—triggers the Shortcut. The LLM processing occurs on on Keyboard Maestro. For transcribing individual audio files, these would still need to be shared to this version of the Shortcut. No need to use the all-in-one Shortcut if you go this route. More details are given in the video.
Both Keyboard Maestro and Shortcuts have text replacement features, which can work like a dictionary for personal words.
You can duplicate the Shortcut and use system prompts for different tasks, trigger them with different keyboard shortcuts, etc.
Triggering the Shortcut
You can use tools like Keyboard Maestro, BetterTouchTool, Alfred, or Shortery to assign a keyboard shortcut to trigger the all-in-one Shortcut anywhere on your system.
For a more advanced setup similar to the paid apps I mentioned above (press to start, press again to stop), you can use Keyboard Maestro or Better Touch Tool to recognize screen areas and simulate clicks.
Downloads!
The all-in-one AI Dictation Shortcut
(Last updated on 10/05/2024)
The Keyboard Maestro Alternative Shortcut
(Last updated on 10/05/2024)
(Last updated on 10/05/2024)
Closing Thoughts
I hope this article or video helps you understand more about AI dictation. If you're curious to explore further, you might want to check out ToolVox AI, which as I have mentioned I created last year. It still works (barely), but it all started with a simple idea that grew into something quite complex. While it became a bit hard to manage, it can still teach you a lot about how the Shortcuts app work and how to use it for different AI related tasks. It started as a text tool but there’s a Whisper AI integration in there.
If you're interested in something a bit more advanced, you can also take a look at Kiki, my Alfred workflow that works with many different popular cloud models, it also can run with local models. It was my go-to for any AI related tasks before switching to Super Whisper. Actually, I still use it for some text-only tasks. It also can use Whisper AI to do similar things as SuperWhisper does with modes, but to be honest the one-time payment option and great models available in SuperWhisper won me over, so that's what I’m using most times. This could change in the future. Kiki is still powerful though, and you can find all the documentation and download it on GitHub.
Even the Keyboard Maestro setup I'm sharing might give you some ideas for your own projects. There are so many possibilities with these automation tools, and anyone can start exploring. If you don't have time for all that, you can always simply go for one of the apps I mentioned earlier. I really hope to see some on Wispr Flow in the areas I’ve mentioned—it has so much potential. We'll see what happens.
Thanks for reading, everyone. I'm not affiliated with any of the apps I talked about, and no one is paying me for the time spent on this, so if you found this info helpful, feel free to buy me a coffee. Your support is much appreciated!
If you liked this you may also enjoy some content I have up on my YT Channel! I don’t hang around social media a lot, but when I do I’m on IG or Twitter. You can also check out some of my online classes, listen to my music, or in case you haven’t already, subscribe to my weekly newsletter. Thank you for reading!