![]() ![]() There was a time when researchers believed that human-level speech recognition might be “A.I.-hard”-their way of describing a problem that was so difficult it might only fall when computers possessed general intelligence. to something of an intelligent machine in itself. His little program has transformed my laptop from a device that accesses A.I. In an era of cloud-based software, when all of our programs are essentially rented from the companies that make them, I find it somewhat electrifying that, now that I’ve downloaded Whisper.cpp, no one can take it away from me-not even Gerganov. in this class that was simply gifted to the public. Similarly, Stable Diffusion, which conjures images from descriptions, is a hugely popular clone of OpenAI’s DALL-E and Google’s Imagen, but trained with publicly available data. ![]() LeelaZero, a chess engine, is a crowdsourced version of DeepMind’s AlphaZero, the world’s best computer player because DeepMind didn’t release AlphaZero’s model weights, LeelaZero had to be trained from scratch, by individual users-a strategy that was only workable because the program could learn by playing chess against itself. There have been a few other open-source A.I.s in the past few years, but most of them have been developed by reverse engineering proprietary projects. If outsiders have been allowed to use them directly, their usage has been metered and controlled. They existed behind the scenes, subtly powering search results, recommendations, chat assistants, and the like. Until recently, world-beating A.I.s like Whisper were the exclusive province of the big tech firms that developed them. This sounds like a logistical detail, but it’s actually the mark of a wider sea change. Gerganov converted Whisper to C++, a widely supported programming language, to make it easier to download and run on practically any device. In so doing, OpenAI made it possible for anyone, including an amateur like Gerganov, to modify the program. They also included the all-important “model weights”: a giant file of numbers specifying the synaptic strength of every connection in the software’s neural network. What’s so unusual about Whisper is that OpenAI open-sourced it, releasing not just the code but a detailed description of its architecture. In some of them, the software is capable of superhuman performance-that is, it can actually parse what somebody’s saying better than a human can. Whisper transcribes speech in more than ninety languages. Gerganov adapted it from a program called Whisper, released in September by OpenAI, the same organization behind ChatGPT and DALL-E. It was written in five days by Georgi Gerganov, a Bulgarian programmer who, by his own admission, knows next to nothing about speech recognition. Instead, it is ten thousand lines of stand-alone code, most of which does little more than fairly complicated arithmetic. It’s rare for modern software in that it has virtually no dependencies-in other words, it works without the help of other programs. researchers from the early days of speech recognition, they might laugh in disbelief, or cry-it would be like revealing to a nuclear physicist that the process for achieving cold fusion can be written on a napkin. Now it was running cutting-edge A.I.ĭespite being one of the more sophisticated programs ever to run on my laptop, Whisper.cpp is also one of the simplest. This was one of the few times in recent memory that my laptop had actually computed something complicated-mostly I just use it to browse the Web, watch TV, and write. As the lines piled up, I could feel my computer getting hotter. ![]() I fed it an audio file and, every few seconds, it produced one or two lines of eerily accurate transcript, writing down exactly what had been said with a precision I’d never seen before. One day in late December, I downloaded a program called Whisper.cpp onto my laptop, hoping to use it to transcribe an interview I’d done.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |