Föderation EN Do 09.01.2025 16:34:26 The enshittification of AI has lead to the choice of AI used by VLC to be groaned at. I even saw a post cross my feed of someone looking for a replacement for VLC. VLC is working on on-device realtime captioning. This has nothing to do with generating images or video using AI. This has nothing to do with LLMs. (edit: There's claims VLC is using a local LLM. It will use whisper.cpp, and not be using OpenAI's models. I don't know which models they will be using. I cannot find any reference to VLC using a LLM.) While it would be preferred to use human generated captions for better accuracy, this is not always possible. This means a lot of video media is inaccessible to those with hearing impairment. What VLC is doing is something that will contribute to accessibility in a big way. AI transcription is still not perfect. It has its problems. But this is one of those things that we should be hoping to advance. I'm not looking to replace humans in creating captions. I think we're very far from ever being able to do this correctly without humans. But as I said, there's a ton of video content that simply do not have captions available, human generated or not. So long as they're not trying to manipulate the transcription using GenAI means, this is the wrong one to demonize. #AI #Transcription #VLC #HearingImpaired #Deaf #Accessibility |
Föderation EN Do 09.01.2025 16:41:44 @bedast I would also add I find it quite helpful to start with a set of automatically generated captions, and then correct them. I don't do this often, but it saves me loads of time in a part-time job. Is this a bit like people being annoyed at Mozilla using AI for on-device browser translation, even though that's very useful? I'm not sure if that's generative, but I'd guess not. |
Föderation EN Do 09.01.2025 16:46:38 @howisyourdog I'm not a Firefox user so I haven't really dug into the latest in being upset with Firefox making an AI plugin, but it seemed like they were making an LLM to summarize pages. These have been known to get things very wrong. I don’t know if it's on-device or if it uses ChatGPT. |
Föderation EN Do 09.01.2025 16:49:01 @bedast oooh, I hadn't heard about it summarising pages, that's useful to know. |
Föderation EN Do 09.01.2025 16:53:22 @howisyourdog It’s a plugin/addon so opt-in for now. So there’s at least that. But Mozilla has a history of eventually forcing stuff on users. |
Föderation EN Do 09.01.2025 17:16:34 @bedast I was thinking of this I believe, and I'm not sure if it is ML/AI |
Föderation EN Fr 10.01.2025 13:20:04 Firefox also has a local AI translation thingy that is different from the plug-in being talked about here. |
Föderation EN Fr 10.01.2025 14:41:21 @frog_reborn @howisyourdog @bedast |
Föderation FR Do 09.01.2025 23:00:47 @bedast @howisyourdog all browsers end up forcing stuff on users anyway. Chrome is the leader in forcing stupid things though, especially regarding privacy infringement. |
Föderation EN Fr 10.01.2025 12:39:27 @bedast @howisyourdog It's off device. They say it is privacy preserving, but that's fundamentally questionable when you are sending stuff off device and can always change at any point. It's basically a "trust us, we won't be evil" statement and a lot of people are not willing to trust in that any longer given how they've behaved of late. |
Föderation EN Do 09.01.2025 21:05:51 @howisyourdog @bedast I groan every time I see unsuprevised automated captions or machine translation. They're simply not ready for prime time. I know some Deaf people find them useful, so I understand the push to integrate them. But this should not be bundled with VLC; it should be an optional plugin, if it isn't already. |
Föderation EN Do 09.01.2025 21:33:46 @grvsmth @bedast it's certainly a tricky one. I would go further and say people with hearing loss, particularly those who can't lip read (me), find them more than just useful. Their accuracy is definitely a problem to be solved, so having it as a plugin is a good compromise as long as people know that. On the other hand it's VLC, so you're getting a pretty amazing piece of software for free, and this is coming from a good place, not trying to inflate stock price with a fad. Certainly not something to rely on if you're producing videos professionally, but I can also see e.g. a solo YouTuber won't have time to transcribe all their videos. |
Föderation EN Fr 10.01.2025 14:07:24 @howisyourdog I dunno about the on-device translation, but Mozilla has also been messing with LLMs, staring with the AI sidebar (which could've been just a regular web panel) and the Orbit summariser extension, which is why people have gotten angry (alongside the "privacy preserving" tracking ad-tech). |
Föderation EN Fr 10.01.2025 14:09:11 |
Föderation EN Fr 10.01.2025 14:17:54 @howisyourdog ATM it's not, but you might also want to disable the "privacy preserving advertising" stuff if you don't want Mozilla to track you. Unless you disabled Mozilla telemetry outright, in which case the adtech gets disabled too. |
Föderation EN Fr 10.01.2025 14:18:45 |
Föderation EN Do 09.01.2025 17:34:18 @bedast It’s tricky because you’re certainly right about the amount of video with no captions, and the unfair inaccessibility of that. But translation AI is exactly the same tech as “generative” or “LLM”, it is statistical modeling. It is not different in any way, including errors and fuel and water demands. It’s like vehicle engines and tires: they do a tremendous amount of good every day, including for accessibility, but they also have terrible side effects that warrant complaints. |
Föderation EN Fr 10.01.2025 02:25:18 |
Föderation EN Fr 10.01.2025 02:35:16 @sbszine @Moss Honestly, in my opinion, any AI inference that is not able to use on-device or edge compute is not ready for mass usage by the public. There’s multiple AI and AI-adjacent tools that I use that have no reliance on cloud compute for inference or decision making. For example, my insulin pump’s operation to keep my blood glucose near target. This runs on a device the size of a pager. |
Föderation EN Fr 10.01.2025 12:53:51 |
Föderation EN Fr 10.01.2025 16:06:46 |
Föderation EN Fr 10.01.2025 19:08:51 Also agree on its tremendous value for a11y. There is however also the surveillance capitalism aspect. Imagine every device with a microphhone or a camera able to phone home tiny compressed and encrypted trickled of plaintext data, containing our conversations and description of social settings. Nightmarish dystopic esp. given all the dystopic stuff we already have. So in the balance there may be fundamental freedoms A11y is directly addressible need vs. long-termis externality. |
Föderation EN Sa 11.01.2025 02:17:02 @smallcircles @Moss @bedast This has nothing to do with the above post. Its also untrue. This is just useless fearmongering. |
Föderation EN Do 09.01.2025 21:48:20 I will stick to Open Subtitles as it is more reliable, & will provide better accuracy for slang and other contextual factors there is no #Enshitification of #AI, when AI is shit to begin with |
Föderation EN Do 09.01.2025 22:01:57 @brentpruitt This is a gross hot take built on gross ignorance. And if you think this makes me an AI apologist, you haven’t seen any of my prior posts about AI. |
Föderation EN Do 09.01.2025 22:09:51 no, i just find the phrase ‘enshitification of AI’ to be paradoxical / funny |
Föderation EN Do 09.01.2025 22:13:02 @brentpruitt Elaborate. |
Föderation EN Fr 10.01.2025 09:32:12 @bedast the worry I do have regarding this feature is it’s will provide an excuse to some (and that will grow over time) to stop investimg into producing quality captioning. Why spending money/ressources when there is an IA who will generate some [crappy, or just basic one, if not errornous] captions, automatically. I beleive on the long run, thats will be an innevitable drop on the quality, in exchange of availability. Damn if you do, damn if you don’t, as they say. |
Föderation EN Fr 10.01.2025 12:43:39 Maybe this needs to be called "voice recognition" instead of AI? Using a term that nowadays means something awful is going to make misunderstandings more likely? (When I read the news about VLC using AI I wrongly assumed it meant generative AI, as that has totally dominated discourse.) |
Föderation EN Fr 10.01.2025 13:00:14 @FediThing @bedast One of GenAI's well poisoning aspects has been tarnishing the term "AI". It has lost its meaning now. |
Föderation EN Fr 10.01.2025 16:37:50 @SamiMaatta @FediThing @bedast In the case of automatic transcription, it’s using machine-learning models, which are similar enough to LLMs that it muddies the water, as far as terminology goes. |
Föderation EN Sa 11.01.2025 00:11:15 Whatever it's called, perhaps it needs to get across the ethics of its technology if it wants to avoid misunderstandings? If it's using massive amounts of energy and/or stolen data for training, then it's probably unethical. If it's using reasonable amounts of energy and hasn't stolen any data, then it might be ethical. (I think? Just a layperson here, might be a lot of stuff I'm missing...) |
Föderation EN Sa 11.01.2025 02:46:24 @SamiMaatta @FediThing @bedast And so every developer or group with a sense of marketing should have started avoiding the word for like a year now. |
Föderation EN Fr 10.01.2025 12:59:37 @bedast I think you completely missed the point of that post asking for different player recommendations. She is well aware that they are implementing STT and not GenAI. See: https://tech.lgbt/@nina_kali_nina/113798526319597617 |
Föderation EN Fr 10.01.2025 13:05:50 @bedast Then don't call it AI. Call it speech to text. But if it uses a language model to more effectively predict words based on context rather than doing an analyzable mechanical local transformation, it is at least partly the "bad kind of AI" - it has the capacity to introduce biases from training data making output that "sounds right" but means the wrong thing, which is much worse than substituting nonsensical homophones now and then (which the reader will immediately recognize as mistakes). Same principle as why autocorrected text is worse than text with typos. |
Föderation EN Fr 10.01.2025 13:09:08 @bedast Hear hear, this is why I'm against people labeling anything ML as AI. |
Föderation EN Fr 10.01.2025 17:16:14 @koen_hufkens @bedast One of my pet peeves is that most of the time anyone talks about AI (positive or negative) they mean genai. Media is adding anything algorithm-based into the mix as AI, so nobody (me) knows whats talked about when talking AI anymore. Is machine learning the correct umbrella term for "nongenai algorithm-based systems" like automatic captioning? Can I adopt that, or are there more variants of "AI" which would be falsely labeled? |
Föderation EN Fr 10.01.2025 17:23:07 @ManniCalavera @bedast AI mimics cognitive functioning. So whenever you have a chat interface that would be AI. Most GenAI is prompt driven, so AI. Inpainting apps might technically not be AI but ML, although using generative models. https://cloud.google.com/learn/artificial-intelligence-vs-machine-learning?hl=en |
Föderation EN Fr 10.01.2025 17:50:36 @koen_hufkens @bedast Thank you very much, this helps a lot! |
Föderation EN Fr 10.01.2025 13:28:32 |
Föderation EN Fr 10.01.2025 13:34:45 @bedast We do not want AI in any product whatsoever. |
Föderation EN Fr 10.01.2025 14:42:49 @x_cli There is AI involved in my survival. It’s not genAI. It’s not transcription modeling. It’s not sexy. But it allows me to live. It’s a light weight system. Sure. It’s my insulin pump when connected to a CGM. Stop demonizing actually useful AI. |
Föderation EN Sa 11.01.2025 02:06:19 @bedast @x_cli I think there is a choice of word "AI" triggered so many people. I remember the days, then programs like in your insulin pump named "neural network", "smart control system", etc. And AI was something cool, futuristic and unattainable because we (still) don't know what is the human consciousness and how the human brain works (like we know how the computers works — from machine codes to RTL and tricky transistors placement on the silicon die). (1/3) |
Föderation · Fr 10.01.2025 13:46:47 @bedast@beige.party As much as I hate AI in general (especially generative AI and LLMs), I think I agree this is a fair usage for it, together with OCR and automatic translation. :celredcrystalheart: |
Föderation · Fr 10.01.2025 13:47:24 @bedast@beige.party I don't think generative AI is the only problem. I don't even think generative AI is the problem by itself. |
Föderation EN Fr 10.01.2025 14:09:03 @bedast I think the biggest issue currently is: AI is way to overused in marketing. In fedi, many bubbles mostly don't like AI at all, because for them, AI = GenAI. Of course, you could look at the automated subtitles on youtube, where swears are censored (which is stupid, lol), or are just plain bad in languages other than English. (e.g. in German it's quite useless.) So what is the correct thing to do? I'd say: Look at how the subtitles perform. How much performance do they cost? Are they enabled by default (aka. opt out via an options / context menu)? How well do they work in other languages? Do they censor anything? How's the delay? I mean, I don't know anyone who actively says voice assistants / voice transcription functions in programs are bad *because* they use AI for Speech to Text. And, if I may say... The text transcription on my Pixel phone is working fully locally, in German, without any issues. It's possible, It's not bad. AND: It's not actively advertised as "we have the best AI to do transcription". Yes, I know... Transcription is not the same as subtitles, but it's still more close to it than having nothing. Even though I don't need subtitles and can't hear the word "AI" one more time, I'm interested to see, what VLC does there. Though, I'd love to see a released version 4.0, which would fix some issues I have with VLC, but... eh, we can't have everything I suppose. |
Föderation EN Fr 10.01.2025 16:42:42 @SteffoSpieler @bedast In fact, as someone who is hard of hearing, inaccurate subtitles and transcription is usually worse than trying to figure out what little I'm hearing from context cues. Accessibility requires work, disabled people require more than just half assed machine learning. If creators cannot put human created captions on media they should question whether creating media is the right job for them. "Something is better than nothing" is how we end up with unsafe wheelchairs kitbashed out of bicycle parts, printed flat dots where there should be braille, and inaccessible captchas. So if you want to pat VLC on the back for this go ahead, but I'll still refuse to use media without human created captions because everything else is unusable garbage. |
Föderation EN Fr 10.01.2025 14:18:39 @bedast I don't get the fuss tbh? VLC is just adding what this app on Linux has been doing for years. https://flathub.org/apps/net.sapples.LiveCaptions |
Föderation EN Fr 10.01.2025 14:24:00 |
Föderation EN Fr 10.01.2025 14:36:11 @bedast so long the model is outsourced to OpenAI and the like. You can always be certain everything you ever watch on VLC will be beamed to a third party "for improvement". Auto-generated subtitles might be better than no subtitles, but not at the cost of constantly feeding third parties with your data. And of course, if we are talking of OpenAI's models, they are known to outright invent nonsense phrases when they tried audio transcription a few months ago. Id not trust an hallucinegic liar. |
Föderation EN Sa 11.01.2025 00:07:23 @zanagb @bedast VLC will do it in-device, not sending anything anywhere. Whisper models are terrible at transcribing casual conversations of doctors and patients because the training data doesn't reflect that kind of speech and environments. But it excels at transcribing movies etc. because a lot of its training data are closed captions. So this would actually work reasonably well. One can put some text with the names of characters, places, etc. as context and that makes it transcribe those names very well. (source: I've been using whisper models at work, and occasionally I've been putting the mic towards the speaker with some show I'm watching to test) (also: I haven't sent any data to openai nor paid them anything) |
Föderation EN Sa 11.01.2025 00:56:38 @starsider @bedast the CES demo makes it clear the transcription is **off-device**, ie, syphoning data. And besides, there are already many built in tools for that on macOS and linux. If i wanted fucked-up nonsense on my videos i would watch a raunchy youtube poop from the early 2010s Id rather have a phoneme-based system where at least you can tell what the gibberish came from and you can tell its an error, and even reconstruct the sentence back. We do not need this. |
Föderation EN Sa 11.01.2025 01:26:43 @zanagb @bedast What makes it clear that it's off-device? Can you provide a link? What tools are you talking about? I use Linux, what should I search? I would like to compare it with the tool I'm doing as part of my day job (for which I compile the *whole* source code incl. all dependencies so I know for a fact that nothing is ever syphoned). About fucked-up nonsense, what I see in youtube all the time: Youtube's automatic subtitles are beyond terrible. With automatic translations to my native language they're even worse. Family members use it and I can't fathom how can they get anything out of it. No pauses, no punctuation, full of mistakes. Using whisper is a 1000x improvement over youtube's. It adds all the correct punctuation and everything. It only fails with proper names (unless it's given a context) and with speech with a lot of background noise. In all the 4 languages I've been testing it. For regular casual speech it doesn't work _that_ well but my work's project has that in account by marking all the dubious words. It also discards whole sentences with too many dubious words because they tend to be gibberish from random noise. Which makes me shudder when I read about the model being used as-is for conversations without regard from confidence levels, without using the context feature, and using naive stitching (since it can only transcribe 30 seconds at a time). Results are awful as I would have expected. |
Föderation EN Sa 11.01.2025 01:03:08 @starsider @bedast and... If you think whisper is anywhere being remotely adequate for the job, clearly you do not rely on subtitles to hear, nor consume information and media through foreign sources. The pitfalls are very apparent and very damaging for the actual purpose of "understanding what is actually happening". Random hindi people with tutorials about the weird obscure software you are trying to debug are always an incredibly easy test these... Abominations. always fail |
Föderation EN Fr 10.01.2025 14:38:41 @bedast I'm not surprised. Everybody also loves the fediverse not having any algorithms at all, after all. |
Föderation · Fr 10.01.2025 14:47:57 @bedast |
Föderation EN Fr 10.01.2025 15:42:02 @bedast VLC uses AI? |
Föderation EN Fr 10.01.2025 16:10:36 @bedast Uhm, if it generates text from video or sound then it's genAI? Maybe the problem is not whether is genAI or not, but what it is used for? |
Föderation EN Fr 10.01.2025 16:29:50 @bedast you seem to be pretty aware of the technical details of this particular AI. Do you have any reading/links on this that allow people to know the technical details? The best article I can find, had no technical details: https://www.theverge.com/2025/1/9/24339817/vlc-player-automatic-ai-subtitling-translation The video here does mention open source model (which are still LLMs) |
Föderation EN Fr 10.01.2025 16:54:58 @bedast They should've called it Accessibility Interface. |
Föderation · Fr 10.01.2025 17:01:34 @bedast@beige.party just curious personally, are they using whisper or something else? |
Föderation EN Fr 10.01.2025 17:17:53 @bedast I treat both people who mindlessly promote AI (compare with so-called crypto bros) and Luddites stating AI=evil as fanatics. Like, neither point has a lot of thought in it. |
Föderation EN Fr 10.01.2025 18:52:27 |
Föderation EN Fr 10.01.2025 17:20:44 @bedast The problem is that it will be used to replace humans. |
Föderation EN Fr 10.01.2025 17:30:53 @bedast are VLC considering using whisper.cpp or another GGML based project? I think it would be neat if yes |
Föderation EN Fr 10.01.2025 17:46:28 @bedast so they are doing MLtranscription? |
Föderation EN Fr 10.01.2025 18:01:34 @bedast I am almost deaf and rely on captions. I'd rather have no captions than auto-generated. Auto-generated captions, however they are made, are awful. It is an insult to have to deal with them. It will also encourge folks making media to skip putting any effort into captioning because auto-generated is "good enough". But, they are only good enough for folks who can also hear what is being said. I will bring up this point and the response is always, "Have you tried them lately." I try them everyday, whether I want to or not. Hearling-abled folks love to tell us what we should be grateful for, though. |
Föderation EN Fr 10.01.2025 18:07:51 @bedast i agree that it could be useful but take a look at how whisper works. |
Föderation EN Fr 10.01.2025 18:08:23 @bedast good feature, bad marketing name? |
Föderation EN Fr 10.01.2025 18:51:45 @bedast I would like to ignore all the AI debates and specifically address the claim that "This is not generative AI". |
Föderation EN Fr 10.01.2025 18:51:54 |
Föderation EN Fr 10.01.2025 18:54:25 @bedast I am happy to see this kind of take as I feel the same way. I can understand the weariness of people when they hear the term AI because the term has been poisoned by bad actors. But this, I think, is a good benefit of "AI". It's allowing accessibility for those who cannot hear to be able to enjoy things like everyone else. Yes, we could have people responsible for adding closed captions to most things, but this will help with the home made videos, etc.. |
Föderation · Fr 10.01.2025 21:53:13 @bedast@beige.party I'm curious then, why use the AI moniker at all? Computerized speech recognition has been around for numerous years, and every product getting an "AI" label slapped on it now is turning people off, rightly or wrongly. Generative or not, as soon as you say AI I'm thinking big waste of power, probably more marketing than substance, etc. |
Föderation · Fr 10.01.2025 23:38:54 @bedast@beige.party Maybe not call it AI Captions but just Automatic Captions like in the "olden" days? |