Podcast Transcript, Mosen at large episode 215, More thoughts on the Victor Reader Stream, what are your favourite PC speakers, and the astounding new voice AI technology fromEleven Labs
Transcripts of Mosen at Large are made possible by Pneuma Solutions, a global leader in accessible cloud technologies. On the web at http://PneumaSolutions.com.
You can read the full transcript below, download the transcript in Microsoft Word format, or download the transcript as an accessible PDF file.
[music]
Jonathan Mosen: I’m Jonathan Mosen. This is Mosen At Large, the show that’s got the blind community talking. This week more on the Victor Reader Stream. Do you have a set of speakers for your PC you’d recommend? The astounding new voice AI cloning technology from ElevenLabs. We’ll hear some samples and speak with one of its founders.
[music: Mosen At Large Podcast]
Welcome Pennsylvania
Jonathan: Good to have you with us for episode 215. As we like to do, we salute on these occasions, the area code in the United States or Canada that matches the episode. An area code 215 belongs to Pennsylvania, particularly around the Philadelphia area. Yes, Philadelphia freedom. I don’t know what else to say really about Philadelphia, so I’ll get some help here. Soup drinker. What are notable things about Philadelphia?
Soup Drinker: From webjack.co.nz, Philadelphia is notable for its rich history the Liberty Bell, Independence Hall where the Declaration of Independence and Constitution were signed, and other American revolutionary sites.
Jonathan: Well I’m glad that with the help of the drinker, we are able to give Philadelphia the credit it is due. Now because of that mention of the Liberty Bell, I’ve got the Monty Python theme, which is the Liberty Bell March by Sousa Stuck in my Head, what an earworm.
Sending love to those affected by natural disasters
Now on a much more serious note, I do want to acknowledge some natural disasters. Mother nature is formidable, and there are times when we realise just how at its mercy we are. Unfortunately, because we have been such poor custodians of the planet as a species and the planet is heating, we are going to see more of these significant natural disasters.
Here in New Zealand, it has been a very difficult week. In fact, it’s been a very difficult month. A couple of weeks ago, there was significant flooding in parts of New Zealand, including Auckland, New Zealand’s largest city, and the recovery was continuing when Cyclone Gabrielle came along. This is undoubtedly New Zealand’s worst storm this century, and it’s probably the worst storm in living memory. Lives have been lost, water supply has been affected at the time that I record this. Electricity is still down for many people. It has been a terrible week.
I’d like to extend my aroha to everybody listening in New Zealand who has been affected by Cyclone Gabrielle either directly, or because you have friends and whanau in harm’s way. Power, telecommunications have been disrupted significantly, and so there has been for many of us an interminable wait as we try and check on people who have been completely cut off, not just in terms of communication, but the roads in and out of some parts of New Zealand have been completely decimated. It is going to be a massive recovery exercise. It is going to take a long time. We will get through it, but it is a very painful process right now and it’s still very real.
It’s still ongoing at the time that I put this podcast together. I also want to send lots of love and best wishes to those affected by the significant earthquakes that occurred affecting Syria and Turkey a couple of weeks ago. Last I checked the death toll there, 30,000 lives had been lost. The devastation is just incalculable. We know a bit about disastrous earthquakes in New Zealand. When things like this happen, you always get remarkable stories of survival, a triumph of the human spirit and of the rescuers who put their lives on the line, who willingly put themselves in harm’s way to try and help other people. What selfless, wonderful human beings they are who do that work. Wishing those affected there all the best as well.
[music: Mosen At Large Podcast]
The Victor Reader Stream
Jonathan: More now on the Victor Reader Stream. First, let’s turn to the question of audible support because Joe Danowski raised this last week in response to Joe Norton’s, successful use of the Victor Reader Stream. The latter Joe- oh boy, gets confusing- was having some issues. Brian Hartgen had similar issues. He’s looked into this, and he sends some useful information over on Mastodon where you can follow us at mosenatlarge@mstdn.social. Lots of Mosen At Large-related activity happening on Mastodon, so follow at mosenatlarge@mstdn.social.
Brian writes, “I’ve now got to the bottom of this. We’ve got the step-by-step here. One, you must use AudibleSync 1.8.13. Two, obviously sign into the store relative to your country. Three, you absolutely must now transfer the book via the AudibleSync app. This will take a minute or two, but the screen reader will confirm when the transfer is complete. Unlike older books, if you browse the SD card onto which the book is copied, you will find that the file name no longer reflects the book title.
It consists of a series of letters and numbers. However, if you attempt to play the book, it does work using that method. In summary, if you download a book from Audible, for example, via a third-party application like OpenAudible, the stream will no longer play it. It must be that method.” So says Brian on Mastodon, and I hope that helps anybody who was having issues.
David Lepofsky: Hello Mosen At Large crowd. My name is David Lepofsky, I’m in Toronto, Canada. I wanted to chime in with a couple of thoughts. First on the Victor Reader Stream. I have been a fan of the Victor Reader Stream since it came out. I have the first gen and the second gen, and I later got the Victor Reader Trek. I’ve always liked them, but I’m pondering a couple of concerns about the new model. The first thing is I agree with everybody that it is so much better to read an audiobook if you’ve got access to those buttons, but the trade-off I am finding is that the digital-analogue converter or whatever it is in the stream is much poorer quality than the iPhone.
If I want to listen to a book and I particularly want to crank up the speed, I want the highest quality audio I can get, and I find that I get much better audio out of the iPhone, so I’ve tended to trade off having the convenience of those buttons for better-quality sound. I’m eager to know whether the new stream, third generation, will have appreciably better audio. Jonathan, you asked a really good question of the Humanware folks, whether there was going to be a better speaker. What we don’t know is whether the audio chip or whatever on board is going to produce better quality audio even if you’re using earphones.
Second, there’s been talk and comment, and concern about the battery not being removable. I share that concern, but I want to add another reason. I’ve found both with the Victor Stream and the Victor Trek, and even with the Victor Stratus, that there are sometimes when the software just gets scrambled, and the only way really to fix it, even if a hard reset doesn’t work, is to take out the battery and then put the battery back in. Then everything is back to good standing. I’m frankly concerned that in the new third generation, because we can’t remove the battery, we’ve lost that.
The third thing that’s come up on discussions on this podcast that I want to chime in on is the question of having a radio. Someone mentioned that there is a radio on the Victor Reader Trek. I have the Victor Reader Trek, it does not have a radio. I’m not sure if I’m missing something or if it’s simply not enabled. I do remember that when the second-generation Victor Stream was announced, they said there was a chip in it to provide for a radio that would be turned on or activated soon, but it never was and it’s been years since. I don’t know what the plans are with the number three.
I would be concerned if the third generation, like the second generation, has a chip, but it’s not actually turned on. I’ve been very interested as an alternative in the new HIMS product, the Sense Reader, but it is so much more expensive than the Victor Reader Stream third generation that I’m concerned that it may well price itself out of range for the uses that would be most attractive for it. Just one more thing, it’s off-topic, but it’s opening up a new topic. I am eager to get any feedback on this podcast from anyone who has found software that is good for blind people to edit video, particularly in the Windows world.
I’m given to understand that it may be possible on Reaper, but I have just not gone through the learning curve of learning Reaper yet. I heard on a Clubhouse meeting, when I got lured into the world of Clubhouse by an earlier Mosen podcast that was very convincing about the merits of Clubhouse, I heard from a visually impaired journalist in England that they use something called Descript, D-E-S-C-R-I-P-T. I fiddled with it, but I can’t figure out how to get it to work with JAWS. It’s just not screen reader-friendly, and I didn’t know if anybody else had found a way around it.
It’s a cool program because when you load a video on it, it generates a transcript, like a Word document, and you can edit words out of the Word document, and it will render the video, deleting the video that accompanies those words. That would be fantastic for blind people, but the user interface, in my experience, is just completely horrendous to try to navigate. Any thoughts welcome though. I absolutely love the podcast. I look forward to hearing more about these and whatever else comes up in upcoming show.
Jonathan: Thanks, David. Descript is indeed very interesting technology because not only can you delete contents as if you were in a document, but you can also insert things, and it tries to use AI to put words in your mouth. It is fiddly and I don’t bother with it because of its fiddly nature. Reaper is your best bet for editing video in Windows. Of course, it’s great for editing audio, and people often don’t realise that you can load a video and edit that as well. Like any application, there are syntaxes to get your head around, and commands to learn, but once you’ve done that, all of the commands that apply to editing audio and processing audio, and making it sound good, can also be used for video.
Not only can you produce edited video, you can make the video sound fantastic. That’s much appreciated because there’s a lot of poor-quality video out there. The one thing that makes me nervous about editing video, and some people tell me I’m way overthinking this, is that we can’t tell what’s going on on the screen with facial expressions. Sometimes you can make a ghastly edit that sounds great in audio, but apparently it looks jarring from a visual point of view because somebody’s facial expression is suddenly cut from one thing to another, and that’s not what happens in real life.
Some people say this matters, and other people I talked to say, “Don’t worry about it. Most people’s editing these days is way bad anyway when it comes to video, and nobody’s going to mind.” Editing video in Reaper is super accessible and it’s a really fun process. I actually do this a little bit. I produce video messages for my team who are spread out across 22 offices across the country. I post a video thing and I edit it in Reaper all the time. I like that because I can process the audio, I’m happy with how it sounds, I can use the microphone in the studio.
Actually, because I’ve been doing a bit more video, thanks to how easy it is to produce in Reaper, I’ve got a different microphone stand. In my home office, I have a camera, and it’s positioned such that when I’m sitting in my normal seated place, I’m looking at the camera and I don’t have to worry about it too much. There is that wonderful little utility called CanYouSeeMe which I do run sometimes before a really important conference call, and it tells me if I’m centred on the camera. I almost always am because of the way this camera is affixed permanently to the wall, so that works for me.
The one downside was that the previous microphone stand I had was vertical, and so it would block my face from the camera. I would use a lav mic, and the lav mic doesn’t sound nearly as good as this Heil PR 40 that I have. Now, I can use the Heil PR 40 because I’ve got a different low-profile stand that doesn’t obstruct my view of the camera. It’s good, I can produce my own video with good audio. Reaper is a fantastic tool, well worth the Learn. We’re going to hear from Nicki from Pennsylvania now, and I know this because she says this is Nicki from Pennsylvania.
You can’t get better proof than that. She says, “First, I just want to say that I have now pre-ordered my Victor Reader Stream 3 and can’t wait to get it. Second, to the person who thought the Trek has an AM/FM radio, it does not. I know there are other devices that do, but the Trek definitely does not. Lastly, I really liked the tech briefing using the AI Adam, I hope you will include it in future episodes of Mosen At Large.” Thanks, Nicki. Yes, I’ve had some positive feedback on Mastodon about that tech briefing as well.
Hello to Richard who says, “Jonathan, through the last three of your podcasts and from several other sources, I have been hearing plenty of people concerned about the lack of a user replaceable battery in the Stream 3. Briefly, I was irritated about that change also, but when I remembered that my next generation 2 Stream is around 10 years old and has never required a new battery even though I have recharged it several times a week over most of those years, I decided that a non-user replaceable battery is not all that important. These newer lithium batteries seem to be much less vulnerable to recharge failure than the older nickel cadmium and nickel metal hydride batteries.
As long as they don’t go up in flames, like some of the phones and several of the electric car batteries, I suspect by the time they start to fail, I will probably be hearing about version four of the Stream unless I have kicked the bucket by then.” What a cheery email. [laughs] “Here in Idaho, the entire state has two area codes that both cover the whole state. I think area code 208 was issued to Idaho back around 1947 but several years ago, they were starting to run out of phone numbers, so they added area code 986 to the whole state also.”
Maybe we’ll get to Episode 986 at some point, unless I kick the bucket. [chuckles] “This means we all need to dial 10-digit phone numbers including the area code to call a neighbour or anyone else in the state. When I got my BlindShell 2 phone, I ended up getting one of the 986 area codes on that phone while my landline is area code 208. I tell my friends and family that my condo is so large that I needed a different area code to get from my cell phone to my landline. Don’t forget Idaho,” he says, “when you get up to episode 986. Thanks for the great show,” says Richard.
Thank you, Richard. I won’t forget, I will Braille it on my hand right now. We can make Transcripts of Mosen At Large available thanks to the generous sponsorship of Pneuma Solutions. Pneuma Solutions, among other things, are the RIM people. If you haven’t used remote incident manager yet, you really want to give it a try. It is a fully accessible screen reader-agnostic way to either get or provide remote assistance. It’s also a no-hassles way of getting into a PC that you might need to access remotely. I manage the internet radio station, Mushroom FM, and it has a dedicated computer we affectionately call the Mushroom Pot. Sometimes I need to log into that machine to make changes.
RIM means I can make those changes simply from anywhere in the world. On the desktop of my ThinkPad, there’s a little shortcut that simply called Mushroom Pot. When I press enter on that shortcut, I’m accessing the Mushroom Pot and it’s as if I was sitting in front of the computer. It literally is that simple. If I have to do a Windows update then after a reboot, RIM can automatically reconnect for me. This is a super tool, and it’s made administering remotely so much simpler than it ever was before. Check out RIM by going to getrim.app. That’s G-E-T-R-I-M.app. The installer’s there, as is the documentation so you can learn all about rim from Pneuma solutions.
[music: Mosen At Large Podcast]
Learning JAWS
Jonathan: This email is all about coming to grips with JAWS and Windows, and it comes from Joe Danowski. He says, “Hi, Jonathan. Recently, I wrote in concerning the unwieldy nature of Windows, Outlook, Word, and the steep learning curve of JAWS. Regardless of these difficulties, I definitely appreciate the amazing things this technology enables me to do, and I know as individuals with vision disabilities, it is essential for us to master this software in order to be a success in today’s workplace. Recently, I got a new laptop PC, and even though I had been a JAWS user for many years, I have undertaken a serious effort recently to step up my JAWS, Windows, and Office skills.
This caused me to think about what advice would I give to someone just starting to set out to learn JAWS, and I put some of those thoughts down here. First, if a person is going to use a laptop, I think it is important to have a laptop with a keyboard that has good tactile feedback and good key placement and spacing. Recently, I purchased the Lenovo X1 Carbon. The keyboard is excellent, the keys are spaced well, the key travel is satisfying with nice click-through. The function keys are in sets of four with a defining space between each set.
The Eraser head track points in the middle of the keyboard makes it easy to place your hands on the home keys. It is also important to me that it has an insert key so that I have a JAWS key on the left side and on the right side of the keyboard, which makes complicated keystroke combinations a lot easier. The next thing I would say is keep it simple. Set out to master Windows, Word, and Outlook. These are the programs most relevant to the workplace. In the beginning, I would say don’t be distracted by Edge which is very cumbersome, save it for later.
I think iPhone is a better tool for searching the web anyway. Another thing, don’t waste a lot of time trying to figure out all the Windows settings such as integrating an iCloud calendar, setting a mailbox priority, et cetera. Have someone who knows Windows help you set it up, or better yet, call the Microsoft Accessibility team. They are excellent and they can get right to the issues using Quick Assist where they take control of your computer. They can configure these complicated things quite easily. Just tell them the way you want your computer set up and they’ll do it for you.
They can be reached at 1800-936-5900. That’s 1800-936-5900. Operators are standing by, from the United States,” says Joe. “Also, don’t try to learn every keystroke. There is more than one way to accomplish a task in Windows or Office, many of which can be performed by invoking the applications key, the right mouse button which will give you an action menu with multiple choices, or use the Office search function, Alt Q. The fewer keystrokes you need to memorise, the better. When it comes to JAWS and Windows keystrokes, I have made a list of the 35 or 40 keystrokes I am most likely to use.
I keep the list on my iPhone, and I try to review it a couple of times a day. I’ve been doing that for a few weeks. That really helps this detailed information to stick. Lastly, I would say to the new user, just keep working at it and you will get there.” Thanks, Joe. That’s a very useful post. I hope it stimulates some conversation about people who are getting started with JAWS and Windows, what advice would you give? I have a couple of things that you’ve inspired me to comment on here. The first is slightly contradictory from yours, and I think this depends on whether the user finds an iPhone intuitive.
I know some very skilled Windows users who just don’t get on with the iPhone at all. They’re absolute masters of the Windows platform, they do not like the touchscreen environment. They do not like iOS. That will play a part in the advice. I have found when I’ve done training that using a web browser first can build your confidence because once you understand how to search the web and get information, I think consuming information first then gives you the confidence to go on and create information. If you can help somebody to learn how to perform a basic Google search, they can help to do their own problem-solving.
The second comment I’d make is that Vispero’s training department has done a first-class job of providing training information. I would show somebody new to Windows and JAWS how to use that training information, because as they gain confidence and curiosity, there’s just a wealth of information there at their fingertips that they can look at any time. There’s also the built-in JAWS Command Search, which can be very handy. There is the Sharky feature, which I personally never use, but I see that some users who are new may want to do that.
I’d also seriously consider using Leasey, which we’ve talked about here on Mosen At Large several times. Leasey is from Hartgen Consultancy. It can really maximise efficiency, but also provide a very simple user interface for those who require that, it can be a stepping stone. Fully agree with you, despite the recent hassles I’ve had with the ThinkPad X1 Carbon, it is a great keyboard, the best keyboard I’ve ever used on a laptop, and so I think you’ve chosen wisely there. Really great contribution, Joe. I thank you for it. I hope others might contribute too on advice that they would give to people starting out in Windows with JAWS.
Braille support in Android TalkBack, and hearing aids
Imke is in touch and says, “Hello, Jonathan. Regarding Braille, with an uppercase B, support within TalkBack. While virtually attending a session on what’s new with Google Accessibility at the Assistive Technology Industry Association, ATIA, conference on February the 3rd, I asked about support for HID Braille displays in TalkBack, recalling the discussion on earlier episodes of Mosen At Large. The response was that support for HID displays via USB was added in TalkBack 13.1. Bluetooth support for HID displays has not been added, and there was no comment on a timeline for doing so.
Regarding hearing aid models. About a year ago, I received my latest pair of hearing aids. I had been using several generations of Widex Hearing Aids, and was now working with a new audiologist. She recommended trying a model of the Starkey Hearing Aids, and I initially accepted this recommendation. However, my hearing was significantly worse with those Aids than with my previous Widex Aids. I was not always able to understand a speech from across a normal-sized living room, and I did not hear background noise as well as I would like to for orientation and the general awareness of my environment.
After a week of testing, I returned to my audiologist and requested a change. We switched to the latest generation of Widex Aids, and they worked much better for me than the Starkey Aids, and somewhat better than the previous generation of Widex Aids, as one would hope. I understand that which hearing aids are optimal for someone very much depends on the type of hearing loss, personal preferences, and lifestyle of each individual, and how these factors match with the characteristics of the different hearing aids on the market.
My hearing loss is moderate towards severe on one ear due to auto sclerosis and high-frequency hearing loss. My primary criteria for aids are that I understand a speech clearly, hear my environment well enough to use ambient sounds for orientation, can stream sound from a laptop and from iOS devices to the hearing aids, and can accurately determine the direction of a sound source. Widex offers a variety of streaming devices. I now use Widex’s FM+DEX to stream sound from the computer to my hearing aids. The FM+DEX plugs into the computer’s 3.5-millimeter port and hangs around my neck.
Unfortunately, it is heavier, more complicated to use, and has a shorter battery life, eight hours, than Widex’s now outdated UNI-DEX of which I have three to use with different computers as well. The delay of sound that has caused you not to like Widex Aids is annoying but not a showstopper for me. My aids are made for iPhone MFi-compatible, and I find it very handy for the sound from my iPhone to stream directly into that without needing a separate headphone or streaming device. Occasionally, the sound streaming from the iPhone is interrupted.
This can be remedied in one of several ways; turning the hearing aids off and on, turning VoiceOver off and on, or simply cycling through the hearing aid’s different programs. From discussions on Mosen At Large, it sounds like this may be an iOS problem rather than one specific to the hearing aids. Next time I am in the market for new aids, I intend to check out the Oticon line as well, given the good things I have heard about this brand from you and others. I hope that this information is helpful to others who are exploring new hearing aids now or in the future.
I am glad to be listening to your podcast episodes again after your well-deserved summer break.” Thanks, Imke. I really appreciate you sharing your thoughts on that. My thoughts on the Widex Aids are now four years old, and that is a lifetime in hearing aid technology. For those who want to see my experiences, they are still up on my blog at mosen.org/nowhearthis2019, all joined together. Mosen.org/nowhearthis2019. One thing I would say that applied in 2019, and I’d like to think it was improved, the erratic behaviour of the Widex Hearing Aids when they were in MFi mode, in other words, when they were paired with an iPhone, is far, far more erratic than what I have now with the Oticons.
The Oticons are pretty much rock solid. I can wear them all day with the iPhone and not have them cut out or do any weird things like that, and the latency is much better with the Oticon than with the Widex, at least of that time. A lot can happen in four years of course, so I’d like to hope that things have improved. I’d also like to hope that the app has improved. Actually, if anyone knows whether that is true or not, please let me know because another of the showstoppers, besides that I could not just directly run a cable from my hearing aids to a 3.5-millimeter jack, which is what I did with my previous Widexs.
It’s what I did with my previous Phonak Aids, and it’s what I do now with my Oticon Hearing Aids. Zero latency, that’s really important when I’m working here in the studio. I simply could not do this podcast if I had the Widex solution, so a pretty significant thing there. Despite all of that, a lot of the advanced features that were available in the Widex Aids, I couldn’t use them because they were dependent on making use of an app that had serious accessibility problems. I reached out. I made a huge effort to try and get in touch with Widex.
I produced videos for them. I explained what needed to be done to fix the problems, but they would not engage. Of course, we’ve talked about this problem with audiology companies in the past where you are dealing with audiologists, and a lot of these manufacturers will not talk directly to so-called patients. Maybe if they started calling hearing aid wearers hearing aid customers instead of patients, we might actually get some decent customer service from these people.
It is really hard to break through and Widex were just not interested and engaging about their awfully inaccessible app. I used Widex with Senso all way back in 1996. I’ve owned several Widex Aids over the years, but the way they’ve evolved to be, for me at least four years ago, very poor compared with other manufacturers with the MFi standard, and with a very inaccessible app, it’s disappointing. If it’s got better, I’m delighted to hear that, and I’d like to hear more.
Male Speaker: Be the first to know what’s coming in the next episode of Mosen At Large. Opt in to the Mosen media list and receive a brief email on what’s coming so you can get your contribution in ahead of the show. You can stop receiving emails anytime. To join, send a blank email to media-subscribe@mosen.org. That’s media-subscribe@mosen.org. Stay in the know with Mosen At Large.
Our Eleven Labs feature begins with a tech roundup
Jonathan: We introduced a new segment last week, a tech roundup segment, and thanks for all the positive feedback on that. The voice that was used in that tech roundup was artificial intelligence from ElevenLabs. I promised you that we would be talking a lot more about that, and I’m fulfilling that promise in this episode. Founded only last year, ElevenLabs has in the last few weeks taken the blind community on Mastodon by storm. If you’re on Mastodon, you will have heard undoubtedly many examples of speech generated from ElevenLabs. They are a voice technology research company whose goal is to instantly convert spoken word audio between languages.
They also allow the creation of realistic-sounding voices for a variety of text-to-speech use cases. Right now, they’re well-backed by venture capital funding. Being text-to-speech, it’s something the blind community is very interested in. I’m going to be speaking with one of the co-founders of ElevenLabs soon, but before I do that, for those who aren’t on Mastodon, I’m going to give you some samples of ElevenLabs speech that’s been generated in various contexts. First, let’s do another tech roundup, but this time I’m not using one of ElevenLabs own voices, I’m using an artificially intelligent-generated Bonnie Mosen, which I’ve created using samples of her voice.
[music]
AI Bonnie Mosen: This is your Mosen At Large Tech Roundup. A quick look at some interesting items making news this week. I’m Bonnie Mosen, or at least an AI version of Bonnie Mosen, thanks to ElevenLabs. LinkedIn is a popular social media channel for professionals to connect with one another, but it’s not been an efficient experience for VoiceOver users on iOS. Microsoft has changed that with one post now taking just one swipe as opposed to the many it used to take. The move comes after years of lobbying for Microsoft to improve the LinkedIn experience for VoiceOver users.
While some accessibility optimization would be ideal, the experience is finally much improved. In a move that will be welcomed by the deafblind community, WhatsApp is working on a new feature that will transcribe audio messages into text. All transcripts will be processed on the device and won’t be uploaded to WhatsApp or Apple servers. The feature is only in the beta version of WhatsApp right now, and there’s no comment yet on whether or when it’ll come to Android. We’ve covered Chromebook devices on Mosen At large previously. They’re accessible thanks to the built-in screen reader, ChromeVox, and for the first time, it looks like Chrome OS devices may be about to offer keyboard shortcut customization.
It’s an early beta right now, but if it happens, it could make Chromebooks more attractive to those who are used to a set of keyboard commands from other operating systems. Apple released new software for all their things in the last week. If you don’t have it yet, it’s a good idea to update it as soon as you can. The updates for Apple’s computers, phones, and tablets, all patch an actively exploited arbitrary code execution vulnerability in web kit safari, and a second kernel vulnerability that isn’t known to be actively exploited. Those updates also fix an issue that could cause iCloud to become unresponsive, and a Siri bug that was keeping it from working properly with the Find My feature.
There were some initial compatibility issues between iOS and some third-party apps, but these have been resolved for the most part thanks to software updates for the affected apps. Meanwhile, development never stops at Apple. iOS 16.4 has now begun its beta cycle. These are minor updates, and there is no expectation that there’s anything new on the accessibility front. It really is a final fond, or maybe not-so-fond, farewell to an old friend. Microsoft officially stopped supporting Internet Explorer sometime ago, but it’s now pushed an update to Windows 10 which disables the browser completely.
RIP Internet Explorer, you served us well. We’ve covered 1Password on Mosen At large before and there have been ongoing concerns about a significant update to the popular password management utility, 1Password 8. 1Password has released a significant update to 1Password 8, which they say has over 100 changes and fixes. Mastodon traffic from the blind community suggests 1Password 8 for iOS is much more accessible now, and the company confirms they’ve been actively working to address accessibility issues. That’s a brief summary of this week’s tech news. For more, follow Mosen At Large on Mastodon, mosenatlarge@mstdn.social. That’s mosenatlarge@mstdn.social.
[music]
More Eleven Labs samples
Jonathan: I want to take you back to my first use of ElevenLabs. This was a few weeks ago now. I used their default voice, which is called Adam, and we used that for last week’s tech roundup. All I did was throw it a piece of text that I’d written for Mastodon. I was astounded at the inflection and the contextual awareness that it displayed when it read it back. Here’s what it came back with.
Adam: All this talk about the balloon reminds me of a we story. We upgraded our TV about 10 years ago because the new one we bought had a screen reader in it. We sold our old TV on Trade Me, which is a bit like the NZ equivalent of eBay. This couple bought the TV and made a time to collect it. The day before, one of the kids had a birthday party with all the usual chaos. Plenty of people around and balloons, lots and lots of helium balloons. There was cleanup to do, so we mustered the balloons into the spare room. When the people came over to get the TV, we cheerfully said, oh sure, it’s in the spare room.
We opened the door to the spare room and the TV-collecting people were bombarded with a barrage of semi-floating balloons that were starting to go a bit flat. I couldn’t see the expression on their faces, but I could hear the skepticism in the guy’s voice when he said, “Um, this TV does work, right?” I think they had decided that after being on the receiving end of a balloon onslaught, we were a bit flaky.
Jonathan: That’s the Adam voice from ElevenLabs reading a toot that I sent to Mastodon. ElevenLabs has a feature called Instant Voice Cloning. The cleaner the samples you can give this feature, the better. Obviously, I’m pretty fortunate in that I can provide lots of clean samples of my voice, but there is a disclaimer with the current version of Instant Voices that says that it works best with American accents.
Nevertheless, I decided to feed it some samples of my voice. To begin with, I gave it quite a lengthy sample from a formal document that I was reading for work purposes. I was narrating a very formal document. After setting up this voice, I decided to read a portion of the Mushroom FM blog post announcing our move to Mastodon. Some people say this sounds a bit like me, I’m not so sure, but it certainly has changed my accent. Here’s what it came back with.
AI Jonathan: In case you’ve not heard of it yet. Mastodon is a social network made up of independently-operated servers or instances around the world. Some who have only heard a bit about Mastodon think that you can only talk to people on the same instance as you, making it too fragmented. While the full answer is complex, the basic answer is that that’s simply incorrect. I can follow you if you use a different Mastodon instance from me, just as I can email you if you use a different email provider from me. No one can buy Mastodon for billions and billions of dollars and destroy it recklessly, because the software is out there and open source for anyone to use.
If you’re worried about the complexity of learning something new, I promise you that while everything unfamiliar seems a bit odd at first, if you can use Twitter, you can definitely use Mastodon. There are differences but many similarities like your home timeline, and a place to go for mentions and other notifications. Mastodon is a perfect companion for Mushroom FM’s culture and reasons for existing. We call ourselves the fun guys, but we’re also the little guys, the really little guys. 24 hours a day, 7 days a week, we offer you something different from the massive radio conglomerates out there and the streaming services.
We are volunteers who love the music we play, and take pride in providing you with entertainment and companionship. Similarly, Mastodon volunteers takes pride in doing social media better, wrestling it off the corporations who exploit our data for commercial purposes, and amplify those posts that will cause division, and therefore more engagement. In short, Mushroom FM and Mastodon are a perfect match. How do I get on Mastodon? I heard it’s complicated. Mate, it’s as easy as falling off a log. If you are not on Mastodon already, here is a set of simple instructions.
Note, there are many, many other ways of getting on Mastodon, and others may have different recommendations, but we want to keep this simple. One, sign up for a Mastodon account. You can find a well-run Mastodon instance where many blind people are hanging out as tweesecake.social. The signup process is simple and accessible. Two, if you want to use Mastodon via the web, you can use the tweesecake.social website. You can find an even simpler web interface at Pinafore Social. Just login with your credentials from TweeseCake Social. Three, if you’d like to use an accessible app on your PC or Mac, download TweeseCake, which has been designed for and by blind people.
Jonathan: I’ll cut it there. Yes, this is the real me speaking again. That is an AI version of me that ElevenLabs came up with, and as we will hear when we speak with Mati from ElevenLabs soon, they do hope to increase their range of accents. Now, I’ve been playing with this quite a bit, putting all kinds of voices through this to see what we can do, and because it works so well with American accents, as you heard, it does a pretty reliable job of duplicating Bonnie’s voice. With a little bit of technical ingenuity, you can have a lot of fun with this. Here is a wee parody I put together comprising an AI version of me and an AI version of Bonnie.
AI Jonathan: Hi, everybody. Welcome, welcome, and thrice welcome. I’m the AI version of Jonathan Mosen, and it’s my great pleasure to welcome you to this inaugural edition of the Mosen machine.
AI Bonnie: It’s a great big howdy from me y’all. Jonathan’s not the only artificial Mosen on this show. This is the official artificial Bonnie Mosen speaking live. Not actually speaking live, I suppose.
AI Jonathan: See, you sound amazing, Bonnie. You sound absolutely souping damn amazing. What about me, ey? What about me, I ask you? They changed my accent. They practically sent me on the treadmill and made me change my name.
AI Bonnie: Oh, you can be such a drama queen, Jonathan. You still sound like me to me.
AI Jonathan: Yes, but I don’t sound like me to me, and that’s what should count. Especially when you sound like you to me. Do you sound like you to you?
AI Bonnie: Hang on, I need time to process all that. I may be a powerful computer and all, but that’s complicated. Yes, I suppose I do sound like me to me. Anyway, I have a really, really great idea that you’re going to love.
AI Jonathan: Oh, yes?
AI Bonnie: Yes. We are much more environmentally-friendly than our human equivalents.
AI Jonathan: How do you mean?
AI Bonnie: Think about it. We take no space. We’re just bits in the cloud. Our emissions are low, and I mean, listen to those guys. Listen to them. Green economy, blah, blah, blah. Build back better. Blah, blah, blah.
AI Jonathan: I see what you mean. What do you suggest?
AI Bonnie: We should put the human Jonathan and Bonnie through the same ceremony Jonathan’s put lots of others through.
AI Jonathan: Great idea. Jonathan and Bonnie. You are exploded.
AI Bonnie: Jonathan and Bonnie. You are exploded.
[explosion]
Jonathan: Oh, dear, you can have far too much fun with Eleven Labs and a little bit of Reaper work. Thabo from Botswana has been in touch. My mention of ElevenLabs in last week show and the sample that we did with the tech roundup, prompted him to go to the site and have a bit of a play himself. Here’s what he came up with. Three short samples using ElevenLabs own voices.
AI 1: It’s a joyous day. Yes, a historic and a wonderful moment for all of us gathered here. When I was asked to welcome you, I couldn’t agree any more. You are such a lovely group of young happy faces that I want to stay around forever. Now what’s in store for you? Does anybody know? No. Listen to me, I am talking. You keep quiet when I am talking, okay? Do you hear what I am saying, George? You have been giving me empty promises for so long. When am I going to have my work done? When?
AI 2: I am telling you Megan. John is so, so, so handsome. We met at a store and he gave me this huge smile, my friend. I couldn’t resist it. I smiled back. I even dropped a can of beans from the shelf, can you imagine? He was gentle. Completely different from the rude John we knew in high school. I even got to meet his wife and daughter. They seemed happy, and were so nice to me. Girl, it was really cute.
Jonathan: Thanks for sending those in, Thabo. What that demonstrates is how contextually aware this technology is. It’s looking at the full text and it’s making inflection decisions based on the text that it is seeing. This is very early technology, and yet it is showing a lot of promise.
Interview with Mati Staniszewski from Eleven Labs
To talk about that technology, where it’s heading, and yes, some of the ethical issues that this raises, I’m joined by Mati Staniszewski. He’s one of the co-founders of ElevenLabs. Mati, it’s great to have you on Mosen At Large. Thank you for coming on.
Mati: Hi, Jonathan. Thanks a lot for having me.
Jonathan: How did you come up with this idea of ElevenLabs?
Mati: It started with a combination of two factors. I’m originally from Poland, as is my co-founder, Piotr, who is also from there and we grew up together all the way from high school. There was two things that coincided. The first one was we’d often engage in different hack weekend projects where we would work on something for fun. It’s stretched from everything from recommendation algorithms to one of the ideas that stuck, which was with analyzing speech and providing us a better ability to pronounce things with emotions we use.
We really like this. This introduced us to broad voice AI space. Then the second thing happened, which is looking back an obvious thing, but for anybody who isn’t from Poland, the way we watch or listen to foreign content frequently is with these single voiceovers. Imagine having a Shawshank Redemption movie or a Mission Impossible, and instead of hearing dialogue with many people, in Poland you have this single narrator reading over the original dialogue in Polish. In general, for a majority of movies, it’s a terrible experience. My co-founder’s girlfriend, she doesn’t speak English, so they couldn’t listen to that in the original language.
They needed to watch a version dubbed in Polish, which brought the combination of those to sell what we knew in voice AI, and knew how big of a problem dubbing is, into this first idea of, what if we can fix it? What if we can make AI dubbing a thing where all audio out there can be provided at the click of a button, but preserve the original emotion, the original intonation in voices without losing the quality. Of course, the same applies to so many other countries out there, and so many other domains across education to entertainment, and embarked on that mission with Piotr to hopefully change that.
Jonathan: That’s interesting, then because when I look at the current site, obviously you’re doing quite different things for now really taking text and allowing you to process that text with a variety of voices using a couple of methodologies that I want to talk about. Has the use case broadened a little bit since you originally envisaged the company, or in time, do you think that that’s where your concentration will be on that original translation goal?
Mati: We would love to reach that original goal, but you are right, the sequencing definitely changed. AI dubbing, we need to solve all the research components as we go along. Of course, within all of these domains, there are a set of a value and problems we can fix. That’s what we realised the further we deep-dive into the space, both in terms of the use cases we can hopefully help with, but also in terms of the technological advancements we need to do. To just to give you few examples, if you are to do a dubbing, you actually need to fix text-to-speech as well.
You also need to be able to create, design, or clone voices. Then you need to somehow preserve that audio between languages. Then as we thought about that use case, we just realised how many audio things are out there that are just not so good. The current text-to-speech where it’s such poor and quality that we couldn’t do anything else, but try to start from scratch our own approach, which brought us to the current use case in the site, and the current focus for the next months and years of trying to bring the textual content into something that sounds hopefully great, and provides that additional medium that’s currently not available.
Jonathan: Broadly speaking, you seem to have two kinds of voices right now. You’ve got the instant voice cloning, and then there’s professional voice cloning. Now, at the risk of sort of pulling the curtain back like the Wizard of Oz or something, the instant voice cloning, it sounds like it is analyzing the voice samples that I upload, and then matching that as close as possible against a library of voices that you have, is that correct?
Mati: That’s broadly, and they’re like the usual attempts, that’s broadly similar. What essentially happens when you upload a voice, we compute a vector, or based on your voice sample, we call it the speaker embedding. There’s no training that actually happened. It’s broadly taking all the voices that we have in data set, and tries to construct something that sounds similar based on what model heard already in the past. Like you said, it tries to mimic that pretty closely. Then some voices, it sounds definitely better, and some it doesn’t do as well, but then of course there’s that professional voice clone that’s also possible.
Jonathan: The instant voice cloning is likely to get even more accurate, I imagine, as you improve your library of voices that you have online that have been professionally cloned. Would that be right?
Mati: I think that likely won’t be enough but it could be a contributing factor, of course, if we get the permission to use those. What it would come down to is really having access to even larger speech and text data sets that we can train on. For example, one of the things that we’ll come across is that we have pretty good coverage on US accents, but if you shift that to some of the other regions, both in terms of how a model works, that can be improved, but also in terms of the voices coverage, that could get better as well.
Definitely with, in general increasing access to data, whether it’s from our users that obtain and provide that permission, or through just being able to have even increasing set of audiobooks or podcasts, or movies and games that give their data for training, that keeps stretching and making it better.
Jonathan: My wife is American, and I put her voice through this, with her permission, I hasten to add, and after messing a little with the inflection and various options that you have, it really does sound like her. When I put my own voice through there, it sounds something like me, but it completely loses my New Zealand accent right now.
Mati: [chuckles] I presume it makes you sound a little bit more American, right?
Jonathan: I sound a little American and quite a bit British in the samples.
Mati: All right. That’s somewhat expected on some voices. It speaks to what you said earlier, if it’s roughly similar to something we’ve seen in the past, and we can generate definitely a better representation. We have a new model update that’s coming hopefully later this month, which should make this better for any accents, where we’ll allow it to explore that space a little bit more in terms of finding a good match in a way.
Jonathan: Anyone can play with the instant voice cloning, do you have any tips to share about how to get the best experience? What sorts of samples are best to upload and how much data is necessary? Because I know you can start producing a voice with as little as a minute of cloning, does it actually materially make a difference if, for example, you provide a variety of samples, and you might put 10, 20, 30 minutes in there?
Mati: I would definitely recommend the longest amount that’s very clean, it really works best on us clean audio without background sounds, without background noise of a single person, nothing overlapping with other people, then the more of that will definitely help. Although in instant voice cloning, I think it flattens out after a few minutes, so it doesn’t give a much additional advantage, say if it’s more than five minutes. On the other hand, if you have the ability to upload a minute or 30 seconds of different quality, I highly recommend that 30 seconds of high quality than minute of an average quality.
The second thing that I would say is that the way you want the voice to be represented will also depend on the sample you provide. For example, if you give 30 seconds of you narrating an article that will try to provide that style of a person narrating the articles. It’s a good representation if you want to read the voiceovers for the textual content. On the other hand, if it’s a dialogue setting, and you want to replicate your voice in a dialogue setting, then I would recommend 30 seconds of you in a more emotional, conversational setting.
The third one is what you mentioned, which is really playing with those settings, the more you make the stability lower, the more we essentially allow to vary the speech over time. It will sound definitely more expressive to some extent. Then we have the second parameter called similarity and clarity, which will essentially boost your sample closer to what someone provided. If it’s a clean sample of a good quality, then I recommend shifting that parameter a little bit higher, and that can affect the quality.
Jonathan: That definitely mirrors my own experience. The first time I use this in my day job, I’m a chief executive, but because I have a studio at home, sometimes when we put out material that we do have to put out in audio for some of our customers, I read it here. We had quite a formal document and I uploaded quite a lengthy sample of me reading very formally, almost newsreader style because that was required in that example, and so I got that back. I got a very official narrator kind of voice back. I also do an internet radio show where I’m much more frivolous and expressive, and so when I created a different voice with samples from that, it was way different. It was much more expressive, for example, more chatty.
Mati: That’s great. Exactly, when we are operating in a closed beta we see in some of other users doing exactly the same. It’s actually providing you ability to provide a few different ways of speaking something which would highly provide a very different use cases that are available to others. In the future, what we would love to do, hopefully soon, is that you can provide even one or two of the samples, but then in the output actually modify the style of the speech. You could imagine, Jonathan, providing a narrative style, but then saying, “Okay, I know when to say, a little bit narrative, but less emotional, or more emotional.” That’s also on the radar to even not need a sample, but just upload one and keep that similarity, but of course, it’s a tricky problem.
Jonathan: I’m a busy person, and I liked the idea that I could give this AI a lot of text and have it read the whole document. I don’t particularly care that it might lose my accent, it might not sound exactly like me today, but it’s obviously much quicker for me to give it a long document than sit here, fixing my mistakes, and editing errors and things like that. It’s a real-time saver. That 5000 character limit is a bit of a killer. Is there any way around that at the moment, or might there be your way around the 5000-character limit at some point in the future?
Mati: No, you’re exactly right. It is pretty hard with the current platform to voice long, long fragments of text. We are actively working for release in March on a dedicated platform where you could start with arbitrary piece of content. Initially, this will support articles so you can just provide an URL of the article or copy-paste a long article of any length all the way to books. You could imagine uploading ebook format of a book or a PDF. Then what we do on our site, we’ll essentially break it down into chapters or fragments, and then a user like yourself will have the ability to either generate the whole thing or regenerate specific pieces.
That’s the biggest, I think, limitation in the current platform, which is, if you were to provide another few 1000 characters, then you can either generate the whole thing. Then if you want to regenerate, you also need to regenerate the whole thing, and we know that. We want to provide an option where you could drop an article, generate the whole thing, but then you realize, “Okay, I want to make an edit in my second paragraph because I mispronounce the name and the name is very important.” In that case, we essentially will provide you a way where you can say, “Okay, let’s just generate this specific piece, but keep everything else as is.” Then we will drop that limit of characters.
We are currently working with set of indi book authors, and they would love to provide their books both for accessibility for showing their work. They frequently can’t just because it’s not affordable to do so with any traditional method. They are bumping across the limitation that you just mentioned, which is almost all current platforms, not only don’t give you a good quality, but they also don’t provide any support for longer form. Hopefully, that platform in March will change that and make that a lot easier.
Jonathan: Yes, you did also touch on the fact that you’ve got a combination that’s not ideal. You’ve got a very discerning audience in the form of the blind community who are used to working with text-to-speech for a lifetime, and so they like things to sound just right, as good as they can. There’s also because of the socio-demographic nature of the blind community and people who perhaps can’t afford a major plan, and so the fact that sometimes we’ve seen blind people submitting something, and then thinking is not quite how I want it, I want to keep playing with those sliders a little more, but every time you submit your content, you are decrementing your character count, even though you haven’t got something that you can really use yet or that you feel comfortable with using.
Mati: That’s entirely true. That’s like a second piece there, which is how can we support much longer form and make it easy to generate and regenerate while keeping the majority of it in place. Then there is the second piece, which is frequently we’ll start with the same text. Then you might want to change some of the settings but keep the text the same. We are still working on every side on the best way to provide people with the ability to not lose quota but be able to generate, while on our side, not losing too much compute because, of course, for every single generation with our model size is relatively expensive.
We are trying to find a middle ground of how we could do that, that people can regenerate for free or just based on some short feedback, but not lose quota. There are few ideas for any of the listeners of this podcast would love any ideas as well, our way as we are still actively thinking about this, but the current ones are one, to provide that ability just on lower latency. Essentially, that request will just take longer if you want to regenerate the same thing, but not lose quota.
Second one is just by providing us feedback of like, “Okay, this was not the way someone wanted it, but what’s the reason?” Obviously, we want to, of course, avoid abuse where people, for example, send requests multiple times without the real reason. Of course, hopefully, that has a flip side where if you regenerate a few times that actually provides you better quality, so in a way, those characters are actually used for a better outcome in the end, instead of just in the vain purposes.
Jonathan: Punctuation marks really seem to make a difference. I’ve noticed something is quite differently expressed if you use one, or even several exclamation marks. Do you have any hints for us in terms of how to best use punctuation marks to get the kind of outcome that you want?
Mati: You’re exactly right with the punctuation. It’s extremely useful in terms of how the text will be pronounced and intonated. Our model is trained predominantly on the content coming from books and audiobooks. Their punctuation serves an important role for voice actors to let them know how something should be read. Roughly, as you type text, closer it follows the structure of a classic book, the more you will see that come across. Just the usual tips, commas and dots are very useful to do the breakpoints in the sentence, they will definitely hear a longer pause in between those.
Three dots are great for long pauses, so for example, if you have two paragraphs or multiple paragraphs, I recommend distinguishing them by those three dots if you want to pause. Then a lot of the emotional content or emotional weight is given by exclamation marks, by question marks and even more broadly, so stretching that outside from the classic punctuation, in general, the context of the text will highly influence the way specific thing is spoken.
To give you examples, if you type just an angry text with exclamation marks or even without exclamation marks, that will definitely show in the emotional output that get generated. Similar for happy text. Then similarly, if you’ve swapped that to a dialogue, which is so present in books, you’ll also see that carried across. For example, if you type a sentence, ‘”I’m very angry,” he said,’ that will one, create the effect where that person will change the voice between saying that thing and then narrator that says it, but then step further if you say something, ‘”I’m very angry,” he or she said angrily,’ that will also influence that even by a further notch.
Those cues, those contextual cues in terms of how something should be said in terms of context will actually also determine the output. All the punctuation, the context around the text that carries some information about the emotional output will both have a huge effect on the speech. Then hopefully over time, we’ll add some additional ways of even further manipulating that as well.
Jonathan: It’s remarkably contextually aware, which makes this thing just so impressive. I must say, I did not resort to AI to think about the questions for this discussion. I didn’t go to ChatGPT or anything like that, but I did go to the Mosen At Large community on Mastodon because there’s been so much activity relating to ElevenLabs on Mastodon the last couple of weeks. I asked some of our listeners about what they would like me to ask you. One of the questions that came up really takes us back to something you said earlier about curating information from various sources. One of our listeners wanted to know whether you thought there might be a chance of a podcast-style feed. I guess it would be XML kind of an RSS feed where you could send articles to the ElevenLabs service and then download them as a podcast essentially.
Mati: Great idea. We think that that project in March will serve like a great intermediate step in that direction where you essentially can drop the article and as in the podcast, example, you frequently will have multiple people speaking in that context. As part of that project is that ability that will be actually able to split the paragraphs to be spoken by different people. You can imagine exactly what you said, which is you drop a dialogue off to people, you just tag that across with the voices that are there. Then as the first step, it’s just going to be downloadable for people to release on any platform, but over time we would love to build integrations where you just click a button and it’s exported directly to Spotify or any podcasting platform out there.
Yes, definitely, it’s a great idea. Even not only a podcast but in any setting what we imagine would be even a step further than what currently is possible, mainly because of time constraints in many places or just the logistic constraints, is that you could have books read with multiple voices. You could have characters actually read their pieces in their own voice. That platform approach or what we call projects will allow exactly that, where you can select those voices the way you want. Whether it’s podcasts, whether is book with multiple characters, whether it’s just a piece of dialogue for a movie or an advert. All of those will be possible on the production side.
Jonathan: Even shorter term, if I’m browsing through a series of articles, for example, on the web in my news app, whatever I get my news from, and I think I want to read this and I could use the iOS share sheet and say, share with ElevenLabs in some way, let’s say, and I want to consume that content when my hands are not free. I’m thinking, for example, I use a Sauna and I might want to just, while I’m in the Sauna, which has built-in Bluetooth speakers, listen to those articles that I’ve selected one after another read by maybe even my voice or someone else’s voice through ElevenLabs. Even before we get to the multiple-voice thing, it could be as simple as that, just having your own curated newsfeed that some professional-sounding narrator voice is reading back.
Mati: 100%. I think it’s a great idea. There is a challenge here, which is you need to create a dedicated app that works seamlessly. You drop an article and any format that the article is in doesn’t influence too much how it’s read. For example, you strip it just to the right text pieces. The short answer is, cannot agree more, are very actively thinking about how to do that. Hopefully, in the next few weeks, we will get a prototype out, which would allow you to get that on a mobile device where you can at least read certain pieces.
The only question is just how to make it affordable as well for everybody, because of course at the moment our model is relatively expensive and that’s why we are starting with the production side of things. Allowing people that distribute and create that content to distribute that with audio, with as simple as possible way, given that get that frequently that cost isn’t match, but as a listener, that cost can pile up pretty quickly.
We are trying to figure out a way which could maybe like as a first step at least, allow you to listen to the most frequent content in audio that frequently doesn’t have that possibility, where the costs are just so much lower. Then over time, as we hopefully figure out how to make our model a little bit more efficient or smaller, we can stretch that to any type of content so people can freely upload anything and listen that on the fly.
Jonathan: Then I guess for content creators, there’d be the desire to have something on your website, a built-in player so that every time you say published a blog post on WordPress, there would just automatically be a listen to this content button and the user could play that and hear the content that had been generated.
Mati: Exactly. This is something we already see a lot of people doing with their current platform. They will go through those few quick steps of trying to take the article, chunk it a few times and put it into the software, and then upload that in their blog posts. Hopefully, we can make those steps a little bit simpler, which actually again, as part of that project release that we are planning, it’s going to allow you to voice that simpler and then soon after that integration piece will come into play.
Jonathan: We talked about instant voice cloning, and then there’s the professional voice cloning, which has some pretty lofty goals, making the voice sound indistinguishable from the original. That sounds like a pretty intense process in terms of creating those voices. Can that be done from someone’s home?
Mati: I think it can, especially with a good microphone. With a poor microphone, it can be challenging, but there are some ideas here as well, but in general, professional voice cloning, like you said, is providing a way to create almost identical copy of the voice. What we need is between 30 minutes to 3 hours in clean conditions and very similar to how instant voice cloning works in the same style that you want to be delivered, no background noises, there’s no overlap. Then what we do on our site, we’ll take that content and we’ll fine-tune or train the model for that voice and then serve that to the user.
For example, if you wanted to tune-code your voice, we would need that 30 minutes to three hours at least, it’ll take us an overnight process to run that through. Then after a day, and that’s the intention for all the users out there that will be available to you as your own dedicated model with that voice. Now, of course, that’s even more costly compute-wise and time-wise given we need to create that model separately. We have few ideas how we could do it slightly better. For example, having few voices run at the same time, of course, each voice would just be saved to one user, so only the voice that they decided to train themselves.
Jonathan: You make the comment in the FAQ that for people who want to use this, who have a disability or medical-related need, you might be able to negotiate something there. Are you able to talk anymore about that and what you have in mind there?
Mati: Yes, so we hope to open it up and the question is how to coordinate that process more than how to make it possible. We would love to provide anybody who has any accessibility need or doesn’t use it for a very clear commercial case for at very low cost or completely for free. Because at the end of the day, this hopefully provides that additional value and benefit and there’s no intention to earn money on that if it genuinely helps. Still working on the program. The challenging piece is, is to figure out how to organizationally structure it, how to run the process, how to get people on board, how to prevent people from abusing that to some extent and verifying that accurately. As soon as we figure that out, we would love to open it up to broader set of people who need that and then offer that at cost for them to have ability to have the voice on the platform available for them in that quality.
That also stretches to the other side of the equation. The number of cases where people can lose their voice or are losing their voice, where they are reaching out to us, whether those voices could be recreated, and of course, in those cases we would love to give them the voice back and a allow them to finally– I’m sure it’s not going to be as good as the real thing, but we hope to at least get it close and make it similar.
Jonathan: It’s a wonderful use case, this. My wife listens to an American radio station and a few years ago she said to me, “You listen to this news bulletin because it sounds like this individual is a text-to-speech engine,” [laughs] and I was quite sceptical. I looked him up, it turns out there’s a journalist called Jamie Dupree who lost his voice completely and can’t speak anymore. Another company cloned his voice and he was able to write his news pieces and his text-to-speech version would read them. That is fantastic that he was able to keep doing his job as a radio journalist because of technologies like this.
Mati: Exactly. It’s incredible that it can bring that back. That even stretches to so many cases of listening back to your close family or loved ones that might have lost their voice or passed away. Even stretching that use case or variation of that use case further. Few companies, one specific one in UK reached out where they record podcasts with all their people. They then transcribe those podcasts into a book and then release that after a while. Number of family members then reached back of then turning that book to have a voice of that person after they passed away to create that even more personal and close experience. That similarity when you lose the voice or lose up the voice of the loved one it’s exactly the same.
I have personally have exactly the same experience where my granddad wrote a piece about his time in Siberia, sadly passed away last year, but now there’s that brewing idea of maybe we can actually recreate that book in an audiobook format and in some form bring him back closer to the family. I definitely feel there’s so many potential here with AI technology to help recover some of that history or voice or your personality back and bring it closer to people.
Jonathan: I’ll come back to that when we talk about some of the ethical questions around this technology because this is a very interesting question, but I wanted to raise another use case, which is that a lot of blind people go blind later in life or they haven’t had the access that ideally, they should have had to Braille as a child. Sometimes where this really hits home is when they become parents and they want to be able to read their kids a bedtime story at night, but they don’t have the fluency in Braille, if they have Braille at all to be able to pull that off in a way that the child finds pleasurable. If they could take the text of a bedtime story with their professionally cloned voice and have a series of stories that they could play to their children, then that’s at least something. Is that a use case you would entertain?
Mati: 100%, yes. It’s a great one. Like you said, it can help definitely generate that with higher speed. Hopefully, the voice that’s similar. Maybe there is even some synergy there when the synthesized speed doesn’t say something well you can actually quickly correct that with the voice of someone who is probably experienced with and listened to a number of stories, but would love to support use cases like this.
Jonathan: The biggest question, the one I got asked most to ask you was support for open standards for these voices so that screen readers might be able to use them with sappy or perhaps a voice on their iPhone or similar to work with screen readers. My initial thought about this is the way that this thing is so contextually aware, it really needs quite long passages of text to do what it’s doing. Also, there are latency issues I would imagine so that a screen reader can be responsive with these voices. Do you think it might be even possible for screen readers in some way to work with these voices to have an offline version of a voice that you create so that you can throw any text you want at it?
Mati: In the long-term, yes, but I think it’s likely not in this half of the year and maybe in some version in the second half of this year, but like you said, it is a challenging problem to make the model work offline, first of all, it’s a huge model on our side. The second concern is that, of course, we on our side need to figure out how to provide some of the technicalities and the research that we developed open source while not letting others use our technology for their advantage, so there are two concerns.
The first one is definitely the big one, which is how to package the model in a smaller version. Ultimately, that’s where we believe the world is headed, where you could interact and use those models on demand as a listener anywhere you are at a low cost and high speed without needing to call an online version. Although intermediate step, we hope to deploy like an extension at least on the browser level, where you can listen to anything on the web with the voice that sounds a lot better than current extensions do and then slowly move that into that space of what you mentioned, so true screen reader working offline where you can work that. The only question is the timeline. I’m positive this will happen. I don’t know when is that.
Jonathan: As an intermediary step, there are technologies, there’s an app for example on iOS called Voice Dream Reader, which is not a screen reader, but it allows you to load a book into that app and have quite a variety of text to speech voices, read it back. It seems to me that might be a first step because that’s a sandboxed environment and it’s also dedicated to reading books, so the latency issue isn’t such a consideration.
Mati: That’s the partners, we are actually speaking with few that might integrate us in the end solution and if that doesn’t happen, then we would likely implement our own solution. You could have several books that are available to general public at low cost to be able to listen to. We are actually iterating with two different companies at the very moment that provide that solution on iOS, Android, and few other places to help potentially integrate our solution on their end, so that short-term step is there and is ready to be used.
Jonathan: For now. How secure is the information that I upload to the service? For example, if I upload something that is commercially sensitive because I want that read to me in a particular way or maybe it’s highly personal in nature, what guarantee, if any, do I have of the security or privacy of that information?
Mati: That information is all yours. We haven’t had any situation where that information would for any reason be hacked into and that will remain the case, so that’s all yours. On our side only cases where we would look into that. If for example, you published something that violated the law and it was reported to us, then we would likely look into the logs of what might have happened that still won’t reveal us a username in that case, it’ll just let us check whether the log suggests who from where generated that information and then we can ban the appropriate account in those cases.
Jonathan: This technology can obviously be used for both good and ill. As this technology becomes more indistinguishable from a real human, people are worried about deep fake, about abuse, does that concern you and how do you mitigate that risk without compromising someone’s privacy? There must be quite a delicate balance you’re trying to strike here.
Mati: Very delicate balance and still heavily iterating with set of our advisors and the team on how to go for that balance. What we imagine and hopefully our solution will bring something that others want, is that in many cases you can generate a lot of the content, especially deep fake content already with a lot of the open source tools. There is models like all the way from TorToiSe to other platforms that I don’t want to name, but all those platforms will allow you to do exactly the same work, but frequently those platforms don’t have ability to trace back whether the specific element was AI-generated or not.
On our side, what we would love to do is to provide ability for any audio that’s created by our software to have that verification that can be made, whether it is AI or not. Preventing that misinformation and maybe in the future that will take more of a form where that could be a standard to follow where for every content that’s AI-generated, it needs to be explicitly tagged, if it’s not, then it’s something that can be checked. As we build our technology, that’s one thing we keep in mind that the information out there can be verified whether it came from us or not, which is a challenge with other tools out there with open source tools and tools that might not have that in place.
Of course, still like trying to figure out the additional limits we should be taking in place and how the technology is used. Luckily, and over 99% of cases we see people creating incredible things from voiceovers of their articles to creative outlaws on YouTube, creating entirely new works all the way through podcasts, fair use cases with derivatives of games, so there’s a plethora of things we haven’t even expected to a large extent. I’m hoping that will be the driving force for a lot of the consideration next, but of course trying to strike the balance of how to prevent any misuse and implement that while not limiting too much of the former group.
Jonathan: I too have heard in the last couple of weeks prominent political figures being used, I guess you might argue in a satirical context. They’re clearly saying things that they would never really say, is that a legitimate use though? Is satire legitimate use of this technology when clearly somebody is still saying something that they would never actually really say?
Mati: It’s a hard question if it’s not a clear-cut case where as an example, impersonation which incentivized to something harmful where it’s a clear thing where we know it’s violating our terms of use and it’s harming an individual where we can ban. If it’s something that goes into that satire category, then that’s a big piece of free speech and fair use. In this case, we would rely on authorities to take action and direction of what should be specific piece classified at.
Jonathan: I think you mentioned in your FAQ watermarking, so I take it is possible in some way to determine if something gets into legally sticky territory that the file did come from ElevenLabs in the first place.
Mati: Exactly.
Jonathan: You specifically mentioned family members who have died earlier. This is a really interesting question for me. It really comes back to whether anyone has the right to put words in the mouth of someone who isn’t around to say whether they approve. It’s a slightly different context actually that you painted because you’ve got a written document from your grandfather. I’m sure being able to hear that in his words and they are his words and he’s speaking them, would be just absolutely incredible. Then you have different use cases where somebody has samples of someone who has died and they’re using it for satirical or parody or comfort purposes. They want to hear that person, that friend, that loved one say something to them that we can’t be absolutely sure they would ever have said. That has got ethical challenges all over the place, hasn’t it?
Mati: It’s true. It’s a very hard question. It’s a very hard question. I’m sure there is a lot of discussion beyond our conversation that will be happening across here. In that book example, hopefully, the family members can give permission on a specific use case that this is used for. Even then that, like you say, there’s a plethora of ethical and moral questions to be answered about. Is it the right direction of the technology? Maybe there isn’t a future where at some stage in life we give that permission before that happens and that could be used as a proof which direction should be taken.
I’m sure there will be a lot more to follow in that space, especially as AI can now generate everything from speech, but also the same thing will apply to text, where frequently people will generate prompts of text in a style of a specific person. This can also be to somewhat confusing and likely it’s informative whether that person would ever type that or ever do anything like this. Voice is definitely more personal, I think. I’m sure this will happen across so many other elements as well.
Jonathan: It is a good idea to make a will. I suppose one way around this would be to say that when you die, the right to license your voice should belong to the executor of your estate and that you should have to clear that with the executor of the estate of the deceased person.
Mati: Exactly. Interestingly, especially on the most prominent figures, we are hearing that a lot of those people would already put some clauses around that in their contracts where my biometric information of XY can actually be used or licensed in the future with profits benefiting my family. It’s definitely rare. I think we had a few conversation when this happened. Some kind of will or agreement are legal conditions in place of who inherits that, would be an interesting question as well. You inherit everything else, but do you inherit some of the voice parametrics of the person that passed away? It’s a big question, I’m sure.
Jonathan: What do you think this technology means for the future of audiobook narration?
Mati: There’s over a million titles created and published by the indi authors. I think this will be the first category of people that will adopt it the most. People where you can provide clear additive value beyond what exists. For current audiobook productions, we think this will probably go into three pieces. The first one will be voice actors will work more with AI on creating even better content, whether it’s more iterations to get the right quality or whether it’s a pre-production and post-production of somehow something sounds. Then getting a voice actor to actually narrate that, that’s going to be one.
Then second-voice actors will likely specialize as they do now in specific niches where AI just won’t work as well. Imagine a fantasy book that’s going to be a very different emotional part that voice actor can provide and only can provide better. Then there’ll be a third piece, which is what we spoke earlier, which is we imagine that a lot more books will now provide you more flexibility both in terms of the characters you listen to, but also how you listen to.
Currently, audiobooks will be published, it might be available in one or two versions. Many people will comment how the voice of narrator is the key thing driving whether you enjoy or not that audiobook. I think in the future that will likely shift where you as a listener have a plethora of characters you can listen to through the entire audiobook. Something that you can identify to or something that speaks more to you in terms of quality or something that you enjoy will be an option, so those three. In summary, voice actors working closer with AI on creating higher quality content. Two, ability to voice books that just weren’t affordable in the standard process previously. Three, giving that flexibility in that content how it can be listened to.
Jonathan: This is a very radical sea change for a number of groups, including the blind community because in the past and even on this podcast we’ve had debates about, do you prefer to read with a text-to-speech cranked up so that you just get the information and it imposes your own voices on the characters, that kind of thing, or do you prefer to be read to by human narrator? One sounded robotic and mechanical and the other was full of expression and inflexion. That distinction is going away with the technology that you are creating. It becomes a very different discussion.
I actually had a debate, discussion with an audiobook narrator on Mastodon who is really upset about the evolution of this AI. I made the point to him that you can’t put the genie back in the bottle. This technology is coming ready or not. We saw what the music industry did where the RIAA and some of those big players tried to resist the downloading of audio. They tried to put DRM all over it. In the end, the music industry has changed to the extent that streaming services like Spotify and Apple Music, they’re essentially used as promotions for tours. You don’t get a lot of revenue from album sales anymore because of the way royalties work in streaming.
This whole industry is changing. What I said to the narrator was, in future where you will make your money is getting good quality deals when it comes to licensing your voice. You will be able to determine the ways in which your voice is used, how long that license lasts, what you will charge, that’s where the future lies. Some people don’t like the sound of that future
Mati: That sounds like a very interesting conversation as well. I also would draw a parallel to the music industry where I think to some extent will happen. We would love to support that as well. Where it creates a marketplace of voices provided by professional voice actors which can be used by people where some of the benefits go back to the voice actor for the specific use cases.
What we hope will happen is that with so few of the textual content just being available the set and the scale of what you can now voice is just so much bigger that above generates demands for those few specific pieces to those voice actors, but also just because there’s just so much of additional audio that will be possible that will directly benefit and provide revenue for everyone out there.
Jonathan: That’s right. There’s an accessibility issue here which is that only a tiny fraction of the world’s books are accessible to blind people right now.
Mati: Exactly. If you look in the book authors or even if you think about news articles, it’s crazy to think how you go to most of the websites and they’ll actually not have an audio feature on it. Some are starting to. I think now with the quality being at a threshold where potentially some of the news publishers will want to actively try to support it. It’s just so much content that could be delivered this way.
Jonathan: Yes. We had a listener who wrote into the podcast last year to say that she really likes the Washington Post website because they quite consistently have an audio option on their articles and she likes to listen that way. Is there any other stuff that you’ve got in the works that you might be able to tell us about? I think you’ve got some new changes coming up in February and March so they’re quite close now.
Mati: We are launching later this week ability to design a voice. You don’t have to rely on cloning one, but you will be able to create entirely new voices. For example, I want an older-sounding male with low pitch with British accent and that will be entirely new voice synthetic one that can be used for your use cases. That’s coming later in the coming week.
Then later in March that project’s workflow will launch, which will allow arbitrary set of content to be voiced and provided in terms of audio with set of changes that one can make through pronunciation, character voices, easy regenerations. Then along the way, we are taking few side bets, hopefully, also with the launch in March of providing an easy way for anyone to listen to audio on their mobile device as well. We’ll see whether we can deliver on that last one because it’s definitely a side bet so. Hopefully, that would provide a way in the initial form that would be just set of audio created from public domain where people can listen for free and enjoy it.
Jonathan: Are you surprised by the degree of engagement that you’ve had from the blind community in recent times? Did you appreciate that the blind market might be as interested as it’s become?
Mati: Definitely not to that extent. Really happy on myself and on behalf of the whole team, you reaching out to us, but also for the wider community of all the support. We’ve been reading on social media and received a number of emails and notes. We couldn’t be happier that we can play a small part in hopefully broadening access to that new medium.
Jonathan: Any chance we might see you on Mastodon because you’re all over the place, you’re on Twitter and discord and things and we don’t see you there yet. Is that a possibility?
Mati: Possibility, it’s coming. We’re trying to, hopefully, get to all platforms where we can attract users. As part of this interview, I’m happy to commit to getting us on Mastodon, and hopefully helping there more actively as well.
Jonathan: Mati, I hope we can keep in touch, because I know there’ll be further announcements coming from ElevenLabs over time, but in the meantime, thank you so much for the chat. It’s been really interesting, and I wish you all the best with the expansion of the products.
Mati: Thank you so much, Jonathan, and thanks everyone for helping along the way. I mean, the positive notes, and the support is incredible, and that’s exactly what we need to help developing, hopefully, what will be the best voice that we can provide to everybody out there.
Jonathan: If you would like to have a play with this technology, you can go to ElevenLabs, that’s the number eleven spelled out, elevenlabs.io. They have a range of plans there. There’s a free one, there’s also a $5-a-month one, and it goes up from there. I think if you subscribe to the $5-a-month plan, you do get the first month free, which will allow you to try it at no cost. What you pay will determine how many characters a month you have to play with.
[music]
Transcripts of Mosen At Large are brought to you by Pneuma Solutions, a global leader in accessible cloud technologies. On the web at PneumaSolutions.com. That’s PneumaSolutions.com.
iPhone DFU and other matters
Sabahatin has emailed in with a plethora, a plethora of subjects, so let’s get into it. He says, “Congratulations on the arrival of the new tot. You must be delighted. All of the wonder of a new bundle of joy without any of the trouble and stress of looking after her. Marvelous.” [laughs] Couldn’t have put it better myself, and thank you for the congratulations. He says, “That’s the second time we have heard a horror story about an iPhone 14, fresh out of the box, including yours.
Although I agree that it’s a deeply unfortunate state of affairs, remember DFU Mode, you could always immediately put your shiny new device into DFU mode, and flash the latest image onto it from a computer before beginning your setup. It should be guaranteed compatible with whatever transfer method you use, and it will be a little bit fast as and activation will be done by the restore process.”
Yes, very good point and it’s one of those things that one doesn’t think about in the heat of the moment when you’re trying to get your jolly device to talk, so thank you for the reminder. Also, he says, “You can use your computer to perform a transfer of the backup directly, if that’s your preference. I know it’s not the experience we deserve, but it definitely works.”
He continues, “You are not supposed to know this, but Audible enhanced format, .AAX files are just Apple Books/iTunes compatible M4B files that have been encrypted using your account key. You just have to decrypt them, I think at best not to say how to avoid getting anyone in trouble, and you end up with files that will play .M4A, .AAC, or, .M4B files. The M4B file just has chapter markings in the metadata in the file.
When I last tried it, the stream would play Audible more or less correctly, including being able to skip precisely back and forward, and jump by chapter, but not the M4B files. Yes, and I suppose that’s good for people who want a geek out. The trouble is that the stream is not really intended for that market, so some people will do it, but a lot of people will just want the thing to work the way it’s meant to.”
Continuing with email, it says, “Given that there are perfectly legitimate uses for M4B files, and in fact, you can purchase them elsewhere, for example, downpour.com, and bigfinish.com, it seems to me Humanware would lose nothing by supporting M4B files properly. Although it could never be a replacement for official support from Audible, it would certainly represent a more plausible product offering.
I’m not sure if the news stream would be right for me though. I do very much adore the concept, and understand the rationale many people have for a separate device, but it does always seem to be, as Chris says, ‘A bit far behind.’ The other thing to note here following the demo of Audible activation is that, at least in the past, activation was tied to the device, not to the card. It’s a hidden file in the card named something like Audibleactivation.sys.”
Oh, my goodness, I haven’t thought of a .sys file for a [laughs] long time. “If you just copy that file to another card, and reproduce the directory structure for books, you can use more SD cards without needing to activate your stream for every card using the AudibleSync app. I’ve done this and it works. It might be quite helpful to others having trouble with the software to know that it’s possible.
I’m so pleased to hear that Rim is coming to the Mac. My hodge-podge of a setup does work, but it would be really nice to have something that could plausibly be used for sighted users too, and that wasn’t so brittle to set up, and get right. Once things start to look official, I’ll definitely be giving it a try. Looking forward, as ever.” He concludes.
Braille instruction
Dawn in Sydney is writing in and says, “Hi, Jonathan. Firstly, congratulations on the new little granddaughter.” Thank you so much. “I thought I would share something that has shocked me deeply, and an attitude that I hoped was non-existent these days. I spoke to someone not too long ago, who had met someone who was starting university. When this person was asked if he would use Braille in his studies.
He replied that, when he was starting high school, the teachers told him not to bother with Braille, because it was outmoded and too much trouble. I was so incensed when I heard this. I feel this is just an excuse for a teacher not to have to bother with Braille, when there are so many pieces of equipment to help teachers who have blind students who wish to use Braille.
Teachers need to encourage students to read Braille, not discourage them because the teacher won’t take the extra time that may be needed to facilitate a person’s education in the best way possible.” Thanks, Dawn. I think that there may be some teachers in the profession who feel this way, and we really need to challenge this and say, “Well, if it’s okay for blind kids to skip Braille, is it okay for sighted kids to skip print?”
Of course, the answer is, absolutely not in both cases, but also it is true to say that a lot of teachers are under enormous pressure, because the sector isn’t being resourced properly. We’ve really got to continue having these conversations because, although it is easy to have pockets of resources in schools for the blind, we’ve talked in the past about abuse that many of us– Well, some of us anyway, have suffered at schools for the blind over the years, far too many of us have suffered that kind of abuse.
Many of us understand all too well the social pitfalls of schools for the blind, as well as some potential benefits. The fact that we can benefit from mentors, the fact that there may be in some people’s perceptions like culture around blindness. There are potential benefits as well, but the philosophy has taken hold that everybody should be entitled to attend their local school. That sounds great, but we should also be ensuring that everybody is entitled to a proper education at that school.
When a blind child is being put in front of teachers for the majority of their time at school who are illiterate in blindness terms, that’s a concern. We have many wonderful, dedicated, conscientious people in the sector, who are doing great work every day, and I salute them unreservedly, but, man, many of them are really stretched. Sometimes what happens when people are stretched, is that you inevitably get rationing decisions.
We’ve just got to watch the fact that Braille is not rationed, and only given to those students who are perceived to be the most capable, or the most blind.
IP Braille table
Haya Simkin is writing in and says, “Dear Jonathan, congratulations on the birth of your granddaughter.” Thank you so much. “I’m not a parent or a grandparent, but I have nieces and nephews, who are five or younger and I love them. They fascinate me, especially, when they change so quickly every two weeks or so when they’re very young.”
Yes, it’s a lovely time, isn’t it? “Enjoy them.” He says, “Maybe when she’s verbal, you can have her on the podcast. A man called Tony Schwartz who recorded just about everything he did, made a recording where he spliced together recordings that he made of a child, perhaps his own, from the time she was several months old, up to when she was at least six and it is a fascinating listen.
She goes from gargling and giggling, to telling you about something that happened to her, I think seemingly within seconds. Clearly. I don’t remember the recording very well. It’s been a few years since I heard it on a Radiolab episode, but if you can find it, I think you’ll love it.” I must say, I spent a lot of time recording my kids when they were little, and people who’ve listened to me on The Mosen Explosion over the years, will have heard the Banana Report and when they have turned 21, I’ve only got one child left to turn 21. I have put together montages of their life from when there were very, very little, all the way up to the present and it’s wonderful.
Of course, if they’ve got partners by that stage, the partners really appreciate it too. [chuckles] Now, to my real question, “I want to thank the listener who sent in the article on Braille and the international phonetic alphabet. It was an interesting read, and provided me both with some discouragement and some hope. The article said there is a Braille table for the IPA, and it’s no harder than learning any other Braille table and this gives me hope.
However, the discouraging part is that, no screen reader has it installed. Neither the Windows screen readers, nor VoiceOver on the Mac, which is what I use. Linguistics is a good career for blind people, to the point where it’s stereotypical, it’s even mentioned in the article. It just so happens I match that stereotype. I am hopeful again, because it seems that I read that you can add tables to VoiceOver as of macOS Catalina, which was about four years ago.
Is this true? If so, how do you do it? Do you need to be a programming geek? I’m a language geek, not a computer geek. I just use the thing. Is there a way to create the Braille table from scratch, or does a table already exist somewhere on the web, and I would just have to think about how to add it? At the end of the article, there was an appendix which has all the symbols listed, so I saved the link just in case. For any windows-using linguists out there, can this be done there? Thank you very much for your podcast.”
I imagine Haya that, you could just drop a Braille table file, if you can find such a file into the right place on your device. I don’t think it will be a complex thing, but I’m speculating. If anybody knows about adding Braille tables in macOS, and where you would get the IP Braille table that he is after, by all means, share the intel.
[music]
Automated Voice: Be the first to know what’s coming in the next episode of Mosen At Large, opt into the Mosen media list, and receive a brief email on what’s coming, so you can get your contribution in ahead of the show. You can stop receiving emails anytime. To join, send a blank email to media-subscribe@mosen.org. That’s media-subscribe@M-O-S-E-N.org. Stay in the know with Mosen At Large.
Looping audio on the iPhone
Jonathan: Russ Winetski is getting back in touch. He says, “Hi, Jonathan. So glad to hear you had a nice vacation. I’m trying to figure out how to make an endless loop on my iPhone, which I can listen to at night, to insert certain ideas into my subconscious mind. Things like Jonathan’s birthday is coming up on the 24th of April. You must save up and send PayPal.” Okay, he didn’t write that, I must confess, he did not write that, but he did write, “I called Apple Accessibility and actually got a VoiceOver user. “How exciting.” he says.
“After exploring all the possibilities, it seems it may not be possible to do it on the Apple Voice Memo app. The person with whom I spoke, suggested that I might be able to use a third-party app, which creates loops by loading the audio file into that third-party app. After checking all the possibilities of loop makers, it seems many of them might be very visual because there are things like circles you need to tap on, and many of them are made to loop music.
I was hoping either you or your listeners may have a solution to this problem. I’m hoping that it can be somewhat of a simple procedure, and not too cumbersome. As a last resort, I do have one of the original number-one versions of the Victor Reader Stream. Under the Recorder section, the seven key speaks the word ‘loop’. I’ve tried various ways of trying to use this key, however, nothing seems to work.
I know you don’t use the Victor Reader Stream, however, some of your listeners may have a suggestion. I’m really motivated to make this happen, and would welcome any suggestions, including any other recorders I might purchase that would have this feature. I’m glad your show is back after your holiday, because I always look forward to the podcast to keep me company over the weekend.” Thanks, Russ.
We were talking recently about the DJ App from Algoriddim, and I wonder whether that might possibly work, but I’m no expert in this at all, so let’s open it up, and see if anybody can help. Basically, what you want to do, as I understand it, is take a file, an audio file, and just have it play on repeat. I could tell you easily how to do that on a PC, but on the iPhone, I can’t immediately think of an app that might do it.
I’m sure somebody will. Please be in touch and help Russ out, Jonathan at mushroomfm.com. If you want an email, an audio clip, or write something down like Russ did, or you can call into the listener line 86460Mosen, 864-606-6736.
Court reporting and captioning
Eden: [unintelligible 01:45:16] coming to you from Vancouver, Washington, near Portland, Oregon. Just to let you know how much I am loving court reporting school, and how much I think it is an awesome career for anyone, and especially, a well-read person, and as a lot of us who are blind and have been educated are, and it’s an awesome career for someone who reads Braille, because when you’re in court, sometimes you need to read back, and you also have to edit your own work.
Right now, I have made it to 60 words a minute. Well, I can technically do about 80. I need to get to 225, so I have a long way to go, but it’s not bad for eight months, but people are saying that this career is dying because, well, we have digital recordings, [chuckles] but there was a big trial going on here in America, where someone got a rough draft from a digital company, and they’re like, “We paid 450 for this [unintelligible 01:46:22] What is that?” “No, oh, that’s not our court reporter.”
I actually had voc rehab tell me they wouldn’t pay for court reporting because well, court reporting is dying, and it’s not. It’s an awesome career, there are some accessibility challenges. Some of the software needs some tweaking to work better, but at least one of the companies, shout out to Advantage Software, who makes the software called Eclipse, not the programming software. It’s Eclipse software for court reporters.
For those people who don’t know, a steno machine has 22 keys, and you use combinations to make– Well, you can make different letters, but the really interesting thing with steno is, you make syllables more or, and the theory I use, we make a lot of words. We do what’s called short writing. A lot of people write out every syllable, but the theory I learned, we want to have to write fewer strokes.
I’m like, “I can do this.” I’m the girl who likes to play around with Grade 3 Braille, and that’s with an uppercase B, if I were writing, and you have to write it in certain order. Sometimes words end up being like anagrams, or you can write whole phrases. I can write, “State your name for the record,” in one stroke, [chuckles] and so I think that’s pretty cool.
I love steno, and I’m telling you about steno because it’s Court Reporting & Captioning Week here in the US. I wanted to encourage any blind people, or anyone who’s listening, who is not sure what they want to do, is good with writing, is good with spelling, has a pretty good acumen for their technology. You really probably need a good grounding in Braille, because you are going to be using that, and there are job opportunities.
You can freelance, you can work in the court. Yes, a lot of courts are going to digital now, but so many still want real writers. The problem is, we have a shortage. If anyone has questions, and would like to reach out to me, I’m not going to hide. I’m available for questions. You can contact me at edenkizer0718@gmail.com, and Kizer is spelled K-I-Z-E-R, so that’s E-D-E-N-K-I-Z-E-R0718@gmail.com.
I’m also getting back into the Ham Radio. Anybody contacting me, KC9WHD, I’m back on the air. Well, when I’m not editing for other court reporters, because that’s what I do now for work, that’s also a good job, if you are good at editing, and you don’t mind learning some software that’s a little expensive, I won’t lie. The editing software is, don’t fall out of your chair, Jonathan, $1,700.
It’s even worse once I buy the steno software, which is full steno, once I am out of the student version, it is $4,000. All of us in our community, who would talk about things being expensive, well, stenographers have the same problem. The difference being though, most of the stenographers I know are– Well, my one friend stated she made $85,000 last year. I’m going back a little late, but I have never felt so excited.
You know how most jobs are just a job. I love getting on my steno machine. I love being that fly on the wall when I’m transcribing or editing people’s things, and I’m tired of talking for a living. That’s what I did most of my time.
Jonathan: Well, Eden, how refreshing to hear somebody who loves what they do, I love what I do, too. It gives you a sense of purpose and bouncing out of bed in the morning, and all those good things. I’m glad that you found something that appeals to you in that way. It is not, unfortunately, National Court Reporting Week in the US now, because by the time I got that message from Eden, I’d already put the podcast together. Even so, if you’re considering a career, that might be one you want to consider.
[music]
I love Sonos
Kathy: Hi, Jonathan, it’s Kathy from California. I have been listening to you for many, many, many years, when you were with Freedom Scientific, and before then I think you were on ACB Radio. Anyways, I’ve always enjoyed everything you have produced. Thanks for such great content. I also wanted to let you know, it was through you that I came into contact with Sonos, and I just love their products.
I have two Play:1s, and a Mini Sub. By the way, the new Mini Subs are just spectacular. I can’t speak highly enough of them. My brother, he has very limited hands and usage, so he can’t really use the Sonos app, but I got to two 1s and a subwoofer. It works perfect through AirPlay that uses his phone and talks to Siri, and it plays everything. We are very happy campers, thanks to you. Thanks, and keep up the great work, Jonathan. Bye-bye.
Jonathan: Well, what a lovely message to get. Thanks for calling in on the listener line there, Kathy, and you do go back a long way. I like my Sonos stuff, as well. I’m pretty excited to hear about some of the new plans that they are working on. Rumors are that there are a couple of new speakers coming out. One of those the premium one is going to be called the R3, I understand, based on reports that I’m reading. That one is going to have spatial audio, the full 360-degrees surround sound.
Wow, all I can say is, if they sound as good as the blurb that makes them sound, and you get a couple of those with a Sub, it sounds like it’s going to be quite a remarkable experience.
Ham Radio
We haven’t heard from Kelby Carlson for a while. I hope he’s brought a note of explanation. Yes, here it is. It says, “Hi, Jonathan. It’s been quite a while since I’ve emailed the show. I’ve been very busy planning for the arrival of our new baby, who came last week.” Wow, congratulations, that is just such a special time. Hope that everything is going well, and that you’re enjoying being a dad. It is so super.
“Now I have a chance,” he says, “To catch up on all the podcasts I’ve missed, while I hold her in the morning.” There you go. You’re on to the magic formula. Not that formula, the magic formula to get your daughter to sleep, put on Mosen At Large, she’ll be dozing off in no time. “It’s been fascinating,” he says, “To listen to all of those, who have described their experience in amateur radio. It is both encouraging and daunting to hear about blind people building their own towers, something I would never have contemplated. I’ve always been intimidated by blind people, who do physically complicated things like woodworking, and other kinds of building projects.
I wish I could learn such things myself, but I don’t feel like I have the capability without being taught by another blind person, which isn’t feasible for me right now. Anyway, I just wanted to say ‘hello’ to everyone on the show, and plan to get back to listening to every week, now that the new year has begun.” Thanks Kelby. It’s so, so cool to get your good news. I used to air a show when I ran ACB Radio that was produced by Phil Parr, who sadly died some years ago now.
It was called The Blind Handyman Show. That was a great show. It really broadened my horizons, because there were two or three blind guys that would appear regularly, and that sort of be sitting there chatting away about all sorts of ways to do these things. It gave me confidence actually, I loved that show. It was very well-produced. To the best of my knowledge, there isn’t anything out there like that anymore. There’s a niche for someone, start a blind handy person podcast.
Comments on several products
Here’s Pam McNeil writing in from sunny Upper Hutt, and she says, “Hi, Jonathan. Just a note to say first that I got really fed up with a poor sound quality on my iPhone SE ’20, and gave it to a sighted friend who doesn’t rely so much on good sound quality as I do. A couple of weeks ago, I bought an iPhone 14, and am pretty happy with this. It didn’t take me long to get used to the Face ID system. Although I’d hate to be in the dark in an emergency, fiddling about trying to get the thing to recognize me.
Although I don’t usually like this feature, I have Raise to Wake turned on because this seems to get the phone’s attention long enough for it to look for my face. My only issue with the feature, is that every time I move the phone, the screen reader starts talking. This gives any sighted people not used to talking phones quite a fright, when they hear my handbag speaking.
Thanks so much for reviewing the new Victor Reader on the first Mosen At Large for the year, I was interested to hear all the feedback from listeners in last week’s episode. It is a pity about the non-removable battery, as I find if the player gets stuck in a loop of stupidity, and won’t work properly, removing and reinserting the battery usually resets the unit. Do you think the SensePlayer OCR from Pacific Vision might give the Stream a run for its money? Do you have any thoughts on this product?
I recently got myself a Mantis Braille with an uppercase B writer, mostly for its Braille terminal capabilities. I’ve paired it with my iPhone, but confess, I haven’t really mastered all its capabilities yet, as I am pretty time-poor at the present. I do, however, look forward to discovering all its features. Keep up the great work on your fantastic show.” Thank you, Pam. It sounds like you’ve got quite a few gadgets lately. That is exciting.
Face ID, first of all, I have Raise to Wake turned off, and what I do when I want to unlock the phone, is just hold it out in front of me and press the side button. It then just instantly unlocks for me, and by the way, it does so in the dark, I have had to unlock the phone at 2:00 or 3:00 AM, and have no difficulty with it. Just hold it out in front of you at the right distance, and then press the button when you’re ready, it unlocks right away.
The other thing you might want to check is, if you had sighted assistants help set Face ID up for you in the first place, it may have Attention mode turned on. That is much stricter and it requires you to be looking squarely with eyes open at the phone. Some blind people can manage it, but it does make it more difficult. That’s why when a blind person sets up Face ID with VoiceOver on, it automatically disables the Attention mode, you can manually go ahead and enable it again.
If you go into the Face ID settings and just check whether that Attention mode is turned off, if it’s on and you turn it off, you will find it much easier to unlock with Face ID. The SensePlayer is interesting. We have invited people from HIMS on Mosen At Large before to talk about other products, and not had any luck. I am continuing to reach out, and we’ll see if we can get someone from HIMS to talk to us about the SensePlayer, because it does sound like a competitor to the Stream, it may be a bit more complex.
You have to wonder, where does this product sit between the simplicity of the Stream, and the more advanced usage available in a smartphone? Would a SensePlayer user, or a prospective SensePlayer user be using a smartphone anyway? We’ll see if we can get any engagement from HIMS, would dearly love to do that because we like to let people know about as many choices as possible.
Looking for PC speakers
We need to crank this one up nice and loud.
Automated Voice: Crank it up.
Jonathan: Yes, because we’ve got an email from Steve in the UK and he says, “Hi, Jonathan, love the podcast. I learned so much from the great content and contributions.” Thanks so much, Steve. I have a question about speakers for my PC, and wonder if you or your listeners could help. I would like a pair of good quality speakers for my PC. I’m a JAWS user by the way.
I would like to be able to connect my iPhone 14 to them also to be able to take calls, I guess, Bluetooth. I also have a hearing impairment, and wear BTE hearing aids. These are Bluetooth as well, but I prefer not to have calls, since notifications keep coming through them, while I’m home and using JAWS at the same time. There’s too much going on, if I do this. Hope that makes sense.
Because of the hearing impairment, I do want a good quality pair of speakers to give me as much chance as possible to hear as well as I can. I would also be interested in a headset with a mic to do the same job. I try not to use headphones for too long as they don’t help my tinnitus. Life is never simple, is it?” He says. “Hope you can assist, and thanks for the great show.” Thanks, Steve. Based on your description, it sounds like you don’t like working with JAWS, and having the iPhone coming through the same source.
What I’m inferring from your email is that, you want speakers that will let you use them JAWS at one time, and then use them with your iPhone at another time. Probably, the easiest way to connect to the PC would be via USB, or the analog jack, but most likely USB, and then a Bluetooth option to connect easily to your iPhone.
There are some very good quality speakers out there these days. The sound that people are getting from their PCs is remarkable, especially, if you go the whole hog, and have a Sub and surround and things, I mean you can be immersed in sound from your computer, but if anyone has any hints for Steve on speakers that will let them work with the PC with good quality audio, and pair via Bluetooth with an iPhone as well when that’s required, please share. It’d be great to get some brand recommendations.
[music]
I love to hear from you. If you have any comments you want to contribute to the show, drop me an email written down, or with an audio attachment to jonathan@mushroomfm.com. If you’d rather call and use the listener line number in the United States, 864-606-6736.
[music]
[02:01:33] [END OF AUDIO]