Podcast Transcript: Mosen At Large episode 130, you’re invited to participate in the Sonaar research project to improve the quality of image descriptions on the web
Jonathan Mosen: I’m Jonathan Mosen and this is Mosen At Large, the show that’s got the blind community talking. On the show today just by installing a Chrome extension or Android app, you can help with a research study, which seeks to spread quality image descriptions across the internet. Meet SONAAR.
Jonathan: We’ve all been there, we visit social networking platforms that are fully accessible but the degree to which we can enjoy them is impeded by the platform being full of images without any meaningful description. Some social networks have sought to automate the image description process but too often, those processes don’t deliver useful information. SONAAR seeks to help. It’s a Chrome extension and an Android app, and they’re looking for testers right now, people like us. To tell us about the project, I’m joined in Lisbon by Carlos Duarte. Carlos, it’s really great to have you here. Thank you so much.
Carlos Duarte: Thank you so much, Jonathan. Thank you for having me.
Jonathan: Tell me a bit about yourself. What’s your background and how did you become involved in this project?
Carlos: I’m a computer science professor here at the faculty of sciences of the University of Lisbon. I also do my research here. I started getting into the accessibility field, I guess almost 20 years ago. This was when I started my PhD. Initially, I didn’t have in mind getting into accessibility. I was working on adaptive systems, systems that learned from their users and changed the way in which they answered their users, but I needed a domain where to try out my ideas.
Through a series of events, we had this project going on that was supposed to work with digital talking books. That’s where I got into the accessibility field. That was my first venture into accessibility. I did try out my adaptive ideas on this domain. I built a digital talking book reader, and I guess from then onwards, I’ve been working on accessibility in different research projects in different ways, targeting different user groups.
This led us to SONAAR. It started February last year, and there was a call from the European Commission that wanted to fund some research works on the accessibility of user authoring tools. As part of our research, we don’t have user authoring tools, but most of the amount that’s generated nowadays on the web, it comes from users. It doesn’t come from designers, professional content authors. It’s users on their social networks that in fact are generating a large amount of content.
We thought this content, especially visual content, most of it it’s inaccessible. If there’s an open call for projects that target content authors, let’s get this idea, propose a project that would try to increase awareness to the authoring of accessible content on social networks. This was really a long shot for us, we knew that this wasn’t what the European Commission was looking for. They were looking for projects that targeted professional content authoring platforms and such, but it also stick with them, and they decided to fund us.
We have this funding, it started in February 2020 and it will last until next July, so it’s almost over. We do want to explore, I guess, two main ideas in this project. The first one, we know that most people are not aware of the need to create accessible content, that is the first barrier that we need to overcome. Second, we know that people that are aware, most of the time they don’t know how to author accessible content on social networks, Facebook, Twitter, Instagram, whatever, and not only do not they know where to do this but they don’t know how to do it. How should I write a description for an image? What should I include? Is it supposed to be short, long?
Do I need to describe everything that’s in the image or can I just focus on a specific aspect of the image? These are a lot of questions that go through into the mind of people that try to author content in an accessible way. As you said, we were aware at the time that Facebook, Instagram, other platforms were also looking into it automatically creating a description for an image.
Jonathan: Let’s say that somebody is on Twitter, which actually has got a pretty good user interface now for captioning content accessibly because there’s this lovely big field where you can type quite an extensive description, but as you say, perhaps somebody who’s inexperienced, just doesn’t really know what to type to make the content meaningful. How does your application assist them to type the right thing to give us something that we can use?
Carlos: We look at different sources of a possible captioning for an image. The one that we put most of our faith on is previous captions that other users have already entered for that image. If you are posting an image on Twitter or on Facebook and you are a user of SONAAR, and you have agreed to allow us to look into your posting process, we collect your image and your captioning, your description of the image, and we store that without any way to identify from where did this image came from, who posted it, or we just store the image and the caption.
Later, another user is posting exactly the same image or the image with a very small modification, so perhaps, a bit of a crop or a small change on it, we detect it’s the same image and we have that image, we have the description for that image stored. This is what we hope to be the best suggestion that we have for such image. SONAAR, when it detects that the same imaging is being posted, it will suggest a previous captioning of that image.
If we don’t have that, we use an external service, it was not developed by us, but that can identify what’s on the image similar to what the automated mechanisms of Facebook and Instagram do and we can suggest a set of concepts. We don’t fill it out automatically for the user. We know that you are posting an image, here is a suggestion of a description for that image.
You can use it, we recommend to improve it because each user has their own context and the description should reflect the specific context in which the image is being posted. We don’t do what Facebook or Instagram does which is to place it automatically there and the users don’t even get notified about it. We show the user, “Okay, you are posting this. You should have a description to that image, here is a suggestion of a description that you should improve upon.”
Jonathan: I’m not sure if your extension does this already, but one of the things that does happen too is that even people who have the best of intentions sometimes forget to add a text description in the heat of the moment. They attach a photo maybe that they have taken and they just hit send, and they’ve forgotten to add a text description. Do you think there’s a case and I’d love to see, for example, Twitter add this, where it prompts you and says, “You are about to send an inaccessible tweet, are you sure you want to do this?”
Could SONAAR, do you think do that to actually encourage people to attach something meaningful by way of a text description before they hit send?
Carlos: In fact, we do that already because it’s part of the process of posting a tweet with an image. When we detect that an image has been uploaded, immediately we say, “Okay, you are posting this, you should have a description for it. Here is a suggestion.” We don’t prevent the user from hitting send. We could do that but I think that would be taking too much of the control from the user.
Jonathan: A bit punitive.
Carlos: Exactly. Every time that an image is being posted, before pressing send, there will be a notice saying to the user, “You are posting an image, please make sure you have an alt text, and here’s a suggestion for you.”
Jonathan: Do you have to have the extension at both ends for this to work, so the sender, the creator of the content has to have it, and also, the end-user has to have it as well?
Carlos: No. In fact, SONAAR works at both ends, but you don’t have to have it at both ends. If you have SONAAR installed, you will get this every time that you try to post an image on Twitter, on Facebook. We just support those two social networks at the moment. If you are on the web, in any page, and you come across an image that does not have a description, then you can ask SONAAR to get you a description of that image and the process is similar.
We search into our image database, if we find the image there with a description attached to it, then we change the page to include that description for the image, and a screen reader user can listen to the description of the image. On the Android platform, what the user is required to do is share the image with SONAAR. When the image is shared with SONAAR, we send back to the user the description of the image.
Jonathan: It sounds quite similar to a feature that is built into JAWS which of course not everybody has, and it’s a paid product. They have a feature called Picture Smart where you can select an image from social media, and it sends it to various image description services. I think Microsoft offers one, Google offers another and gives you a range of descriptions, so sort of similar functionality but because it’s in a Chrome extension or an app, it’s not dependent on the screen reader you use.
Carlos: Exactly. Once again, I would like to highlight one of the differences comparing to what those services offer because the automated descriptions, I believe they will get there but they haven’t gotten there yet to really create descriptions with enough quality. Since we store descriptions generated by users, not just machine-generated descriptions, we store what users write, we do hope that our descriptions have higher quality than those that are just machine-generated.
Jonathan: That’s the thing, isn’t it? I mean, you can give Facebook an A for effort, I guess, but then when you get so many descriptions where it says image may contain outdoors, footwear, or something really nebulous like this, it tells you nothing, it doesn’t add any value at all to what you’re reading.
Carlos: Yes, exactly. That was our major motivation to try and store what users are in fact writing, instead of relying solely on machine-generated descriptions.
Jonathan: Is the success of this dependent, though, on getting take up from social media creators? If people won’t install the extension, then presumably, the project is stalled, isn’t it?
Carlos: Yes. I think that’s one of the major challenges that we have going forward is to try and get enough of a user base using SONAAR so that we can have enough descriptions to make it useful. There are other challenges like one that we became aware very recently. We currently support SONAAR in two languages, English and Portuguese. When initial Portuguese and Brazilian users starting trying out SONAAR they say, okay, there are a lot of descriptions in English but these aren’t really useful for us if we don’t know English. It’s about getting a consistent user base across many languages, or to try and come up with a solution that uses machine translation of the language so that we can use something that’s been written in English also for other languages for instance.
Jonathan: I invited you onto the show now because I understand that you’re looking for people to participate in a study at the moment. What criteria must people meet to be of value to you as a participant in the study?
Carlos: What we’re looking for are screen reader users that are active on social platform, Twitter or Facebook, because those are the ones that we support, and that are also able to convince some of their friends on these social platforms to also install and use SONAAR.
Jonathan: How do people express their interest in being a part of the study?
Carlos: I think there are two ways to do that. One is to get in touch with us, drop us an email and you can reach out to us at SONAAR, S-O-N-A-A-R@fc.ul.pt.
Jonathan: We’ll put that email address in the show notes so people can refer to that.
Carlos: Okay, great. The other way is to install SONAAR and use the application to report the interest in participating in the study.
Jonathan: Yes. I saw the Chrome extension in the Google Chrome Store and that will work with any chromium browser that has support for the Chrome Store. For example, I can use it on Microsoft Edge, the new one, and other chromium-based browsers as well.
Jonathan: Once it’s installed, what’s the user experience like for screen reader users? What happens? How will they notice that it’s installed?
Carlos: They can activate it whenever they need a description of an image. It has a keyboard shortcut that allows a screen reader user to request descriptions for, in fact, all the images that are present on a page. This is the content consumption experience. After requesting the descriptions for all the images on the page, the screen reader user then can navigate through all the images. We not only add the description to the image, but we make the image also tappable. You can use Tab to navigate to the images, which should allow you to more quickly navigate through all the images if required, if not using the screen reader already to do that.
For the content authoring experience, as soon as we detect that an image is being posted, we present a notification that’s also screen reader accessible. To let the user know that an image is being posted, then a description should be added to the image which screen reader user don’t need that information, but also with the suggestion of the description for the image, which is automatically copied to the clipboard, and then can be pasted into the alternative text in the description field.
Jonathan: Does this work only on a handful of sites that you specify for the extension, or if for example, somebody is on an online shopping site where there’s a lot of images there, can they use it on that site too?
Carlos: They can. The content consumption, what we call the content consumption can be used on any website, or in the case of the Android application or in any application. On the website, you just trigger the extension. As I just mentioned, it populates the alt text fields, it populates textual description for all images on the page. On the Android version, you need to share each individual image that you need a description with SONAAR.
Jonathan: This is really interesting. By installing that Chrome extension, S-O-N-A-A-R it’s spelled, they are participating in the study and helping the product.
Carlos: Just by installing, yes, they are participating. We will be conducting different phases of this study. This initial phase, the one that I described, we are looking for screen reader users that can also bring with them a couple 3, 4, 5 of their social network friends. Later, we will have a second phase where everyone that has installed is in fact contributing to the study, and hopefully, SONAAR can get a good user base and keep growing. Just by using it and providing some descriptions, you are already contributing to improving the quality of descriptions for other users elsewhere on the internet.
Jonathan: Is there a finite period for this phase of the project? I mean, one of the things that could happen, I suppose is that funding runs out or other projects are being given priority, that people could quite like this and then find that it’s gone away when the research phase is over.
Carlos: Unfortunately, I would say that’s a real possibility. We will do our best not to happen, but the funding will run out in July. That’s the duration of the project. We will keep working on SONAAR after the funding ends and hopefully, we won’t have to pull the plug. So far the expenses for storing the images and using the service that is able to compare images are pretty manageable but, of course, this is something that will grow as the user base grows. In fact, I would say the best solution for SONAAR would be to have the social networks themselves having this kind of service and supporting it directly in their services. Hopefully, sharing descriptions between themselves.
This could also take care of another of the challenges that we face, which is the very frequent updates to the interfaces of Twitter and Facebook because we need to detect when a user is posting an image. Every time the interface changes, we need to change something in SONAAR to be able to keep detecting that specific part of the tweeting process or the posting process. By having the social networks themselves, integrating what we have been doing that would be the best way to make sure that SONAAR would be useful for a long time.
Jonathan: Right. I can see that’s a real possibility because the European Commission has not been afraid to regulate. I actually think we’re going to see some very significant regulations coming with respect to antitrust and social media generally. Why don’t they just, or perhaps they will eventually after they’ve done all of this inquiring, they may well insist that social networks take this issue a lot more seriously than they have been.
Carlos: That would be brilliant but, in my opinion, I don’t think that’s going to happen. There have been some major steps in what regards to the accessibility of digital content in Europe. The European Accessibility Act did in fact bring this to the table but it still only applies to public bodies. It’s great that public bodies are required to have an accessibility statement and to provide accessible websites and in a couple of years to provide accessible mobile applications but, as I said, currently this European Accessibility Act applies only to public bodies. I do hope that it will come a time when this needs to apply also to private companies and their online presence but we’re still not there, unfortunately.
Jonathan: Will this come to iOS at any time, do you think?
Carlos: We would love to but I guess at this point, it’s a matter of not having enough resources to support it in both platforms.
Jonathan: Of course, in Europe in particular, Android is way dominant, isn’t it?
Carlos: It is but if we look at the blind and visually impaired community, I guess iOS even in Europe, it’s still the major platform.
Jonathan: That’s SONAAR. It has a lot of potential because if somebody takes the time to add a good quality description to an image and that image is coming up all over the web, then why not take advantage of that good quality description for us all to share. It makes a lot of sense. It’s crowdsourcing the describing of images in a meaningful way. I’ve got Google Chrome up at the moment, and I have a page loaded that has been put together by Carlos which shows us how SONAAR can work and what a difference it can make.
The reason why I have Google Chrome up and running is that when I installed this extension in Microsoft Edge to record this demo, I suddenly realized that there’s a keyboard conflict at the moment. Ctrl+Shift+S is the key which we will use in just a moment to invoke the sonar extension but in Microsoft Edge, which is a chromium browser and should therefore support this extension, that shortcut is actually used for Edge’s web capture mode.
This is a work in progress. There have been a few versions of the extension as I’ve been researching this SONAAR program which have been updated to reflect keyboard changes and other work that is still going on with this project. If you go into the Google Chrome Store and you search for SONAAR, S-O-N-A-A-R, you’ll get the latest extension. Once you have it installed, then the extensions do update in real-time and we will provide a link to the extension in the show notes. I’m going to perform a JAWS key with T to read the window’s title. I’m using JAWS for this demo.
Automated Voice: Example page for SONAAR Google Chrome.
Jonathan: We’re in the sample page for SONAAR and we’ll explore this page just so we understand what’s on it.
I’ll first perform a say all and we’ll listen to what’s here.
Automated Voice: Example page for SONAAR. This page has four pictures. The first two do not have a description. The last two have a description, but one that is not really descriptive of the image. The page can be used to demonstrate features of SONAAR. You can browse the page to check the existing descriptions left-paren or lack of descriptions right-paren. Then activate SONAAR by pressing Ctrl+Shift+S.
After SONAAR finishes executing, the description will have been updated or augmented. All images that have had their descriptions modified by SONAAR will also be made a tab stop. Next, there is a picture of a dog without a description. This picture is not on SONAAR’s database. SONAAR will add a description based on concepts identified in the image. To get missing image descriptions, open the context menu on label graphic.
Next, there is a picture of a cat without a description. This picture is already on SONAAR’s database. SONAAR will add a description based on the descriptions stored in the database. Next, there is the same picture of a dog but now with a description. This picture is not on SONAAR’s database. SONAAR will not update the original description but if you enable SONAAR to query for additional descriptions of images that have an original description, then it will augment the image with extra descriptions based on the concepts found in the image. To access those descriptions, you can tap to the image and press Ctrl+Shift+D.
Picture graphic. Next, there is the same picture of a cat but now with a description, this picture is already on SONAAR’s database. SONAAR will not update the original description but if you enable SONAAR to query for additional descriptions of images that have an original description, then it will augment the image with extra descriptions
based on the descriptions found in the database. To access those descriptions, you can tab to the image and press Ctrl+Shift+D. Image graphic image.
Jonathan: On this page then, we’ve got some images and the most important thing to note is that right now they are not at all accessible. They just say image and Google Chrome now offers the ability to get text descriptions of images, some of which are more helpful than others. You will hear in some instances Chrome offering to make use of that feature. I have the SONAAR extension for Google Chrome installed now. Just as the instructions say, I’m going to press Ctrl+Shift+S to activate it. You heard that beep, that tells me that it is scanning the pictures on the page. That second beep tells me that scanning is complete. It didn’t take too long. Now we’re going to go to the top of the page-
Automated Voice: Example.
Jonathan: -and I’m going to press the Tab key.
Automated Voice: Example page for SONAAR, this image may contain “dog, Pat K9, grass, poppy, cute, tongue animal, mammal, portrait, pedigree, adorable, funny” graphic.
Jonathan: Because SONAAR has done its scan, now that I’ve pressed the Tab key, it has taken me straight to that image and described it. You recall from our reading of this page that this first image has been added by SONAAR but it is not a human-generated description of an image. It’s one of those machine-generated ones but it is still pretty extensive. It’s similar to the kind of description you might hear if you use the JAWS Picture Smart feature. Now I’ll press Tab and if I remember correctly the second image is described by SONAAR by a human image describer.
Automated Voice: A cat looking at the camera while resting; a cat looking at the camera while arresting graphic.
Jonathan: That’s a good description there. If I press Tab again.
Automated Voice: Toolbar.
Jonathan: The reason why we’re not getting the other pictures described is as the page indicated, these ones require you to get the full image description by pushing the Ctrl+Shift+D, describe hotkey. At the moment, the version of the Chrome extension that I’m recording with has not enabled that feature yet. They’re uploading an updated version to the Google Chrome Store that gets rid of a keyboard conflict. By the time you install this extension and have a play with this, then you will be able to use Ctrl+Shift+D to get that additional description.
That’s the kind of difference that SONAAR can make and you can try this on various websites if you have the extension installed and see what a difference it can make by just going to the website and pressing Ctrl+Shift+S and it will perform a scan. You’ll hear the first beep to say the scan has started. The second beep indicates that the scan is completed. Do check the show notes for more information about how you can get this extension and help them out participate in this research study.
To contribute to Mosen At Large, you can email Jonathan, that’s J-O-N-A-T-H-A-N@mushroomfm.com by writing something down or attaching an audio file. Or you can call our listener line, it’s a US number 86460Mosen. That’s 864-606-6737.
[00:29:21] [END OF AUDIO]