Visually impaired Hungarians are assisting Ergomania to develop Voice User Interfaces for all mankind
At the end of 2020, Ergomania embarked on the next phase of a larger project. Since 2013, we have been developing solutions on the field of voice control and voice navigation, so we sought out the blind and visually impaired to aid us in our research on voice control. The visually impaired are playing an increasingly important role in the design of VUI (voice user interface) worldwide, and Ergomania’s primarily concern is user experience design, especially on online interfaces.
VUI makes the use of objects more natural
Although it may sound weird at first, especially in our world full of keyboards, mice, buttons, and touchscreens, we have already crossed the threshold of how we are going to handle objects in the near future. Years earlier, we were among the first to recognize what most of the world was forced to realize during the coronavirus epidemic: voice-based, non-contact treatment is literally a vital element of many services.
The percentage of VUI is rising steadily
It can also be observed that the number and proportion of voice-based searches are constantly increasing: in 2020, 20% of searches were voice-based, and 58% of mobile users had tried this feature at least once. In addition, as the number of digital assistants at home continues to grow (at least 40 million households in America alone already have one), so does the number of voice-based uses.
Aside from the fact that certain devices are primarily designed for voice-based use, and that voice-based use is the only possibility in certain instances, there is another serious reason why more and more people are talking to their smart devices.
Objects are controlled manually, but we give voice commands to animals
The answer is essentially the same as the one explaining why so many people are reluctant to control a smartphone or vending machine with sound – but why it is just natural for others to google something using voice commands.
Over the past millennia, humanity has become accustomed to handling inanimate objects by hand, whereas animals – which are in turn considered somewhat tool-like, but one equipped only with limited cognitive capabilities – are being controlled by voice commands. No matter if these “tools” are farm animals getting short commands or pets, like cats or dogs getting more coherent sentences, voice commands are the natural way to interact with them. In short, manual control for inanimate objects and voice-based treatment for living things are natural for most people.
Therefore as long as a simple mobile phone or computer is considered as an object (although both can be controlled by voice commands), smartphones and digital assistants are already being moved into the animal category and are being seen as a personal butler or sensible maid, or even a subordinate.
Digital assistants are to become people?
Digital assistants who are able to arrange appointment reservations, food orders or other activities by imitating a person are more and more considered rather human-like for a lot of users.
Some of these users said they have been unable to treat their digital assistants simply as machines, and there is even a philosopher who says that how we handle artificial intelligence reveals a lot about us, while basically we are only talking about machines. As people increasingly treat digital assistants and smart devices as living things we can expect that business and economic actors are to follow suit rather sooner than later.
In our experience VUI solutions are also gaining popularity among our clients. Therefore we decided to involve in VUI design the very people for whom voice-based management is the (almost) exclusive option when using smart devices anyway.
Ergomania keeps the visually impaired in mind
Our goal remains to better understand their needs, thus planning the most usable interfaces and voice navigation possible. In addition to helping our fellow human beings it is at least as important an aspect to create a more livable environment for as many people as possible. According to official dara there are at least thirty thousand legally blind people living in Hungary, and if we take all the visually impaired into consideration, we are talking about more than two hundred thousand people – of whom a significant proportion lose their sight in part or completely due to some ailment frequently brought upon them by their advanced age. Globally, the number of blind people is close to 40 million and that of the visually impaired is 250 million.
We were looking for visually impaired people who were happy to talk to us for an hour on the subject. The interviews were conducted by Tóth Rózsa Vanda and we built them in such a way that we get to know and understand as much as possible all the challenges, opportunities and desires that are part of the daily reality for the blind and partially sighted.
Eventually, six of them took part in the interviews. (For reasons of privacy, we are disclosing only their first names.) Anita is visually impaired, works in the field of marketing. Sándor, Anita’s husband, is completely blind. Szilvi visually impaired, voice navigation is of utmost importance for her. Edit was born blind, she graduated at a college of theology. Szilvia is also blind since birth, she went to law school. Mihály is completely blind, works as an IT specialist. Peter is visually impaired, and works as a special education teacher.
What the Hungarian visually impaired are talking about – this is how the interview was structured
During the interviews, our main focus was on how the participants live their daily lives and especially what (smart) tools they use and how. We specifically covered voiceover software, computers, and digital assistants. We wanted to know what worked well, what worked badly and in general what causes the greatest difficulty in using the tools mentioned above?
In the field of software use, we looked for answers to questions such as what additional software they use to help their everyday lives, and we also asked them to show their use, if possible. We considered it important to find out what they thought was the biggest shortcoming or advantage of this software. We covered how the recognition of icons, buttons, images works for these software, whether they have encountered any problems in this regard.
Regarding internet use we were primarily interested in what matters are handled online and in what cases and situations they prefer online administration. All this is done on a computer or smart device, and with what parameters they use the internet, and what tools and software help them in this.
Software use of Hungarian visually impaired people in everyday life
In general, the visually impaired use smart devices and the services they provide through voiceover software (practically speaking: sound-based interfaces). Szilvia, for example, who has been visually impaired since childhood and graduated in law, used a laptop in high school (this was a brand new thing in Hungary in 2000), and took notes at university using a portable computer – specifically a screen reader software called Jaws.
Prior to the advent of smartphones, those who dabbed into the world of computer technology almost all used personal computers, but now they are also increasingly using phones, since it is easier to use them. Some say that certain pages are likely to be blocked by the screen reader, but the visually impaired are not very likely to find out whether it is true or not. This is a possible bug that is definitely worth investigating from the developers’ side. As Szilvia put it, the screen reader sometimes stays silent, and the users have no clue what is happening.
Mihály uses a Hungarian-developed speech synthesizer called Profibox that was developed a decade ago by the staff of the BME Department of Telecommunications and Telematics and the Hungarian Academy of Sciences’ Phonetics Laboratory). If he wants to read an audiobook, Mihály turns to Dex, an electronic book and conversion software that has built-in VoiceOver capabilities.
It is definitely worth mentioning nytud.hu, as the developers of the Hungarian-language NLU (Natural language understanding) and NLP (Natural Language Processing), who have been working on solutions for visually impaired people for years.
In Mihály’s opinion the spoken word part of the technology is still underdeveloped in Hungarian, while in other countries the developers are taking a profound interest in using friendly voices in their applications. If the intonation and the sound is not native Hungarian, it is tiring to listen to it in the long run.
Phones with keypads have an important advantage
Mihály and Sándor are old-fashioned users who believe that a telephone is manufactured for making telephone calls and the computer is being produced to be used for surfing the internet – Mihály still uses Nokia E51 with physical buttons, for example. It is a fact that for a blind person, the physical buttons are far better because they are easier to use. As Mihály puts it, he misses a cell phone that both has touch screen and physical keys. When you want to do something quickly, you need the keys. It is faster to write text messages and emails. You would use a touch screen to navigate.
The younger generations of the visually impaired have the upper hand
The current generation grew up with easily accessible computing, and embraced the tech advances making their lives markedly easier to live, but as Szilvia put it, there are those among the visually impaired, who approach this warily. He cited an older lady as an example who went blind at the age of 70 – it is very difficult for her to learn anything to do with modern technology.
What young and middle-aged visually impaired people consider best is that a lot of services are accessible to them that used to be unthinkable. It provides a great sense of security at school or work, using transportation systems just going out to have fun. Edit also believes that it is popular for the visually impaired to have talking computers and phones available, although she represents a minority opinion. Edit has been blind since birth. He graduated from a college of theology and has been volunteering at a foundation for 15 years, visiting many countries during this time. The views of consumer society and the modern world itself are very far from her own personal beliefs.
Edit is a firm believer in displays using Braille technology, and would rather travel to the other side of the country (or the world for that matter) rather than go online – she is very adamant of protecting her private sphere. By the way, Braille-based technologies are still generating more and more market activity. Among the classic solutions we find Braille watches, that Mihály uses, but more and more innovative solutions are appearing on the market, such as Braille book readers.
If we are talking about screen readers, we still have Jaws or a built-in VoiceOver application on our minds
The rule of the thumb is that there are still too many pages that are not accessible for the visually impaired, moreover, the information written onto the pictures are not recognized by previous versions of Jaws as something that needs to be read. Still, this is the most-widely used software for its compatibility with Microsoft Office.
Although there are other solutions (such as NVDA or Windows’ own Narrator), Szilvia is used to Jaws. She thinks its functionality and voice quality is bar none in its field. Peter, who works as a special education teacher and medical masseuse, also highlighted that it is much easier to edit text on a computer. He would also like to use Apple computers, although he is used to the Microsoft environment. Peter put it this way, “if I had the opportunity to get a Mac, I would love to learn how to use it. I know it’s a lot more accessible and dictation works on it just as great as on iPhone devices.”
Anita, on the other hand, had a different experience with Jaws. She thought NVDA sounded better, it was easier to understand the voices being not so mechanical. However, the reverse also occurs when a website is difficult to use, but many features are available through its application.
As Anita put it, using Gmail in a browser on the iPhone is quite complicated, but the application has been “done well”, meaning it is easy to use.
For phones, the screen reader is a given, that’s how Szilvia uses her iPhone, for example. They can learn and work while using it, but Szilvia, for one, is not thrilled with the voices. What kind of voice she would be content with? Well, it’s easier to tell based on patterns, but the sound of reading software is now getting closer to human voice. The voice of the first reader was like listening to a frog – as Szilvia put it.
One of the common occurrences among the participants was that for mobile devices they opted for Apple products while their laptops were running Microsoft Windows variants. Peter, for example, used Nokia in the past and its own text reader, but had to install Nuance separately for the proper experience because his device was only capable of voicing a few functions. Some visually impaired people still use these older devices.
Peter summed up the situation: “by now Microsoft machines also include Narrator, but if you want to use a more customizable, more professional reader program, you need to install Jaws or Nvda.
By the way, with the advent of Windows 10, Narrator has improved a lot, and was released in Hungary with a speech synthesizer called Szabolcs Ms. In previous Windows editions, Ms Sam was just an English-language speech synthesizer.”
Anyone who has some vision (like Szilvi) uses a text-enlarging app, or just a physical magnifier if needed. ZoomTtext is not to be confused with the popular video chat app – it can zoom in and read text at the same time. Mihály and some others noted that Hungarian dictation does not work on Windows for the time being.
Internet usage experience
The internet is basically still dominated by visual content, although the visually impaired would use everything if it were accessible. The good news, however, is that mobile banking, online ticketing, or, as Szilvia reported, law libraries are now accessible.
Social media sites are used less often, chat programs (Messenger, Skype, Viber) are all the more so. Searching is a mixed bag of experience. Youtube voice search engine is ambiguous, as Peter put it, “there are times when it hits the mark, other times it does not.”
Online administration is far from hassle-free
What is a uniform experience, however, is that in general all the visually impaired encounter difficulties when accessing official sites – the general government portal used for all kind of e-transactions is not accessible for the visually impaired at all. Even the tax office website is difficult to navigate.
Online banking and payment is hindered by the abundance of fraudulent sites. Scamming and phishing are represent major problems for the visually impaired, as even healthy people often fall for professional fraudsters when they send an email on behalf of a public utility company, for example.
Visually impaired people prefer to use telebank services, where, for the time being, they are still identified by a living clerk. In the field of netbanking and online administration in general, desktops and laptops still have an advantage because it is far easier to use them even for the visually impaired.
Images with text are still causing problems
Although Jaws have better features and now can “read” images, in case of pages containing lots of images or graphics, the screen reader slows down, even freezes at times, to the extent that the computer needs to be restarted.
Sometimes there are errors in the UI (user interface): certain buttons are not named so the text reader could only identify these as buttons but tells nothing about their functions. Sure, it’s more of a frontend issue, but for most visually impaired, user interfaces are managed by voiceover in the first place. When navigating by sound, it is also a problem that the sighted see that there is a small microphone signal, but visually impaired users do not always find the dictation icon.
It may be worth taking into consideration how those living with multiple disabilities (visually impaired and physically handicapped for example) use IT tools. There are already some devices on the market that act as communicators. They used to communicate with cards, but nowadays, with the help of communicators, the TTS (text to speech synthesizer) can also read their request and thoughts expressed with pictures. For example, clicking on a glass and apple juice means “I want a glass of apple juice”.
Online shopping is still a pretty bumpy ride
Online shopping is especially complicated for the visually impaired, especially navigation within a huge webshop. As our interview participants put it “It’s very complicated to shop, we waste a lot of time doing it. Most people when their patience runs out, just click on the first item that comes up.”
It is no help either to ask the webshop to list the products on sale if it displays too many items at once. What is confusing for a sighted person is especially mind-boggling for the visually impaired. If they have to scrape through hundreds of products one by one, they won’t even start because they get bored soon.
Szilvia also complained that by introducing two-factor identification, banks had made it even more difficult to shop online for a blind or barely sighted person.
Getting by with smart devices
Owing to voice-based UX options, a variety of navigation applications are also available for the visually impaired: Blindsquare, Google Maps, or even the iOS Map application. However, the experience is a mixed bag. As Szilvia said, it’s great that she could type in a location and the phone displays the distance and gives different sets of instructions for cars and pedestrians. Also positive to get physical directions (go left, turn right, etc.), and the applications even announce when the user is near to their goal, or what shopping opportunities are around them.
Anita and Sándor’s experience also supports what the interviews revealed: they find the iPhone more accessible than Android. Still there are problems: the frequency the companies update the maps is lacking, more often than not the places do not exist anymore (shops are closed permanently, etc.) Sometimes the navigation app mixes up the numbers on the houses too.
Several people mentioned (Anita and her husband, Sándor, too) that there should be some kind of overall help in malls, like a guide track to the entrance to the shops or information points with a VUI (voice user interface). The sighted can easily use digital maps but these are useless for a blind or partially sighted person.
Public transport has room for improvement
GPS-based navigation is also a great help in public transport, because passenger information system do not always announce the names of the stops on the bus. For a regular passenger, this is a minor concern – the route is printed out inside the bus – but the visually impaired have no way to do so. However, if you turn on your map application, it will announce the upcoming stops.
The other thing that still causes difficulty is the use of ticket machines. What if a person cannot operate the ticket machine, but cannot purchase a ticket from the driver? Also, without up-to-date passenger information you need to know the schedule by heart – a smart device is a great solution for this problem as well.
Buying an online train ticket is already smooth business, but when ordering a taxi, calling them on the phone is still the best bet because most of the applications on the market are not accessible for the visually impaired.
However, there are also those among the visually impaired who don’t like using apps and devices – they need a human voice. Of course, this is also true for a significant proportion of people in general: we navigate through the byzantine telephone customer service menus just to get to a live clerk, rather than having the machine telling us exactly the same information.
The visually impaired and digital assistance
Digital support, ie assistance, covers not only digital assistants like Siri or Alexa, but essentially all software that performs assistant, in older words, secretarial duties. On smartphones they usually use the dictation function, although there are those who control it with a separate Bluetooth keyboard at times. IPhone, for example, is chosen by many because Apple emphasizes that it is accessible.
Moreover, Apple, for example, solved what the state did not: they made money recognition accessible. Although there was an attempt to include Braille symbols on banknotes (the very first 5000 HUF denomination had is), it never came into fruition. iPhone, on the other hand, has a built-in money recognition function, so even a visually impaired person can relatively easily tell what banknote he is holding in his hand.
Language usually hinders the use of digital assistants – but where English is given, lack of touch can cause aversion. Szilvia, for example, knows she can handle Siri, but she doesn’t really like to use it, although she put it, “I don’t know why I’m averse to her.”.
Two main obstacles: language and price
The two biggest obstacles to the mass use of smart devices and digital assistants by the visually impaired are that the blind and partially sighted rarely work in a position to afford such a thing, or, as Peter put it, they cannot speak Hungarian, so most visually impaired people cannot use them.
In the struggle between Microsoft and Apple, the latter wins: Peter estimated that Siri was capable of nearly 80% accuracy even when faced with a speech impediment. “While in Microsoft programs if I tap on dictation, I get an error message saying ‘this feature is not available in your language’ and the application exits. Siri also operates on the Hungarian platform, and in 90% of the cases it manages to say more complex Hungarian contact names.”
One of the key concerns mentioned by the interview participants was that each of these digital assistants has a learning curve, i.e. which button to push to start it.
It has been indicated that mixing sounds can also be problematic because, for example, for smartphones, if set correctly, the screen reader stays quiet when voice navigation is active.
Szilvia mentioned that if reading software is being turned off by voice navigation, but she has eight open pages, she can’t navigate. Switching to another tab would be really annoying if it required a reboot. What if you want to jump between two applications?
The biggest difficulty in using Siri is that sometimes the users’ needs are not understood clearly. One must first learn what Siri is capable of, and act accordingly. But most users don’t know exactly what Siri or other digital assistants are capable of – especially in Hungarian.
Reading software and privacy concerns
There is another aspect to the use of reading software, which was also highlighted by one of our interview participants. Edit said, although screen reading is a good thing, if someone else listens in her private conversation or correspondence – especially sensitive information, like bank account details – , there are serious privacy concerns.
Szilvia voiced similar concerns. For example, if she wanted tom initiate wire transfer, how does the banking application know that the voice command is being issued by the account holder? Any customer service is willing to release confidential information about the account balance if the user identified themselves – while the data needed for this purpose could be known by other parties.
A captcha is also an obstacle that is virtually unsolvable for a blind person if the puzzle consists only visual recognition.
A possible future for VUI
We also covered during the interview, what tools the visually impaired would be happy to see being equipped with voice control. Everybody would benefit, if, for example, we did not have to push buttons or turn discs on the washing machine anymore, just issue a voice command and the proper washing cycle started up just like that. The same is true in case of ovens: heating to the appropriate temperature or start baking, cooking, etc.
Of course, for household appliances to “talk”, that is, to provide information simulating human speech beyond simple beeps, is again a development opportunity that everyone could take advantage of.
Overall, therefore, the involvement of the visually impaired in the process of designing voice-based interfaces is essential, because if anyone, they can certainly tell you how well a VUI (Voice User Interface) can be used.
Share your opinion with us