AI Prototype Development

Standard

To make sure that the robot was able to give the correct responses, an audio model using teachable machine was produced, to distinguish the different tones in peoples voices.  The catagories consoted of Happy, Sad and Anger at first. 

 First ai test

Then I noticed that as there was no neutral setting if you talked normally into the program then it would defult to the anger setting and so needed a neutral one for when people were just talking normally. 

 Second ai test

Then after testing the AI, I realised that using key words to highlight the emotions didn’t work as the AI was responding to the words spoken and not the actualy tone of the voices themselves. Becasue of this I created the AI from scratch again, this time speaking full sentences which were different every time to attempt creating an AI that was more accurate than the last one. Below is a video of the AI working as I talk over it to test how well it is working.

As you can see her the AI was not very accurate when it came to identifying the tones and was jumping around between them without being able to settle down onto a specific one very well. In order to improve this more audio samples were needed so that the program could easily detect it.

This time the AI was able to identify the tones that I was speaking in at the start however when it was going continuously it was losing the ability to identify what was being said effectively and started mixing all of the tones together. Again more audio files were gathered to try and improve the software.

As you can see in the video above the AI is better at distinguishing the tones as individual answers and doesnt mix them together as a combination of 30% of all three. As you saw it was able to distinguish both happy and sad very well after processing what had been said however it has some issues idenitfying both angry and neutral tones.

As you can see above the AI is now able to distinguish the tone of voice much better than it was able to previously. This is due to obtaining even more audio samples, as well as increasing the amount of epochs that the learning software undergoes. One epoch means that every sample has been fed through the training model at least once, this means that using two hundred epochs, each sample has been fed through two hundred times.

You can make your own machine learning algorithm here: https://teachablemachine.withgoogle.com/train

Print Friendly, PDF & Email

Leave a Reply

Your email address will not be published. Required fields are marked *