Busy busy busy!

Back to work after maternity leave doesn’t leave me much time to keep the blog up to date! But, I’ve also been busy on a couple of other articles.

The first, over at Statistics Views, is an introduction to the role of statistics in speech recognition.

The second, over at the Software Sustainability Instutute, is about my latest project - Cambridge Women and Tech - as part of their blog about women in technology.

Posted in Uncategorized | Leave a comment

Her: fact vs. fiction

Thanks to the fantastic institution that is the big scream, I got to see Her at the cinema while on maternity leave. There are spoilers below, so go and see the film before you read on!

At it’s heart, Her is about the developing relationship between Theo, a recent divorcee, and his intelligent operating system Samantha. The film is set in a not-too-distant future where technology is smart but small enough to recede into the background. People interact with their computers mainly using voice, though gesture is in there too, most notably for gaming. Emails are listened to via discreet earpieces and people are comfortable enough to talk naturally to their computers, as if they were talking to an old friend.

Samantha’s voice recognition skills are near perfect, far better than exist now. Yet, we forget just how bad humans can be at speech recognition sometimes. People mishear and mispronounce all the time, but we’re really good at combining all our knowledge to seamlessly recover a conversation gone wrong. Much of the time we don’t even realise that we’re doing it. Today’s speech recognition systems are worse than humans, though operating error rates of around 20%, that’s 1 in 5 words wrongly transcribed, is useful enough for applications like Siri and Google Now.

The biggest difference between current dialogue systems and Samantha is the naturalness of the dialogue between her and Theo. Samantha is able to hit the correct emotional note, talk back at appropriate points with no (unintended) awkward silences, and to keep track of complex conversations. These are things that our current state-of-the-art systems are not yet capable of.

Of these three active research areas, detecting and synthesising emotion is perhaps the most difficult to define. Identifying emotion is something that humans don’t even agree that well on, and collecting data for research purposes often means relying on acted emotion. Current research tends to focus on a small subset of easily identifiable emotions like happy, sad, angry and excited, ignoring many more nuances.

Our current spoken dialogue systems are also not great at knowing when they should speak, leading to unnatural conversations that don’t flow well. Typically, current systems have to wait for half a second or so after the other party has finished speaking, to be sure that they’re not going to say anything else. In contrast, humans are really good at jumping in, sometimes even before the previous speaker has finished, to minimise the total amount of silence in a conversation.

Furthermore, we’re only at the beginning of solving the problem of keeping track of a conversation. Most deployed dialogue systems use a really simple set of handwritten rules to decide how to respond to a person. Such rules can only capture a small subset of human behaviour and conversation topics, and it takes a huge amount of work to write down those rules. For computers to have realistic conversations, we need new models of dialogue that are easily extendable without human intervention. This is the focus of the work done in the dialogue systems group at Cambridge.

The film makes the point that the smart technology is far more advanced than us mere mortals can ever dream of being. At one point, Samantha drives this home by confessing that she’s talking to more than 8,000 people at the same time. As the story unfolds, Samantha becomes gradually more and more self-aware, eventually getting bored of Theo, until she (and all the other smart operating systems) leave. In the end, this is another in a long line of sci-fi films to rely on the age-old idea of intelligent machines becoming self-aware and deciding to rise above us humans (though without the usual killer robots trying to wipe out humanity).

Posted in Technology | Tagged , , | Leave a comment

Getting started: data science with Python

The purpose of this post is to collect together online resources for anyone who wants to learn how to do machine learning (data science) in Python, starting from scratch. Some of these sites I’ve used, and others I’ve only glanced at, but I hope they let you get started no matter what your level. I’ll add new stuff as I come across it, but let me know if you have any useful resources to add!

If you’re new to programming, the first step is to get started! Code Academy has an introduction to Python tutorial which will get you started with some basic concepts. Google’s tutorial is a bit more advanced, but should be do-able once you have an understanding of variables, conditionals and loops:

Now you have a basic understanding of Python, install some of the libraries that are useful – numpy, scipy, pandas and scikit-learn. A great place to get a set of useful libraries is Anaconda.

With the tools in place, the best thing to do is dive in. A great place to start is Kaggle. They have some tutorial tasks to get started on, including one from Data Science London. This is a binary supervised classification task so you’ll want to read up on how that works, but it’s essentially about deciding whether a data example is from one class or another. ‘Class’ in this context can be things like:

  • Is an email spam or not?
  • Is a credit card transaction fraudulent or not?
  • Does some audio contain speech or noise?

You can use sci-kit learn to get started without knowing too much about what’s going on under the hood. Perhaps the most important thing to get to grips with is the use of training/dev/test data, cross-validation and generalisation. But, if you want to really get a good understanding, then Coursera’s Machine Learning course covers a lot of the basics of machine learning, with some practical tasks to complete.

Finally, be aware of common pitfalls.

If you can build a classifier to work on the Data Science London Kaggle challenge, and understand how it works, then you’re well on the way to learning about more advanced stuff. But that’s a topic for another post!

Posted in Machine Learning, Technology | Tagged , , , | Leave a comment

How to learn a new programming language

I first learned to write code at university, where we used a number of obscure (oberon, anyone?) and not-so-obscure languages.  The different languages were used in different courses – Haskell for functional programming, Oberon for Object Oriented programming, C for signal processing etc. – so we learnt them alongside the rest of the theory. Since then, I’ve picked up other languages at places I’ve worked; mostly by having to extend code in a new language.

I’ve not yet worked out the best way to learn a language that I don’t have an immediate work-related reason to use. I’ve heard a lot of talk about Ruby recently, and wanted to know what it was all about. But finding out about Ruby proved to be a challenge!

Obviously the best way to learn a new language is to write code in it. But if you don’t have a project to work on, or the time to spare, then it can be difficult to start something complex enough to learn from.

I find online tutorials much too simple as they’re normally aimed at novices, and really only cover the basics. Places like Code Academy may be great for beginners, but the basics only really differ in syntax between languages.

Another technique I’ve read about is to port an existing project from one language to another. To me, this sounds like a potentially bad way of learning as you could easily just end up trying to emulate one language in another without understanding anything of the intricacies and complexities of the language you’re trying to learn. I can imagine that it’s really easy to directly port a project in C to Python without really learning anything about what more Python can offer.

In the end, I was pointed at ruby koans by my husband, which are inspired by the Zen Koans. These are a series of almost 40 files that you work through. Each contains tests that fail, and by correcting them you can learn something about the language. Every corrected test takes you one step closer to enlightenment. There are no explanations, so you can google as little or as much as you need, without having to wade through explanations of things you already know. I found these to be a good way to learn; I could easily complete lots of the tasks with a little thought, but every so often had to stop and google an answer to find out the theory. Apparently there are F# koans too, which might be my next challenge!

Posted in Technology | Tagged , , , , | Leave a comment

Why is speech recognition hard?

Not so long ago, automatic speech recognition was something of a niche technology, which very few people used. That changed in 2011 with the release of Siri and Google Voice Search, with both Apple and Google making speech technology a key feature of smartphones. Despite recent improvements, speech recognition remains a difficult problem to solve, and even the best speech recognition technology makes errors.

It’s hard to appreciate why this is when understanding speech is something that even young kids can do with relative ease. Though, if you know any young kids, you’ll know that they don’t always let you know that they understand!

Here are just a few of the things that make speech recognition hard for computers:

  • Peopledonotleavegapsbetweenwords. While your brain hears speech as a series of discrete words, closer listening to the audio shows that the boundary between words is fuzzy. Just try listening to someone speaking a foreign language to hear how difficult it is to pick out the individual words.
  • People speak sloppily and ungrammatically. They slur words together, mispronounce, stop part-way through words to correct what they’ve said, and don’t always speak in coherent sentences.
  • Building a speech recognition system needs hours of transcribed audio to create a model of the acoustics of speech. This data is expensive to obtain, and even expert transcribers make mistakes.
  • Today’s speech recognisers need to cope with lots of different speakers, each of which has their own accent, speaking rate and style.
  • Background noise obscures speech, making it difficult to recognise. Different types of noise have different characteristics – for example street noise is very different from the noise inside a car – and a computer must be able to cope with lots of different types of noise.
  • Our smartphones and devices all have different makes and model of microphone, each of which distorts your speech in a subtly different way.
  • Homonyms, or words/phrases that sound the same, are impossible to tell apart without more information, e.g.
    • ‘Their’ vs. ‘There’
    • ‘Build’ vs. ‘Billed’
    • ‘I scream’ vs. ‘Ice cream’
  • New words come into use all the time, so speech recognition systems need to be continually updated. E.g. ‘Selfie’, the OED’s word of the year for 2013, is estimated to have increased in usage by 17,000% in 2013.

The human ear is really good at filtering out noise and the subtle changes from using different microphones. We’re also great at quickly adapting to how a new speaker talks, and using context and real world knowledge to disambiguate things that we haven’t heard properly. On the other hand, we need to explicitly tell computers how to do these things, which means inventing and trialling a bunch of ways to find out what works and what doesn’t.

Posted in Technology | Tagged , , , | Leave a comment

SyncDevelopHER Norwich

Last week I spoke at SyncDevelopHER in Norwich, a series of events to promote gender equality in the tech industry. The baby came too, and the talk was on machine learning and speech technology. It was great fun to meet a bunch of enthusiastic people, and a bit strange to be in a room that wasn’t completely dominated by men!

Posted in Technology | Tagged , , | Leave a comment

Ethics and artificial intelligence

Until now, artificial intelligence has largely avoided questions of ethics by focusing research on interesting yet uncontroversial problems. While the technology has under-delivered, people have been unconcerned by the ethical implications of intelligent machines. Yet, the recent boom in machine learning and data science has led to intelligent technology becoming tightly integrated with our daily lives, while accompanying issues like privacy and accountability have slipped under the radar.

Companies like Google and Facebook have spent the past years accumulating a large amount of data about each and every one of us. Aside from the privacy and security issues, there is the interesting idea of how we trust these companies not to use that data for underhand purposes. As in this recent slate article: if people empathise with an intelligent machine, what happens if that machine then manipulates users? It’s not too great a leap to imagine that smartphones of the future may try to build rapport with us, then try and sell us stuff – an advertiser’s dream!

Despite anthropomorphising anything remotely human-like, we are not completely gullible. People, even small babies, know the difference between an outwardly sentient robot and another human. Yet, in limited interaction it can be very easy to fool people, and current research directions involve making interaction more natural and intelligent machines more human-like.

You can fool all the people some of the time and some of the people all of the time, but you cannot fool all the people all the time” – Abraham Lincoln

It will be a long time before conversational interfaces are human enough to fool us even some of the time. But when they do, the implications will certainly be interesting.

Posted in Technology | Tagged , , , | Leave a comment