Minds in the Making, Xiao Xiao theremin

Xiao Xiao: Modular Thinking and The Melody of Language

This interview was possible because of one accident. A colleague, curator and co-founder of InSitu art agency Anna Gargarian was looking for recommendations on Facebook.

Dear musician friends in Armenia. Do any of you know where to rent a Theremin in Armenia? Amazing artist Xiao Xiao finds herself (happily) stuck in Armenia while en route to Paris from Siberia, and is in need of an instrument to play! Any advice?”.

Wow! “What a story Mark”©, I thought. I checked her profile and I was impressed. Technologist and musician from MIT.  I PMed Xiao and asked for an interview.  We started to chat.  She and her husband left Paris and went to Lake Baikal(Russia). They were supposed to return to Paris, but the first wave of pandemic took Europe by storm. So they decided to wait in Armenia, which was not a red zone. Plus they heard great things about food and nature. 

So here we are. Beijing born, New Orleans raised, course 6 @MIT, @MediaLab Ph.D, post-doctoral researcher at Université Sorbonne Nouvelle, Paris,, technologist, pianist, and thereminist Xiao Xiao.


Our ordinary conceptual system, in terms of which we both think and act, is fundamentally metaphorical in nature.

George Lakoff and Mark Johnson, “Metaphors We Live By”.


I finished my Ph.D. at the MIT Media Lab in 2016. Afterward, I was working on interactive installations for museums and for private clients. I just started a post-doc with a language learning lab in Paris, to develop a new way of learning the prosody of foreign languages with the help of a vocal synthesizer. Prosody is the musical dimension of speech, including the rhythm, the melodic contour, and accent patterns.

Why did you decide to study at the MIT Media Lab?

I studied Computer Science as an undergrad. Started working at the Media Lab when I was an undergraduate student. I was interested in art and design and wanted a way to combine that with my technical studies.

Are there any role models who motivated and inspired you?

I’m really inspired by the work of Marvin Minsky. I was lucky enough to have gotten to know Marvin when I was at the Media Lab. He’s most famous as one of the founding fathers of artificial intelligence, but what’s less known about him is his musical side. He taught himself how to improvise fugues in the style of Bach as a way to understand the workings of the mind. I wrote an essay about this a few years ago. My attempt to teach myself to play the theremin is directly inspired by Marvin’s musical explorations.

Another big inspiration is my mentor Donal Fox, who is a composer/pianist/improviser. I met Donal when he was a visiting artist at MIT and ended up taking lessons with him for about a decade. Much of my Ph.D. work at the Media Lab was directly inspired by conversations with him. Even though my goal was not to become a professional pianist, what I learned about learning in the context of the piano was a treasure trove for work in other domains, both for problem-solving and for creative intention.

What is your main focus nowadays?

Broadly, I’m interested in learning, because in this day and age one of the most important skills that anyone can have is the ability to learn new things quickly and effectively (and enjoyably). More specifically, I’m interested in higher-level strategies of learning across different subjects, and what artistic disciplines can teach us about better ways to learn. Right now, I just started a post-doc in language learning, focusing on the acquisition of intonation for non-native speakers. A lot of what I’m thinking about draws directly from my experiences in music.

Live coding teaches the artist to be aware of the process, showing people this is what the artist is doing — it might sound like a machine, it might sound robotic, but at its core it’s human. Do you follow any specific pattern during your lives and is it any different from the algorithms that machines otherwise would do without you?

For me, there’s the Art that expresses an idea or creates an experience, and then there’s my personal Art. The first is in the form of interactive installations, starting from my work at the Media Lab. These all make use of technology. I use technology because it enables interactivity.

The narrative and language of the medium quite evolved during the past decades. For lots of artists “performer”, is a common term to use. Can you describe what kind of art form is best to describe what you do, and where is the line when technology transforms into art?

On the personal side, Art is exactly a reflection of the process, as you said. I draw and paint a lot and the act of laying gestures to paper is an opportunity to think about the making of the artwork. Same with practicing music.

Can you share your ideas on AI-based art? Who is the author or rather is the algorithm a subject and not an object/tool?

I really like Scott McCloud’s framework from Understanding Comics (picture below). I think that algorithms can play a role in different layers of the apple. For example, on the surface level, it can be used to generate textures for slick-looking graphics. On the other end of the spectrum, I think that algorithmic art is often a meditation on the nature of art. And also the nature of human perception.


Scott McCloud's framework from Understanding Comics

Please elaborate a little on AI authorship. Where is that thin line where suddenly you lose authorship and turn into co-creator, the mediator of technology?

In Ross Goodwin’s 1 the Road project is quite interesting. Ross drove a car from New York to New Orleans, and the AI-generated Kerouac-style poetry based on location data and sensors. The finished book was apparently published, with the AI as the author. In this case, I think that the artist plays a mediating role.

I’d like you to cover a little about the theremin. Why you chose it over the piano?

I started the theremin as an experiment to better understand learning. It’s a relatively young instrument and often considered diabolically difficult to play. I wanted to show that it’s not as difficult as people think it is if you approach it with the right mindset and the right learning strategies. Of course, to get to a level of a top master still takes many years of practice, but I wanted to show that you can reach a point where you can start to have fun and make music much earlier than most people think.

Part of the learning strategies involves adapting ways of thinking from other domains. For me, those other domains are computer programming, the piano, drawing, and language learning
Of course, it helped a lot to have some fundamentals in music from many years of classical piano lessons. For instance, knowing what intervals are, what scales and arpeggios are, knowing how to read music, having some melodies in my head already.

So what is the common ground for computer programming, piano, drawing, and language learning?

Let me first talk about the things one at a time. First computer programming – how to break down something big and complex into smaller simpler parts how to approach knowledge in a modular way and try to always reuse what you already know. And my favorite, how to debug.

Often time in music, you have this no pain no gain mentality where you are told to just practice practice practice and if you work hard enough maybe one day you will get results. Of course, it’s important to practice, but it’s also important to debug when you have a problem. When something is difficult for you, trying to figure out the root of the difficulty and designing ways to fix it. It involves a lot of creative thinking actually.

So let’s say I’m playing a piece and it just doesn’t sound good. One approach is just to be like “I need more practice” and spend hours repeating that piece, but sometimes that doesn’t help, and sometimes it actually makes things worse because it makes whatever the problem you had even more ingrained.

First, I try to isolate exactly where the problem is. Let’s say there is this one measure that consistently sounds out of tune. Maybe there are several measures that are out of tune, but I’ll focus on one at a time and think about why it’s out of tune. So it could be that it’s an interval that I don’t know as well. I try to sing the interval to see if I understand it in my ears. If I can’t even sing it, it means I need to work on my auditory notion of that interval.
If I can sing it just fine, it’s probably due to something in my technique. Maybe the gesture that I am using is awkward. I try to isolate the gesture and see if I can improve it.
Once it starts to sound better, I add a little more around it.

So basically you turn “method” into “equation”  and try to debug=solve it. You locate the problem and solve it like a software engineer.

To me, this is very similar to what I do when a piece of code isn’t working right. Isolate where the problem is. Come up with hypotheses of the root cause. Implement a solution, and then test it to make sure if it actually works.

For language learning, there’s an element of debugging also, but my takeaways there toward the theremin are more along the lines of modular thinking and inspiration for expression.

Regarding modular thinking – a language is made up of words, phrases, grammatical constructs that get reused in different circumstances. It’s also a set of sounds that get reused. Each sound is a gesture that your vocal apparatus learns to make.
This might sound a bit abstract, but reminding myself that playing the theremin consists of knowing a toolbox of gestures has been helpful. Modular thinking is also important in computer programming. It’s good to write modular code so that you can reuse things easily in different projects. Regarding “inspiration for the expression” – this is more about the musicality of language. For my current research, I am studying the prosody of different languages.


Can we describe it as syntactic coding?

For every language that you speak, you never speak in…terms…of…in…di…vi…dual…sy…la….bles. The syllables are grouped as words. Words are grouped as phrases. Phrases form sentences. So there’s a hierarchical structure once again.
And you hear it in the rhythm and the melody of any language.
And actually, if you ignore the words of a language, the change in melody form contours that are really well suited for the theremin
I think playing theremin has helped my ears improve their ability to hear sliding pitches, which is useful for my research  , And sometimes when I open a door and it creaks, I hear a melody.

The AI-based jukebox can also imitate the style, the pattern of a specific composer.

I think it’s an interesting mirror and it’s a good way to reflect on what “style” means.

As a researcher who deals with interconnected disciplines, how would you evaluate the impact of digitalization of culture in general?

On the one hand, of course, digitization makes certain parts of life a lot easier, but on the other hand, a lot of activities that were a lot more physicians are now pushed behind glass screens with very limited modes of interaction. My Ph.D. advisor Hiroshi Ishii used to talk about the fight against the “pixel empire,” which isn’t about going back to the past where we didn’t have digital technologies but is more about how to make interactions with the digital more tangible, more human friendly. I think I’m very much influenced by this line of thinking.

What does “human friendly” mean?

The current interface with computers (keyboard, mouse, screens) use only a limited subset of human sensory-motor capabilities. People often get into awkward postures that are unhealthy long term. Whereas when interacting with the physical world, people use a lot more of their senses and physical capabilities.

In nature, especially, and I think a lot of people feel nice in nature. In contrast, people often suffer from back pain, hand pain, or headaches after using computers for too long.

Technology is faster than evolution and our body/motorics/psyche is slow to adapt. But I believe it is true only for early adopters. Virtual reality on the other hand provides a huge test ground to shape that skills.

That’s an interesting point. Our bodies have evolved over millions of years, and some people argue that it will take longer than a couple of generations for us to get adapted to new technologies. But I feel like the brain is more plastic than many people assume. We can get used to a lot of things faster than people think.

How you see the evolution of language (nlp) and more sophisticated ways of communication like neurosensory contact. How will that impact our society?

I think that science is figuring out more and more the mechanisms of the brain, and it’s super exciting. Already, researchers have made a monkey able to control a robotic arm with a brain implant over 10 years ago. Regarding the impact on our society, it’s really hard to say. The pioneers of the personal computer in the 1950s, 1960s, and 1970s had a beautiful vision of computer enhancing human knowledge. To a certain extent what we have today goes beyond what they dreamed of, but it’s also far utopian than what they had imagined. There’s a film coming out that explores these topics. I feel like with even more advanced future technologies, it will be a similar story. As much as we designers like to imagine beautiful, optimistic futures, it’s also important to think about the flip side of how technology can impact society. Another point is about inequality. Cutting-edge technologies require special resources to develop, and of course in the beginning it’s not for everyone. But as any technology begins to be adopted in society, it starts to create bubbles of privilege. We see this for communication technologies especially. One question has to do with people with access or not. For instance, people have access to the internet, no matter how remote they are, have access to a lot more free knowledge that can potentially help them. Another question has to do with different “lanes” of access. Facebook is available to everyone but its algorithms inevitably separate people into their own bubbles who have limited awareness of what is going on outside.

Honestly, after spending 4 months in the beautiful nature of Armenia, I’m more enchanted with mountains and plants than computing technology.  And I think a bigger challenge for our generation isn’t to build the next computational marvel but to figure out how to save our planet.

Do Androids Dream of Electric Sheep?

If they are built to emulate real humans, I think they would dream of normal sheep. But then again, I’ve never dreamed of sheep. Have you?

Minds in the Making Xiao Xiao Panopticon Interview

For more info visit: