Leashing the sequences
What do dogs have in common with DNA sequences?
Don't worry, this question isn't like "why is a raven like a writing desk?" - there is an answer, hidden away in the depths of a foreign language. Before I tell you the answer, here's a cute picture of a dog to settle your nerves. I imagine it's feeling about as bemused as you are right now...
Wissenschaft!
During my undergraduate degree, I was fortunate enough to spend a year abroad in the biology department at the Technical University of Kaiserslautern (TUK TU KL), Germany. The Erasmus experience was invaluable to me for many reasons, not least of which being the opportunity to improve my German language skills. Now, around the world, there is a permanent debate in scientific environments about what language to use. (This will come up a lot on this blog, by the way).
English obviously has its place. But I hadn't traveled nearly 1000km just to help my host institution's professors & students to practice their English. So, whenever possible, I asked the locals to speak German with me. After all, in my social life I mostly seemed to be bluffing my way through Spanish.
In fact, it was exactly like the film L'Auberge Espagnole, except the weather wasn't as nice.
Sequenzen!
As it happens, my main interests that year lined up neatly with the strengths of the research groups at the TUK TU KL. (Coincidence? Probably not...) I signed up to as many practical research projects as possible, wanting to get my hands dirty and gain technical skills in the lab. Well, as dirty as you can get them in an RNase-free lab. (That is to say, spotless, and in gloves.)
A lot of the work involved aligning genetic sequences. Genetic sequences are the series of letters which represent the "bases", the basic units that make up DNA (OK, there are even smaller units if you go down to atoms and beyond, but by that point they are no longer specific to DNA). To compare two (or more) sequences, we have to line them up. How can we say what the differences are between these two:
Sequence 1: ATCGATTGC
and
Sequence 2: GATCCATTC
if we don't know where they start or stop?
By aligning the sequences, we give numbers to the positions, and we can say how many positions are the same in both sequences, and how many are different. Sometimes, a position is different because a letter has changed. (This is called a substitution. There are two types: transversion or transition, depending on which letters have changed to which.) Other times, letters have been inserted or removed (insertions or deletions). All these changes are collectively known as "mutations".
In the example above, there have been three changes:
- Sequence 2 has an extra G inserted at the start.
- The first G in sequence 1 has been substituted by a C in sequence 2.
- The second G in sequence 1 has been deleted.
So if we look at the two sequences above, we get the following alignment:
|
Position |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
|
Sequence 1 |
x |
A |
T |
C |
G |
A |
T |
T |
G |
C |
|
Sequence 2 |
G |
A |
T |
C |
C |
A |
T |
T |
x |
C |
Of course, you could argue that there are far more than just the three changes. you could say that all 9 letters changed from sequence to the other. While that's not impossible, it's worth remembering that mutations are (generally speaking) rare events. So when we align sequences, trying to make a judgement on which positions correspond to each other, we have to be parsimonious. As in all science, we have to think critically. What is more likely: 3 mutations, or 9?
Sequences of the Haemagglutinin protein, found in blood, across a nmber of species. Protein alignments follow similar principles to DNA alignments.
Now, once we have an alignment of two sequences, we can tell how closely related the organisms they came from are. Generally speaking, sequences which are more similar to each come from more closely related species than in the case of very dissimilar sequences. There are a host of additional complications, which I spent the best part of a year learning about, in German. Learning in a foreign language forces you to listen better, so I think I absorbed a lot more of the information I was given on my year abroad than during my other student years.
Denglisch!
Ein typisches Example von Denglisch und Freutsch.
On one particular day when I was listening closely, I overheard a PhD student talking to the professor (I wasn't eavesdropping, we shared an office). They were having trouble with a particular alignment. (It's almost never as simple as the example above, because real-world sequences are typically thousands of letters long...) The professor was wondering how the alignment had been made; she asked something along the lines of:
Wie hast du das Alignment gemacht?
Now, normally I would flinch at this kind of Denglisch (using English words in otherwise entirely German contexts), although for technical, recently-coined vocabulary I don't mind so much. Besides, what came next was far more jarring. The student replied he had aligned them "so":
Ich habe die Sequenzen so angeleint, dass... [blah blah blah, I forget the details because my ears had latched onto "angeleint"]
Those of you familiar with German will now see the connection with dogs I alluded to above. In German, the verb anleinen (past tense angeleint haben) means "to leash", as in to tie up a dog before a walk, or to a post. You might even say anleinen is a "false friend" for English speakers. What the student would normally have said, in proper Denglisch, is:
Ich habe die Sequenzen so aligned, dass...
The trouble being that there is no single German word for "to align" - it varies on how you are using it. So koordinieren, anpassen, justieren or even (*shudder*) alignieren could be used, but none of them has quite the same meaning as the English term.
So, in answer to my original question, what do dogs have in common with DNA sequences? Now we know, they can both be angeleint, apparently.
Over to you
Thanks to those of you answered my teaser Tweet yesterday, trying to guess the connection between dogs & DNA sequences. There were some amusing and unexpected results, including:
and
DNA replication relies on fidelity; fidelity from fido, a popular dogs' name
Quite honestly, I'm loving the level of interaction I'm getting through this blog, both on Twitter and in the comments section. (I mean, just check out those on my last post! I had no idea that idiot and idiom sound so similar because of perceptions of selfishness in Ancient Greece!)
So, in the spirit of keeping the conversation going, here are a couple of questions for you:
- What do you think about the use of Denglisch, or Franglais, or other mish-mash languages, when it comes to science? Should we embrace these new words, or should we seek alternatives in the "true" forms of languages?
- Do you have any other examples of this kind of thing? What are the other "sequence leashes" of the world?





There is a worse use of align / alignment in German in my point of view. In business Denglish it is also used with respect to "align with corporate goals" - so that everybody is "on message". Probably I am biased but I think using English terms that denote somethine very specific in science is OK - I had once "sputtered" thin films by "laser ablation" because the German terms are artificial and hardly ever used (Laser-Verdampfung?). But saying something like: Wir müssen alle "on message" bleiben und auf das Alignment achten is ridiculous.
Thanks for your comment, Elke.
The nuance between justifiable, technical Denglisch and ridiculous management-speak Denglisch seems subtle - not something I had considered before, thanks for illustrating it here!
If we talk about the mish-mash of languages in science, then what about "gestalt" and "eigenvalue"? Are there proper English words with exactly the same meaning? I think in many cases it is easier to use the scientific term which was first established or used in a key publication (or which is generally used based on a convention). Then however you should either conjugate verbs according to their original language (aligned) or fully incorporate them into your language (alignieren, aligniert), the same rules should apply for every word of another language. I would argue that it is much more parsimonious to simply stick to the grammar of the original language a word comes from. One problem is that not every person who uses this word is familiar with the grammatical rules of its original language... I guess there is no optimal solution, even people in ancient Rome had similar problems with incorporating Greek words!
Especially in science, English is so common that people rarely use their own native language to communicate with each other about work. I even think about work in English and it's hard for me to explain something in German (I'm a native German speaker). This is sad because it makes it more complicated to talk about science with non-scientists - but within the scientific world everyone can understand everyone else and it doesn't matter where you come from and what your native language is. And isn't this the spirit of science?
By the way: I think it is TU KL instead of TUK
Thanks for the thoughtful comment, Meike - you've touched on several issues I'm going to be exploring in this blog!
I'll fix the TUK/TU KL thing now.
I work with several Croatians, and hearing them discuss work topics in Croatian is always amusing because the otherwise-incomprehensible conversation is scattered with words like "transcriptome", "RNA-seq", etc. I think some of these concepts are so recent and specialised that there aren't terms for them in languages other than the language of publication (i.e. English).
Countries where the local language is spoken in the lab, rather than English, probably come up with these terms more quickly - is that the case where you were in Germany?
Thanks for your comment Liz. I've heard little bits of Croatian here and there along my way, but never had a chance to learn it properly so far.
In the lab in Germany there were some words in English (like Alignment), whereas others had German equivalents (like Sequenzen). It was also tricky to learn species names - "common" or everyday species often had names in German (or even Pfälzisch, the local dialect), but more obscure kinds were given their Latin monikers.
It's also interesting to see the different uses of initialisms/acronyms in other languages. In France, (at school at least, I can't speak for research labs), we always called DNA & RNA ADN & ARN, for acide (désoxy)ribonucléique, but terms less commonly used outside of science would have even a French name and English initialism! For example, luteinizing hormone was called hormone lutéinisante, but shortened to LH (and pronounced "el-ash").
OK, in Serbian, DNA/RNA are DNK/RNK because 'acid' is 'kiselina'.
Also, 'ravno' means 'straight' in Serbian but 'right' in Slovakian ('pravo' means straight in both languages), which was really bad one day many years ago, when we were trying to give directions to driver in Bratislava! We kept irritating him by telling him to go straight then no, no, no, we meant right!
Thanks for your comment, Bora!
Your "right"/"straight" confusion reminds me of directions in French, where tout droit (silent "t") means "straight on", but à droite (pronounce the "t") means "to the right".
For my part, I've not come across Serbian very much (maybe 1 friend on Facebook), though I had a friend at school who spoke Slovakian.
[...] meike: “Especially in science, English is so common that people rarely use their own native language to communicate with each other about work. I even think about work in English and it's hard for me to explain something in German (I'm a native German speaker). This is sad because it makes it more complicated to talk about science with non-scientists - but within the scientific world everyone can understand everyone else and it doesn't matter where you come from and what your native language is. And isn't this the spirit of science?” [...]
[...] Science spans countries and languages. When different scientists from different countries converse in different languages, you’re bound to hit a few walls. Last week, new blogger Alex Brown, realised that dogs and DNA sequences have something in common while not eavesdropping on his lab supervisors’ conversations. And it was hilarious. [...]