Translation quality and errors
Yes, our system makes mistakes. Not only that, it sometimes generates ridiculous translations that have little or nothing to do with your input sentence. Other translation systems, especially the ones using neural networks, behave the same way (e.g. try translating "Paneb puusse." to English; the correct translation is "screws up"). Often it also works amazingly well (e.g. try "It reports remarkable achievements in tackling neglected tropical diseases since 2007.").
The reason for this behaviour is that neural MT systems primarily learn to generate output text, with the option of looking at the input text. Whenever the input is unexpected in any way to the system, or when the stars in the sky have apparently aligned in the wrong way, it can generate some text that is at best very weakly (at worst not at all) related to the input sentence. We are looking for a way to improve this.
Despite the errors the system makes, it can be used for example for post-editing (manually fixing the automatic translation), which works especially well with strict or limited text domains like legal texts, technical manuals and subtitles -- it can get 20%-30% faster than translating manually from scratch.
As for overall quality, here are the general statistics collected from your clicks on the best translations under the "Play" button:

Well, bummer. In the beginning we were considered to be as good as Google and ahead of Tilde, but not any more. We are working on it!
Of course this is very general and does not give any details. Here are the same statistics separately for shorter or longer input sentences:

So you can see, that Google really beats the other systems at very short sentences, and Tilde is better at long sentences, while our system is best at medium length sentences.