Hermann Maurer on some aspects of Informatics close to his heart#
My life with Informatics, and what I have learnt from it#
In this short essay I want to describe my CV from an unusual angle: what have I learnt (or think that I have learnt) about Informatics along the way.I started to study mathematics at the University of Vienna in 1959. My favourite subject and later thesis topic was in number theory. Number theory was considered by most (including myself) as something very beautiful, yet also fairly useless, like abstract paintings.
Even those who have never worked in this area, I guess you know that a number n is prime if it has only the trivial factors 1 and n. That there are infinitely many primes has such a beautiful and simple proof (known since ancient times) that I may as well repeat it: suppose there are only finitely many primes and let P be the largest one. Consider Q=P!+1 , where ! denotes the factorial as usual. Now Q>P cannot be a prime. So consider the smallest nontrivial factor x of Q. Clearly x>P, since x<=P would imply that x is factor of P! and therefore cannot be factor of Q. Thus, P<x<Q; x cannot be a prime, so it must have a nontrivial factor y <x. But then y is a factor of Q, hence x is not the smallest factor of Q, a contradiction.
Of course you know that the sum of all 1/i (i all natural numbers) is infinite. One can also show that the sum of all 1/p (p all primes) is infinite. However, surprise, surprise, the sum of all 1/t, where t is a prime-twin is finite (indeed known to be <3). (Note: t is a prime-twin if either t+2 or t-2 is also a prime, e.g. 9,11 or 17,19 are prime twins). Despite this fact, it is still an open question whether infinitely prime twins exist. Some prime twins are very large: the largest known prime-twin has more than 100.000 decimal places. It was discovered in 2009 and is 65 516 468 355 * 2**333333 +/-1, where ** means „to the power of“.
Now why do I bother to mention such things? Because they look like fun, but also seem to be useless. Well, to not just my surprise, prime numbers, factoring of large numbers etc. started to be exceedingly important when Diffie and Hellmann introduced public key cryptography, and when the first working “asymmetric encryption algorithm” was based on important results in number theory! A good part of the interest in Quantum computers is based on the fact that according to Shor’s quantum algorithm if it is possible to build big quantum computers (a question still open) then factorization of large numbers becomes easy: this would endanger most encryption algorithms in use today! (See the recent paper by D.Bacon and W.van Dam in C.ACM vol.52, no.2 (February 2010), pp.84-93 for a good survey of new results on quantum computers).
Anyway, this development of number theory from an esoteric discipline to something of much practical interest was Lesson 1 for me: Some theory that may look like a useless undertaking can prove to be very important at some later stage. Let me follow this up with two more facts that I believe in because of my career in Informatics. Lesson 2: Students who are good in theory are usually also good in other task, but not conversely (my test-bed: 35.000 students, over 500 M.Sc. thesis’ and some 50 Ph.D.s). Lesson 3: Some of the most important break-throughs in Informatics are due to deep theoretical and mathematical results. Examples that emerged over the last years are areas such as e.g. compression techniques for pictures, video and audio. The mp3 encoding that you are using for music is due to a major theoretical breakthrough, just to mention one example.
During my study of mathematics in Vienna I started to attend lectures on computers as they were emerging, nothing solid and coherent yet, but with the smell of something in the making, not just in electrical engineering!
At some stage I was lucky to run into one of the leading Canadian researcherss in Informatics of those days: John Peck from Calgary. He was member of the IFIP 2.1 working group that had developed Algol 60, and started to work on a successor to Algol 60. Peck invited me as graduate assistant to Calgary, and I was (a) thrown into deep water by teaching my first Informatics classes in fall 1962 and (b) being confronted with Lesson 4: Even powerful programming languages can have simple and unambiguous specifications. Surely Algol 60 is the superb example of a very powerful language whose syntax, i.e. what constitutes a valid program, is clearly specified by what theoreticians would call a context-free grammar. The technique had already been used for the definition of FORTRAN by Backus and Naur, leading to the terminology BNF (Backus-Naur-Form). For me it was clear (through Peck’s influence) that it is not sufficient to describe what a valid program looks like, but also what it does (its semantics). As much as I believed in this principle no good way of specifying the semantics of programming languages existed at that point (in 1963). The attempt to model part of it in Algol 68 was brave, but eventually a failure. Yet it did implant the idea into my mind that the right way to program was not to specify an algorithm, but to specify what the starting situation is, and what outcome is wanted, but have the algorithm deduced automatically form the specification of the problem.
Again, by pure luck, I ended up after a year in Calgary with an extended seven months summer job with the Government of Sakskatchewan where I was an apprentice in a large undertaking: to implement the informational underpinning for general medicare in that province. Some small parts of the task I was to write in RPG, Report Program Generator. This system had a limited range of applications. However, where it did fit, you just specified inputs and the relationships to the desired outputs, the rest was done automatically. In that sense RPG was probably the earliest specification language.
When returning to Vienna to finish my Ph.D. I happened to bump into a friend who worked at the Vienna IBM Research Lab. He suggested I join. This Lab had been founded by IBM due to the enormous achievement of Heinz Zemanek, who had managed to build the first transistorized computer in Europe. It was on this machine on which intellectual giants like Peter Lucas developed the first (or at least one of the first) full Algol 60 compilers. And where a bit later the work on the formal definition of PL/I started: the language PL/I was messy, redundant, stupid in many ways, so to try to specify all functionality in a rigorous form seemed beyond human power for me. I started to concentrate more on the basics, the study of formal languages and automata, and took on a position in Calgary in 1966. If not for the presence of John Peck in Calgary who continued to be a guiding light and first contacts with some of the then heroes in formal languages like Ginsburg, Ullian, Greibach and Salomaa, this decision might have been a serious mistake: the team in Vienna, basically consisting of Lucas, Bekic and Walk. managed an incredible job: they ended up with the Vienna Definition Language, that today has evolved into Formal Methods Europe, a technique that allows to exactly specify even very large software-undertakings.
Lesson 5: Formal languages, formal specifications, and formal models still remain at the heart of Software Engineering. Thus, when I was appointed full professor for Informatics in Karlsruhe, Germany, in 1971, I put into the name of the institute I founded formal specifications and applications, and tried to combine the two aspects as much as possible. The Institute in Karlsruhe has not changed its name from Institute fuer Angewandte Informatik und Formale Beschreibungsverfahren since then; I believe Lesson 6 is that this area remains as important as ever.
During my years in Karlsruhe my interest started to slowly shift from just formal description methods to the development of efficient algorithms and data-structures. I took this emphasis with me when accepting the founding chair of Informatics in Graz, Austria.
I also learned in Karlsruhe Lesson 7: Make sure to collaborate with top people in the field. The list of great researchers from all over who I had the joy to host and work with, first in Karlsruhe and then in Graz, is too long to mention here, but let me mention the first, Seymour Ginsburg, then the one who guided and influenced me most, Arto Salomaa, and my good friends Derik Wood and Karel Culik II, and later geniuses like Grzegorz Rozenberg, John Bentley or John Preparata. But a look at my publication list between say 1975 and 1990 will show to you what I mean immediately.
Confronted with the job to build up a substantial Informatics group in Graz I believe I also stuck to Lesson 8: Make sure to identify clever young people already in their first years and start giving them challenging tasks very soon. I guess that early students who turned into leaders world wide in their field like Thomas Ottmann (Freiburg), Jürgen Albert (Wuerzburg), Hans-Peter Kriegel (Munich), Herbert Edelsbrunner (Duke/IST-A), Emo Welzl (ETH Zurich), Dieter Fellner (Fraunhofer Darmstadt), etc., did prove me right.
Lesson 9 is equally important: Hire the best you can, don’t be afraid of colleagues who are more clever than yourself. Thus, I am proud that I had the chance to hire a Reinhard Posch, a Wolfgang Maass, a Franz Leberl, etc.
I think the next Lesson 10 does not apply to all, but did apply to me and the situation in Graz: Look for industry-support! When the importance of networks started to become clearer and clearer it was Posch and me who built one of the first microcomputer networks in the world, based on a simple colour-graphic computer whose hardware was developed by Posch and his group, while some of the software was the doing of my group.
The intelligent videotext network which in many ways was a precursor of the WWW, and more clever than the WWW is even today, was a big stimulus and gave us much industry backing, including the possibility to found a number of start-ups (I am counting about 40 by now, but I guess I am missing some). It also allowed us to branch out into new areas such as computer graphics, computer security, and myself into applications of networked multimedia, applications to learning, to digital libraries (www.jucs.org), digital encyclopaedias (www.aeiou.at or www.austria-forum.org) , to museums, to doing the server on which you are currently reading this, etc.
A early patent on a fast-access storage device in 1982 also caused some ripples. It is shown below combined with a videtotex system as mentioned above.
We are being confronted with many new challenges all the time. Let me state the major challenges that I see as Lesson 11: there are four streams that will influence the future development in Informatics most. The first three have been formulated very often, so it is almost boring to do so: (a) the confluence of mobile devices, networks and computers (ubiqitous computing); (b) the combination of intelligent algorithms with the wisdom of communities; (c) the need for more powerful techniques in computer graphics/ computer visions; and (d), and this is the one I feel is the most important one, yet has not received the attention it should, is the importance of new similarity detection techniques.
I feel I have to explain this a bit. Similarity detection applies to all kinds of data and comes in many flavours.
The most obvious application is similarity detection of textual documents, or parts of them. This is important to be able to discover earlier applicable research, to find relevant patents, to detect plagiarized material, to cluster documents dealing with the same subject, to automatically create links in digital libraries, to detect experts and expertise in a certain area,
and much more.
Not surprising, similarity detection plays also a similarly important role when pictures, music, or videos are at issue. In this and the former case questions of metadata, ontologies, and tags to ease the determination of similarities are areas wide open for further research.
However, one of the most important areas is finding similarities or clusters in high dimensional large data-sets. There is almost no area of human endeavour that would not profit from much better techniques. Rather than giving you abstract examples let me give you two (over-simplified) examples from completely different fields.
Suppose we have collected ECG of hundreds of thousands of persons world wide between 1995 and 2005, call this set of ECGs X. Now let us re-examine as many of those persons to find out who had cardiac difficulties between 2005 and 2010: let the set of all those people be Y. With good clustering techniques it may be possible to find out a set of parameters of members in X that differs from parameters of members of Y-X. The implications of this would be enormous: if a person goes to a physician and the ECG is made, it can be compared with the parameters defining X. If p is detected to match those parameters, a warning “you have to change your style of life or you will have cardiac related difficulties” is in place, otherwise p can walk away relieved that the likelihood of some heart related disease is small over the next few years.
Consider an environmental example. Suppose a number of villages in the Himalayas have been destroyed by mud avalanches (caused by: rise of the permafrost zone, de-forestation, overbuilding, etc.): are there parameters characteristic for those villages? If yes, apply them to other villages allowing you to predict that likelihood of mud avalanches threatening some of those other villages.
The list of applications is huge. It is a fact (known e.g. from BCI interfaces) that often data-sets that do not seem to have any similarity do have some similarity, if one looks at them the right way. I thus consider it one of the main research challenges to find new and more and ever more powerful techniques for similarity recognition / clustering in large data-sets. Note that some of the data-mining going on in the WWW covers small aspects of this.
Let me conclude this with a Lesson 12: There are many attempts to try to improve teaching (knowledge distribution) by using new multi-media technology. This may be an important area, although “e-Learning” has had many cycles of hype in the past. Thus, we should not expect miracles in this area. However, the most important fact is often overlooked: maybe it is not so important HOW we teach, but rather WHAT we teach and WHEN we teach it. After all, we are more and more “externalizing” knowledge to the computer: we do not know to calculate any more, we leave spelling to spell checkers, finding our way to GPS routing systems, instead of learning facts we google for them in often unreliable sources, etc. Since Informatics is responsible for this new phenomenon of a growing symbiosis of humans and computers, it is also the responsibility of Informatics to investigate, how far this symbiosis should go, and what it means in terms of what we have to teach or learn in the future. It seems to me that this responsibility is taken much too lightly by today’s Informatics researchers.
P.S.: I have NOT tried to write an objective report on important areas of Informatics, but rather presented some rather personal views. I hope other members of the Informatics section will do the same so that at the end we have a collection of statements whose diversity will show one thing for sure: that Informatics will influence how we live and work much more dramatically than most persons think. The Informatics revolution has not happened yet, it is just in its very infancy.
Back to the Informatics Section