The Social Factor of Open Science
Abstract Increasing visibility in the Internet is a key success factor for all stakeholders in the online world. Sky rocketing online marketing spending of companies as well as increasing personal resources in systematic ''self-marketing'' of private people are a consequence of this. Similar holds true for the science and knowledge creation world—here, visibility is also a key success factor and we are currently witnessing the systematic exploitation of online marketing channels by scientists and research institutes. A theoretical base for this novel interest in science marketing is herein provided by transferring concepts from the non-science online marketing world to the special situation of science marketing. The article gives hints towards most promising, practical approaches. The theoretical base is derived from considerations in the field of scale-free networks in which quality is not necessarily a predominant success factor, but the connectivity.
New aspects of Web 2.0, together with those that are already familiar, are about to completely revolutionize the world of academic publishing. The gradual removal of access barriers to publications—like logins or unavailability in libraries—will increase the transparency and accordingly the use of Web 2.0 elements. We can envisage evaluation and suggestion systems for literature based on such factors as relevance and reputation along similar lines to search engines. Conversely, while it is conceivable that networking systems and search engine technology will consequently become increasingly prone to manipulation, this will at the same time be preventable up to a certain point.
The Transition from Science to Open Science
Admittedly some way behind the consumer sector, the field of science is now beginning to grow accustomed to the Internet. Certain ingrained dogmas that previously stood in the way, like a general sense of apprehension on the grounds of it being ''unscientific'' are being cast aside—slowly but surely—to reveal the innovative concepts, even though this is proving to be a slightly hesitant process. Just how hesitant becomes clear when we take a look at the use of Wikipedia, by way of an example. Wikipedia is admittedly not a primary source of information and calls for considerable caution when quoting, but this doesn't make it apply any less to the seemingly objective conventional science. This is not meant as a disparaging remark or an insult to the significance of scientific relevance but merely serves to point out comparable dangers. We cannot judge the objectivity of unofficial editors who contribute to the Social Internet on a voluntary basis any more than we can assess the source of funds used to sponsor an academic study. It is in any event wise to exercise basic level of caution when employing it for any objective purpose.
Anyone—including scientists—who acquainted himself/herself with the new media early on is already at a great advantage, even now. The earliest pioneers were able to send the treatises to and from much more frequently via email than with the traditional postal system, and this in turn considerably shortened editing cycles, and whoever dared to blow caution to the wind and post his/her text on the Internet, despite any fears of data theft, was rewarded with tangibly higher citation rates. The reasons for this are intuitively plausible to anyone who has ever carried out research work himor herself: we only quote what we find. No matter how brilliant an unavailable text may be, if it is unknown, nobody will cite it. This conclusion can be drawn by taking into account the work of de Solla Price (1976), who analyzed Cumulative Advantage Processes on the example of paper citations. By showing that the Science Citation Index only used to consider 1573 sources out of 26000 journals of interest, he calculated a possible total reach of 72 % by having access to only the first 6 % of journals. He also found out that a longer presence in the archives increased citation rates as a result of higher potential availability. To transform de Solla Prices findings into other words, it is not the content alone that leads to its subsequent utilization in academic circles but also the extent of its reach. This correlation can also be expressed in the form of an citation conversion equation, by dividing the number of quotes (k) by the number of times a text is read (n):
Factor C is used here to denote the citation conversion rate, a coefficient already familiar from conventional web analysis. The conversion rate refers to the ratio between the number of orders placed with online shops and the overall amount of web traffic or the number of baskets/shopping carts filled without proceeding to the check-out. It serves as an indication of the quality of the website which, by deduction, can also provide a straightforward analogy to the quality of the academic publication. Although the concept of the quality of academic publications has been around for decades, a reliable evaluation has yet to be realized, because no-one so far has managed to track the frequency with which they are read. Stateof-the-art performance assessment of academic articles is largely restricted to the quantitative number of citations and, very occasionally, the publication location and the circulation of the journals and books can, with considerable limitations, provide some clues as to the distribution. Acceptance for publication is, however, much more subjective than academia would like to admit. Publication frequently depends on personal contacts, political considerations or the scientific expertise of small editing committees, who may not necessarily fully recognize the significance of a treatise that is indeed new from an academic point of view.
By contrast, academics who post their publications on the Internet in an open, search-friendly format, stand a better chance of being quoted which, given the specific rationale of de Solla Price findings, consequently creates a self-perpetuating effect with every additional citation. Anyone who can be accessed is absolutely bound to be read more frequently, and even where the quality is inferior, may well be quoted less often (in relative terms) but more frequently from the point of view of the absolutely greater number of readers. Since the absolute citation rate in the current status quo is one of the key indicators of quality, a citation volume derived from the high statistical figures leads to a real perception of quality, possibly even when there are better articles on the same topic.
In their own interests, anyone who has grasped this concept is hardly likely to hide the treatises they have written behind the log-ins of a personal homepage, for which there may be a charge, or the internal domains of publishing companies' websites. Academic publications are not a mass product, and anyone who wants to earn money on the basis of the print run would be better off with books dealing with sexuality, entertainment or automotive topics, as these subjects regularly attain a publication run of a million copies or more. In so far as academics are ever involved in their field of interest for money, this is earned indirectly through lectures, consultation fees or application products derived from scientific research, to which the publication itself only contributes the legitimizing reputation. Browsing the Internet with the help of Google Scholar, Google Books or any other search engine while restricting one's search to documents with the ending.pdf will nowadays turn up a large number of academic publications whose full text is extremely specific and its perusal accordingly highly productive for one's own publications. Some publishers and authors even go as far as placing long passages of their books verbatim on online book portals like Amazon, specifically for search purposes, which a good many academics employ for the inclusion of such books that might not otherwise have been selected for citation. There is undoubtedly a conflict of goals and it is proving to be a problem for quite a number of researchers today: the more pages of a book or publication are read in a public domain, the fewer copies are going to be sold. If we regard the turnover achieved with the product and the number of book sales as a yardstick for measuring success, then this viewpoint is justified. If we go one step further, however, to the level of reach or prominence they gain, the number of sales is of absolutely secondary importance.
Barabási/Albert provide a next level of scientific explanation for this correlation with their analysis of networks, which can be regarded as the indisputable standard in network research—at least in terms of the number of recorded citations (cf. Barabási and Albert 1999). At the time of writing this article, Barabási had been quoted by more than 12,000 other scientists, according to Google Scholar. The content of the article deals with the clustering behavior of nodes in scale-free networks. This designation refers to graphs displaying structures that resemble their own but on a different scale: in other words, their structures look similar when enlarged or decreased in size. Another feature of scale-free networks is an exponential function in the number of clusters. Conventional random networks typically display a bell-shaped curve in their distribution function, according to which most nodes have a similar number of links and, in the boundary areas of the bell-shaped curve, tend to be those that deviate from this mean (cf. Erd}os and Rényi 1960).
Examples of conventional random networks include transport or electricity networks, as depicted in Figs. 1 and 2 for the European high-speed railway network. During their investigation, Barabási and Albert (1999) looked at Internet links and initially assumed a random network of the kind introduced by Erdös and Rényi (1960). They were, however, surprised to discover an exponential distribution function rather than a bell-shaped curve. This contained a small number of websites (nodes) which were linked to an extremely large number of other
Fig. 1 The high-speed railway network in Europe. It represents a random network as proposed by Erd}os and Rényi. (Source and copyright of the diagram Akwa and Bernese media and BIL under a creative commons licence)
Fig. 2 The frequency distribution of the links between the nodes of the European high-speed railway network. The density function superimposed on the dots illustrates the underlying Poisson distribution
websites and an extremely large number of websites to which just a very few other sites pointed. Taking their research further, they came across a probability mass function in other scale-free networks, with the help of which it was easier to find well-attached nodes, so they were linked up more often. They called this phenomenon ''Preferential Attachment'' and succeeded in proving that clustering dynamics of this kind give rise to ever-increasing imbalances in the network system over a longer period of time, so that well-attached nodes accumulate further links whereas less well-attached nodes attract even fewer new links. Barabási later coined the very apt designation ''The Rich get Richer Phenomenon'' to describe this effect (cf. Barabási and Bonabeau 2003, p. 64f).
Applying this investigation to science, it confirms the hypothesis proposed at the beginning of this treatise that reach, rather than quality, can lead to truth— particularly in the case of low visibility and accordingly less likelihood of the better essay being linked. Both de Solla Price and Barabasi/Albert identified exponential distribution functions in scientific citation, one called the phenomenon ''Cumulative Advantage Processes'', the other one ''Preferential Attachment''.
There are nevertheless far-reaching discussions currently in progress in academic circles regarding free access to publications, intellectual property rights and the supposed protection of scientific independence. The conflicting goals of scientists and publishers might be interpreted as the reason behind these discussions. Scientific advancement does not necessarily have to be the main priority of the publishing companies, seeing as they are financing an infrastructure and pay its owners dividends. They themselves do not have a share in the indirect income raised through the reputation of the scientist concerned, so the publishers' main interest must lie in the direct generation of revenue, while intellectual property rights and limited access safeguard their influence. By contrast, if we take a closer look, the researcher has an interest in barrier-free accessibility and is therefore trying to break free of the publishers' sphere of influence, although they give him the benefit of the Open Science network. The fact that discussions of this nature are already in progress could be construed as friction due to a change of paradigms, because science could become more objective, more transparent and consequently more democratic, offering impartial benefits to researchers and research outcomes alike, if it were organized properly and supported by the rigorous use of certain aspects of the social web. To ensure that publishers don't come away empty-handed from a turn-around of this kind, they would be well advised to grasp the mechanisms behind this trend—the sooner, the better—and help to shape the framework in which science will be operating in the future through timely investments in the infrastructure for this Science Web 2.0. The fact that those who get noticed early on benefit from long-term reach applies to publishers as well, because a publishing company's reputation—and accordingly being published within its infrastructure—already attracts the better scientists; an innovative form of publication will make no difference to the fundamental research results achieved by de Solla Price and Barabási/Albert. The early bird catches the worm, regardless of whether that bird is a publisher or a researcher.