Parallels to Search Engines
Expressed in simplified terms, search engines consist of a crawler and an indexer. The crawler visits websites, records the contents and proceeds to the next website via the links provided. There is also a so-called scheduler designed to determine the order in which the next sites will be visited by arranging the links according to priority. The indexer orders the recorded contents and allots them priority for displaying in the lists of results. The precise technical principle is of secondary importance in this context, but a general outline of the analogical derivation process is no doubt useful. To sum up, a search engine arranges results according to their relevance and reputation. It is this principle that will be crucial for a suggestion system in Open Science, too; other technical features that will also be essential for a suggestion system include a crawler, an indexer and a scheduler, thus displaying numerous parallels between search engines and social science in the Open Web.
The background is that, even where the content is of equal relevance, one or more additional coefficients are needed to determine which results appear at the top of the list. These might be such dimensions as frequency of citation, the number of favorable comments and maybe even comments posted by other highly rated scientists. These other dimensions may be varied and, in the interests of maintaining a high standard of output, subject to dynamic change. This is due to the high probability of leading, and implicitly more relevant, search results being used more often for quoting, so scientists strive to optimize their own input.
Similar Problems to Those of Search Engines
The history of search engines is dotted with attempts to influence this process— initially through the frequent repetition of keywords taken from the body of the text. Since this was easy for the author himself to manipulate, the quality of an assessment based primarily on this factor was fairly meaningless. For this reason, external criteria such as the number of links from other websites were added, but they were also easily influenced by means of self-developed networks. We have observed a kind of cat and mouse game between search engines and so-called search engine optimizers over the past 15 years. These SEOs began by inserting a large number of keywords on their websites, which led to the search engines introducing a kind of maximum quota. Everything over and above that quota was classified as spam and greater importance was ascribed to the number of incoming links. So the SEOs began devising their own website structures that pointed to the target sites to be optimized. Search engines consequently began to evaluate the number of different IP numbers as well, so the SEOs retaliated by setting up different servers, whose sites highlighted the target sites in the shape of a star or a circle. And we could add many more examples to his list. Similar developments are to be expected in the scientific sphere, particularly as the setting up of citation networks is nothing unusual even in traditional academia. What does need to be solved is the problem of avoiding cartels of this kind and it is essential that we learn as much as possible from past experience with search engine optimization.
Similar Solutions to Those of Search Engines
Solutions in the field of search engine technology are increasingly permeating the domain of network science. Analyzing typical and atypical linkages has now advanced so far that it can determine with reasonable probability whether a more or less naturally evolved linking network is behind a certain website or whether there are numerous links bred on search engine optimizers' own farms. The solution is not yet complete but the number of very crude manipulations has receded noticeably during the past few years, as Google and other search engines were evaluating search engine positions for those detected. Similar occurrences are to be anticipated in the academic sphere of Open Science. In such areas where network references are unmistakably concentrated in denser clusters than the extent of the subject-matter would normally justify, an algorithm will be employed to reduce the reputation factor to a natural size. Search engines meanwhile go one step further and remove excessively optimized sites completely from the index, a move that can only be reversed by dismantling the linkage cartel or stopping the manipulations. Whilst the hitherto anonymously functioning search engines are only just beginning to identify users in the registered domains and to incorporate their search and surf patterns in the reputation assessment process, this has been common practice in the publication of scientific treatises on the social web right from the start due to the clear authentication system described above. This has the added advantage of being able to include commenting and rating behavior, and possibly even the amount of time spent on a page of a treatise, in the reputation assessment of an article. It is not possible to forecast the entire range of potential manipulations as yet, and a certain amount of reciprocal technological upgrading is also to be anticipated in academic circles—in the interests of unbiased, relevant results on the one hand and motivated by a desire for upfront placements, which hold the promise of additional citations, on the other.