The Imitation Game
It seems very interesting, in the context of the gender test as described in our course, that in many ways it draws historically from Turing’s thinking.
Many readers may note that the recent film, as indicated earlier, addressing both Turing’s life and his efforts in breaking the Enigma Code in the World War II was called The Imitation Game. Turing published an extremely important article in the May 1950 issue of Mind entitled “Computing Machinery and Intelligence.” More to the point, he called the section of this paper in which he first introduced his Turing test The Imitation Game, which evolved into the title of the biographical film.
It is significant, in our view, that in order to explain to a 1950s audience how to establish whether an entity possessed intelligence that to describe the test in terms of a human and machine would be incomprehensible to most of his audience, since in the 1950, there were only a handful of computers in existence.
Consequently, in introducing the nature of his test, he described it as a way of determining gender, as follows:
I propose to consider the question, ‘Can machines think?’ This should begin with definitions of the meaning of the terms ‘machine’ and ‘think’. The definitions might be framed so as to reflect so far as possible the normal use of the words, but this attitude is dangerous. If the meaning of the words ‘machine’ and ‘think’ are to be found by examining how they are commonly used it is difficult to escape the conclusion that the meaning and the answer to the question, ‘Can machines think?’ is to be sought in a statistical survey such as a Gallup poll. But this is absurd. Instead of attempting such a definition I shall replace the question by another, which is closely related to it and is expressed in relatively unambiguous words.
The new form of the problem can be described in terms of a game which we call the ‘imitation game’. It is played with three people, a man (A), a woman (B), and an interrogator (C) who may be of either sex. The interrogator stays in a room apart from the other two. The object of the game for the interrogator is to determine which of the other two the man is and which is the woman. He knows them by labels X and Y, and at the end of the game he says either ‘X is A and Y is B’ or ‘X is B and Y is A’. The interrogator is allowed to put questions to A and B thus:
C: Will X please tell me the length of his or her hair?
We thus view the initiative that we have developed from the Behavioral Cybersecurity course as a descendant in some small way of Turing’s proposition.
In order to understand the ways in which persons interpret written text and try to assign gender to the author—in effect a version of the gender Turing test (henceforth, GTT) described by Turing in the paper cited earlier—a number of individuals from varying backgrounds, genders, ages, first languages, countries, and professions were given the test in question.
There were 55 subjects completing this test, and a description of their demographics is as follows.
The participants were selected as volunteers, primarily at occasions where the first author was giving a presentation. No restrictions were placed on the selection of volunteer respondents, nor was there any effort taken to balance the participation according to any demographic objective.
The voluntary subjects were (except on one occasion) given no information about the purpose of the test and were also guaranteed anonymity in the processing of the test results. There was no limit on the time to take the test, but most observed respondents seemed to complete the test in about 15 minutes.
Summary of Results
The responses were scored in two ways. First, the number of correct answers identifying the student author was divided by the total number of questions (24) in the complete test. Alternatively, the score was determined by the number of attempts. Since in only 2 of 19 instances, the difference between the two exceeded 2%, it was decided to use the second set of response scores.
Observations of the results of these responses from the 55 participants in this study and their very diverse experiences that they brought to the response to this test yield some very interesting questions to ponder:
First, female respondents were more accurate in the identification of the gender of the students by a margin of 56.89%-51.02%.
Next, older respondents were more accurate in their identification than younger responses by a similar margin of 57.6%-51.77%. This might be a more surprising result since for the most part, the older respondents were not as technically experienced in computer science or cybersecurity matters than the younger responders, who for the most part were students themselves.
One very clear difference is that Eastern European respondents scored far higher in their correct identification of the students’ gender, averaging 66.67%, with the nearest other regional responses being fully 10% less. The number of respondents from Eastern Europe was very small, so generalizations might be risky in this regard. However, the Eastern Europeans (from Romania and Russia) were not first-language speakers of English, although they were also quite fluent in the English language. Each of them also tied for the highest percentage of correct answers of anyone among all 55 respondents.
Of the respondents from the various disciplines, the linguists, anthropologists, engineers, and psychologists all fared better than the computer scientists—and lowest of all were the students who took the test (as opposed to the students who wrote the original answers).
It is possible, of course, to view the entire data set of responses to this test as a matrix of dimensions 24 x 55, wherein the students who wrote the original exam—and thus in effect, created the GTT— represent the rows of the matrix, and the gender classifications by the 55 responders are the columns. If we instead examine the matrix in a row-wise fashion, we learn of the writing styles of the original test takers and their ability (although inadvertent, because no one, other than the first author, planned that the writings would be used to identify the gender of the writer).
Thus, it is perhaps more informative than the assessment of the ability of the respondent to determine the gender of the test takers to note that several of the original test takers were able, unconsciously, to deceive over two-thirds of the respondents. Fully one-quarter (6 of 24) of the students reached the level of greater than two-thirds deception. Of these six “high deceivers,” three were female and three were male students.
At the other end of the spectrum, one-third of the students were not very capable of deception—fooling less than one-third of the respondents. Of these eight students, six were male and only two were female. On the whole, averaging the level of deception by the male and female students, on average the female students were able to deceive 52.5% of the respondents, whereas the male students were only able to accomplish this with 42.2% of the respondents.
All of the respondents described earlier had simply been given a test with only the simple instruction described in the attachment, without any prior preparation or understanding on the part of the respondent as to possible techniques for identifying the gender of a writer or author.
Consequently, we determined that it would be useful to see if persons could be given some training in order to try to improve their results on the GTT. We attempted to identify a number of keys that would assist a reader in trying to improve their scores on the GTT or related tests.
Our next objective was to see if a subject could improve on such text analysis in the case of distinguishing the gender of a writer by looking for certain clues that could be described. A number of techniques to identify the gender of an author were described to perform an analysis of the questions in the original GTT:
- 1. Examine how many pronouns are being used. Female writers tend to use more pronouns (I, you, she, their, myself).
- 2. What types of noun modifiers are being used by the author? Types of noun modifiers: a noun can modify another noun by coming immediately before the noun that follows it. Males prefer words that identify or determine nouns (a, the, that) and words that quantify them (one, two, more).
- 3. Subject matter/style: the topic dealt with or the subject represented in a debate, exposition, or work of art. “Women have a more interactive style,” according to Shlomo Argamon, a computer scientist at the Illinois Institute of Technology in Chicago.
- 4. Be cognizant of word usage and how it may reveal gender. Some possible feminine keywords include with, if, not, where, be, should. Some of the other masculine keywords include around, what, are, as, it, said. This suggests that language tends to encode gender in very subtle ways.
- 5. “Women tend to have a more interactive style,” said Shlomo Argamon, a computer scientist at the Illinois Institute of Technology in Chicago (Argamon et al., 2003). “They want to create a relationship between the writer and the reader.”
Men, on the other hand, use more numbers, adjectives, and determiners—words such as “the,” “this,” and “that”— because they apparently care more than women do about conveying specific information.
- 6. Pay attention to the way they reference the gender of which they speak. For example, a female may refer to her own gender by saying “woman” rather than “girl.”
- 7. Look at the examples that they give. Would you see a male or female saying this phrase?
- 8. A male is more likely to use an example that describes how a male feels.
- 9. Women tend to use better grammar and better sentence structure than males.
- 10. When a person of one gender is describing the feelings/ thoughts of the opposite gender, they tend to draw conclusions that make sense to them but will not provide actual data.
It should be noted that some prior work includes the development of an application available on the Internet (Gender Guesser), developed by Neil Krawetz based on Krawetz (2018) and described at the location http://hackerfactor.com/GenderGuesser.php.
This application seems to depend on the length of the text being analyzed and, in comparison with the responses of our human responders, does not perform as well, as normally the application indicates that the text is too short to give a successful determination of gender.
However, because the overall objective of this research is to determine if a GTT can be used in a cybersecurity context, it is likely that an attacker or hacker might only be providing very short messages—for example, a troll on the Internet trying to mask his or her identity in order to build a relationship, say, with an underage potential victim.
The questions that have been raised by this research have also opened the potential of devising other such tests to determine other characteristics of an author, such as age, profession, geographic origin, or first language. In addition, given that the initial respondents to the test as described earlier are themselves from a wide variety of areas of expertise, nationality, and first language, a number of the prior participants have indicated interest in participating in future research in any of these aforementioned areas.
- 1. Read Turing’s article in Mind. How would a reader respond (a) in 1950; (b) today?
- 2. Try Eliza. How many questions did you ask before Eliza repeated?
- 3. Construct three questions for Jeopardy! that the human contestants would probably answer faster or more correctly than Watson.
- 4. Give examples of two encryptions of the same message with different keys. Use the encryption method of your choice.
- 5. Consider developing your own gender Turing test. List five sample questions that might differentiate between a female or male respondent.
- 6. Consider developing an age Turing test. List five sample questions that might differentiate between a younger or older respondent.
- 7. Take the “Who Answered These Questions” gender Turing test. To find out your score, email your responses to waynep97@ gmail.com. Send a two-line email. The first line will have “F: xl, 2, x3, ...” and the second line “M: yl, y, y3, ....” You will receive an email response.
- 8. Comment in the context of 2019, from Turing’s article in Mind, on interrogator C’s question in order to determine the gender of the hidden subject.
- 9. Run the Gender Guesser on (A) your own writing and (B) the “Test Bed for the Questionnaire” in the following.
Argamon, S.. Koppel. M., Fine, J., & Shimoni, A. R. 2003. Gender, genre, and writing style in formal written texts. Text, 23(3), 321-346.
Baker, S. 2012. Final Jeopardy: The Story of Watson, the Computer That Will Transform Our World. Houghton Mifflin Harcourt, Boston, MA.
Krawetz, N. 2018. Gender Guesser. http://hackerfactor.com/Gender Guesser.php.
National Centers of Academic Excellence in Information Assurance Education (NCAEIAE). 2018. National Security Agency, https:// www.nsa.gov/ia/academic_outreach/nat_cae/index.shtml.
Patterson, W. & Winston-Proctor, C. 2019. Behavioral Cybersecurity. CRC Press. Boca Raton. FL.
Pedersen, J. (ed.). 2011. Peter Hilton: Codebreaker and mathematician (1923-2010). Notices of the American Mathematics Society, 58(11), 1538-1551.
Sony Pictures Releasing. 2014. The Imitation Game (film).
Turing, A. M. 1950. Computing machinery and intelligence. Mind: A Quarterly Review of Psychology and Philosophy, LIX(236), 433-460.
Weizenbaum, J. 1966. ELIZA—A computer program for the study of natural language communication between man and machine. Communications of the Association for Computing Machinery, 9, 36-45.