We first began the work of finding our corpus. These song lyrics are mostly taken from various websites judged to be the most accurate. Five songs did not have lyrics availible online and were transcribed by ear. We then began the process of tagging these songs for processing. We first utilized regular expressions for basic structural markup, such as preserving lines and stanzas. We also tagged musical elements in the songs, such as instrumentation and scale. We then focused on the more complex, political ideas we wanted to tag.
We used XML tagging on the Turkish songs to mark up a variety of elements and attributes that occured within the songs. Each song was verified against our Relax NG schema to ensure that the mark-up followed the same conventions.
We tagged an "address" element whenever the narrator used called out to a "you."For the "formality" attribute on address, we marked "formal" for the formal "you", and "informal" for the familiar "you". We also included an "addressee" attribute to indicate if a particular party was being addressed, along with "mood", "tense", "aspect", and "voice" to capture more of the grammar. Often, the address tag is used for tagging commands.
ex:<address formality="informal" pos="verb" mood="imp"> <nature>Asmadan</nature> gel <nature>asmadan</nature> gel <nature>asmadan</nature> </address>
This element contains any references specifically to God.
ex: <allah>Tanrım</allah>
This element contains any reference to a folksinger or the act of playing folk music.
ex: <aşık>Karacoğlan</aşık>
This indicates a musical break in a song. If a song is an instremental, it will only contain a "break" element. A break can also contain attributes "instr", which contains the prominent instrements in the break or song, "scale", which will be either major or minor, and "solo", which indicates one or more instrements has an extensive solo.
ex: <break instr="electro bağlama, kaval" scale="major"/>
Any references to social and economic class are tagged with "class." This element contains the attribute "classtype," whose value is either "upper," "middle," "lower," or "working." The tag can also contain elements "place," "class," "narrator," and/or "violence."
ex:<class type="working">işçiden</class>
This element indicates a mention of death. It can contain the element "narrator," to indicate a shift in who is narrating the song.
ex:<death>Vay gurban</death>
This element indicates economic problems or injustice. It can contain the element "place" to indicate a specific boundary.
ex:<econ>Borç ödeyince</econ>
This element contains any references to family. It can also include the element "narrator."
ex:<family>anam sevgi</family>
This element contains any references to friendship, often the word "dost".
ex:<friendship>dostlar</friendship>
This element indicates references and attitueds toward the future. It can contain the elements "place," "nature," "nationalism," "addressee," "narrator," "poverty," and/or "wealth."
ex: <econ> <place where="foreign">Libya'ya</place> gidecek olanlara</econ>
This element contains references to the practice of Islam. It must contain the attribute "i", whose value is specific school or type of Islam being sung about.
ex: <islam i="sufism">Nasıl çıkar karanlıklar aydınlığa</islam>
This element contains references to love. It must contain the attribute "l", which has a value of "romantic", "platonic", or "religious" to denote the type of love being expressed. The element can also contain the elements "narrator", "address", and/or "family."
ex: <love l="romantic">yare kavuşayım</love>
This element indicates the narrator of the song. It can have attribute number, with "1" meaning a singular narrator, and "2" indicating a collective narrator.
ex:<narrator>biz erkeklerin</narrator>
Here we tagged any references to Turkish Nationalism and used the attribute "approve" to indicate whether or not they approved of a nationalistic sentiment.
ex:<nationalism approve="yes">birarada</nationalism>
In this element, we tagged references to the natural world or any natural imagery. This element may also contain references to the elements "family", "place", "address", or "wealth."
ex:<nature>kara topraklar</nature>
We used this element to highlight lyrics about the neglect of the state. Elements "family," "death," and "place" could be inside this element.
ex:<neglect> <family>Babamın</family> elini öpmeden</neglect>
This element can be used with attribute "when". This element can be used with elements "nature," "love," and "place."
ex:<past when="foundingOfTurkey">1923`ün ılık bir ekim sabahında</past>
We tagged any real or metaphorical place, and indicated where the words refferred to with a "where" attribute. This element may also contain references to the elements "neglect", "poverty", "nature", another "place", "class", or "family."
ex:<place where="rural"> <nature>dağın başına</nature> </place>
This element indicates any word from the list of politically charged words and phrases the Turkish government published such as "peace", "concede land", and "remove troops." This element contains the attribute "rl", which has the value of either "right" or "left" depending on which party's view the context of the word seems to point to.
ex: <poliWord rl="r">sağı</poliWord>
This element contains any references to poverty. It can also contain the element "nature."
ex:<poverty>Bir Yoksulluk</poverty>
We tagged any references to threatened or actual violence here. We used attribute "target" to indicate who this violence was against.
ex:<violence>Bir gürledimi yer yerinden oynar</violence>
This element can contain any type or warning or threat that isn't explicitly violent.
ex:<warning>Sanma faşist olandan</warning>
This element contains any references to wealth.
ex:<wealth>bereket dolu</wealth>
This element contains any references to youth and the ideas that surround it.
ex: <youth>gençlerimiz</youth>
In order to sort through our data, we wrote xQuery scripts to sort through an eXide database containing the tagged xml files.
Here is our public github that contains all queries, xml, schemas, html/css, and transformations.