This text is a practical guide for linguists, and programmers, who work with data in multilingual computational environments. We introduce the basic concepts needed to understand how writing systems and character encodings function, and how they work together at the intersection between the Unicode Standard and the International Phonetic Alphabet. Although these standards are often met with frustration by users, they nevertheless provide language researchers and programmers with a consistent computational architecture needed to process, publish and analyze lexical data from the world's languages. Thus we bring to light common, but not always transparent, pitfalls which researchers face when working with Unicode and IPA. Having identified and overcome these pitfalls involved in making writing systems and character encodings syntactically and semantically interoperable (to the extent that they can be), we created a suite of open-source Python and R tools to work with languages using orthography profiles that describe author- or document-specific orthographic conventions. In this cookbook we describe a formal specification of orthography profiles and provide recipes using open source tools to show how users can segment text, analyze it, identify errors, and to transform it into different written forms for comparative linguistics research.
This open book is licensed under a Creative Commons License (CC BY). You can download The Unicode Cookbook for Linguists ebook for free in PDF format (1.0 MB).
Table of Contents
The Unicode approach
The International Phonetic Alphabet
IPA meets Unicode
The Unicode Cookbook for Linguists
Language Science Press
This timely volume focuses on the period of decolonization and the Cold War as the backdrop to the emergence of new and diverse literary aesthetics that accompanied anti-imperialist commitments and Afro-Asian solidarity. Competing internationalist frameworks produced a flurry of writings that made Asian, African and other world literatures visible ...
Whether you're a business executive or a seasoned developer, something has led you on the quest to learn more about graphs - and what they can do for you.
This ebook will take those new to the world of graphs through the basics of graph technology, including: Using the intuitive Cypher query language; The importance of data relationships; K...
This book presents the proceedings volume of the YOUMARES 8 conference, which took place in Kiel, Germany, in September 2017, supported by the German Association for Marine Sciences (DGM). The YOUMARES conference series is entirely bottom-up organized by and for YOUng MARine RESearchers. Qualified early career scientists moderated the scientific se...
This open access volume explores how UN peace operations are adapting to four trends in the changing global order: (1) the rebalancing of relations between states of the global North and the global South; (2) the rise of regional organisations as providers of peace; (3) the rise of violent extremism and fundamentalist non-state actors; and (4) incr...
This book covers the latest in snow sport epidemiology, snow sport injuries and treatment, and biomechanical/mechanical engineering related to snow sports injuries (mechanisms of injury, injury prevention by equipment design, injury prevention by design of resort features, and more). It brings together a collection of papers from the International ...
Android on x86: an Introduction to Optimizing for Intel® Architecture serves two main purposes. First, it makes the case for adapting your applications onto Intel's x86 architecture, including discussions of the business potential, the changing landscape of the Android marketplace, and the unique challenges and opportunities that arise from x...