index.html

---
layout: strapless
menu_item: home
---

<section class="section-alt pad-top">
    <div class="container">
        <h1 class="text-center headline">Glottobank</h1>

    </div>
</section>

<section class="section-main">
    <div class="container pad-top">
        <div class="service-flexrow pad-top">
            <div class="column-66">
                <div class="gray-box">
                    <p class="lead">
                        Glottobank is an international research consortium established to
                        document and
                        understand the world’s linguistic diversity. Glottobank team
                        members are
                        pursuing this goal on two fronts. First, we have established five
                        global
                        databases documenting variation in language structure
                        (<a href="#grambank">Grambank</a>), 
                        lexicon (<a href="#lexibank">Lexibank</a>), paradigm systems 
                        (<a href="#parabank">Parabank</a>), numerals
                        (<a href="#numeralbank">Numeralbank</a>), and phonetic changes
                        (<a href="#phonobank">Phonobank</a>). 
                        In doing so, we seek to develop new methods in language
                        documentation, compile 
                        data on the world’s languages and make this data accessible and
                        useful. Second,
                        we are developing methods to use this data to make inferences
                        about human 
                        prehistory, relationships between languages and processes of
                        language change. We anticipate data will begin to become available
                        in 2022.
                    </p>
                </div>
            </div>
            <div class="column-33">
                <div>
                    <img src="images/glottobank_all.jpg" alt="CLDF logo"
                         class="img-responsive">
                </div>
            </div>
        </div>
        <div class="container pad-top">
            <div class="service-flexrow pad-top">
                <div class="column-100">
                    <h2 id="grambank">Grambank</h2>
                    <p>
                        Grambank is a database of structural (typological) features of
                        language. It
                        consists of 195 logically independent features (most of them
                        binary) spanning
                        all subdomains of morphosyntax. The Grambank feature questionnaire
                        has been
                        filled in, based on reference grammars, for more than 2,000 languages.
                        The aim is to
                        eventually reach as many as 3,000 languages. The database can be
                        used to
                        investigate language prehistory, the geographical-distribution of
                        features, language universals and the functional interaction of
                        structural
                        features.
                    </p>
                </div>
            </div>
            <!--
            To find out more, visit the Grambank website.
            -->
            <div class="service-flexrow pad-top">
                <div class="column-100">
                    <h2 id="lexibank">Lexibank</h2>
                    <p>
                        Lexibank is a <a href="https://github.com/lexibank/lexibank-analysed/">public database and repository</a> for lexical data from
                        the languages of the
                        world. Currently, Lexibank contains lexemes and cognate judgments
                        from ~2500 languages
                        spanning Africa, Europe, Asia, the Pacific, and the Americas. The
                        database will be used to
                        refine cognate judgments, infer language relationships, construct
                        language phylogenies,
                        test hypotheses about language history, investigate factors that
                        affect the mode and
                        tempo of language evolution, model sound change, and facilitate
                        quantitative comparisons
                        with other types of linguistic data. The initial focus of Lexibank
                        will be on compiling
                        basic or core vocabulary, but ultimately the database will be
                        expanded to include a full
                        range of lexicon from all the world’s languages.
                        <!--
                        For more information on Lexibank and how to use or submit data please see the project
                        website.
                        -->
                    </p>
                </div>
            </div>
            <div class="service-flexrow pad-top">
                <div class="column-100">
                    <h2 id="parabank">Parabank</h2>
                    <p>
                        Parabank is a large database of selected paradigmatic structures
                        found in the world’s
                        languages, focusing on the patterning of formal similarities and
                        identities (or
                        <i>syncretisms</i>) between cells in these paradigms (cf  <i>I</i> vs <i>me</i>
                        but <i>you</i> vs <i>you</i>). It is
                        motivated by the observation that different languages and language
                        families have
                        significantly different patterns in their syncretisms and that at
                        least some of these are
                        stable through time. In addition, information arranged in matrices
                        gains additional power
                        because of the large number of values that can be calculated by
                        comparing every cell with
                        every other cell.
                    </p>
                    <p>
                        Because the paradigms we explore are ubiquitous across the world’s
                        languages, our working
                        hypothesis is that paradigmatic syncretisms can provide
                        significant signal to linguistic
                        relationships in time, and the database is designed to allow the
                        systematic
                        exploration of morphosyntactic features by linguistic typologists
                        and evolutionary
                        biologists. Additionally, Parabank will be an important resource
                        to assist in the
                        identification and quantification of some of the important
                        mechanisms in how the design
                        space of language evolves. Initially, the database will assemble
                        paradigms of free
                        pronouns, verb agreement, and a subset of kin terms, with
                        subsequent plans to incorporate
                        demonstratives/interrogatives/indefinite pronouns/negative
                        pronouns, numeral systems, and
                        other promising linguistic subsystems with paradigmatic structure.
                    </p>
                    <p>
                        Parabank will be led by Nick Evans, Simon Greenhill and Kyla
                        Quinn, all based at the
                        Australian Research Council Centre of Excellence for the Dynamics
                        of Language (CoEDL), at
                        the Australian National University (ANU), but welcomes the
                        participation of any interested
                        researcher. Funding will primarily come from the CoEDL.
                        <!--
                        To find out more, click here.
                        -->
                    </p>
                </div>
            </div>
            <div class="service-flexrow pad-top">
                <div class="column-100">
                    <h2 id="numeralbank">Numeralbank</h2>
                    <p>
                        Numeralbank is a public database and repository on numeral systems
                        in the world’s languages. It is motivated by the idea that number
                        words do not just form an important part of most languages, but
                        constitute systems that serve as essential tools at the
                        intersection of culture, language, and cognition. Numeralbank can
                        be used to classify numeral systems according to their properties,
                        to document the geographical distribution of system types, to
                        investigate commonalities and differences in system properties
                        across languages, to reconstruct the most likely ancestral states,
                        and to explore possible limits to and constraints on the striking
                        diversity in how people count. Initially, the database will allow
                        for analyses within and across systems, but the ultimate goal is
                        to support tests of hypotheses on linguistic, cognitive, and
                        cultural factors that may drive the emergence and evolution of
                        numeral systems.
                    </p>
                    <p>
                        Entries in Numeralbank are largely based on data collected by
                        Eugene Chan as part of the long-running project "Numeral Systems
                        of the World's Languages" that was hosted at the former Department
                        of Linguistics at the MPI for Evolutionary Anthropology in
                        Leipzig. The data is now hosted at the Department of Cultural and
                        Linguistic Evolution at the MPI for Evolutionary Anthropology
                        in Leipzig. The Numeralbank database is designed and maintained by
                        Hans-Jörg Bibiko. The Numeralbank team consists of (in
                        alphabetical order) 
                        <a href="http://www.uib.no/en/persons/Andrea.Bender">Andrea Bender</a>,
                        <a href="http://www.shh.mpg.de/employees/42541/55811">Hans-Jörg Bibiko</a>,
                        <a href="https://www.eva.mpg.de/linguistic-and-cultural-evolution/staff/robert-forkel/">Robert Forkel</a>,
                        <a href="http://www.shh.mpg.de/employees/48696/55811">Simon Greenhill</a>,
                        <a href="http://www.shh.mpg.de/2923/russellgray">Russell Gray</a>, <a href="http://www.shh.mpg.de/employees/48214/55811">Harald
                        Hammarström</a>, <a href="http://www.bristol.ac.uk/school-of-arts/people/fiona-m-jordan/">Fiona
                        Jordan</a>,
                        and <a href="http://www.shh.mpg.de/employees/48689/25522">Annemarie
                        Verkerk</a>.
                    </p>
                </div>
            </div>
            <div class="service-flexrow pad-top">
                <div class="column-100">
                    <h2 id="phonobank">Phonobank</h2>
                    <p>
                        Phonobank aims to establish a cross-linguistic comparative
                        database of sound patterns,
                        sound correspondences, and sound shifts. Our starting point is
                        collections of multiple
                        phonetic alignments of cognate sets in language families. All
                        sounds are linked to a
                        cross-linguistic phonetic alphabet that provides distinctive
                        features and segment
                        descriptions. The ultimate goals of the database are to support
                        the computational
                        linguistic comparison of word forms and to serve as a basis for
                        improving the methods of
                        computer assisted cognate detection, sound reconstruction and
                        building linguistic
                        phylogenies from sound correspondences.
                    </p>
                </div>
            </div>
            <div class="service-flexrow pad-top">
                <div class="column-100">
                    <h2>Methods and Tools</h2>
                    <p>
                        The Glottobank team is developing a suite of methods and tools for
                        analysing comparative
                        linguistic data. For example, using the <a href="http://www.beast2.org">BEAST
                        2</a> software
                        platform, we have created a Bayesian framework for
                        <a href="http://language.cs.auckland.ac.nz/">phylogeographic inference of language expansion in space and
                        time</a>.
                        <a href="https://github.com/lmaurits/BEASTling">BEASTling</a> is a program
                        designed
                        to help linguists easily prepare Bayesian phylogenetic analyses of
                        linguistic data using the BEAST 2 platform.  It automates many
                        tedious
                        data-preparation tasks, features close integration with the
                        <a href="http://glottolog.org">Glottolog language catalog</a>, and strives to
                        follow established best
                        practices for computational linguistic phylogenetics.
                        <a href="http://lingpy.org">LingPy</a> is a Python library for quantitative
                        tasks in historical
                        linguistics. It offers state-of-the-art algorithms for pairwise
                        and multiple phonetic
                        alignment analyses, automatic cognate detection, and various tools
                        to explore and curate
                        lexical data. Finally, <a href="http://cldf.clld.org">CLDF</a>
                        and associated standards
                        are aimed at providing an interface between databases and tools
                        which will enable easier
                        sharing of data and code.
                    </p>
                </div>
            </div>
            <div class="service-flexrow pad-top">
                <div class="column-100">
                    <h2>Funding</h2>
                    <p>
                        In addition to the time and energy of members of the consortium,
                        Glottobank is supported
                        by the Max Planck Institute for the Science of Human History,
                        a Royal Society of New Zealand Marsden Grant (grant #13-UOA-121)
                        and
                        the ARC Centre of Excellence for the Dynamics of Language.
                    </p>
                </div>
            </div>
        </div>
    </div>
</section>