GNU GENERAL PUBLIC LICENSE + Version 3, 29 June 2007 + + Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/> + Everyone is permitted to copy and distribute verbatim copies + of this license document, but changing it is not allowed. + + Preamble + + The GNU General Public License is a free, copyleft license for +software and other kinds of works. + + The licenses for most software and other practical works are designed +to take away your freedom to share and change the works. By contrast, +the GNU General Public License is intended to guarantee your freedom to +share and change all versions of a program--to make sure it remains free +software for all its users. We, the Free Software Foundation, use the +GNU General Public License for most of our software; it applies also to +any other work released this way by its authors. You can apply it to +your programs, too. + + When we speak of free software, we are referring to freedom, not +price. Our General Public Licenses are designed to make sure that you +have the freedom to distribute copies of free software (and charge for +them if you wish), that you receive source code or can get it if you +want it, that you can change the software or use pieces of it in new +free programs, and that you know you can do these things. + + To protect your rights, we need to prevent others from denying you +these rights or asking you to surrender the rights. Therefore, you have +certain responsibilities if you distribute copies of the software, or if +you modify it: responsibilities to respect the freedom of others. + + For example, if you distribute copies of such a program, whether +gratis or for a fee, you must pass on to the recipients the same +freedoms that you received. You must make sure that they, too, receive +or can get the source code. And you must show them these terms so they +know their rights. + + Developers that use the GNU GPL protect your rights with two steps: +(1) assert copyright on the software, and (2) offer you this License +giving you legal permission to copy, distribute and/or modify it. + + For the developers' and authors' protection, the GPL clearly explains +that there is no warranty for this free software. For both users' and +authors' sake, the GPL requires that modified versions be marked as +changed, so that their problems will not be attributed erroneously to +authors of previous versions. + + Some devices are designed to deny users access to install or run +modified versions of the software inside them, although the manufacturer +can do so. This is fundamentally incompatible with the aim of +protecting users' freedom to change the software. The systematic +pattern of such abuse occurs in the area of products for individuals to +use, which is precisely where it is most unacceptable. Therefore, we +have designed this version of the GPL to prohibit the practice for those +products. If such problems arise substantially in other domains, we +stand ready to extend this provision to those domains in future versions +of the GPL, as needed to protect the freedom of users. + + Finally, every program is threatened constantly by software patents. +States should not allow patents to restrict development and use of +software on general-purpose computers, but in those that do, we wish to +avoid the special danger that patents applied to a free program could +make it effectively proprietary. To prevent this, the GPL assures that +patents cannot be used to render the program non-free. + + The precise terms and conditions for copying, distribution and +modification follow. + + TERMS AND CONDITIONS + + 0. Definitions. + + "This License" refers to version 3 of the GNU General Public License. + + "Copyright" also means copyright-like laws that apply to other kinds of +works, such as semiconductor masks. + + "The Program" refers to any copyrightable work licensed under this +License. Each licensee is addressed as "you". "Licensees" and +"recipients" may be individuals or organizations. + + To "modify" a work means to copy from or adapt all or part of the work +in a fashion requiring copyright permission, other than the making of an +exact copy. The resulting work is called a "modified version" of the +earlier work or a work "based on" the earlier work. + + A "covered work" means either the unmodified Program or a work based +on the Program. + + To "propagate" a work means to do anything with it that, without +permission, would make you directly or secondarily liable for +infringement under applicable copyright law, except executing it on a +computer or modifying a private copy. Propagation includes copying, +distribution (with or without modification), making available to the +public, and in some countries other activities as well. + + To "convey" a work means any kind of propagation that enables other +parties to make or receive copies. Mere interaction with a user through +a computer network, with no transfer of a copy, is not conveying. + + An interactive user interface displays "Appropriate Legal Notices" +to the extent that it includes a convenient and prominently visible +feature that (1) displays an appropriate copyright notice, and (2) +tells the user that there is no warranty for the work (except to the +extent that warranties are provided), that licensees may convey the +work under this License, and how to view a copy of this License. If +the interface presents a list of user commands or options, such as a +menu, a prominent item in the list meets this criterion. + + 1. Source Code. + + The "source code" for a work means the preferred form of the work +for making modifications to it. "Object code" means any non-source +form of a work. + + A "Standard Interface" means an interface that either is an official +standard defined by a recognized standards body, or, in the case of +interfaces specified for a particular programming language, one that +is widely used among developers working in that language. + + The "System Libraries" of an executable work include anything, other +than the work as a whole, that (a) is included in the normal form of +packaging a Major Component, but which is not part of that Major +Component, and (b) serves only to enable use of the work with that +Major Component, or to implement a Standard Interface for which an +implementation is available to the public in source code form. A +"Major Component", in this context, means a major essential component +(kernel, window system, and so on) of the specific operating system +(if any) on which the executable work runs, or a compiler used to +produce the work, or an object code interpreter used to run it. + + The "Corresponding Source" for a work in object code form means all +the source code needed to generate, install, and (for an executable +work) run the object code and to modify the work, including scripts to +control those activities. However, it does not include the work's +System Libraries, or general-purpose tools or generally available free +programs which are used unmodified in performing those activities but +which are not part of the work. For example, Corresponding Source +includes interface definition files associated with source files for +the work, and the source code for shared libraries and dynamically +linked subprograms that the work is specifically designed to require, +such as by intimate data communication or control flow between those +subprograms and other parts of the work. + + The Corresponding Source need not include anything that users +can regenerate automatically from other parts of the Corresponding +Source. + + The Corresponding Source for a work in source code form is that +same work. + + 2. Basic Permissions. + + All rights granted under this License are granted for the term of +copyright on the Program, and are irrevocable provided the stated +conditions are met. This License explicitly affirms your unlimited +permission to run the unmodified Program. The output from running a +covered work is covered by this License only if the output, given its +content, constitutes a covered work. This License acknowledges your +rights of fair use or other equivalent, as provided by copyright law. + + You may make, run and propagate covered works that you do not +convey, without conditions so long as your license otherwise remains +in force. You may convey covered works to others for the sole purpose +of having them make modifications exclusively for you, or provide you +with facilities for running those works, provided that you comply with +the terms of this License in conveying all material for which you do +not control copyright. Those thus making or running the covered works +for you must do so exclusively on your behalf, under your direction +and control, on terms that prohibit them from making any copies of +your copyrighted material outside their relationship with you. + + Conveying under any other circumstances is permitted solely under +the conditions stated below. Sublicensing is not allowed; section 10 +makes it unnecessary. + + 3. Protecting Users' Legal Rights From Anti-Circumvention Law. + + No covered work shall be deemed part of an effective technological +measure under any applicable law fulfilling obligations under article +11 of the WIPO copyright treaty adopted on 20 December 1996, or +similar laws prohibiting or restricting circumvention of such +measures. + + When you convey a covered work, you waive any legal power to forbid +circumvention of technological measures to the extent such circumvention +is effected by exercising rights under this License with respect to +the covered work, and you disclaim any intention to limit operation or +modification of the work as a means of enforcing, against the work's +users, your or third parties' legal rights to forbid circumvention of +technological measures. + + 4. Conveying Verbatim Copies. + + You may convey verbatim copies of the Program's source code as you +receive it, in any medium, provided that you conspicuously and +appropriately publish on each copy an appropriate copyright notice; +keep intact all notices stating that this License and any +non-permissive terms added in accord with section 7 apply to the code; +keep intact all notices of the absence of any warranty; and give all +recipients a copy of this License along with the Program. + + You may charge any price or no price for each copy that you convey, +and you may offer support or warranty protection for a fee. + + 5. Conveying Modified Source Versions. + + You may convey a work based on the Program, or the modifications to +produce it from the Program, in the form of source code under the +terms of section 4, provided that you also meet all of these conditions: + + a) The work must carry prominent notices stating that you modified + it, and giving a relevant date. + + b) The work must carry prominent notices stating that it is + released under this License and any conditions added under section + 7. This requirement modifies the requirement in section 4 to + "keep intact all notices". + + c) You must license the entire work, as a whole, under this + License to anyone who comes into possession of a copy. This + License will therefore apply, along with any applicable section 7 + additional terms, to the whole of the work, and all its parts, + regardless of how they are packaged. This License gives no + permission to license the work in any other way, but it does not + invalidate such permission if you have separately received it. + + d) If the work has interactive user interfaces, each must display + Appropriate Legal Notices; however, if the Program has interactive + interfaces that do not display Appropriate Legal Notices, your + work need not make them do so. + + A compilation of a covered work with other separate and independent +works, which are not by their nature extensions of the covered work, +and which are not combined with it such as to form a larger program, +in or on a volume of a storage or distribution medium, is called an +"aggregate" if the compilation and its resulting copyright are not +used to limit the access or legal rights of the compilation's users +beyond what the individual works permit. Inclusion of a covered work +in an aggregate does not cause this License to apply to the other +parts of the aggregate. + + 6. Conveying Non-Source Forms. + + You may convey a covered work in object code form under the terms +of sections 4 and 5, provided that you also convey the +machine-readable Corresponding Source under the terms of this License, +in one of these ways: + + a) Convey the object code in, or embodied in, a physical product + (including a physical distribution medium), accompanied by the + Corresponding Source fixed on a durable physical medium + customarily used for software interchange. + + b) Convey the object code in, or embodied in, a physical product + (including a physical distribution medium), accompanied by a + written offer, valid for at least three years and valid for as + long as you offer spare parts or customer support for that product + model, to give anyone who possesses the object code either (1) a + copy of the Corresponding Source for all the software in the + product that is covered by this License, on a durable physical + medium customarily used for software interchange, for a price no + more than your reasonable cost of physically performing this + conveying of source, or (2) access to copy the + Corresponding Source from a network server at no charge. + + c) Convey individual copies of the object code with a copy of the + written offer to provide the Corresponding Source. This + alternative is allowed only occasionally and noncommercially, and + only if you received the object code with such an offer, in accord + with subsection 6b. + + d) Convey the object code by offering access from a designated + place (gratis or for a charge), and offer equivalent access to the + Corresponding Source in the same way through the same place at no + further charge. You need not require recipients to copy the + Corresponding Source along with the object code. If the place to + copy the object code is a network server, the Corresponding Source + may be on a different server (operated by you or a third party) + that supports equivalent copying facilities, provided you maintain + clear directions next to the object code saying where to find the + Corresponding Source. Regardless of what server hosts the + Corresponding Source, you remain obligated to ensure that it is + available for as long as needed to satisfy these requirements. + + e) Convey the object code using peer-to-peer transmission, provided + you inform other peers where the object code and Corresponding + Source of the work are being offered to the general public at no + charge under subsection 6d. + + A separable portion of the object code, whose source code is excluded +from the Corresponding Source as a System Library, need not be +included in conveying the object code work. + + A "User Product" is either (1) a "consumer product", which means any +tangible personal property which is normally used for personal, family, +or household purposes, or (2) anything designed or sold for incorporation +into a dwelling. In determining whether a product is a consumer product, +doubtful cases shall be resolved in favor of coverage. For a particular +product received by a particular user, "normally used" refers to a +typical or common use of that class of product, regardless of the status +of the particular user or of the way in which the particular user +actually uses, or expects or is expected to use, the product. A product +is a consumer product regardless of whether the product has substantial +commercial, industrial or non-consumer uses, unless such uses represent +the only significant mode of use of the product. + + "Installation Information" for a User Product means any methods, +procedures, authorization keys, or other information required to install +and execute modified versions of a covered work in that User Product from +a modified version of its Corresponding Source. The information must +suffice to ensure that the continued functioning of the modified object +code is in no case prevented or interfered with solely because +modification has been made. + + If you convey an object code work under this section in, or with, or +specifically for use in, a User Product, and the conveying occurs as +part of a transaction in which the right of possession and use of the +User Product is transferred to the recipient in perpetuity or for a +fixed term (regardless of how the transaction is characterized), the +Corresponding Source conveyed under this section must be accompanied +by the Installation Information. But this requirement does not apply +if neither you nor any third party retains the ability to install +modified object code on the User Product (for example, the work has +been installed in ROM). + + The requirement to provide Installation Information does not include a +requirement to continue to provide support service, warranty, or updates +for a work that has been modified or installed by the recipient, or for +the User Product in which it has been modified or installed. Access to a +network may be denied when the modification itself materially and +adversely affects the operation of the network or violates the rules and +protocols for communication across the network. + + Corresponding Source conveyed, and Installation Information provided, +in accord with this section must be in a format that is publicly +documented (and with an implementation available to the public in +source code form), and must require no special password or key for +unpacking, reading or copying. + + 7. Additional Terms. + + "Additional permissions" are terms that supplement the terms of this +License by making exceptions from one or more of its conditions. +Additional permissions that are applicable to the entire Program shall +be treated as though they were included in this License, to the extent +that they are valid under applicable law. If additional permissions +apply only to part of the Program, that part may be used separately +under those permissions, but the entire Program remains governed by +this License without regard to the additional permissions. + + When you convey a copy of a covered work, you may at your option +remove any additional permissions from that copy, or from any part of +it. (Additional permissions may be written to require their own +removal in certain cases when you modify the work.) You may place +additional permissions on material, added by you to a covered work, +for which you have or can give appropriate copyright permission. + + Notwithstanding any other provision of this License, for material you +add to a covered work, you may (if authorized by the copyright holders of +that material) supplement the terms of this License with terms: + + a) Disclaiming warranty or limiting liability differently from the + terms of sections 15 and 16 of this License; or + + b) Requiring preservation of specified reasonable legal notices or + author attributions in that material or in the Appropriate Legal + Notices displayed by works containing it; or + + c) Prohibiting misrepresentation of the origin of that material, or + requiring that modified versions of such material be marked in + reasonable ways as different from the original version; or + + d) Limiting the use for publicity purposes of names of licensors or + authors of the material; or + + e) Declining to grant rights under trademark law for use of some + trade names, trademarks, or service marks; or + + f) Requiring indemnification of licensors and authors of that + material by anyone who conveys the material (or modified versions of + it) with contractual assumptions of liability to the recipient, for + any liability that these contractual assumptions directly impose on + those licensors and authors. + + All other non-permissive additional terms are considered "further +restrictions" within the meaning of section 10. If the Program as you +received it, or any part of it, contains a notice stating that it is +governed by this License along with a term that is a further +restriction, you may remove that term. If a license document contains +a further restriction but permits relicensing or conveying under this +License, you may add to a covered work material governed by the terms +of that license document, provided that the further restriction does +not survive such relicensing or conveying. + + If you add terms to a covered work in accord with this section, you +must place, in the relevant source files, a statement of the +additional terms that apply to those files, or a notice indicating +where to find the applicable terms. + + Additional terms, permissive or non-permissive, may be stated in the +form of a separately written license, or stated as exceptions; +the above requirements apply either way. + + 8. Termination. + + You may not propagate or modify a covered work except as expressly +provided under this License. Any attempt otherwise to propagate or +modify it is void, and will automatically terminate your rights under +this License (including any patent licenses granted under the third +paragraph of section 11). + + However, if you cease all violation of this License, then your +license from a particular copyright holder is reinstated (a) +provisionally, unless and until the copyright holder explicitly and +finally terminates your license, and (b) permanently, if the copyright +holder fails to notify you of the violation by some reasonable means +prior to 60 days after the cessation. + + Moreover, your license from a particular copyright holder is +reinstated permanently if the copyright holder notifies you of the +violation by some reasonable means, this is the first time you have +received notice of violation of this License (for any work) from that +copyright holder, and you cure the violation prior to 30 days after +your receipt of the notice. + + Termination of your rights under this section does not terminate the +licenses of parties who have received copies or rights from you under +this License. If your rights have been terminated and not permanently +reinstated, you do not qualify to receive new licenses for the same +material under section 10. + + 9. Acceptance Not Required for Having Copies. + + You are not required to accept this License in order to receive or +run a copy of the Program. Ancillary propagation of a covered work +occurring solely as a consequence of using peer-to-peer transmission +to receive a copy likewise does not require acceptance. However, +nothing other than this License grants you permission to propagate or +modify any covered work. These actions infringe copyright if you do +not accept this License. Therefore, by modifying or propagating a +covered work, you indicate your acceptance of this License to do so. + + 10. Automatic Licensing of Downstream Recipients. + + Each time you convey a covered work, the recipient automatically +receives a license from the original licensors, to run, modify and +propagate that work, subject to this License. You are not responsible +for enforcing compliance by third parties with this License. + + An "entity transaction" is a transaction transferring control of an +organization, or substantially all assets of one, or subdividing an +organization, or merging organizations. If propagation of a covered +work results from an entity transaction, each party to that +transaction who receives a copy of the work also receives whatever +licenses to the work the party's predecessor in interest had or could +give under the previous paragraph, plus a right to possession of the +Corresponding Source of the work from the predecessor in interest, if +the predecessor has it or can get it with reasonable efforts. + + You may not impose any further restrictions on the exercise of the +rights granted or affirmed under this License. For example, you may +not impose a license fee, royalty, or other charge for exercise of +rights granted under this License, and you may not initiate litigation +(including a cross-claim or counterclaim in a lawsuit) alleging that +any patent claim is infringed by making, using, selling, offering for +sale, or importing the Program or any portion of it. + + 11. Patents. + + A "contributor" is a copyright holder who authorizes use under this +License of the Program or a work on which the Program is based. The +work thus licensed is called the contributor's "contributor version". + + A contributor's "essential patent claims" are all patent claims +owned or controlled by the contributor, whether already acquired or +hereafter acquired, that would be infringed by some manner, permitted +by this License, of making, using, or selling its contributor version, +but do not include claims that would be infringed only as a +consequence of further modification of the contributor version. For +purposes of this definition, "control" includes the right to grant +patent sublicenses in a manner consistent with the requirements of +this License. + + Each contributor grants you a non-exclusive, worldwide, royalty-free +patent license under the contributor's essential patent claims, to +make, use, sell, offer for sale, import and otherwise run, modify and +propagate the contents of its contributor version. + + In the following three paragraphs, a "patent license" is any express +agreement or commitment, however denominated, not to enforce a patent +(such as an express permission to practice a patent or covenant not to +sue for patent infringement). To "grant" such a patent license to a +party means to make such an agreement or commitment not to enforce a +patent against the party. + + If you convey a covered work, knowingly relying on a patent license, +and the Corresponding Source of the work is not available for anyone +to copy, free of charge and under the terms of this License, through a +publicly available network server or other readily accessible means, +then you must either (1) cause the Corresponding Source to be so +available, or (2) arrange to deprive yourself of the benefit of the +patent license for this particular work, or (3) arrange, in a manner +consistent with the requirements of this License, to extend the patent +license to downstream recipients. "Knowingly relying" means you have +actual knowledge that, but for the patent license, your conveying the +covered work in a country, or your recipient's use of the covered work +in a country, would infringe one or more identifiable patents in that +country that you have reason to believe are valid. + + If, pursuant to or in connection with a single transaction or +arrangement, you convey, or propagate by procuring conveyance of, a +covered work, and grant a patent license to some of the parties +receiving the covered work authorizing them to use, propagate, modify +or convey a specific copy of the covered work, then the patent license +you grant is automatically extended to all recipients of the covered +work and works based on it. + + A patent license is "discriminatory" if it does not include within +the scope of its coverage, prohibits the exercise of, or is +conditioned on the non-exercise of one or more of the rights that are +specifically granted under this License. You may not convey a covered +work if you are a party to an arrangement with a third party that is +in the business of distributing software, under which you make payment +to the third party based on the extent of your activity of conveying +the work, and under which the third party grants, to any of the +parties who would receive the covered work from you, a discriminatory +patent license (a) in connection with copies of the covered work +conveyed by you (or copies made from those copies), or (b) primarily +for and in connection with specific products or compilations that +contain the covered work, unless you entered into that arrangement, +or that patent license was granted, prior to 28 March 2007. + + Nothing in this License shall be construed as excluding or limiting +any implied license or other defenses to infringement that may +otherwise be available to you under applicable patent law. + + 12. No Surrender of Others' Freedom. + + If conditions are imposed on you (whether by court order, agreement or +otherwise) that contradict the conditions of this License, they do not +excuse you from the conditions of this License. If you cannot convey a +covered work so as to satisfy simultaneously your obligations under this +License and any other pertinent obligations, then as a consequence you may +not convey it at all. For example, if you agree to terms that obligate you +to collect a royalty for further conveying from those to whom you convey +the Program, the only way you could satisfy both those terms and this +License would be to refrain entirely from conveying the Program. + + 13. Use with the GNU Affero General Public License. + + Notwithstanding any other provision of this License, you have +permission to link or combine any covered work with a work licensed +under version 3 of the GNU Affero General Public License into a single +combined work, and to convey the resulting work. The terms of this +License will continue to apply to the part which is the covered work, +but the special requirements of the GNU Affero General Public License, +section 13, concerning interaction through a network will apply to the +combination as such. + + 14. Revised Versions of this License. + + The Free Software Foundation may publish revised and/or new versions of +the GNU General Public License from time to time. Such new versions will +be similar in spirit to the present version, but may differ in detail to +address new problems or concerns. + + Each version is given a distinguishing version number. If the +Program specifies that a certain numbered version of the GNU General +Public License "or any later version" applies to it, you have the +option of following the terms and conditions either of that numbered +version or of any later version published by the Free Software +Foundation. If the Program does not specify a version number of the +GNU General Public License, you may choose any version ever published +by the Free Software Foundation. + + If the Program specifies that a proxy can decide which future +versions of the GNU General Public License can be used, that proxy's +public statement of acceptance of a version permanently authorizes you +to choose that version for the Program. + + Later license versions may give you additional or different +permissions. However, no additional obligations are imposed on any +author or copyright holder as a result of your choosing to follow a +later version. + + 15. Disclaimer of Warranty. + + THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY +APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT +HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY +OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, +THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR +PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM +IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF +ALL NECESSARY SERVICING, REPAIR OR CORRECTION. + + 16. Limitation of Liability. + + IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING +WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS +THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY +GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE +USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF +DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD +PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), +EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF +SUCH DAMAGES. + + 17. Interpretation of Sections 15 and 16. + + If the disclaimer of warranty and limitation of liability provided +above cannot be given local legal effect according to their terms, +reviewing courts shall apply local law that most closely approximates +an absolute waiver of all civil liability in connection with the +Program, unless a warranty or assumption of liability accompanies a +copy of the Program in return for a fee. + + END OF TERMS AND CONDITIONS + + How to Apply These Terms to Your New Programs + + If you develop a new program, and you want it to be of the greatest +possible use to the public, the best way to achieve this is to make it +free software which everyone can redistribute and change under these terms. + + To do so, attach the following notices to the program. It is safest +to attach them to the start of each source file to most effectively +state the exclusion of warranty; and each file should have at least +the "copyright" line and a pointer to where the full notice is found. + + <one line to give the program's name and a brief idea of what it does.> + Copyright (C) <year> <name of author> + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see <http://www.gnu.org/licenses/>. + +Also add information on how to contact you by electronic and paper mail. + + If the program does terminal interaction, make it output a short +notice like this when it starts in an interactive mode: + + <program> Copyright (C) <year> <name of author> + This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'. + This is free software, and you are welcome to redistribute it + under certain conditions; type `show c' for details. + +The hypothetical commands `show w' and `show c' should show the appropriate +parts of the General Public License. Of course, your program's commands +might be different; for a GUI interface, you would use an "about box". + + You should also get your employer (if you work as a programmer) or school, +if any, to sign a "copyright disclaimer" for the program, if necessary. +For more information on this, and how to apply and follow the GNU GPL, see +<http://www.gnu.org/licenses/>. + + The GNU General Public License does not permit incorporating your program +into proprietary programs. If your program is a subroutine library, you +may consider it more useful to permit linking proprietary applications with +the library. If this is what you want to do, use the GNU Lesser General +Public License instead of this License. But first, please read +<http://www.gnu.org/philosophy/why-not-lgpl.html>. ++ +
vignettes/covariates-distill.Rmd
+ covariates-distill.RmdIn this problem, we illustrate fitting multiple covariate distance sampling (MCDS) models to point transect data using a bird survey from Hawaii: data on an abundant species, the Hawaii amakihi (Hemignathus virens) is used. This practical is makes use of the Distance R package described by Miller et al. (2019) duplicating the analysis in Marques et al. (2007). For basic information regarding analysis of point transect data, consult the point transect example
## Region.Label Area Sample.Label Effort object distance Month OBs Sp MAS HAS
+## 1 792 0 1 1 1 40 0 TJS COAM 50 1
+## 2 792 0 1 1 2 60 0 TJS COAM 50 1
+## 3 792 0 1 1 3 45 0 TJS COAM 50 1
+## Study.Area
+## 1 Kana
+## 2 Kana
+## 3 Kana
+These data include:
+Region.Label - survey dates (month and year, e.g. 792 is July 1992) which are used as ‘strata’Area - study area size (not used, set to 0) will only produce density estimates, not abundanceSample.Label - point transect identifier (41 transects)Effort - survey effort (1 for all points because each point was visited once)distance - radial distance of detection from observer (meters)month -OBs - initials of the observerSp - species code (COAM)MAS - minutes after sunriseHAS - hour after sunriseStudy.Area - name of the study area (Kana)Note that the Area column is always zero, hence, detection functions can be fitted to the data, but bird abundance cannot be estimated. The covariates to be considered for possible inclusion into the detection function are OBs, MAS and HAS.
It is important to gain an understanding of the data prior to fitting detection functions. With this in mind, preliminary analysis of distance sampling data involves:
+We begin by assessing the distribution of distances to decide on a truncation distance (Figure 1).
+
+hist(amakihi$distance, main="Radial distances", xlab="Distance (m)")
+Figure 1: Distribution of radial distances of amakihi +
+To see if there are differences in the distribution of distances recorded by the different observers and in each hour after sunrise, boxplots can be used. Note how the ~ symbol is used to define the discrete groupings (i.e. observer and hour) (Figure 2).
+boxplot(amakihi$distance~amakihi$OBs, xlab="Observer", ylab="Distance (m)")
+boxplot(amakihi$distance~amakihi$HAS, xlab="Hour", ylab="Distance (m)")

+Figure 2: Visual assessment of effect of observer and hour since sunrise upon detection. +
+The components of the boxplot are:
+For minutes after sunrise (a continuous variable), we create a scatterplot of MAS (on the \(x\)-axis) against distances (on the \(y\)-axis). The plotting symbol (or character) is selected with the argument pch (Figure 3)
+scatter.smooth(amakihi$MAS, amakihi$distance, family = "gaussian", pch=20, cex=.9, lpars=list(lwd=3),
+ xlab="Minutes after sunrise",ylab="Distance (m)")
+Figure 3: Visualisation of detectability as function of minutes since sunrise. +
+Clearly room for right truncation from this figure of the radial distance distribution. Subsequent detection function fitting will use the truncation argument in ds() to exclude the largest 15% of the detection distances.
You may also want to think about potential collinearity (linear relationship) between the covariates - if collinear variables are included in the detection function, they will be explaining some of the same variation in the distances and this will reduce their importance as a potential covariate. How might you investigate the relationship between HAS and MAS?
From these plots, infer whether any of the covariates will be useful in explaining the distribution of detection distances.
+We would like to treat OBs and HAS as factor variables as in the original analysis; OBs is, by default, treated as a factor variable because it consists of characters rather than numbers. HAS, on the other hand, consists of numbers and so by default would be treated as a continuous variable (i.e. non-factor). That is fine if we want the effect of HAS to be monotonic (i.e. detectability either increases or decreases as a function of HAS). If we want HAS to have a non-linear effect on detectability, then we need to indicate to R to treat it as a factor as shown below.
+amakihi$HAS <- factor(amakihi$HAS)One other, more subtle adjustment, is a transformation of the continuous covariate MAS. We are considering three possible covariates in our detection function: OBs, HAS and MAS. The first two variables, OBs and HAS, are both factor variables, and so, essentially, we can think of them as taking on values between 1 and 3 in the case of OBS, and 1 to 6 in the case of HAS. However, MAS can take on values from -18 (detections before sunrise) to >300 and the disparity in scales of measure between MAS and the other candidate covariates can lead to difficulties in the performance of the optimizer fitting the detection functions in R. The solution to the difficulty is to scale MAS such that it is on a scale (approx. 1 to 5) comparable with the other covariates.
With three potential covariates, there are 8 possible models for the detection function:
+Even without considering covariates there are also several possible key function/adjustment term combinations available: if all key function/covariate combinations are considered the number of potential models is large. Note that covariates are not allowed if a uniform key function is chosen and if covariate terms are included, adjustment terms are not allowed. Even with these restrictions, it is not best practice to take a scatter gun approach to detection function model fitting. Buckland et al. (2015) considered 13 combinations of key function/covariates. Here, we look at a subset of these.
+Fit a hazard rate model with no covariates or adjustment terms and make a note of the AIC. Note, that 15% of the largest distances are truncated - you may have decided on a different truncation distance.
+
+conversion.factor <- convert_units("meter", NULL, "hectare")
+amak.hr <- ds(amakihi, transect="point", key="hr", truncation="15%",
+ adjustment=NULL, convert_units = conversion.factor)Now fit a hazard rate model with OBs as a covariate in the detection function and make a note of the AIC. Has the AIC reduced by including a covariate?
+amak.hr.obs <- ds(amakihi, transect="point", key="hr", formula=~OBs,
+ truncation="15%", convert_units = conversion.factor)Fit a hazard rate model with OBs and MAS in the detection function:
+amak.hr.obs.mas <- ds(amakihi, transect="point", key="hr", formula=~OBs+MAS,
+ truncation="15%", convert_units = conversion.factor)Try fitting other possible formula and decide which model is best in terms of AIC. To quickly compare AIC values from different models, use the AIC command as follows (note only models with the same truncation distance can be compared):
+AIC(amak.hr, amak.hr.obs, amak.hr.obs.mas)## df AIC
+## amak.hr 2 11400.47
+## amak.hr.obs 4 11368.20
+## amak.hr.obs.mas 5 11365.96
+Another useful function is summarize_ds_models - this has the advantage of ordering the models by AIC (smallest to largest).
+knitr::kable(summarize_ds_models(amak.hr, amak.hr.obs, amak.hr.obs.mas), digits=3,
+ caption="Model selection table for Hawaiian amakihi.")| + | Model | +Key function | +Formula | +C-vM p-value | +\(\hat{P_a}\) | +se(\(\hat{P_a}\)) | ++\(\Delta\)AIC | +
|---|---|---|---|---|---|---|---|
| 3 | ++ | Hazard-rate | +~OBs + MAS | +0.170 | +0.302 | +0.022 | +0.000 | +
| 2 | ++ | Hazard-rate | +~OBs | +0.116 | +0.297 | +0.022 | +2.249 | +
| 1 | ++ | Hazard-rate | +~1 | +0.144 | +0.308 | +0.022 | +34.516 | +
Examine the shape of the preferred detection function (including covariates observer and minutes after sunrise) (Figure 4).
+
+plot(amak.hr.obs.mas, pdf=TRUE, main="Hazard rate with observer and minutes after sunrise.", showpoints=FALSE)
+sfzero <- data.frame(OBs="SGF", MAS=0)
+sf180 <- data.frame(OBs="SGF", MAS=180)
+t1zero <- data.frame(OBs="TJS", MAS=0)
+t1180 <- data.frame(OBs="TJS", MAS=180)
+t2zero <- data.frame(OBs="TKP", MAS=0)
+t2180 <- data.frame(OBs="TKP", MAS=180)
+add_df_covar_line(amak.hr.obs.mas, data=sfzero, lty=1, lwd=2,col="blue", pdf=TRUE)
+add_df_covar_line(amak.hr.obs.mas, data=sf180, lty=2, lwd=2,col="blue", pdf=TRUE)
+add_df_covar_line(amak.hr.obs.mas, data=t1zero, lty=1,lwd=2,col="darkorange", pdf=TRUE)
+add_df_covar_line(amak.hr.obs.mas, data=t1180, lty=2, lwd=2,col="darkorange", pdf=TRUE)
+add_df_covar_line(amak.hr.obs.mas, data=t2zero, lty=1,lwd=2,col="violet", pdf=TRUE)
+add_df_covar_line(amak.hr.obs.mas, data=t2180, lty=2, lwd=2,col="violet", pdf=TRUE)
+legend("topright", legend=c("SF, minutes=0",
+ "SF, minutes=180",
+ "TS, minutes=0",
+ "TS, minutes=180",
+ "TP, minutes=0",
+ "TP, minutes=180"),
+ title="Covariate combination: observer and minutes",
+ lty=rep(c(1,2),times=3), lwd=2, col=rep(c("blue","darkorange","violet"), each=2))
+Figure 4: PDF of best fitting model, including effects of observer and minutes after sunrise. +
+There were three observers involved in the survey. One observer made ~80% of the detections, with a second observer responsible for a further 15% and the third observer 5%.
+Example analysis with Ivory Coast Maxwell’s duiker.
+Hawaiian amakihi point transect data.
+Revisiting the winter wren point transects with cue counts.
+Using the bootstrap to derive the sampling distribution of pairwise differences in estimated density.
+Example analysis of line transect data.
+Examples demonstrating the use of the mcds.exe alternative optimization engine for fitting single platform detection functions in the Distance and mrds packages.
+Dung surveys including estimates of production and decay rates.
+Example using Montrave data employing region_table and sample_table construct
Example analysis of point transect songbird data.
+Eastern tropical Pacific spotted dolphin surveys from tuna fishing vessels.
+Use of covariate to model detectability in multi-species surveys.
+Revisiting the savanna sparrow point transect data.
+Variance estimation using bootstrap resampling.
+vignettes/lines-distill.Rmd
+ lines-distill.RmdIn this exercise, we use R (R Core Team, 2019) and the Distance package (Miller, Rexstad, Thomas, Marshall, & Laake, 2019) to fit different detection function models to line transect survey data of winter wren (Troglodytes troglodytes) density and abundance. These data were part of a study described by Buckland (2006).
ds functionNineteen line transects were walked twice (Figure 1).
++Figure 1: Montrave study area; diagonal lines indicate line transects walked to generate these data. +
+The fields of the wren_lt data set are:
This command assumes that the Distance package has been installed on your computer. The R workspace wren_lt contains detections of winter wrens from the line transect surveys of Buckland (2006).
The effort, or transect length has been adjusted to recognise each transect is walked twice. Examine the first few rows of wren_lt using the function head()
+head(wren_lt)## Region.Label Area Sample.Label Effort object distance Study.Area
+## 1 Montrave 33.2 1 0.416 5 15 Montrave 4
+## 2 Montrave 33.2 1 0.416 6 80 Montrave 4
+## 3 Montrave 33.2 1 0.416 7 35 Montrave 4
+## 4 Montrave 33.2 1 0.416 8 55 Montrave 4
+## 5 Montrave 33.2 1 0.416 12 12 Montrave 4
+## 6 Montrave 33.2 1 0.416 13 75 Montrave 4
+The object wren_lt is a dataframe object made up of rows and columns.
## [1] 156
+The code above determines the number of detection distances that are not missing. Why might there be rows in our data where detection distance is missing? Distance would have to be recorded as missing for rows representing transects on which there were no detections. The transect and its effort would need to appear in the data, but without detections, the perpendicular distance would be recorded as missing (NA).
+Gain familiarity with the perpendicular distance data using the hist() function (Figure 2).
+hist(wren_lt$distance, xlab="Distance (m)", main="Winter wren line transects")![Distribution of perpendicular distances for winter wren from [@Buckland2006].](lines-distill_files/figure-html/basichist-1.png)
+Figure 2: Distribution of perpendicular distances for winter wren from (Buckland, 2006). +
+Note that there appears to be too few detections between 0 and 20m, and too many detections between 20m and 40m. This may be evidence of evasive movement by winter wrens; see further discussion of this below.
+++A guaranteed way to produce incorrect results from your analysis is to misspecify the units distances are measured. The
+dsfunction has an argumentconvert.unitswhere the user provides a value to report density in proper units. Providing an incorrect value will result in estimates that are out by orders of magnitude.
We can choose the units in which winter wren density is to be reported, we choose square kilometre. How to transmit this information to the ds function?
The answer is another function convert_units. Arguments to this function are
Specify the correct arguments to this function for the winter wren data set. Note: units are specified as quoted strings, singular rather than plural; e.g. “meter” rather than “meters”
+
+conversion.factor <- convert_units("meter", "kilometer", "hectare")ds
+Detection functions are fitted using the ds function and this function requires a data frame to have a column called distance. We have this in our nests data, therefore, we can simply supply the name of the data frame to the function along with additional arguments.
Details about the arguments for this function:
+key="hn"
+adjustment=NULL
+convert_units=conversion.factor
+
+wren.hn <- ds(data=wren_lt, key="hn", adjustment=NULL, convert_units=conversion.factor)On calling the ds function, information is provided to the screen reminding the user what model has been fitted and the associated AIC value. More information is supplied by applying the summary() function to the object created by ds().
+summary(wren.hn)##
+## Summary for distance analysis
+## Number of observations : 156
+## Distance range : 0 - 100
+##
+## Model : Half-normal key function
+## AIC : 1418.188
+## Optimisation: mrds (nlminb)
+##
+## Detection function parameters
+## Scale coefficient(s):
+## estimate se
+## (Intercept) 4.105816 0.1327744
+##
+## Estimate SE CV
+## Average p 0.685037 0.05678821 0.08289802
+## N in covered region 227.724931 21.47275208 0.09429250
+##
+## Summary statistics:
+## Region Area CoveredArea Effort n k ER se.ER cv.ER
+## 1 Montrave 33.2 193.2 9.66 156 19 16.14907 1.226096 0.07592366
+##
+## Abundance:
+## Label Estimate se cv lcl ucl df
+## 1 Total 39.13286 4.399007 0.1124121 31.3023 48.9223 74.24595
+##
+## Density:
+## Label Estimate se cv lcl ucl df
+## 1 Total 1.1787 0.1325002 0.1124121 0.9428403 1.473563 74.24595
+summary function
+Examining the output produced by summary(wren.hn) notice
Visually inspect the fitted detection function with the plot() function, specifying the cutpoints histogram with argument breaks (Figure 3):
+cutpoints <- c(0,5,10,15,20,30,40,50,65,80,100)
+plot(wren.hn, breaks=cutpoints, main="Half normal model, wren line transects")
+Figure 3: Fit of half normal detection function to wren data. Note large number of break points specified at small distances. +
+Continue to note the presence of evasive movement in this plot of the fit of detection function to the observed data.
+Detection function forms and shapes, are specified by changing the key and adjustment arguments.
The options available for key detection functions are:
key="hn") - defaultkey="hr")key="unif")The options available for adjustment terms are:
+adjustment=NULL)adjustment="cos") - defaultadjustment="herm")adjustment="poly")To fit a uniform key function with cosine adjustment terms, use the command:
+
+wren.unif.cos <- ds(wren_lt, key="unif", adjustment="cos", convert_units=conversion.factor)When this line of code is executed, multiple models will be fitted, successively adding addition adjustment terms. When the model with four adjustment terms is fit, an error message is returned; but a uniform key with 3 cosine adjustments is fitted and contained in the returned object.
+AIC model selection will be used to fit adjustment terms of up to order 5.
+To fit a hazard rate key function with simple polynomial adjustment terms, then use the command:
+
+wren.hr.poly <- ds(wren_lt, key="hr", adjustment="poly", convert_units=conversion.factor)Each fitted detection function produces a different estimate of winter wren abundance and density. The estimate depends upon the model chosen. The model selection tool for distance sampling data is AIC.
+
+AIC(wren.hn, wren.hr.poly, wren.unif.cos)## df AIC
+## wren.hn 1 1418.188
+## wren.hr.poly 2 1412.133
+## wren.unif.cos 3 1416.430
+df in the AIC table indicates the number of parameters associated with each model.
In addition to the relative ranking of models provided by AIC, it is also important to know whether selected model(s) actually fit the data. The model is the basis of inference, so it is dangerous to make inference from a model that does not fit the data. Goodness of fit is assessed using the function gof_ds. This function by default, reports the goodness of fit assessed by the Cramer von-Mises test along with a quantile-quantile plot showing locations of deviations from good fit. Optionally, a \(\chi^2\) goodness of fit test and a bootstrap version of the Kolomogorov-Smirnov goodness of fit test can be performed. Using function defaults, we see results only of the Cramer von-Mises test along with the Q-Q plot (Figure 4).
+gof_ds(wren.hr.poly)
+Figure 4: Q-Q plot of hazard rate key function fitted ot wren line transect data. +
+##
+## Goodness of fit results for ddf object
+##
+## Distance sampling Cramer-von Mises test (unweighted)
+## Test statistic = 0.249897 p-value = 0.188501
+Even though there may have been evasive movement, the goodness of fit statistics are still sufficient for using detection function models for inference.
+The function summarise_ds_models combines the work of AIC and gof_ds to produce a table of fitted models and summary statistics.
+knitr::kable(summarize_ds_models(wren.hn, wren.hr.poly, wren.unif.cos),digits=3,
+ caption="Model comparison table for wren line transect data, Montrave.")| + | Model | +Key function | +Formula | +C-vM p-value | +\(\hat{P_a}\) | +se(\(\hat{P_a}\)) | ++\(\Delta\)AIC | +
|---|---|---|---|---|---|---|---|
| 2 | ++ | Hazard-rate | +~1 | +0.189 | +0.844 | +0.025 | +0.000 | +
| 3 | ++ | Uniform with cosine adjustment terms of order 1,2,3 | +NA | +0.198 | +0.757 | +0.140 | +4.298 | +
| 1 | ++ | Half-normal | +~1 | +0.077 | +0.685 | +0.057 | +6.055 | +
The AIC model selection tools suggest the hazard rate key function is the preferred model. However, examine the shape of the hazard rate detection function in contrast to the uniform cosine fitted detection function (Figure 5).
+
+plot(wren.hr.poly, breaks=cutpoints, main="Hazard rate")
+plot(wren.unif.cos, breaks=cutpoints, main="Uniform cosine")

+Figure 5: Possible evidence of evasive movement of wrens. Note left figure (hazard rate) with implausible perfect detectability out to 70m, then precipitous decline. +
+The fellow who gathered these data (Prof Buckland) maintained the shape of the fitted hazard rate detection function is not plausible. Instead, he chose the uniform key with cosine adjustments for making inference (Buckland, 2006, p. 352):
+++Common Chaffinch and Winter Wren showed some evidence of observer avoidance. For 2 of the 12 data sets, this resulted in a fitted hazard rate detection function with certain detection out to ∼60 m, with an implausibly rapid fall-off beyond 70 m. In these two analyses, a model with a slightly higher AIC value and a more plausible fit to the detection function was selected.
+
This is an example of moderating objective model selection tools with common sense and understanding of field procedures.
+vignettes/species-covariate-distill.Rmd
+ species-covariate-distill.RmdSometimes the focal species of a distance sampling survey is quite rare. So rare that it is difficult to accumulate sufficient detections to fit a detection function for the species in question. Likewise, it is also common for other species to be detected during the survey for the focal species. Could the detections of the other species be useful in estimating a detection function for the focal species?
+One approach might be to consider the species to serve as “strata” and proceed to analyse the data as if they were from a stratified survey. See the example for stratified survey analysis. However, if a pooled detection function (one that combines data from multiple species) is fitted, it would be dubious to apply this pooled detection function to data at a lower level of aggregation (species level). Applying the pooled detection function would lead to a biased estimate of abundance for the rare species.
+Instead of treating species as strata, an alternative form of analysis is to treat species as a covariate in the modelling of the detection function (Marques & Buckland, 2003). The principle is that the general key function is shared across species, but the scale parameter \((\sigma)\) differs between species. In this way, the detections of all species is shared, such that the estimation of the detection function for the rare species is bolstered by information from other species; yet the rare species receives its own unique detection function such that bias is not induced in the abundance estimation for that species.
+To demonstrate such an analysis, the Montrave songbird study conducted by Buckland (2006) is used. The species covariate approach to analysis of the snapshot point count version of his survey is described in the book by Buckland et al. (2015, sec. 5.3.2.2). The Distance R package (Miller, Rexstad, Thomas, Marshall, & Laake, 2019) is used to analyse the line transect survey Buckland conducted. Results are compared with estimates presented by Buckland (2006).
The data are available online at a website that serves as a companion to Buckland et al. (2015). The data set can be read into R directly from the URL.
+ +Only one slight modification to the data needs to be conducted before they can be analysed. Buckland (2006) made two transits of the transects, the line transect effort needs to be modified to reflect the multiple visits.
+
+birds$Effort <- birds$Effort * birds$repeats # two visits
+library(Distance)
+convunit <- convert_units("meter", "kilometer", "hectare")In Buckland’s (2006) line transect survey, three of the four songbird species (c-chaffinch, g-great tit, r-robin, w-winter wren) were detected in sufficient quantities that sample size is not an issue. However, the great tit was only detected 32 times, making the support for this species open to question.
+| +Var1 + | ++Freq + | +
|---|---|
| +c + | ++73 + | +
| +g + | ++32 + | +
| +r + | ++82 + | +
| +w + | ++156 + | +
As mentioned in the Background, we could fit a pooled detection function across species and with species as a stratification criterion produce species-specific density estimates using the pooled detection function in conjunction with species-specific encounter rates. However that would be using the wrong detection function for every species. We take the alternative analysis route and incorporate species into the detection function.
+Inclusion of species as a covariate in the detection function is simple using the formula= argument in ds(). Note the species names are coded as letters, R will automatically treat a variable containing letters as a factor covariate. If numbers were used in coding species, as.factor would need to be employed.
+all.birds <- ds(data = birds, key="hn", convert_units = convunit,
+ formula=~species, truncation = 95)The CvM goodness of fit test indicates this model adequately fits the data, W=0.401, P=0.072.
+The shape of the species-specific detection functions can be seen by using the plotting function provided below.
+
+plot(all.birds, showpoints=FALSE, main="Montrave line transects\nspecies as covariate")
+add.df.covar.line(all.birds, data=data.frame(species="c"), lwd=3, lty=1, col="blue")
+add.df.covar.line(all.birds, data=data.frame(species="g"), lwd=3, lty=1, col="darkgreen")
+add.df.covar.line(all.birds, data=data.frame(species="r"), lwd=3, lty=1, col="brown")
+add.df.covar.line(all.birds, data=data.frame(species="w"), lwd=3, lty=1, col="salmon")
+legend("topright", legend=c("chaffinch", "great tit", "robin", "winter wren"),
+ lwd=3, lty=1, col=c("blue", "darkgreen", "brown", "salmon"))
+Figure 1: Species-specific detection functions. +
+Density estimates for each species can be produced by using the dht2 function that contains the argument strat_formula used to specific the levels of stratum-specific estimates requested. The stratification argument ensures the correct measures of precision are associated with the species-specific density estimates. The value object indicates this analysis is a form of post-stratification, rather than geographic stratification criterion that could have been know prior to the gathering of the data.
+bird.ests <- dht2(ddf=all.birds, flatfile=birds,
+ strat_formula = ~species, convert_units = convunit,
+ stratification = "object") | +species + | ++n + | ++Density + | ++Density_CV + | ++LCI + | ++UCI + | +
|---|---|---|---|---|---|
| +c + | ++73 + | ++0.641 + | ++0.170 + | ++0.456 + | ++0.900 + | +
| +g + | ++32 + | ++0.251 + | ++0.242 + | ++0.156 + | ++0.406 + | +
| +r + | ++80 + | ++0.769 + | ++0.149 + | ++0.572 + | ++1.032 + | +
| +w + | ++155 + | ++1.149 + | ++0.113 + | ++0.918 + | ++1.437 + | +
The density estimates for chaffinch and great tits match those reported by Buckland (S. T. Buckland, 2006) almost exactly. The congruence between estimates produced by this analysis and those reported by Buckland are less good for the robins and winter wrens.
+
+Figure 2: Reproduction of Table 2 of Buckland (2006). +
+As described by Buckland (S. T. Buckland, 2006), there was some reason to believe evasive movement took place on the part of robins and winter wrens. Conceivably, this could be accommodated by using a hazard rate key function for those two species. This would lead to a more complex analysis in which the data set was divided into a chaffinch/great tit data set, with a half normal key and species covariate detection function model. The other portion of the data set would contain robins/winter wrens modelled using a hazard rate key function and species covariate.
+Indeed, the goodness of fit for this more complex analysis (not shown) leads to better fit of the “two model” approach:
+| +analysis + | ++CvM.W + | ++P.value + | +
|---|---|---|
| +Single analysis + | ++0.401 + | ++0.072 + | +
| +HN key + | ++0.068 + | ++0.762 + | +
| +HR key + | ++0.222 + | ++0.228 + | +
vignettes/web-only/CTDS/camera-distill.Rmd
+ camera-distill.RmdA distance sampling approach to the analysis of camera trapping data offers the potential advantage that individual animal identification is not required. However, accurate animal-to-camera detection distances are required. This requires calibration prior to the survey with images of objects taken at known distances from the camera. See details in Howe, Buckland, Després-Einspenner, & Kühl (2017) for description of the field work and data analysis. Here we present analysis of data from Howe et al. (2017) using the R package Distance (Miller, Rexstad, Thomas, Marshall, & Laake, 2019).
Heat- and motion-sensitive camera traps detect only moving animals within the range of the sensor and the field of view of the camera. Animals are therefore unavailable for detection by camera traps when they are stationary, and when they are above (e.g., semi-arboreal species) or below (e.g., semi-fossorial species) the range of the sensor or the camera, regardless of their distance from the camera in two dimensions. This temporally limited availability for detection must be accounted for to avoid negative bias in estimated densities. When data are abundant, researchers may choose to include only data from times when 100% of the population can be assumed to be active within the vertical range of camera traps (Howe et al., 2017). However, for rarely-detected species or surveys with lower effort, it might be necessary to include most or all observations of distance. In these situations, survey duration (\(T_k\)) might be 12- or 24-hours per day, and it becomes necessary to estimate the proportion of time included in \(T_k\) when animals were available for detection. Methods for estimating this proportion directly from CT data have been described (Rowcliffe, Kays, Kranstauber, Carbone, & Jansen, 2014), and it can be included in analyses to estimate density (Bessone et al., 2020), for example as another multiplier, potentially with an associated standard errors.
+Times of independent camera triggering events for the period 28 June 21 September 2014 at 23 cameras are recorded in a file described in the data repository Howe, Buckland, Després-Einspenner, Kühl, & Buckland (2018). Download the file from Dryad and save to your local drive, then read with the following code:
+
+trigger.events <- read.table(file="VideoStartTimes_FullDays.txt", header=TRUE)The format of the trigger.events data frame is adjusted to create a datetime field for use in the activity package Rowcliffe (2021)
+trigger.events$date <- paste("2014",
+ sprintf("%02i", trigger.events$month),
+ sprintf("%02i", trigger.events$day),
+ sep="/")
+trigger.events$time <- paste(sprintf("%02i", trigger.events$hour),
+ sprintf("%02i", trigger.events$minute),
+ sep=":")
+trigger.events$datetime <- paste(trigger.events$date, trigger.events$time)activity package
+We will employ two functions from the activity package. First, convert the time of day of a camera triggering event into the fraction of the 24hr cycle when the event took place, measured in radians. In other words, an event occurring at midday is recorded as \(\pi\) and an event occurring at midnight is recorded as 2\(\pi\).
+library(activity)
+trigger.events$rtime <- gettime(trigger.events$datetime,
+ tryFormats = "%Y/%m/%d %H:%M",
+ scale = "radian")With the radian conversion of the camera triggering times, the distribution of the triggering events times is smoothed, using a kernel smoother by the function fitact. The function estimates the proportion of time (in a 24hr day) animals were active. In addition, the triggering time data can be resampled to provide a measure of uncertainty in the point estimate of activity proportion.
+act_result <- fitact(trigger.events$rtime, sample="data", reps=100)A plot of the histogram of triggering times (Figure 1), along with the fitted smooth is provided by a plot function applied to the object returned by fitact.
+plot(act_result)
+Figure 1: Fitted smooth to histogram of camera triggering times for Maxwell’s duiker data. +
+The value computed by the smooth through the activity histogram can be extracted from the object created by fitact. The extraction reaches into the object to look at the slot called act. The uncertainty around the point estimate is derived from resampling that takes place within fitact. The slot will display the point estimates, standard error and confidence interval bounds.
+print(act_result@act)## act se lcl.2.5% ucl.97.5%
+## 0.33463831 0.02096859 0.30195769 0.37801207
+The output above would be used to adjust density estimates for temporal activity if the cameras were in operation 24hrs per day. However, in this study, cameras were only active for 11.5 hours per day (0630-1800).
+We use the temporal availability information to create a multiplier. Our multiplier must be defined as +> proportion of the camera operation time animals were available to be detected
+This is not equivalent to the value produced by the fitact function; that value is the proportion of 24hr animals were available to be detected. The availability multiplier must be adjusted based on the daily camera operation period. Uncertainty in this proportion is also included in our computations.
The point estimate and standard error are pulled from the fitact object, adjusted for daily camera operation time and placed into a data frame named creation in a named list, specifically in the fashion shown.
+camera.operation.per.day <- 11.5
+prop.camera.time <- camera.operation.per.day / 24
+avail <- list(creation=data.frame(rate = act_result@act[1]/prop.camera.time,
+ SE = act_result@act[2]/prop.camera.time))A more robust way of incorporating uncertainty in the temporal availability estimate will be described later.
+Detection distances for the full daytime data set is also available on Howe et al. (2018). Download from Dryad and is read in the code chunk below:
+
+DuikerCameraTraps <- read.csv(file="DaytimeDistances.txt", header=TRUE, sep="\t")
+DuikerCameraTraps$Area <- DuikerCameraTraps$Area / (1000*1000)
+DuikerCameraTraps$object <- NA
+DuikerCameraTraps$object[!is.na(DuikerCameraTraps$distance)] <- 1:sum(!is.na(DuikerCameraTraps$distance))Data file recorded study area size in square meters; second line above converts this to area in square kilometers; the remaining lines create an object field, which uniquely identify each observation.
A quick summary of the data set including: How many camera stations and how many detections in total.
+ +## [1] 11180
+
+table(DuikerCameraTraps$Sample.Label)##
+## A1 A2 A3 A4 B1 B2 B3 B4 C1 C2 C3 C4 C5 C6 D3 D4
+## 388 66 988 420 3 1951 73 208 52 195 767 153 41 2682 342 193
+## D5 E3 E4 E5 E6
+## 524 518 1 375 1241
+Note, three sampling stations (B1, C5, E4) had no detections. The one record for each of those stations has distance recorded as NA, but the record is important because it contains effort information.
An examination of the distribution of detection distances; note the bespoke cutpoints causing distance bins to be narrow out to 8m, then increasing in width to the maximum detection distance of 21m (Figure 2).
+
+breakpoints <- c(seq(0, 8, 1), 10, 12, 15, 21)
+hist(DuikerCameraTraps$distance, breaks=breakpoints, main="Peak activity data set",
+ xlab="Radial distance (m)")
+Figure 2: Distribution of detection distances during peak activity period. +
+As described by Howe et al. (2017):
+++a paucity of observations between 1 and 2 m but not between 2 and 3 m, so we left-truncated at 2 m. Fitted detection functions and probability density functions were heavy-tailed when distances >15 m were included, so we right truncated at 15 m.
+
The conversion factor must be included both in the call to ds() and the call to bootdht().
Candidate models considered here differ from the candidate set presented in Howe et al. (2017). This set includes
+The maximum number of parameters in models within the candidate model set is no more than 3.
+
+library(Distance)
+trunc.list <- list(left = 2, right = 15)
+mybreaks <- c(seq(2, 8, 1), 10, 12, 15)
+conversion <- convert_units("meter", NULL, "square kilometer")
+uni1 <- ds(DuikerCameraTraps, transect = "point", key = "unif", adjustment = "cos",
+ nadj = 1, convert_units = conversion,
+ cutpoints = mybreaks, truncation = trunc.list)## Warning in create_bins(data, cutpoints): Some distances were outside bins and
+## have been removed.
+
+uni2 <- ds(DuikerCameraTraps, transect = "point", key = "unif", adjustment = "cos",
+ nadj = 2, convert_units = conversion,
+ cutpoints = mybreaks, truncation = trunc.list)## Warning in create_bins(data, cutpoints): Some distances were outside bins and
+## have been removed.
+
+uni3 <- ds(DuikerCameraTraps, transect = "point", key = "unif", adjustment = "cos",
+ nadj = 3, convert_units = conversion,
+ cutpoints = mybreaks, truncation = trunc.list)## Warning in create_bins(data, cutpoints): Some distances were outside bins and
+## have been removed.
+
+hn0 <- ds(DuikerCameraTraps, transect = "point", key = "hn", adjustment = NULL,
+ convert_units = conversion, cutpoints = mybreaks, truncation = trunc.list)## Warning in create_bins(data, cutpoints): Some distances were outside bins and
+## have been removed.
+
+hn1 <- ds(DuikerCameraTraps, transect = "point", key = "hn", adjustment = "cos",
+ nadj = 1, convert_units = conversion,
+ cutpoints = mybreaks, truncation = trunc.list)## Warning in create_bins(data, cutpoints): Some distances were outside bins and
+## have been removed.
+
+hn2 <- ds(DuikerCameraTraps, transect = "point", key = "hn", adjustment = "cos",
+ nadj = 2, convert_units = conversion,
+ cutpoints = mybreaks, truncation = trunc.list)## Warning in create_bins(data, cutpoints): Some distances were outside bins and
+## have been removed.
+## Warning in check.mono(result, n.pts = control$mono.points): Detection function
+## is greater than 1 at some distances
+## Warning in check.mono(result, n.pts = control$mono.points): Detection function
+## is greater than 1 at some distances
+## Warning in mrds::check.mono(model, n.pts = 10): Detection function is greater
+## than 1 at some distances
+
+hr0 <- ds(DuikerCameraTraps, transect = "point", key = "hr", adjustment = NULL,
+ convert_units = conversion, cutpoints = mybreaks, truncation = trunc.list)## Warning in create_bins(data, cutpoints): Some distances were outside bins and
+## have been removed.
+
+hr1 <- ds(DuikerCameraTraps, transect = "point", key = "hr", adjustment = "poly",
+ nadj = 1, convert_units = conversion,
+ cutpoints = mybreaks, truncation = trunc.list)## Warning in create_bins(data, cutpoints): Some distances were outside bins and
+## have been removed.
+We do not present the density estimates produced from the fitted detection function models because a) we have not chosen a preferred model and b) the density estimates have not been adjusted for viewing angle and temporal availability.
+Overdispersion causes AIC to select overly-complex models, so analysts should specify the number/order of adjustment terms manually when fitting distance sampling models to data from camera traps, rather than allowing automated selection using AIC. Howe, Buckland, Després-Einspenner, & Kühl (2019) describe two methods for performing model selection of distance sampling models in the face of overdispersion. Here we provide R functions to perform the first of these methods. The first method of Howe et al. (2019) employs a two-step process. First, an overdisersion factor \((\hat{c})\) is computed for each key function family from the most complex model in each family. The \(\hat{c}\) is derived from the \(\chi^2\) goodness of fit test statistic divided by its degrees of freedom. This results in an adjusted AIC score for each model in the key function family:
+\[QAIC = -2 \left \{ \frac{log(\mathcal{L}(\hat{\theta}))}{\hat{c}} \right \} + 2K\]
+Code to perform this QAIC computation is found in the function QAIC in the Distance package, and produces the following results:
Tables of QAIC values for each key function family are shown below (code for kable() calls suppressed for easier readability of results).
| + | ++df + | ++QAIC + | +
|---|---|---|
| +uni1 + | ++1 + | ++2825.316 + | +
| +uni2 + | ++2 + | ++2826.822 + | +
| +uni3 + | ++3 + | ++2823.921 + | +
| + | ++df + | ++QAIC + | +
|---|---|---|
| +hn0 + | ++1 + | ++2296.204 + | +
| +hn1 + | ++2 + | ++2292.252 + | +
| +hn2 + | ++3 + | ++2294.096 + | +
| + | ++df + | ++QAIC + | +
|---|---|---|
| +hr0 + | ++2 + | ++2465.086 + | +
| +hr1 + | ++3 + | ++2466.362 + | +
From this first pass of model selection based on QAIC values, we find the model with the uniform key function preferred by QAIC has three cosine adjustment terms. The preferred model from the half normal key function family has one cosine adjustment term. Finally, the preferable model from the hazard rate key function family has no adjustment terms.
+The second step of model selection ranks the models by their \(\hat{c}\) values.
+
+chats <- chi2_select(uni3, hn1, hr0)$criteria
+modnames <- unlist(lapply(list(uni3, hn1, hr0), function(x) x$ddf$name.message))
+results <- data.frame(modnames, chats)
+results.sort <- results[order(results$chats),]
+knitr::kable(results.sort, digits=2, row.names = FALSE,
+ caption="Compare with Table S5 of Howe et al. (2018)") %>%
+ kable_paper(full_width = FALSE) %>%
+ row_spec(1, bold=TRUE, background = "#4da6ff")| +modnames + | ++chats + | +
|---|---|
| +uniform key function with cosine(1,2,3) adjustments + | ++15.63 + | +
| +half-normal key function with cosine(2) adjustments + | ++16.54 + | +
| +hazard-rate key function + | ++17.21 + | +
For this data set, the model chosen by this algorithm that adjusts for overdispersion is the same model (uniform key with three cosine adjustments) as would have been chosen by conventional model selection; but again, not the model selected by Howe et al. (2017) because of the differing candidate model sets.
+As a check of the detection function vis-a-vis Howe et al. (2017), the paper reports the effective detection radius (\(\rho\)) to be 9.4m for the peak activity data set. Howe et al. (2017) employed a different candidate model set, resulting in the unadjusted hazard rate model as the preferred model. Here we present the estimated effective detection radius from the selected uniform key function with three cosine adjustment terms.
+The effective detection radius can be derived from \(\hat{P_a}\) as reported by the function ds as
\[\hat{\rho} = \sqrt{\hat{P_a} \cdot w^2}\]
+ +\(\hat{P_a}\) is estimated to be 0.329, resulting in an estimate of \(\hat{\rho}\) of 7.457.
+Figure 3 shows the detection function probability density function of selected model.
+
+plot(uni3, main="Daytime activity", xlab="Distance (m)",
+ showpoints=FALSE, lwd=3, xlim=c(0, 15))
+plot(uni3, main="Daytime activity", xlab="Distance (m)", pdf=TRUE,
+ showpoints=FALSE, lwd=3, xlim=c(0, 15))

+Figure 3: Detection function and probability density function of the selected detection function model. +
+The camera traps do not view the entire area around them, as would be the case with simple point transect sampling. The portion of the area sampled needs to be incorporated in the estimation of abundance. The data file contains a column multiplier that represents the proportion of the circle sampled. Howe et al. (2017) notes the camera angle of view (AOV) of 42\(^{\circ}\). The proportion of the circle viewed is this value over 360\(^{\circ}\).
An argument to dht2 is sample_fraction, an obvious place to include this quantity. We also add the multiplier for temporal availability described in the section on temporal availability The dht2 function will produce analytical measures of precision with this call.
+viewangle <- 42 # degrees
+samfrac <- viewangle / 360
+peak.uni.dens <- dht2(uni3, flatfile=DuikerCameraTraps, strat_formula = ~1,
+ sample_fraction = samfrac, er_est = "P2", multipliers = avail,
+ convert_units = conversion)
+print(peak.uni.dens, report="density")## Density estimates from distance sampling
+## Stratification : geographical
+## Variance : P2, n/L
+## Multipliers : creation
+## Sample fraction : 0.1166667
+##
+##
+## Summary statistics:
+## .Label Area CoveredArea Effort n k ER se.ER cv.ER
+## Total 40.37 2596.317 31483179 10284 21 0 0 0.268
+##
+## Density estimates:
+## .Label Estimate se cv LCI UCI df
+## Total 17.2357 4.801 0.279 9.7947 30.3297 23.239
+##
+## Component percentages of variance:
+## .Label Detection ER Multipliers
+## Total 2.17 92.77 5.06
+To produce a more reliable estimate of the precision of the point estimate, produce bootstrap estimates using bootdht. The user needs to create a function and another named list to facilitate use of the bootstrap: a summary function to extract information from each replicate and a multiplier list describing how temporal availability is being derived.
As constructed, mysummary will keep the density estimate produced by each bootstrap replicate and the stratum (if any) to which the estimate pertains.
+mysummary <- function(ests, fit){
+ return(data.frame(Label = ests$individuals$D$Label,
+ Dhat = ests$individuals$D$Estimate))
+}This rather complex list makes use of make_activity_fn that exists in the Distance package used to call the fitact function from the activity package. For the user, your responsibility is to provide three arguments to this function:
+mult <- list(availability= make_activity_fn(trigger.events$rtime, sample="data",
+ detector_daily_duration=camera.operation.per.day))Bootstrap analyses of camera trap data can be quite slow. In general, camera traps produce a large amount of distance sampling data, and in addition these data tend to be “overdispersed” meaning (in this case) that there are lots of observations with the same distances. Together, this can cause analyses to run slowly, and this can be especially true for bootstrap analyses for variance estimation.
+One way to speed up the bootstrap is to run multiple analyses in parallel, using multiple cores of your computer. You can achieve this using the cores argument to bootdht - for fastest results set this to the number of cores on your machine minus 1 (best to leave 1 free for other things). You can find the number of cores by calling parallel::detectCores() and we do this in the code below.
Another possible speed-up is to set starting values - but this is quite an advanced technique and so we come back to this later in this document.
+bootdht
+Just as with dht2 there are arguments for the model, flatfile, sample_fraction, convert.units and multipliers (although for bootdht multipliers uses a function rather than a single value). The only novel arguments to dht2 are resample_transects indicating camera stations are to be resampled with replacement, and nboot for the number of bootstrap replicates.
+n.cores <- parallel::detectCores()
+daytime.boot.uni <- bootdht(model=uni3, flatfile=DuikerCameraTraps,
+ resample_transects = TRUE, nboot = 500,
+ cores = n.cores - 1,
+ summary_fun=mysummary, sample_fraction = samfrac,
+ convert_units = conversion, multipliers=mult)## Bootstrap results
+##
+## Boostraps : 500
+## Successes : 498
+## Failures : 2
+##
+## median mean se lcl ucl cv
+## Dhat 18.01 19.13 7.85 7.62 37.79 0.44
+
+hist(daytime.boot.uni$Dhat, breaks = 20,
+ xlab = "Estimated density", main = "D-hat estimates bootstraps")
+abline(v = quantile(daytime.boot.uni$Dhat, probs = c(0.025,0.975), na.rm = TRUE), lwd = 2, lty = 2, col = "red")
+abline(v = c(peak.uni.dens$LCI/peak.uni.dens$Area, peak.uni.dens$UCI/peak.uni.dens$Area), lwd = 2, lty = 2, col = "grey")
+Figure 4: Distribution of density estimates from bootstrap replicates. Red dashed lines indicate bootstrap 95% confidence intervals (obtained using the quantile method); grey dashed lines indicate the analytical 95% confidence intervals obtained earlier. +
+The confidence interval derived from the bootstrap is wider than the confidence interval derived from analytical methods (Figure 4).
+Feel free to skip this unless you’re a fairly advanced user!
+In some cases, it may be necessary to set starting values for the detection function optimization, to help it converge. This can be achieved using the initial_values argument of the ds function. As an example, say we want to use the fitted values from the uniform + 2 cosine function uni2 as starting values for the first two parameters of the uniform + 3 cosine function fitting (and 0 for the third parameter). The following code does this:
+uni3.with.startvals <- ds(DuikerCameraTraps, transect = "point", key="unif", adjustment = "cos",
+ nadj=3,
+ cutpoints = mybreaks, truncation = trunc.list,
+ initial_values = list(adjustment = c(as.numeric(uni2$ddf$par), 0)))## Warning in create_bins(data, cutpoints): Some distances were outside bins and
+## have been removed.
+What about when it comes to bootstrapping for variance estimation. You can pass this model in to boot.dht with no problems, so long as you don’t set ncores to more than 1. If you do set ncores to more than 1 it won’t work, returning 0 successful bootstraps. Why? Because uni2$ddf$par is not passed along to all those parallel cores. To fix this you have to hard-code the starting values. So, in this example, we see that the values are
+print(uni2$ddf$par)## [1] 0.97177303 0.03540654
+and so we use
+
+uni3.with.startvals <- ds(DuikerCameraTraps, transect = "point", key="unif", adjustment = "cos",
+ nadj=3,
+ cutpoints = mybreaks, truncation = trunc.list,
+ initial_values = list(adjustment = c(0.97177303, 0.03540654, 0)))## Warning in create_bins(data, cutpoints): Some distances were outside bins and
+## have been removed.
+and this will work fine in bootdht.
A final tip is that setting starting values can sometimes speed up the bootstrap (as optimization is faster if it starts from a good initial spot), so you might want to pass in the starting values from uni3 to your bootstrap routine - something like the following, which we found nearly halved the run time on our test machine. Note, this code is set not to run in this examples file - just here to show what you might use.
+print(uni3$ddf$par)
+uni3.with.startvals <- ds(DuikerCameraTraps, transect = "point", key="unif", adjustment = "cos",
+ nadj=3,
+ cutpoints = mybreaks, truncation = trunc.list,
+ optimizer = "MCDS",
+ initial_values = list(adjustment = c(0.93518220, -0.05345965, -0.08073799)))
+daytime.boot.uni <- bootdht(model=uni3.with.startvals, flatfile=DuikerCameraTraps,
+ resample_transects = TRUE, nboot = 500,
+ cores = n.cores - 1,
+ summary_fun=mysummary, sample_fraction = samfrac,
+ convert_units = conversion, multipliers=mult)vignettes/web-only/alt-optimise/mcds-dot-exe.Rmd
+ mcds-dot-exe.RmdHere we demonstrate the use of the alternative optimization engine mcds.exe in the Distance and mrds packages. This engine was introduced with Distance version 1.0.8 and mrds version 2.2.9 to provide an alternative to the built-in optimizer in our R packages – but subsequent improvements in the built-in optimizer implemented in Distance 2.0.0 and mrds 3.0.0 mean that we no longer recommend use of mcds.exe. Nevertheless, the option to use mcds.exe remains open, and may be useful for some users, so we have retained this vignette. We may deprecate the option in future releases.
Note also that this vignette is designed for use within the Microsoft Windows operating system – the mcds.exe engine only has experimental support for MacOS or Linux (see the MCDS.exe help page within the mrds package.for more information).
Distance packagemrds packageDistance packageThe Distance package is designed to provide a simple way to fit detection functions and estimate abundance using conventional distance sampling methodology (i.e., single observer distance sampling, possibly with covariates, as described by Buckland et al. (2015)). The main function is ds. Underlying Distance is the package mrds – when the function ds is called it does some pre-processing and then calls the function ddf in the mrds package to do the work of detection function fitting. mrds uses maximum likelihood to fit the specified detection function model to the distance data using a built-in algorithm written in R.
An alternative method for analyzing distance sampling data is using the Distance for Windows software (Thomas et al., 2010). This software also uses maximum liklihood to fit the detection function models, and relies on software written in the programming language FORTRAN to do the fitting. The filename of this software is MCDS.exe.
In a perfect world, both methods would produce identical results given the same data and model specification, since the likelihood has only one maximum. However, the likelihood surface is sometimes complex, especially when monotonicity constraints are used (which ensures the estimated detection probability is flat or decreasing with increasing distance when adjustment terms are used) or with “overdispersed” or “spiked” data (see Figure 2 in Thomas et al. (2010)), and so in some (rare) cases one or other piece of software fails to find the maximum. Note that in our tests, we have found this to be extremely rare from Distance version 2.0.0 and mrds version 3.0.0 onwards. Nevertheless, to counteract this, it is possible to run both the R-based optimizer and MCDS.exe from the ds function within the Distance package or the ddf function within mrds package.
Another historical motivation for using the MCDS.exe software from within R was that the R-based optimizer was sometimes slow to converge and so using MCDS.exe in place of the R-based optimizer can then save significant time, particularly when doing a nonparametric bootstrap for large datasets. However, from Distance 2.0.0 and mrds 3.0.0 the R-based optimizer is no longer generally slower.
This vignette demonstrates how to download and then use the MCDS.exe sofware from within the Distance and mrds packages. For more information, see the MCDS.exe help page within the mrds package.
The program MCDS.exe does not come automatically with the Distance or mrds packages, to avoid violating CRAN rules, so you must first download it from the distance sampling website.
+#Unload Distance package if it's already loaded in R
+if("Distance" %in% (.packages())){
+ detach("package:Distance", unload=TRUE)
+}
+
+#Download MCDS.exe
+download.file("http://distancesampling.org/R/MCDS.exe", paste0(system.file(package="mrds"),"/MCDS.exe"), mode = "wb")## Warning in download.file("http://distancesampling.org/R/MCDS.exe",
+## paste0(system.file(package = "mrds"), : URL
+## http://distancesampling.org/R/MCDS.exe: cannot open destfile
+## 'C:/Users/erexs/Documents/R/win-library/4.1/mrds/MCDS.exe', reason 'Permission
+## denied'
+## Warning in download.file("http://distancesampling.org/R/MCDS.exe",
+## paste0(system.file(package = "mrds"), : download had nonzero exit status
+
+## Loading required package: mrds
+## This is mrds 3.0.0
+## Built: R 4.4.1; ; 2024-10-23 19:10:34 UTC; windows
+##
+## Attaching package: 'Distance'
+## The following object is masked from 'package:mrds':
+##
+## create.bins
+Now that this software is available, both it and the R optimizer will be used by default for each analysis; you can also choose to use just one or the other, as shown below.
This example (of golf tee data, using only observer 1) is taken from the R help for the ds function: (There is a warning about cluster sizes being coded as -1 that can be ignored.)
+#Load data
+data(book.tee.data)
+tee.data <- subset(book.tee.data$book.tee.dataframe, observer==1)
+#Fit detection function - default is half-normal with cosine adjustments
+ds.model <- ds(tee.data, truncation = 4)## Starting AIC adjustment term selection.
+## Fitting half-normal key function
+## AIC= 311.138
+## Fitting half-normal key function with cosine(2) adjustments
+## AIC= 313.124
+##
+## Half-normal key function selected.
+## No survey area information supplied, only estimating detection function.
+
+summary(ds.model)##
+## Summary for distance analysis
+## Number of observations : 124
+## Distance range : 0 - 4
+##
+## Model : Half-normal key function
+## AIC : 311.1385
+## Optimisation: mrds (nlminb)
+##
+## Detection function parameters
+## Scale coefficient(s):
+## estimate se
+## (Intercept) 0.6632435 0.09981249
+##
+## Estimate SE CV
+## Average p 0.5842744 0.04637627 0.07937412
+## N in covered region 212.2290462 20.85130344 0.09824906
+Assuming you have MCDS.exe installed, the default is that both it and the R-based optimizer are run. Both give the same result in this example, and when this happens the result from the R-based optimizer is used. You can see this from the line of summary output:
Optimisation: mrds (nlminb)
where mrds is the R package that the Distance package relies on, and nlminb is the R-based optimizer.
You can see the process of both optimizers being used by setting the debug_level argument of the ds function to a value larger than the default of 0 and then examining the output:
+ds.model <- ds(tee.data, truncation = 4, debug_level = 1)## Starting AIC adjustment term selection.
+## Fitting half-normal key function
+## DEBUG: initial values = -0.1031529
+## Running MCDS.exe...
+## Command file written to C:\Users\erexs\AppData\Local\Temp\RtmpsdWor7\cmdtmp46cc6da77768.txt
+## Stats file written to C:\Users\erexs\AppData\Local\Temp\RtmpsdWor7\stat46cc26507b7a.txt
+## DEBUG: initial values = 0.6632378
+##
+## DEBUG: Convergence!
+## Iteration 0.0
+## Converge = 0
+## nll = 154.5692
+## parameters = 0.6632378
+## MCDS.exe log likehood: -154.5697
+## MCDS.exe pars: 1.941067
+## mrds refitted log likehood: -154.5692276
+## mrds refitted pars: 0.6632378
+##
+## DEBUG: Convergence!
+## Iteration 0.0
+## Converge = 0
+## nll = 154.5692
+## parameters = 0.6632435
+## AIC= 311.138
+## Fitting half-normal key function with cosine(2) adjustments
+## DEBUG: initial values = -0.1031529 0
+## Running MCDS.exe...
+## Command file written to C:\Users\erexs\AppData\Local\Temp\RtmpsdWor7\cmdtmp46cc19ae5459.txt
+## Stats file written to C:\Users\erexs\AppData\Local\Temp\RtmpsdWor7\stat46cc5f227f6c.txt
+## DEBUG: initial values = 0.6606793 -0.0159333
+##
+## DEBUG: Convergence!
+## Iteration 0.0
+## Converge = 0
+## nll = 154.5619
+## parameters = 0.6606793, -0.0159333
+## MCDS.exe log likehood: -154.5624
+## MCDS.exe pars: 1.936107, -0.0159333
+## mrds refitted log likehood: -154.5619307
+## mrds refitted pars: 0.6606793, -0.0159333
+## iteration: 1
+## f(x) = 243.539291
+## iteration: 2
+## f(x) = 164.079444
+## iteration: 3
+## f(x) = 156.273060
+## iteration: 4
+## f(x) = 155.340034
+## iteration: 5
+## f(x) = 154.684098
+## iteration: 6
+## f(x) = 154.571590
+## iteration: 7
+## f(x) = 154.562292
+## iteration: 8
+## f(x) = 154.561975
+## iteration: 9
+## f(x) = 154.561931
+##
+## DEBUG: Convergence!
+## Iteration 0.0
+## Converge = 0
+## nll = 154.5619
+## parameters = 0.6606883, -0.0159336
+## DEBUG: MCDS lnl = -154.5619 mrds lnl = 154.5619
+## AIC= 313.124
+##
+## Half-normal key function selected.
+## No survey area information supplied, only estimating detection function.
+First the half-normal with no adjustments is run; for this model the MCDS.exe software is run first, followed by the R-based (mrds) optimizer. Both converge and both give the same nll (negative log-likelihood) or 154.5692, giving an AIC of 311.138. The model with half-normal and a cosine adjustment of order 2 is then fitted to the data, with first the MCDS.exe optimizer and then the R-based optimizer. Again both give the same result of nll 154.5619 and an AIC of 313.124. This is higher than the AIC with no adjustments so half-normal with no adjustments is chosen.
In this case, both optimizers produced the same result, so there is no benefit to run MCDS.exe.
As we said earlier, the default behaviour when MCDS.exe has been downloaded is to run both MCDS.exe and the R-based optimizer. However, the optimizer argument can be used to specify which to use – either both, R or MCDS. Here is an example with just the MCDS.exe optimizer:
+ds.model <- ds(tee.data, truncation = 4, optimizer = "MCDS")## Starting AIC adjustment term selection.
+## Fitting half-normal key function
+## AIC= 311.138
+## Fitting half-normal key function with cosine(2) adjustments
+## AIC= 313.124
+##
+## Half-normal key function selected.
+## No survey area information supplied, only estimating detection function.
+
+summary(ds.model)##
+## Summary for distance analysis
+## Number of observations : 124
+## Distance range : 0 - 4
+##
+## Model : Half-normal key function
+## AIC : 311.1385
+## Optimisation: MCDS.exe
+##
+## Detection function parameters
+## Scale coefficient(s):
+## estimate se
+## (Intercept) 0.6632378 0.09981136
+##
+## Estimate SE CV
+## Average p 0.5842718 0.04637577 0.07937362
+## N in covered region 212.2300013 20.85133459 0.09824876
+The summary output now says Optimisation: MCDS.exe.
ddf in mrds package
+Here we demonstrate using both optimizers in the ddf function, rather than via ds.
+#Half normal detection function
+ddf.model <- ddf(dsmodel = ~mcds(key = "hn", formula = ~1), data = tee.data, method = "ds",
+ meta.data = list(width = 4))
+#Half normal with cos(2) adjustment
+ddf.model.cos2 <- ddf(dsmodel = ~mcds(key = "hn", adj.series = "cos", adj.order = 2, formula = ~1),
+ data = tee.data, method = "ds", meta.data = list(width = 4))
+#Compare with AIC
+AIC(ddf.model, ddf.model.cos2)## df AIC
+## ddf.model 1 311.1385
+## ddf.model.cos2 2 313.1239
+
+#Model with no adjustment term has lower AIC; show summary of this model
+summary(ddf.model)##
+## Summary for ds object
+## Number of observations : 124
+## Distance range : 0 - 4
+## AIC : 311.1385
+## Optimisation : mrds (nlminb)
+##
+## Detection function:
+## Half-normal key function
+##
+## Detection function parameters
+## Scale coefficient(s):
+## estimate se
+## (Intercept) 0.6632435 0.09981249
+##
+## Estimate SE CV
+## Average p 0.5842744 0.04637627 0.07937412
+## N in covered region 212.2290462 20.85130344 0.09824906
+As an exercise, fit using just the MCDS.exe optimizer:
+ddf.model <- ddf(dsmodel = ~mcds(key = "hn", adj.series = "cos", adj.order = 2,
+ formula = ~1), data = tee.data, method = "ds",
+ meta.data = list(width = 4),
+ control = list(optimizer = "MCDS"))
+summary(ddf.model)##
+## Summary for ds object
+## Number of observations : 124
+## Distance range : 0 - 4
+## AIC : 313.1239
+## Optimisation : MCDS.exe
+##
+## Detection function:
+## Half-normal key function with cosine adjustment term of order 2
+##
+## Detection function parameters
+## Scale coefficient(s):
+## estimate se
+## (Intercept) 0.6606782 0.1043327
+##
+## Adjustment term coefficient(s):
+## estimate se
+## cos, order 2 -0.01593274 0.1351281
+##
+## Estimate SE CV
+## Average p 0.5925856 0.08165144 0.1377884
+## N in covered region 209.2524623 31.22790760 0.1492356
+This is an example of point transect data for a bird (wren), from Buckland (2006). In this case one of the optimizers fails correctly to constrain the detection function so the probability of detection is more than zero at all distances, and so we use the other optimizer for inference.
+We load the wren 5 minute example dataset and define cutpoints for the distances (they were collected in intervals).
+ +The following call to ds gives several warnings. Some warnings are about the detection function being less than zero at some distances. There is also a warning about the Hessian (which is used for variance estimation), but this relates to the Hermite(4, 6) model (i.e., two Hermite adjustment terms of order 4 and 6) which is not chosen using AIC and so this warning can be ignored.
+wren5min.hn.herm.t100 <- ds(data = wren_5min, key = "hn", adjustment = "herm",
+ transect = "point", cutpoints = bin.cutpoints.100m)## Warning in create_bins(data, cutpoints): Some distances were outside bins and
+## have been removed.
+## Starting AIC adjustment term selection.
+## Fitting half-normal key function
+## AIC= 427.471
+## Fitting half-normal key function with Hermite(4) adjustments
+## Warning in check.mono(result, n.pts = control$mono.points): Detection function
+## is less than 0 at some distances
+## Warning in check.mono(result, n.pts = control$mono.points): Detection function
+## is less than 0 at some distances
+## AIC= 422.228
+## Fitting half-normal key function with Hermite(4,6) adjustments
+## Warning: First partial hessian is singular and second-partial hessian is NULL, no hessian
+## Warning: Detection function is less than 0 at some distances
+## Warning: Detection function is less than 0 at some distances
+## AIC= 423.255
+##
+## Half-normal key function with Hermite(4) adjustments selected.
+## Warning in mrds::check.mono(model, n.pts = 10): Detection function is less than
+## 0 at some distances
+
+summary(wren5min.hn.herm.t100)##
+## Summary for distance analysis
+## Number of observations : 132
+## Distance range : 0 - 100
+##
+## Model : Half-normal key function with Hermite polynomial adjustment term of order 4
+##
+## Strict monotonicity constraints were enforced.
+## AIC : 422.2284
+## Optimisation: MCDS.exe
+##
+## Detection function parameters
+## Scale coefficient(s):
+## estimate se
+## (Intercept) 12.08697 1e+05
+##
+## Adjustment term coefficient(s):
+## estimate se
+## herm, order 4 0.5723854 0.07888508
+##
+## Estimate SE CV
+## Average p 0.4399177 0.0253475 0.05761875
+## N in covered region 300.0561563 26.0944820 0.08696533
+##
+## Summary statistics:
+## Region Area CoveredArea Effort n k ER se.ER cv.ER
+## 1 Montrave 33.2 2010619 64 132 32 2.0625 0.1901692 0.09220324
+##
+## Abundance:
+## Label Estimate se cv lcl ucl df
+## 1 Total 0.004954625 0.0005386969 0.1087261 0.003988075 0.006155428 57.83608
+##
+## Density:
+## Label Estimate se cv lcl ucl df
+## 1 Total 0.0001492357 1.622581e-05 0.1087261 0.0001201227 0.0001854045 57.83608
+The MCDS.exe optimizer is the chosen one (see the `Optimisation’ line of output).
The warnings persist if only the MCDS.exe optimizer is used:
+wren5min.hn.herm.t100.mcds <- ds(data = wren_5min, key = "hn", adjustment = "herm",
+ transect = "point", cutpoints = bin.cutpoints.100m,
+ optimizer = "MCDS")## Warning in create_bins(data, cutpoints): Some distances were outside bins and
+## have been removed.
+## Starting AIC adjustment term selection.
+## Fitting half-normal key function
+## AIC= 427.471
+## Fitting half-normal key function with Hermite(4) adjustments
+## Warning in check.mono(result, n.pts = control$mono.points): Detection function
+## is less than 0 at some distances
+## Warning in check.mono(result, n.pts = control$mono.points): Detection function
+## is less than 0 at some distances
+## AIC= 422.228
+## Fitting half-normal key function with Hermite(4,6) adjustments
+## Warning: First partial hessian is singular and second-partial hessian is NULL, no hessian
+## Warning: Detection function is less than 0 at some distances
+## Warning: Detection function is less than 0 at some distances
+## AIC= 423.255
+##
+## Half-normal key function with Hermite(4) adjustments selected.
+## Warning in mrds::check.mono(model, n.pts = 10): Detection function is less than
+## 0 at some distances
+Looking at a plot of the fitted object (Figure 1), it seems that the evaluated pdf is less than 0 at distances close to the truncation point (approx. 95m and greater):
+
+plot(wren5min.hn.herm.t100.mcds, pdf = TRUE)
+Figure 1: PDF of fitted model with MCDS optimizer. +
+What appears to be happening here is a failure of the optimization routine to appropriately constrain the model parameters so that the detection function is valid. This happens on occasion (the routines aren’t perfect!) and where it does we recommend trying the other optimization routine. Here we use the R-based optimizer:
+wren5min.hn.herm.t100.r <- ds(data=wren_5min, key="hn", adjustment="herm",
+ transect="point", cutpoints=bin.cutpoints.100m,
+ optimizer = "R")## Warning in create_bins(data, cutpoints): Some distances were outside bins and
+## have been removed.
+## Starting AIC adjustment term selection.
+## Fitting half-normal key function
+## AIC= 427.471
+## Fitting half-normal key function with Hermite(4) adjustments
+## AIC= 422.73
+## Fitting half-normal key function with Hermite(4,6) adjustments
+## AIC= 424.717
+##
+## Half-normal key function with Hermite(4) adjustments selected.
+Here the fitted AIC for the chosen model (half normal with one Hermite adjustment of order 4) is 422.73, higher than that with the MCDS.exe optimizer (which was 422.23), which explains why the MCDS.exe optimizer fit was chosen when we allowed ds to choose freely. However, the detection function fit from MCDS.exe was invalid, because it went lower than 0 at about 95m, while the fit with the R-based optimizer looks valid (Figure 2):
+plot(wren5min.hn.herm.t100.r, pdf = TRUE)
+Figure 2: PDF of fitted model with R-based optimizer. +
+Hence in this case, we would use the R-based optimizer’s fit.
For this example, it helps if you are familiar with the Analysis of camera trapping data vignette on the distance sampling web site.
+You also need to Download from the Dryad data repository the detection distances for the full daytime data set and then read it in with the code below:
+
+#Read in data and set up data for analysis
+DuikerCameraTraps <- read.csv(file="DaytimeDistances.txt", header=TRUE, sep="\t")
+DuikerCameraTraps$Area <- DuikerCameraTraps$Area / (1000*1000)
+DuikerCameraTraps$object <- NA
+DuikerCameraTraps$object[!is.na(DuikerCameraTraps$distance)] <- 1:sum(!is.na(DuikerCameraTraps$distance))
+
+#Specify breakpoints and truncation
+trunc.list <- list(left=2, right=15)
+mybreaks <- c(seq(2, 8, 1), 10, 12, 15)Then we fit the detection function selected in the camera trap vignette, uniform plus 3 cosine adjustment terms, and time how long the fitting takes:
+
+start.time <- Sys.time()
+uni3.r <- ds(DuikerCameraTraps, transect = "point", key="unif", adjustment = "cos",
+ nadj=3, cutpoints = mybreaks, truncation = trunc.list, optimizer = "R")## Warning in create_bins(data, cutpoints): Some distances were outside bins and
+## have been removed.
+## Fitting uniform key function with cosine(1,2,3) adjustments
+## AIC= 44012.238
+
+##
+## Summary for distance analysis
+## Number of observations : 10284
+## Distance range : 2 - 15
+##
+## Model : Uniform key function with cosine adjustment terms of order 1,2,3
+##
+## Strict monotonicity constraints were enforced.
+## AIC : 44012.24
+## Optimisation: mrds (slsqp)
+##
+## Detection function parameters
+## Scale coefficient(s):
+## NULL
+##
+## Adjustment term coefficient(s):
+## estimate se
+## cos, order 1 0.93529846 0.01504415
+## cos, order 2 -0.05342058 0.02438026
+## cos, order 3 -0.08069257 0.01557680
+##
+## Estimate SE CV
+## Average p 3.290022e-01 1.349494e-02 0.04101777
+## N in covered region 3.125815e+04 1.306764e+03 0.04180555
+##
+## Summary statistics:
+## Region Area CoveredArea Effort n k ER se.ER cv.ER
+## 1 Tai 40.37 21858518573 31483179 10284 21 0.0003266506 8.763252e-05 0.268276
+##
+## Abundance:
+## Label Estimate se cv lcl ucl df
+## 1 Total 5.772996e-05 1.566754e-05 0.2713936 3.315829e-05 0.0001005103 20.94597
+##
+## Density:
+## Label Estimate se cv lcl ucl df
+## 1 Total 1.430021e-06 3.880986e-07 0.2713936 8.213597e-07 2.489727e-06 20.94597
+Fitting takes 10 secs. (Note, in versions of Distance before 2.0.0 this was a much higher number!) Here we try the MCDS.exe optimizer:
+start.time <- Sys.time()
+uni3.mcds <- ds(DuikerCameraTraps, transect = "point", key="unif", adjustment = "cos",
+ nadj=3, cutpoints = mybreaks, truncation = trunc.list, optimizer = "MCDS")## Warning in create_bins(data, cutpoints): Some distances were outside bins and
+## have been removed.
+## Fitting uniform key function with cosine(1,2,3) adjustments
+## AIC= 44012.211
+
+##
+## Summary for distance analysis
+## Number of observations : 10284
+## Distance range : 2 - 15
+##
+## Model : Uniform key function with cosine adjustment terms of order 1,2,3
+##
+## Strict monotonicity constraints were enforced.
+## AIC : 44012.21
+## Optimisation: MCDS.exe
+##
+## Detection function parameters
+## Scale coefficient(s):
+## NULL
+##
+## Adjustment term coefficient(s):
+## estimate se
+## cos, order 1 0.93518220 0.01504583
+## cos, order 2 -0.05345965 0.02438049
+## cos, order 3 -0.08073799 0.01557817
+##
+## Estimate SE CV
+## Average p 3.290679e-01 1.349917e-02 0.04102246
+## N in covered region 3.125191e+04 1.306645e+03 0.04181008
+##
+## Summary statistics:
+## Region Area CoveredArea Effort n k ER se.ER cv.ER
+## 1 Tai 40.37 21858518573 31483179 10284 21 0.0003266506 8.763252e-05 0.268276
+##
+## Abundance:
+## Label Estimate se cv lcl ucl df
+## 1 Total 5.771844e-05 1.566445e-05 0.2713943 3.315164e-05 0.0001004903 20.94619
+##
+## Density:
+## Label Estimate se cv lcl ucl df
+## 1 Total 1.429736e-06 3.880222e-07 0.2713943 8.21195e-07 2.489232e-06 20.94619
+This took a little less time: 9 secs. Hence, for some datasets, it may be quicker to use the MCDS.exe optimizer. This could make a significant difference if using the nonparametric bootstrap to estimate variance. However, after making improvements to the optimizer in mrds 3.0.0 and Distance 2.0.0 the difference is generally small, and in many cases the R optimizer is faster than MCDS.exe so this is likely not a productive avenue to pursue in general.
We have shown how to fit distance sampling detection functions (for single platform data) using either the R-based optimizer built into the ddf function (via calling ddf or, more likely, calling the ds function in the Distance package) or the MCDS.exe analysis engine used by Distance for Windows. In the vast majority of cases both fitting methods give the same result, and so there is no need to use both. However, the only downside is that fitting takes longer, as each is called in turn. If you have downloaded the MCDS.exe file and want to speed things up, you can use just the R-based optimizer by specifying optimizer = "R" in the call to ds or ddf, or just the MCDS.exe optimizer with optimizer = "MCDS".
Some situations where the two may produce different results are given below. Note that in each case we give an update related to new algorithms developed and used in mrds 3.0.0.
Detection functions that are close to non-monotonic or close to zero at some distances. When adjustment terms are used in the detection function, then constraints are required to prevent the fitted function from having “bumps” where detection probability increases with increasing distance and also to prevent detection probability from becoming less than zero. The former are called monotonicity constraints and are set using the monotonicity argument in ds or in the meta.data argument in ddf; monotonicity is set on by default. In practice, monotonicity and values less than zero are monitored at a finite set of distances between the 0 and the right truncation point, and (for historical reasons) this set of distances is different for the R-based and MCDS.exe optimizers. This typically makes no difference to the optimization, but particularly in borderline cases it can result in different fitted functions. Plotting the fitted functions (as we did in the wren example above) can reveal when there is an issue with a fitted function, and if this occurs the associated optimizer should not be used. In the future we plan to bring the two into line so they use the same distances for checking.
mrds 3.0.0 and Distance 2.0.0 these are now aligned, so this difference should have gone away.Detection functions with many adjustment terms. The two optimizers use different algorithms for optimization: the R-based optimizer uses a routine called nlminb while MCDS.exe uses a nonlinear constrained optimizer routine produced by the IMSL group. In cases where there are multiple adjustment terms, and hence several parameters to estimate (that are often correlated) the likelihood maximization is harder, and one or other routine can sometimes fail to find the maximum. In this case, choosing the routine with the higher likelihood (i.e., lower negative log-likelihod, or equivalently lower AIC) is the right thing to do, and this is the default behaviour of the software.
Update: in mrds 3.0.0 we now use a Sequential Least Squares Programming (SLSQP) algorithm from the ‘nloptr’ package via nlminb in the R-based optimizer (rather than the old solnp algorithm). The old algorithm can be accessed from the ds() function in Distance using the argument mono_method = "solnp" or with the ddf() function in mrds using the argument control(mono.method = "solnp"). However, the new one shows improved performance in our testing, and so we do not recommend using the old algorithm except for reasons of backwards compatibility.
Detection functions that are “overdispersed” or with a “spike” in the detection function close to zero distance. Similarly to the above, the detection function can then be hard to maximize and hence on or other optimizer can fail to find the maximum. Solution is as above. Overdispersed data is common in camera trap distance sampling because many detections can be generated by the same individual crossing in front of the camera.
Update is as above.
If you are interested in seeing more comparisons of the optimizers on various datasets, we maintain a test suite of both straightforward and challenging datasets together with test code to run and compare the two optimizers – this is available at the MCDS_mrds_compare repository.
+If you encounter difficulties when using both optimizers, one possible troubleshooting step is to run the analysis first choosing one optimizer (e.g., specifing the argument optimizer = "MCDS") and then choosing the other (optimizer = "R"). This allows you clearly to see what the output of each optimizer is (including any error messages) and facilitates their comparison.
One other criterion to favour one optimizer over the other is speed. We found that for large datasets the MCDS.exe optimizer was quicker, but as of Distance 2.0.0 and mrds 3.0.0 this is no longer necessarily the case.
One thing to note is that the MCDS.exe file will get deleted each time you update the mrds package, so you’ll need to re-download the file if you want to continue using the MCDS.exe optimizer. As shown above, this only requires running one line of code.
vignettes/web-only/cues/cuecounts-distill.Rmd
+ cuecounts-distill.RmdIn this exercise, we use R (R Core Team, 2019) and the Distance package (Miller, Rexstad, Thomas, Marshall, & Laake, 2019) to fit different detection function models to point transect cue count survey data of winter wren (Troglodytes troglodytes) density and abundance. These data were part of a study described by Buckland (2006).
Each of the 32 point count stations were visited twice. During each visit, the observer recorded distances to all songs detected during a 5-minute sampling period (Figure 1).
++Figure 1: Montrave study area; white circles are point count stations. +
+In addition, 43 male winter wrens were observed and their rate of song production was measured. The mean cue rate, along with its standard error (between individuals) was calculated and included in the data set to serve as a multiplier.
+The fields of the wren_cuecount data set are:
Distance package and cue count data
+This command assumes that the dsdata package has been installed on your computer. The R workspace wren_cuecount contains detections of winter wrens from the line transect surveys of Buckland (2006).
Examine the first few rows of wren_cuecount using the function head()
+head(wren_cuecount)## Region.Label Area Sample.Label Cue.rate Cue.rate.SE object distance
+## 1 Montrave 33.2 1 1.4558 0.2428 38 50
+## 2 Montrave 33.2 1 1.4558 0.2428 39 55
+## 3 Montrave 33.2 1 1.4558 0.2428 40 55
+## 4 Montrave 33.2 1 1.4558 0.2428 41 55
+## 5 Montrave 33.2 1 1.4558 0.2428 46 50
+## 6 Montrave 33.2 1 1.4558 0.2428 47 50
+## Study.Area Search.time
+## 1 montrave 3 10
+## 2 montrave 3 10
+## 3 montrave 3 10
+## 4 montrave 3 10
+## 5 montrave 3 10
+## 6 montrave 3 10
+Note there is no field in the data to indicate sampling effort. With line transects, the lengths of each transect were provided to measure effort. For point transects, the number of visits to each station was specified. In this data set, all that is specified is Search.time the length of time each station was sampled. Note, each station was visited twice and sampling was 5 minutes in length on each visit. Hence Search.time is recorded as 10. Note also the units of measure of Search.time must be consistent with the units of measure of cue rate.
Gain familiarity with the perpendicular distance data using the hist() function (Figure 2).
+hist(wren_cuecount$distance, xlab="Distance (m)", main="Song detection distances")
+Figure 2: Radial detection distances of winter wren song bursts. +
+Note the long right tail we will cut off with the truncation argument to ds().
ds
+As noted above, Effort is missing from the data. With cue count surveys, effort is measured in time rather than length or number of visits. Therefore we define a new field Effort and set it equal to the Search.time field.
Note: no converstion.factor is specified in the call to ds() because it is only the detection function that is of interest at this step of the analysis, nothing about density or abundance.
+conversion.factor <- convert_units("meter", NULL, "hectare")
+wren_cuecount$Effort <- wren_cuecount$Search.time
+wrensong.hr <- ds(wren_cuecount, transect="point", key="hr", adjustment=NULL,
+ truncation=100)Visually inspect the fitted detection function with the plot() function, specifying the cutpoints histogram with argument breaks (Figure 3).
+cutpoints <- c(0,5,10,15,20,30,40,50,65,80,100)
+plot(wrensong.hr, breaks=cutpoints, pdf=TRUE, main="Hazard rate function fit to winter wren song counts.")
+Figure 3: Fit of hazard rate detection function to winter wren song detection distances. +
+Do not examine the abundance or density estimates produced by summary(wrensong.hr) because as the results it contains are nonsense. These summary values do not properly recognise that the unit of effort is time rather than visits for the point count survey. This additional component of the analysis is provided in the next step.
dht2
+The function dht2 provides additional capacity for providing density or abundance estimates in novel situations such as cue counts where multipliers need to be incorporated.
The argument multipliers in dht2 provides the mechanism whereby the cue production rate and its uncertainty are incorporated into the analysis.
To properly perform the calculations responsible for converting song density to bird density, we enlist the aide of the function dht2. The additional information about cue rates and their variability are provided in a list. The multiplier in the list is required to have the name creation and it contains both the cue rate point estimate and its associated measure of precision.
+cuerate <- unique(wren_cuecount[ , c("Cue.rate","Cue.rate.SE")])
+names(cuerate) <- c("rate", "SE")
+(mult <- list(creation=cuerate))## $creation
+## rate SE
+## 1 1.4558 0.2428
+Additional arguments are also passed to dht2. flatfile is the name of the data set and strat_formula contains information about stratification that might exist in the survey design. The Montrave study had no stratification, inference was only for the 33 hectare woodland, so strat_formula here is simply constant ~1.
Results of the overall winter wren density estimate is provided by a print method, specifying report="density". The alternative for the report argument is report="abundance".
+wren.estimate <- dht2(wrensong.hr, flatfile=wren_cuecount, strat_formula=~1,
+ multipliers=mult, convert_units=conversion.factor)
+print(wren.estimate, report="density")## Density estimates from distance sampling
+## Stratification : geographical
+## Variance : P2, n/L
+## Multipliers : creation
+## Sample fraction : 1
+##
+##
+## Summary statistics:
+## .Label Area CoveredArea Effort n k ER se.ER cv.ER
+## Total 33.2 1005.31 320 771 32 2.409 0.236 0.098
+##
+## Density estimates:
+## .Label Estimate se cv LCI UCI df
+## Total 1.2018 0.238 0.198 0.8172 1.7674 520.679
+##
+## Component percentages of variance:
+## .Label Detection ER Multipliers
+## Total 4.83 24.38 70.79
+We assess the goodness of fit of the hazard rate model to the winter wren cue count data (Figure 4).
+
+gof_ds(wrensong.hr)
+Figure 4: Q-Q plot of hazard rate model to winter wren radial detection distances. +
+##
+## Goodness of fit results for ddf object
+##
+## Distance sampling Cramer-von Mises test (unweighted)
+## Test statistic = 1.69439 p-value = 6.24759e-05
+Note the distinct lack of fit to the song data. This is because of many detections at the identical distances from birds being stationary and singing. This induces a phenomenon known as over dispersion.
+This vignette uses the function dht2 because that function knows how to incorporate multipliers such as cue rates and propogate the uncertainty in cue rate into overall uncertainty in density and abundance. Because there is uncertainty coming not only from encounter rate variability and uncertainty in detection function parameters, but also from cue rate variability, the relative contribution of each source of uncertainty is tablated. This is the last table produced by printing the wren.estimate object. For the Montrave winter wren data, only 4% of the uncertainty in the density estimate is attributable to the detection function, 24% attributable to encounter rate variability and 71% attributable to between-individual variability in call rate.
This insight suggests that if this survey was to be repeated, exerting more effort in measuring between-individual variation in call rate would likely yield the most benefits in tightening the precision in density estimates.
+Also note the poor fit of the model to the data; the P-value for the Cramer von-Mises test is <<0.05. This is caused by over-dispersion in the distribution of detected call distances. A single individual may sit on a tree branch and emit many song bursts, leading to a jagged distribution of call distances that is not well fitted by a smooth detection function. That over-dispersion will not bias the density estimates.
+vignettes/web-only/differences/differences.Rmd
+ differences.RmdOften ecological questions extend beyond simply wanting an estimate of density in a study region. It is common for inference to extend to differences in density over time or space.
+In Buckland et al. (2001, Sect. 3.6.5) methods are described to produce tests of significance based on t-test methods. That section presents formulas for comparing two density estimates under two scenarios
+The situation that Buckland et al. (2001) does not consider is the situation in which the two estimates are linked via a covariate in the detection function. Because the t-test framework cannot deal with this intermediate situation, an alternative approach, employing the bootstrap, can be employed. The bootstrap provides the added advantage that no parametric assumptions (t-distribution) need to be invoked when making inference.
+A function in the Distance R package (Miller, Rexstad, Thomas, Marshall, & Laake, 2019) exists for computing uncertainty in density estimates via bootstrapping. This vignette demonstrates a function that harnesses the bootdht function to produce a sampling distribution of the difference between pairs of density estimates embedded as strata within a data set.
Recognise that strata can represent not only geographic divisions of a study area, but potentially also a survey of the same study area at another time. If a data set is organised in this manner, then the assessment of differences between strata would be an assessment of the possible change in density over time. Furthermore, as shown in the example of multi-species surveys, species could serve as strata. In this context, assessing the difference in stratum-specific density would examine the difference in density between species.
+
+#' @title differences.bootstrap
+#'
+#' @description Test for pairwise density differences between strata
+#'
+#' Test is performed by producing replicate stratum-specific estimates and calculating
+#' differences of each replicate. Differencing is done for all pairs of strata in
+#' the survey, e.g. if there are 4 strata there are \code{choose(4,2)=6} pairwise
+#' comparisons computed.
+#'
+#' Histograms are produced for each comparison, designating the median of the distribution
+#' and a percentile-based 95% confidence interval from the sampling distribution
+#'
+#' Difficulties can arise from very long left or right tails of the distribution
+#' resulting from awkward bootstrap replicates. The limits of the histogram are
+#' cut off at 5*median so histogram shape does not appear degenerate. Code presumes differences will be positive.
+#'
+#' @param dsobj dsmodel object generated by \code{ds}
+#' @param flatfile flatfile of survey data analysed by \code{ds}
+#' @param nboot number of bootstrap replicates to compute
+#'
+#' @return Histogram showing sampling distribution of differences plus named list
+#' \itemize{
+#' \item medians - median of sampling distribution
+#' \item ps - P-value for two-tailed test that difference is zero
+#' \item thematrix - Matrix of replicate pairwise differences
+#' }
+#' @importFrom Distance bootdht
+#' @export
+#'
+#' @examples
+#' library(Distance)
+#' data(minke)
+#' hn.pooled <- ds(minke) # pooled detection function with hn key
+#' result <- differences.bootstrap(hn.pooled, minke, nboot=100)
+differences.bootstrap <- function(dsobj, flatfile, nboot) {
+
+ num.strata <- length(dsobj$dht$individuals$D$Estimate) - 1
+ stopifnot(
+ 'first argument is not a dsmodel object' = class(dsobj) == 'dsmodel',
+ 'study area must have >1 stratum' = num.strata > 1,
+ 'specified flatfile object is not a data.frame ' = class(flatfile) == 'data.frame'
+ )
+ d.point.ests <- dsobj$dht$individuals$D$Estimate
+ strata.names <- dsobj$dht$individuals$D$Label
+# Following function used by bootdht to collect density point estimates
+# from each bootstrap replicate
+ pullout.D <- function(ests, fit) {
+ bill <- ests$individuals$D$Estimate
+ extract <- data.frame(t(bill))
+ colnames(extract) <- ests$individuals$D$Label
+ return(extract)
+ }
+ outcome <- bootdht(dsobj, flatfile=flatfile, cores=10,
+ summary_fun=pullout.D, nboot=nboot)
+# Having run the bootstrap, calculate number of pairwise comparisons btwn strata
+# create objects to receive the replicate-wise differences for each comparison
+# median differences are reported and empirical P-value computed for each comparison
+# histograms of sampling distribution for differences are shown with CIs
+# allstrata <- complete.cases(outcome)
+ num.compare <- choose(num.strata, 2)
+ pairs <- t(combn(1:num.strata, 2))
+ result.matrix <- matrix(data=NA, nrow=nrow(outcome), ncol=num.compare)
+ themedian <- array(data=NA, dim=num.compare)
+ pvalue <- array(data=NA, dim=num.compare)
+ par(mfrow=c(num.compare, 1))
+ for (i in 1:num.compare) {
+ result.matrix[,i] <- mapply('-', outcome[pairs[i,2]], outcome[pairs[i,1]])
+ themedian[i] <- median(result.matrix[,i], na.rm=TRUE)
+ pvalue[i] <- ifelse(themedian[i]>0,
+ sum(result.matrix[,i]<0, na.rm=TRUE) / sum(!is.na(result.matrix[ ,i])),
+ sum(result.matrix[,i]>0, na.rm=TRUE) / sum(!is.na(result.matrix[ ,i])))
+ tmp <- result.matrix[ ,i]
+ hist(tmp[abs(tmp)<5*abs(themedian[i])],
+ breaks=30, xlab="Estimated difference",
+ main=paste("Bootstrap test of equality of two density estimates",
+ "\nMedian difference=", round(themedian[i],4),
+ " Two-tailed P-value=", round(2*pvalue[i],4)))
+ abline(v=themedian[i])
+ abline(v=quantile(result.matrix[,i], probs = c(0.025, 0.975), na.rm=TRUE), lty=3)
+ first <- pairs[i, 1]
+ second <- pairs[i, 2]
+ line1 <- bquote(hat(D)[.(strata.names[first])] == .(round(d.point.ests[first], 4)))
+ line2 <- bquote(hat(D)[.(strata.names[second])] == .(round(d.point.ests[second], 4)))
+ legend("topleft", legend=as.expression(c(line1, line2)))
+ }
+ par(mfrow=c(1,1))
+ return(list(medians=themedian, ps=2*pvalue, thematrix=result.matrix))
+}Several examples of the use of differences.bootstrap are provided. They make use of data sets that are included in the Distance package.
The simplest example uses the minke data set that consists of two geographic strata (North and South). A model that can be fitted to these data assumes the two strata share a common detection function
+library(Distance)
+data(minke)
+hr.pooled <- ds(minke, key="hr", truncation=1.5)
+result <- differences.bootstrap(hr.pooled, flatfile=minke, nboot=250)
+Figure 1: Strata share a pooled detection function. +
+Output from the function consists primarily of a histogram of the replicate density differences. This approximates the sampling distribution of the estimated density difference. A solid vertical line depicts the median of that distribution (medians are less influenced by outliers than are means). Dotted vertical lines depict the 95th percentiles around the estimated difference. The two-tailed P-value is presented in the histogram main title. In the legend box are presented the density estimates from the two strata, labelled using the Region.Label values found in the dsmodel object passed to the function.
Working with the same minke data set, we present an alternative analysis in which stratum-specific detection functions are derived using stratum as a covariate in the detection function. Having fitted that detection function model to the data, the comparison of the densities in the strata are performed using the same function.
+hr.covar <- ds(minke, key="hr", truncation=1.5, formula=~Region.Label)
+resultcovar <- differences.bootstrap(hr.covar, flatfile=minke, nboot=250)
+Figure 2: Two strata with Region.Label as a covariate in detection function. +
+The evidence that densities differ in the two strata appear stronger in this analysis because the dependence in the two estimates is reduced as a result of stratum-specific detection functions being used. Of course, inference would not be drawn from two different analyses of the same data set, this is merely to demonstrate the use of the function. If we were to perform model selection upon the two detection function models fitted to the minke data, we would find the model with stratum as a covariate is preferable and our inference should be based upon this second analysis.
+Another data set, Savannah_sparrow_1980, is derived from a point transect survey of a study area with three strata. We will fit a model with stratum as a covariate and send the result to our function to assess whether there are differences between the three strata.
+data("Savannah_sparrow_1980")
+hn.sparrow <- ds(Savannah_sparrow_1980, transect="point", key="hn", truncation="10%",
+ convert_units=convert_units("meter", NULL, "hectare"), formula=~Region.Label)
+resultsparrow <- differences.bootstrap(hn.sparrow,
+ flatfile=Savannah_sparrow_1980,
+ nboot=250)
+Figure 3: Two strata with Region.Label as a covariate in detection function. +
+Note here, when there are three strata, there are three pairwise comparisons. The function can cope with any number of strata, but recognise the number of comparisons (hence number of histograms) grows rapidly when the number of strata exceeds roughly 5.
+This function cannot compute significance of density estimate differences when estimation is carried out via multiple calls to ds(), as would be the case when analysing data from different study areas residing in different data files. However, based upon the provided code, it should be clear how to produce replicate density estimates via bootdht() and then difference them with a single line of code. Depending upon circumstances, it might also be possible to combine the two data sets into a single data file and treat them as strata which could allow use of the provided function.
vignettes/web-only/groupsize/Remedy-size-bias-for-dolphin-surveys.Rmd
+ Remedy-size-bias-for-dolphin-surveys.RmdIn this example we have a sample of sightings data from eastern tropical Pacific (ETP) offshore spotted dolphin, collected by observers board tuna vessels (the data were made available by the Inter-American Tropical Tuna Commission - IATTC). More details about surveys of dolphins in the ETP can be found in T. Gerrodette & Forcada (2005) and Tim Gerrodette (2008). In the ETP, schools of yellow fin tuna commonly associate with schools of certain species of dolphins, and so vessels fishing for tuna often search for dolphins in the hopes of also locating tuna. For each school detected by the tuna vessels, the observer records the species, sighting angle and distance (later converted to perpendicular distance and truncated at 5 nautical miles), school size, and a number of covariates associated with each detected school.
+A variety of search methods were used to find the dolphins from these tuna vessels. The coding in the data set is shown below.
+| +Method + | ++code + | +
|---|---|
| +Crows nest + | ++0 + | +
| +Bridge + | ++2 + | +
| +Helicopter + | ++3 + | +
| +Radar + | ++5 + | +
Some of these methods may have a wider range of search than the others, and so it is possible that the detection function varies according to the method being used.
+For each sighting the initial cue type is recorded. This may be birds flying above the school, splashes on the water, floating objects such as logs, or some other unspecified cue.
+| +Cue + | ++code + | +
|---|---|
| +Birds + | ++1 + | +
| +Splashes + | ++2 + | +
| +Unspecified cue + | ++3 + | +
| +Floating objects + | ++4 + | +
Another covariate that potentially affects the detection function is sea state. Beaufort levels are grouped into two categories, the first including Beaufort values ranging from 0 to 2 (coded as 1) and the second containing values from 3 to 5 (coded as 2).
+The sample data encompasses sightings made over a three month summer period.
+| +Month + | ++code + | +
|---|---|
| +June + | ++6 + | +
| +July + | ++7 + | +
| +August + | ++8 + | +
As described, there are a number of potential covariates that might influence dolphin detectability. Rather than throw all covariates into detection function models, examine the distribution of detection distances (y-axis of figure below) as a function of the plausible factor covariates.
+
+Figure 1: Exploratory data analysis using violin plots. Prepared using the vioplot package. Number of detections show above plots.
+
From Fig. 1 there are several decisions to be made concerning the remaining analysis:
+Size bias (Buckland et al., 2001) can be examined by plotting distribution of group size as a function of detection distances.
+
+Figure 2: Box plot of observed group sizes by perpendicular distance band. Outliers are not shown; notches indicate discernable difference in mean group size at 2nm. +
+Fig. 2 indicates a difference in observed mean group size at 2nm; with average group size being distinctly larger at distances greater than 2nm. Hence, average group size in the sample is an overestimate of the average group size in the population. Our modelling of the detection function will need to counteract this bias by including group size in the detection function.
+Before creating a host of candidate models, we should address with the question of the appropriate key function for these data. Recall we are not including sightings made from the helicopter platform in our analyses.
+Fitting models with half normal key function without adjustments and with and without Search.method
+hn <- ds(nochopper, key="hn", adjustment = NULL)
+hn.method <- ds(nochopper, key="hn", formula = ~factor(Search.method))
+par(mfrow=c(1,2))
+gof_ds(hn, main="HN key, no adj", cex=0.5)##
+## Goodness of fit results for ddf object
+##
+## Distance sampling Cramer-von Mises test (unweighted)
+## Test statistic = 0.656421 p-value = 0.0162635
+
+gof_ds(hn.method, main="HN key + method", cex=0.5)
+Figure 3: Q-Q goodness of fit plots for half normal key function without adjustments also including search method as a covariate. +
+##
+## Goodness of fit results for ddf object
+##
+## Distance sampling Cramer-von Mises test (unweighted)
+## Test statistic = 0.672219 p-value = 0.0148816
+
+indicates a lack of fit of the half normal key function models. After some rounding to the trackline, the detection function maintains a shoulder before falling away quite rapidly. Even taking into consideration the idea that the sample size is very large (n=961), making the goodness of fit test quite powerful, there is some doubt that the half normal key function is appropriate for these data. We will remove the half normal from further modelling, as the hazard rate will serve our purposes, as the hazard rate without adjustments or covariates, adequately fit the data.
+
+hr <- ds(nochopper, key="hr")## Starting AIC adjustment term selection.
+## Fitting hazard-rate key function
+## AIC= 2920.797
+## Fitting hazard-rate key function with cosine(2) adjustments
+## Warning in check.mono(result, n.pts = control$mono.points): Detection function
+## is greater than 1 at some distances
+## Warning in check.mono(result, n.pts = control$mono.points): Detection function
+## is greater than 1 at some distances
+## AIC= 2922.8
+##
+## Hazard-rate key function selected.
+
+gof_ds(hr, plot=FALSE)##
+## Goodness of fit results for ddf object
+##
+## Distance sampling Cramer-von Mises test (unweighted)
+## Test statistic = 0.130299 p-value = 0.455606
+Conducting our modeling using the hazard rate key function, we turn our attention to incorporating group size into the detection function. The way to counteract the effect of size bias is to include group size in the detection function.
+
+hr.size <- ds(nochopper, key="hr", formula = ~size)## Model contains covariate term(s): no adjustment terms will be included.
+## Fitting hazard-rate key function
+## AIC= 2919.357
+It is a disappointment to learn that a model including group size as a covariate fails to converge. There are numerical difficulties associated with a covariate that spans three orders of magnitude. For more about fitting issues with covariates, consult the covariate example with amakihi.
+The distribution of group sizes is strongly skewed to the right, with a very long right tail. A transformation by natural logs will both reduce the range of log(size) to one order of magnitude and shift the centre of the distribution of the covariate (Fig. 4).

+Figure 4: Effect of log transformation upon distribution of observed group sizes. +
+The convergence problems associated with using size as a covariate in the detection function are alleviated as a result of the transformation.
## Model contains covariate term(s): no adjustment terms will be included.
+## Fitting hazard-rate key function
+## AIC= 2904.307
+Having successfully incorporated group size into the detection function, we proceed to examine the consequence of using Search.method as a covariate and a model incorporating both covariates.
+hr.method <- ds(nochopper, key="hr", formula = ~factor(Search.method))
+hr.clus.method <- ds(nochopper, key="hr", formula = ~log(size) + factor(Search.method))| +Model + | ++Key function + | ++Formula + | ++C-vM p-value + | ++\(\hat{P_a}\) + | ++se(\(\hat{P_a}\)) + | ++\(\Delta\)AIC + | +
|---|---|---|---|---|---|---|
| + + | ++Hazard-rate + | ++~log(size) + | ++0.465 + | ++0.551 + | ++0.035 + | ++0.000 + | +
| + + | ++Hazard-rate + | ++~log(size) + factor(Search.method) + | ++0.458 + | ++0.547 + | ++0.036 + | ++2.604 + | +
| + + | ++Hazard-rate + | ++~1 + | ++0.456 + | ++0.564 + | ++0.036 + | ++16.490 + | +
| + + | ++Hazard-rate + | ++~factor(Search.method) + | ++0.463 + | ++0.553 + | ++0.037 + | ++18.232 + | +
All of the fitted models using the hazard rate as the key function fit the data. In addition, note the estimates of \(\widehat{P_a}\) for all four models. Inclusion of covariates has a negligible effect upon estimated detection probability. Despite a \(\Delta\)AIC value > 15, the model without covariates produces a virtually identical estimate of detection probability. This is another example of the remarkable property of pooling robustness of distance sampling estimators (Rexstad, Buckland, Marshall, & Borchers, 2023).
+We discuss estimates of group and individual density from this data set. However, this data set does not accurately reflect survey effort. The Effort column is filled with 1 and there is only a single transect labelled in the data. Hence, the density estimates do not reflect biological reality; nevertheless the comparisons between models are legitimate. Variability between transects is also not properly incorporated into this analysis, so I won’t present measures of precision associated with any of the following point estimates.
This slight variation in \(\widehat{P_a}\) among the hazard rate candidate models is reflected in the equally similar estimates of dolphin pod density among the competing models. The model with the largest \(\widehat{P_a}\) produces the lowest estimate of \(\widehat{D_s}\) (170.5); while the model with the smallest \(\widehat{P_a}\) produces the largest estimate of \(\widehat{D_s}\) (175.8).
+However, the most important consideration in analysis of this data set is proper treatment of size bias. The hazard rate models without group size in the detection function, estimate average group size in the population to be 515 whereas the model incorporating group size in the detection function estimates average group size in the population to be 408. Based on the evidence presented in Fig. 2, there is reason to believe that estimates of average group size without incorporating group size in the detection function results in a positively biased estimate of group size in the population. From the group size estimates under the two models, it appears the magnitude of that positive size bias in this data set is 26.2.
+This difference in estimated average group size is magnified in the estimates of individual density \(\widehat{D_I}\). The model without covariates estimates \(\widehat{D_I}\) = 87805 while the model with group size as a covariate estimates \(\widehat{D_I}\) to be 71150.
+Take home points:
+vignettes/web-only/multipliers/multipliers-distill.Rmd
+ multipliers-distill.RmdWe consider indirect methods to estimate abundance and hence include multipliers in the abundance calculations. The first problem uses data from a dung survey of deer and there are two levels of multipliers that need to be incorporated in the analysis (dung production rate and dung decay rate).
+The objectives of this exercise are to
+dht2 function to obtain animal abundances.The question is how to estimate of the density of sika deer in a number of woodlands in the Scottish Borders (Marques et al., 2001). These animals are shy and will be aware of the presence of an observer before the observer detects them, making surveys of this species challenging. As a consequence, indirect estimation methods have been applied to this problem. In this manner, an estimate of density is produced for some sign generated by deer (in this case, faecal or dung pellets) and this estimate is transformed to density of deer (\(D_{\textrm{deer}}\)) by
+\[ \hat D_{\textrm{deer}} = \frac{\textrm{dung deposited daily}}{\textrm{dung production rate (per animal)}} \] +where the dung deposited daily is given by
+\[ \textrm{dung deposited daily} = \frac{\hat D_{\textrm{pellet groups}}}{\textrm{mean time to decay}} \] +Hence, we use distance sampling to produce a pellet group density estimate, then adjust it accordingly to account for the production and decay processes operating during the time the data were being acquired. We will also take uncertainty in the dung production and decay rates into account in our final estimate of deer density.
+Data from 9 woodlands (labelled A-H and J) were collected according to the survey design (Figure 1) but note that data from block D were not included in this exercise.
+![Location of sika deer survey in southern Scotland and the survey design (from [@Maretal01]). Note the differing amounts of effort in different woodlands based on information derived from pilot surveys.](Prac_9_Figure_1.png)
+Figure 1: Location of sika deer survey in southern Scotland and the survey design (from (Marques et al., 2001)). Note the differing amounts of effort in different woodlands based on information derived from pilot surveys. +
+In addition to these data, we also require estimates of the production rate. From a literature search, we learn that sika deer produce 25 pellet groups daily but this source did not provide a measure of variability of this estimate. During the course of our surveys we also followed the fate of some marked pellet groups to estimate the decay (disappearance) rates of a pellet group. A thorough discussion of methods useful for estimating decay rates and associated measures of precision can be found in Laing et al. (2003).
+There are many factors that might influence both production and decay rates, and for purposes of this exercise we will make the simplifying assumption that decay rate is homogeneous across these woodlands; with their mean time to decay of 163 days and a standard error of 13 days. (If you were to conduct a survey such as this, you would want to investigate this assumption more thoroughly.)
+These data (called sikadeer) are available in the Distance package. Detection of deer dung takes place at small spatial scales; perpendicular distances are measured in centimeters. But transects were long; measured in kilometers and deer densities are customarily reported in numbers kilometer-2.
+library(Distance)
+data(sikadeer)
+conversion.factor <- convert_units("centimeter", "kilometer", "square kilometer")Fit the usual series of models (i.e. half normal, hazard rate, uniform) models to the distances to pellet groups and decide on a detection function. This detection function (Figure 2) will be used to obtain \(\hat D_{\textrm{pellet groups}}\).
+
+deer.df <- ds(sikadeer, key="hn", truncation="10%", convert_units = conversion.factor)
+plot(deer.df, main="Half normal detection function")
+Figure 2: Simple detection function to deer pellet line transect data. +
+
+print(deer.df$dht$individuals$summary)## Region Area CoveredArea Effort n k ER se.ER cv.ER
+## 1 A 13.9 0.005950 1.70 1217 13 715.88234 119.918872 0.1675120
+## 2 B 10.3 0.003850 1.10 396 10 359.99999 86.859289 0.2412758
+## 3 C 8.6 0.001575 0.45 17 3 37.77778 8.521202 0.2255612
+## 4 E 8.0 0.002975 0.85 30 5 35.29412 16.568939 0.4694533
+## 5 F 14.0 0.000700 0.20 29 1 145.00000 0.000000 0.0000000
+## 6 G 15.2 0.001400 0.40 32 3 80.00000 39.686269 0.4960784
+## 7 H 11.3 0.000700 0.20 3 1 15.00000 0.000000 0.0000000
+## 8 J 9.6 0.000350 0.10 7 1 70.00000 0.000000 0.0000000
+## 9 Total 90.9 0.017500 5.00 1731 37 201.90876 0.000000 0.0000000
+Have a look at the Summary statistics for this model - note some woodlands have but a single transect of effort allocated.
The next step is to create an object which contains the multipliers we wish to use. We already have estimates of dung production rates but need similar information on dung decay (or persistence) rate. Analysis is based upon methods presented in Laing et al. (2003).
+Data to calculate dung persistence has been collected in the file dung_persistence.csv. Following code from (Meredith, 2017).
+
+MIKE.persistence <- function(DATA) {
+
+# Purpose: calculate mean persistence time (mean time to decay) for dung/nest data
+# Input: data frame with at least two columns:
+# DAYS - calendar day on which dung status was observed
+# STATE - dung status: 1-intact, 0-decayed
+# Output: point estimate, standard error and CV of mean persistence time
+#
+# Attribution: code from Mike Meredith website:
+# http://www.mikemeredith.net/blog/2017/Sign_persistence.htm
+# Citing: CITES elephant protocol
+# https://cites.org/sites/default/files/common/prog/mike/survey/dung_standards.pdf
+
+ ## Fit logistic regression model to STATE on DAYS, extract coefficients
+ dung.glm <- glm(STATE ~ DAYS, data=DATA, family=binomial(link = "logit"))
+ betas <- coefficients(dung.glm)
+ ## Calculate mean persistence time
+ mean.decay <- -(1+exp(-betas[1])) * log(1+exp(betas[1])) / betas[2]
+ ## Calculate the variance of the estimate
+ vcovar <- vcov(dung.glm)
+ var0 <- vcovar[1,1] # variance of beta0
+ var1 <- vcovar[2,2] # variance of beta1
+ covar <- vcovar[2,1] # covariance
+ deriv0 <- -(1-exp(-betas[1]) * log(1+exp(betas[1])))/betas[2]
+ deriv1 <- -mean.decay/betas[2]
+ var.mean <- var0*deriv0^2 + 2*covar*deriv0*deriv1 + var1*deriv1^2
+ ## Calculate the SE and CV and return
+ se.mean <- sqrt(var.mean)
+ cv.mean <- se.mean/mean.decay
+ out <- c(mean.decay, se.mean, 100*cv.mean)
+ names(out) <- c("Mean persistence time", "SE", "%CV")
+ plot(decay$DAYS, jitter(decay$STATE, amount=0.10), xlab="Days since initiation",
+ ylab="Dung persists (yes=1)",
+ main="Eight dung piles revisited over time")
+ curve(predict(dung.glm, data.frame(DAYS=x), type="resp"), add=TRUE)
+ abline(v=mean.decay, lwd=2, lty=3)
+ return(out)
+}
+decay <- read.csv("dung_persistence.csv")
+persistence.time <- MIKE.persistence(decay)
+Figure 3: Logistic curve fitted to pellet persistence survey data. Vertical line represents day at which 50% of pellets have decayed to non-detectable. +
+
+print(persistence.time)## Mean persistence time SE %CV
+## 163.396748 14.226998 8.707026
+Running the above command should have produced a plot of dung persistence versus days since produced and fitted a logistic regression (this is like a simple linear regression but restricts the response to taking values between 0 and 1). Note the points can in reality only take values between 0 and 1 but for the purposes of plotting have been ‘jittered’ to avoid over-plotting.
+An estimate of mean persistence time and measure of variability are also provided - make a note of these as they will be required below. Dotted vertical line indicates the time at which the estimated probability of persistence is 0.5.
+As stated above, we want an object which contains information on the dung production rate (and standard error) and dung decay rate (and standard error). The following command creates a list containing two data frames:
+creation contains estimates of the dung production rate and associated standard errordecay contains the dung decay rate and associated standard error where XX and YY are the estimates obtained from the dung decay rate analysis.
+# Create list of multipliers
+mult <- list(creation = data.frame(rate=25, SE=0),
+ decay = data.frame(rate=163, SE=14.2))
+print(mult)## $creation
+## rate SE
+## 1 25 0
+##
+## $decay
+## rate SE
+## 1 163 14.2
+The final step is to use these multipliers to convert \(\hat D_{\textrm{pellet groups}}\) to \(\hat D_{\textrm{deer}}\) (as in the equations above) - for this we need to employ the dht2 function. In the command below the multipliers= argument allows us to specify the rates and standard errors. There are a couple of other function arguments that need some explanation:
strat_formula=~Region.Label is specified to take into account the design (i.e. different woodlands or blocks).stratification="geographical" is specified because we want to produce an overall estimate density that is the mean of the woodland specific densities weighted by area of each block.deer.df is the detection function you have fitted.
+deer.ests <- dht2(deer.df, flatfile=sikadeer, strat_formula=~Region.Label,
+ convert_units=conversion.factor, multipliers=mult,
+ stratification="geographical")## Warning in dht2(deer.df, flatfile = sikadeer, strat_formula = ~Region.Label, :
+## One or more strata have only one transect, cannot calculate empirical encounter
+## rate variance
+
+print(deer.ests, report="density")## Density estimates from distance sampling
+## Stratification : geographical
+## Variance : R2, n/L
+## Multipliers : creation, decay
+## Sample fraction : 1
+##
+##
+## Summary statistics:
+## Region.Label Area CoveredArea Effort n k ER se.ER cv.ER
+## A 13.9 0.005950 1.70 1217 13 715.882 119.919 0.168
+## B 10.3 0.003850 1.10 396 10 360.000 86.859 0.241
+## C 8.6 0.001575 0.45 17 3 37.778 8.521 0.226
+## E 8.0 0.002975 0.85 30 5 35.294 16.569 0.469
+## F 14.0 0.000700 0.20 29 1 145.000 0.000 0.000
+## G 15.2 0.001400 0.40 32 3 80.000 39.686 0.496
+## H 11.3 0.000700 0.20 3 1 15.000 0.000 0.000
+## J 9.6 0.000350 0.10 7 1 70.000 0.000 0.000
+## Total 90.9 0.017500 5.00 1731 37 346.200 68.158 0.197
+##
+## Density estimates:
+## Region.Label Estimate se cv LCI UCI df
+## A 73.9167 14.248 0.193 49.6889 109.9576 21.037
+## B 37.1709 9.643 0.259 21.3191 64.8093 12.031
+## C 3.9007 0.955 0.245 1.7460 8.7142 2.779
+## E 3.6442 1.746 0.479 1.0713 12.3958 4.337
+## F 14.9716 1.428 0.095 12.4246 18.0407 63231.773
+## G 8.2602 4.173 0.505 1.2114 56.3218 2.151
+## H 1.5488 0.148 0.095 1.2853 1.8663 63231.773
+## J 7.2277 0.689 0.095 5.9981 8.7093 63231.773
+## Total 20.8476 3.011 0.144 15.5123 28.0180 25.610
+##
+## Component percentages of variance:
+## Region.Label Detection ER Multipliers
+## A 4.05 75.53 20.43
+## B 2.23 86.49 11.28
+## C 2.51 84.84 12.65
+## E 0.66 96.04 3.31
+## F 16.54 0.00 83.46
+## G 0.59 96.44 2.97
+## H 16.54 0.00 83.46
+## J 16.54 0.00 83.46
+## Total 3.73 96.27 0.00
+stratification choices with dht2
+This example of Sika deer on different hunting estates uses geographical stratification. There is also the option of using the option replicate for the stratification argument. This is useful when there are repeated surveys in a geographic area; the average abundance is computed and variance is variability between surveys. Alternatively effort_sum is used with replicate surveys, but few replicates reporting average variance. Finally, the specification of stratification="object" can be used when detections are made of different species, sexes or ages of animals. This option will produce species-specific abundance estimates as well as abundance estimate over all species, properly calculating variance of total abundance. More information is available in this diagramatic comparison as well as in the help file for ?dht2.
The function dht2 also provides information on the components of variance. Make a note of the these (contribution of detection function, encounter rate, decay rate and what happened to production rate component?) in each strata.
In woodland A, there were 13 transects on which over 1,200 pellet groups were detected: uncertainty in the estimated density (measured by CV) was 19% and the variance components were apportioned as detection probability 4%, encounter rate 76% and multipliers 20%.
+In woodland E, there were 5 transects and 30 pellet groups resulting in a coefficient of variation (CV) of 48%: the variance components were apportioned as detection probability 0.7%, encounter rate 96% and multipliers 3%.
+The CV of the abundance estimates for blocks F, H and J are identical (9%) because a pooled detection function was used across all blocks and the dung deposition and decay rates were not block-specific. The only element of the computation remaining that is block-specific is the encounter rate; and for these three blocks there was but a single transect per block, meaning the encounter rate variance could not be computed and was set to zero.
+The estimated abundance across all blocks had a CV of 14%. But far and away, the greatest contribution to this uncertainty was encounter rate variance–differences in pellet encounters between transects. In the context of distance sampling, the uncertainty in the parameter estimates of the detection function accounts for <1% in the total estimate of deer abundance across the blocks.
+vignettes/web-only/multispecies/multispecies-multioccasion-analysis.Rmd
+ multispecies-multioccasion-analysis.RmdIt is increasingly common for investigators to conduct surveys in which multiple species are detected and density estimates for several species are of interest. There are many ways of analysing such data sets, but care must be taken. Not all approaches will produce correct density estimates. To demonstrate one of the ways to produce incorrect estimates, we will use the line transect survey data reported in Buckland (2006). This survey (and data file) recorded detections of four species of songbirds. We conduct an analysis of chaffinch (Fringilla coelebs) (coded c in the data file), but similar results would arise with the other species.
Begin by reading the flat file in a comma delimited format. Note the URL for the data file is very long, double check that you can read the URL including the Github token.
+ +Buckland’s design consisted of visiting each of the 19 transects in his study twice. To examine some of the errors that can arise from improper analysis, I choose to treat the two visits as strata for the express purpose of generating stratum (visit) -specific density estimates. Density estimates reported in Buckland (2006) are in units of birds \(\cdot hectare^{-1}\).
+birds$Region.Label <- birds$visit
+cu <- convert_units("meter", "kilometer", "hectare")The direct approach to producing a density estimate for the chaffinch would be to subset the original data frame and use the species-specific data frame for analysis. Begin by performing the subset operation.
+
+chaf <- birds[birds$species=="c", ]When the data are subset, the integrity of the survey design is not preserved. A simple frequency table of the species-specific data frame flags up a number of transect/visit combinations where no chaffinches were detected. The result is that the subset data frame suggests 3 of the 19 transects lacked chaffinch detections on the first visit and one of the 19 transects lacked chaffinch detections on the second visit. This revelation, in itself, causes no problems for our estimate of density of chaffinches.
+
+detects <- table(chaf$Sample.Label, chaf$visit)
+detects <- as.data.frame(detects)
+names(detects) <- c("Transect", "Visit", "Detections")
+detects$Detections <- cell_spec(detects$Detections,
+ background = ifelse(detects$Detections==0, "red", "white"))
+knitr::kable(detects, escape=FALSE) %>%
+ kable_paper(full_width=FALSE)| +Transect + | ++Visit + | ++Detections + | +
|---|---|---|
| +1 + | ++1 + | ++3 + | +
| +2 + | ++1 + | ++3 + | +
| +3 + | ++1 + | ++4 + | +
| +4 + | ++1 + | ++3 + | +
| +5 + | ++1 + | ++5 + | +
| +6 + | ++1 + | ++4 + | +
| +7 + | ++1 + | ++2 + | +
| +8 + | ++1 + | ++0 + | +
| +9 + | ++1 + | ++1 + | +
| +10 + | ++1 + | ++1 + | +
| +11 + | ++1 + | ++0 + | +
| +13 + | ++1 + | ++1 + | +
| +14 + | ++1 + | ++1 + | +
| +15 + | ++1 + | ++3 + | +
| +16 + | ++1 + | ++2 + | +
| +17 + | ++1 + | ++3 + | +
| +18 + | ++1 + | ++3 + | +
| +19 + | ++1 + | ++0 + | +
| +1 + | ++2 + | ++1 + | +
| +2 + | ++2 + | ++4 + | +
| +3 + | ++2 + | ++3 + | +
| +4 + | ++2 + | ++2 + | +
| +5 + | ++2 + | ++4 + | +
| +6 + | ++2 + | ++3 + | +
| +7 + | ++2 + | ++3 + | +
| +8 + | ++2 + | ++1 + | +
| +9 + | ++2 + | ++0 + | +
| +10 + | ++2 + | ++2 + | +
| +11 + | ++2 + | ++1 + | +
| +13 + | ++2 + | ++1 + | +
| +14 + | ++2 + | ++1 + | +
| +15 + | ++2 + | ++1 + | +
| +16 + | ++2 + | ++1 + | +
| +17 + | ++2 + | ++1 + | +
| +18 + | ++2 + | ++4 + | +
| +19 + | ++2 + | ++1 + | +
However, there is a problem hidden within the table above. Transect 12 does not appear in the table because there were no detections of chaffinches on either visit. Consequently, there were 4 transects without chaffinches on the first visit and 2 transects without chaffinches on the second visit, rather than the 3 transects and 1 transect you might mistakenly conclude do not have chaffinch detections if you relied completely upon the table.
+Let’s see what the ds() function thinks about the survey effort using information from the species-specific data frame.
+chaf.wrong <- ds(chaf, key="hn", convert_units = cu, truncation=95, formula = ~Region.Label)
+knitr::kable(chaf.wrong$dht$individuals$summary) %>%
+ kable_paper(full_width=FALSE) %>%
+ column_spec(6, background="salmon") %>%
+ column_spec(7, background="steelblue")| +Region + | ++Area + | ++CoveredArea + | ++Effort + | ++n + | ++k + | ++ER + | ++se.ER + | ++cv.ER + | +
|---|---|---|---|---|---|---|---|---|
| +1 + | ++33.2 + | ++82.061 + | ++4.319 + | ++39 + | ++15 + | ++9.029868 + | ++1.1159303 + | ++0.1235821 + | +
| +2 + | ++33.2 + | ++83.562 + | ++4.398 + | ++34 + | ++17 + | ++7.730787 + | ++0.9798153 + | ++0.1267420 + | +
| +Total + | ++66.4 + | ++165.623 + | ++8.717 + | ++73 + | ++32 + | ++8.380327 + | ++0.7425191 + | ++0.0886026 + | +
Examine the column labelled k (the number of transects) for each of the visits. Rather than the 19 transects that were surveyed on each visit, the ds() function erroneously believes there were only 15 transects surveyed on the first visit and 17 transects surveyed on the second visit.
Note also the number of detections per kilometer; roughly 9 on the first visit and 7.7 on the second visit. These encounter rates exclude kilometers of effort on transects where there were no detections. We will return to this comparison later.
+Additional arguments can be passed to ds() to resolve this problem. Consulting the ds() documentation
ds
+This analysis that produces erroneous results can be remedied by explicitly letting the ds() function know about the study design; specifically, how many strata and the number of transects within each stratum (and associated transect lengths).
Construct the region table and sample table showing the two strata with equal areas and each labelled transect (of given length) is repeated two times.
+birds.regiontable <- data.frame(Region.Label=as.factor(c(1,2)), Area=c(33.2,33.2))
+birds.sampletable <- data.frame(Region.Label=as.factor(rep(c(1,2), each=19)),
+ Sample.Label=rep(1:19, times=2),
+ Effort=c(0.208, 0.401, 0.401, 0.299, 0.350,
+ 0.401, 0.393, 0.405, 0.385, 0.204,
+ 0.039, 0.047, 0.204, 0.271, 0.236,
+ 0.189, 0.177, 0.200, 0.020))The chaffinch analysis is performed again, this time supplying the region_table and sample_table information to ds(). The correct number of transects (19) sampled on both visits (even though chaffinch was not detected on 4 transects on visit 1 and 2 transects on visit 2) is now recognised. Hence, the use of region table and sample table solves the problem of effort miscalculation if a species is not detected on all transects.
+tr <- 95 # as per Buckland (2006)
+onlycf <- ds(data=birds[birds$species=="c", ],
+ region_table = birds.regiontable,
+ sample_table = birds.sampletable,
+ trunc=tr, convert_units=cu, key="hn", formula = ~Region.Label)
+knitr::kable(onlycf$dht$individuals$summary) %>%
+ kable_paper(full_width=FALSE) %>%
+ column_spec(6, background="salmon") %>%
+ column_spec(7, background="steelblue")| +Region + | ++Area + | ++CoveredArea + | ++Effort + | ++n + | ++k + | ++ER + | ++se.ER + | ++cv.ER + | +
|---|---|---|---|---|---|---|---|---|
| +1 + | ++33.2 + | ++91.77 + | ++4.83 + | ++39 + | ++19 + | ++8.074534 + | ++1.2196305 + | ++0.1510465 + | +
| +2 + | ++33.2 + | ++91.77 + | ++4.83 + | ++34 + | ++19 + | ++7.039338 + | ++1.0612781 + | ++0.1507639 + | +
| +Total + | ++66.4 + | ++183.54 + | ++9.66 + | ++73 + | ++38 + | ++7.556936 + | ++0.8083641 + | ++0.1069698 + | +
To drive home the consequence of failing to properly specify the survey effort, contrast the encounter rate for the two visits from the incorrect calculations above (9.0 and 7.7 respectively), with the correct calculation (8.1 and 7.0 respectively). The number of transects is incorrect with the knock-on effect of effort being incorrect. If effort is incorrect then so too is covered area.
+The ripple effect from incomplete information about the survey design results in positively biased estimates of density.
vignettes/web-only/points/pointtransects-distill.Rmd
+ pointtransects-distill.RmdIn this exercise, we use R (R Core Team, 2019) and the Distance package (Miller, Rexstad, Thomas, Marshall, & Laake, 2019) to fit different detection function models to point transect survey data of savanna sparrows (Passerculus sandwichensis) density and abundance. These data were part of a study examining the effect of livestock grazing upon vegetation structure and consequently upon the avian community described by Knopf et al. (1988).
Steps in this analysis are similar to the steps taken in the line transect analysis of winter wren data.
+ds functionA total of 373 point transects were placed in three pastures in the Arapaho National Wildlife Refuge in Colorado (Figure 1). Elevation of these pastures was ~2500m. We will not deal with pasture-level analysis of these data in this vignette and will alter the data to remove the strata designations.
++Figure 1: Summer grazed pastures along Illinois River Arapaho National Wildlife Refuge, Colorado. Figure from (Knopf et al., 1988). +
+The fields of the Savannah_sparrow_1980 data set are:
This command assumes that the dsdata package has been installed on your computer. The R workspace Savannah_sparrow_1980 contains detections of savanna sparrows from point transect surveys of Knopf et al. (1988).
+library(Distance)
+data(Savannah_sparrow_1980)
+# remove pasture-level identifier in Region.Label
+Savannah_sparrow_1980$Region.Label <- "Single_stratum"The code above overwrites the strata designations in the original data to make it appear that all data were derived from a single stratum. This makes the analysis simpler to perform. There are examples of analysis of stratified data in another vignette.
+Examine the first few rows of Savannah_sparrow_1980 using the function head()
+head(Savannah_sparrow_1980)## Region.Label Area Sample.Label Effort object distance Study.Area
+## 1 Single_stratum 1 POINT 1 1 NA NA SASP 1980
+## 2 Single_stratum 1 POINT 2 1 NA NA SASP 1980
+## 3 Single_stratum 1 POINT 3 1 NA NA SASP 1980
+## 4 Single_stratum 1 POINT 4 1 NA NA SASP 1980
+## 5 Single_stratum 1 POINT 5 1 NA NA SASP 1980
+## 6 Single_stratum 1 POINT 6 1 NA NA SASP 1980
+The object Savannah_sparrow_1980 is a dataframe object made up of rows and columns. In contrast to the Montrave winter wren line transect data used in the previous vignette, savannah sparrows were not detected at all point transects. Radial distances receive the value NA for transects where there were no detections. To determine the number of detections in this data set, we total the number of values in the distance field that are not NA
## [1] 276
+Gain familiarity with the radial distance data using the hist() function (Figure 2).
+hist(Savannah_sparrow_1980$distance, xlab="Distance (m)",
+ main="Savannah sparrow point transects")
+Figure 2: Histogram of radial distances of savannah sparrows across all pastures. +
+Note the shape of the radial distance histogram does not resemble the shape of perpendicular distances gathered from line transect sampling (Buckland, Rexstad, Marques, & Oedekoven, 2015, sec. 1.3).
+With point transects, there are only units of measure associated with the size of the study area and the radial distance measures, because effort is measured in number of visits, rather than distance.
+NULL for point transects)1, resulting in abundance and density to be equal.
+conversion.factor <- convert_units("meter", NULL, "hectare")ds
+Detection functions are fitted using the ds function and this function requires a data frame to have a column called distance. We have this in our nests data, therefore, we can simply supply the name of the data frame to the function along with additional arguments.
Details about the arguments for this function:
+key="hn"
+adjustment=NULL
+transect="point"
+convert_units=conversion.factor
+truncation="5%"
+As is customary, right truncation is employed to remove 5% of the observations most distant from the transects, as detections at these distances contain little information about the shape of the fitted probability density function near the point.
+
+sasp.hn <- ds(data=Savannah_sparrow_1980, key="hn", adjustment=NULL,
+ transect="point", convert_units=conversion.factor, truncation="5%")On calling the ds function, information is provided to the screen reminding the user what model has been fitted and the associated AIC value. More information is supplied by applying the summary() function to the object created by ds().
+summary(sasp.hn)##
+## Summary for distance analysis
+## Number of observations : 262
+## Distance range : 0 - 51.025
+##
+## Model : Half-normal key function
+## AIC : 2021.776
+## Optimisation: mrds (nlminb)
+##
+## Detection function parameters
+## Scale coefficient(s):
+## estimate se
+## (Intercept) 3.044624 0.04270318
+##
+## Estimate SE CV
+## Average p 0.321125 0.02296165 0.07150378
+## N in covered region 815.881752 71.61153776 0.08777196
+##
+## Summary statistics:
+## Region Area CoveredArea Effort n k ER se.ER
+## 1 Single_stratum 1 305.0877 373 262 373 0.7024129 0.04726421
+## cv.ER
+## 1 0.06728836
+##
+## Abundance:
+## Label Estimate se cv lcl ucl df
+## 1 Total 2.674253 0.2625745 0.09818612 2.206266 3.241509 598.5905
+##
+## Density:
+## Label Estimate se cv lcl ucl df
+## 1 Total 2.674253 0.2625745 0.09818612 2.206266 3.241509 598.5905
+Visually inspect the fitted detection function with the plot() function, specifying the cutpoints histogram with argument breaks. Add the argument pdf so the plot shows the probability densiy function rather than the detection function. The probability density function is preferred for assessing model fit because the PDF incorporates information about the availability of animals to be detected. There are few animals available to be detected at small distances, therefore lack of fit at small distances is not as consequential for points as it is for lines (Figure 3).
+cutpoints <- c(0,5,10,15,20,30,40,max(Savannah_sparrow_1980$distance, na.rm=TRUE))
+plot(sasp.hn, breaks=cutpoints, pdf=TRUE, main="Savannah sparrow point transect data.")
+Figure 3: Fit of half normal detection function to savannah sparrow data. +
+Detection function forms and shapes, are specified by changing the key and adjustment arguments.
The options available for key and adjustment elements detection functions are:
key="hn") - defaultkey="hr")key="unif")adjustment=NULL)adjustment="cos") - defaultadjustment="herm")adjustment="poly")To fit a uniform key function with cosine adjustment terms, use the command:
+
+sasp.unif.cos <- ds(Savannah_sparrow_1980, key="unif", adjustment="cos",
+ transect="point", convert_units=conversion.factor, truncation="5%")To fit a hazard rate key function with simple polynomial adjustment terms, then use the command:
+
+sasp.hr.poly <- ds(Savannah_sparrow_1980, key="hr", adjustment="poly",
+ transect="point", convert_units=conversion.factor, truncation="5%")## Warning in ddf.ds(dsmodel = dsmodel, data = data, meta.data = meta.data, :
+## Estimated hazard-rate scale parameter close to 0 (on log scale). Possible
+## problem in data (e.g., spike near zero distance).
+## Warning in ddf.ds(dsmodel = dsmodel, data = data, meta.data = meta.data, :
+## Estimated hazard-rate scale parameter close to 0 (on log scale). Possible
+## problem in data (e.g., spike near zero distance).
+Each fitted detection function produces a different estimate of Savannah sparrow abundance and density. The estimate depends upon the model chosen. The model selection tool for distance sampling data is AIC.
+
+AIC(sasp.hn, sasp.hr.poly, sasp.unif.cos)## df AIC
+## sasp.hn 1 2021.776
+## sasp.hr.poly 3 2024.785
+## sasp.unif.cos 1 2023.178
+In addition to the relative ranking of models provided by AIC, it is also important to know whether selected model(s) actually fit the data. The model is the basis of inference, so it is dangerous to make inference from a model that does not fit the data. Goodness of fit is assessed using the function gof_ds (Figure 4).
+gof_ds(sasp.hn)
+Figure 4: Q-Q plot of half normal detection function to savannah sparrow data. +
+##
+## Goodness of fit results for ddf object
+##
+## Distance sampling Cramer-von Mises test (unweighted)
+## Test statistic = 0.0835959 p-value = 0.671325
+The function summarise_ds_models combines the work of AIC and gof_ds to produce a table of fitted models and summary statistics.
+knitr::kable(summarize_ds_models(sasp.hn, sasp.hr.poly, sasp.unif.cos),digits=3,
+ caption="Model selection summary of savannah sparrow point transect data.")| + | Model | +Key function | +Formula | +C-vM p-value | +\(\hat{P_a}\) | +se(\(\hat{P_a}\)) | ++\(\Delta\)AIC | +
|---|---|---|---|---|---|---|---|
| 1 | ++ | Half-normal | +~1 | +0.671 | +0.321 | +0.023 | +0.000 | +
| 3 | ++ | Uniform with cosine adjustment term of order 1 | +NA | +0.364 | +0.350 | +0.015 | +1.402 | +
| 2 | ++ | Hazard-rate with simple polynomial adjustment term of order 4 | +~1 | +0.904 | +0.295 | +0.053 | +3.009 | +
Key differences between analysis of line transect data and point transect data
+transect in ds() must be set to "point",pdf=TRUE argument added to plot() function,vignettes/web-only/pointtransects-distill.Rmd
+ pointtransects-distill.RmdIn this exercise, we use R (R Core Team, 2019) and the Distance package (Miller, Rexstad, Thomas, Marshall, & Laake, 2019) to fit different detection function models to point transect survey data of savanna sparrows (Passerculus sandwichensis) density and abundance. These data were part of a study examining the effect of livestock grazing upon vegetation structure and consequently upon the avian community described by Knopf et al. (1988).
Steps in this analysis are similar to the steps taken in the line transect analysis of winter wren data.
+ds functionA total of 373 point transects were placed in three pastures in the Arapaho National Wildlife Refuge in Colorado (Figure 1). Elevation of these pastures was ~2500m. We will not deal with pasture-level analysis of these data in this vignette and will alter the data to remove the strata designations.
+![Summer grazed pastures along Illinois River Arapaho National Wildlife Refuge, Colorado. Figure from [@knopf_guild_1988].](arapaho.jpg)
+Figure 1: Summer grazed pastures along Illinois River Arapaho National Wildlife Refuge, Colorado. Figure from (Knopf et al., 1988). +
+The fields of the Savannah_sparrow_1980 data set are:
This command assumes that the dsdata package has been installed on your computer. The R workspace Savannah_sparrow_1980 contains detections of savanna sparrows from point transect surveys of Knopf et al. (1988).
+library(Distance)
+data(Savannah_sparrow_1980)
+# remove pasture-level identifier in Region.Label
+Savannah_sparrow_1980$Region.Label <- "Single_stratum"The code above overwrites the strata designations in the original data to make it appear that all data were derived from a single stratum. This makes the analysis simpler to perform. There are examples of analysis of stratified data in another vignette.
+Examine the first few rows of Savannah_sparrow_1980 using the function head()
+head(Savannah_sparrow_1980)## Region.Label Area Sample.Label Effort object distance Study.Area
+## 1 Single_stratum 1 POINT 1 1 NA NA SASP 1980
+## 2 Single_stratum 1 POINT 2 1 NA NA SASP 1980
+## 3 Single_stratum 1 POINT 3 1 NA NA SASP 1980
+## 4 Single_stratum 1 POINT 4 1 NA NA SASP 1980
+## 5 Single_stratum 1 POINT 5 1 NA NA SASP 1980
+## 6 Single_stratum 1 POINT 6 1 NA NA SASP 1980
+The object Savannah_sparrow_1980 is a dataframe object made up of rows and columns. In contrast to the Montrave winter wren line transect data used in the previous vignette, savannah sparrows were not detected at all point transects. Radial distances receive the value NA for transects where there were no detections. To determine the number of detections in this data set, we total the number of values in the distance field that are not NA
## [1] 276
+Gain familiarity with the radial distance data using the hist() function (Figure 2).
+hist(Savannah_sparrow_1980$distance, xlab="Distance (m)",
+ main="Savannah sparrow point transects")
+Figure 2: Histogram of radial distances of savannah sparrows across all pastures. +
+Note the shape of the radial distance histogram does not resemble the shape of perpendicular distances gathered from line transect sampling (Buckland, Rexstad, Marques, & Oedekoven, 2015, sec. 1.3).
+With point transects, there are only units of measure associated with the size of the study area and the radial distance measures, because effort is measured in number of visits, rather than distance.
+NULL for point transects)1, resulting in abundance and density to be equal.
+conversion.factor <- convert_units("meter", NULL, "hectare")ds
+Detection functions are fitted using the ds function and this function requires a data frame to have a column called distance. We have this in our nests data, therefore, we can simply supply the name of the data frame to the function along with additional arguments.
Details about the arguments for this function:
+key="hn"
+adjustment=NULL
+transect="point"
+convert_units=conversion.factor
+truncation="5%"
+As is customary, right truncation is employed to remove 5% of the observations most distant from the transects, as detections at these distances contain little information about the shape of the fitted probability density function near the point.
+
+sasp.hn <- ds(data=Savannah_sparrow_1980, key="hn", adjustment=NULL,
+ transect="point", convert_units=conversion.factor, truncation="5%")On calling the ds function, information is provided to the screen reminding the user what model has been fitted and the associated AIC value. More information is supplied by applying the summary() function to the object created by ds().
+summary(sasp.hn)##
+## Summary for distance analysis
+## Number of observations : 262
+## Distance range : 0 - 51.025
+##
+## Model : Half-normal key function
+## AIC : 2021.776
+## Optimisation: mrds (nlminb)
+##
+## Detection function parameters
+## Scale coefficient(s):
+## estimate se
+## (Intercept) 3.044624 0.04270318
+##
+## Estimate SE CV
+## Average p 0.321125 0.02296165 0.07150378
+## N in covered region 815.881752 71.61153776 0.08777196
+##
+## Summary statistics:
+## Region Area CoveredArea Effort n k ER se.ER
+## 1 Single_stratum 1 305.0877 373 262 373 0.7024129 0.04726421
+## cv.ER
+## 1 0.06728836
+##
+## Abundance:
+## Label Estimate se cv lcl ucl df
+## 1 Total 2.674253 0.2625745 0.09818612 2.206266 3.241509 598.5905
+##
+## Density:
+## Label Estimate se cv lcl ucl df
+## 1 Total 2.674253 0.2625745 0.09818612 2.206266 3.241509 598.5905
+Visually inspect the fitted detection function with the plot() function, specifying the cutpoints histogram with argument breaks. Add the argument pdf so the plot shows the probability densiy function rather than the detection function. The probability density function is preferred for assessing model fit because the PDF incorporates information about the availability of animals to be detected. There are few animals available to be detected at small distances, therefore lack of fit at small distances is not as consequential for points as it is for lines (Figure 3).
+cutpoints <- c(0,5,10,15,20,30,40,max(Savannah_sparrow_1980$distance, na.rm=TRUE))
+plot(sasp.hn, breaks=cutpoints, pdf=TRUE, main="Savannah sparrow point transect data.")
+Figure 3: Fit of half normal detection function to savannah sparrow data. +
+Detection function forms and shapes, are specified by changing the key and adjustment arguments.
The options available for key and adjustment elements detection functions are:
key="hn") - defaultkey="hr")key="unif")adjustment=NULL)adjustment="cos") - defaultadjustment="herm")adjustment="poly")To fit a uniform key function with cosine adjustment terms, use the command:
+
+sasp.unif.cos <- ds(Savannah_sparrow_1980, key="unif", adjustment="cos",
+ transect="point", convert_units=conversion.factor, truncation="5%")To fit a hazard rate key function with simple polynomial adjustment terms, then use the command:
+
+sasp.hr.poly <- ds(Savannah_sparrow_1980, key="hr", adjustment="poly",
+ transect="point", convert_units=conversion.factor, truncation="5%")Each fitted detection function produces a different estimate of Savannah sparrow abundance and density. The estimate depends upon the model chosen. The model selection tool for distance sampling data is AIC.
+
+AIC(sasp.hn, sasp.hr.poly, sasp.unif.cos)## df AIC
+## sasp.hn 1 2021.776
+## sasp.hr.poly 3 2024.785
+## sasp.unif.cos 1 2023.178
+In addition to the relative ranking of models provided by AIC, it is also important to know whether selected model(s) actually fit the data. The model is the basis of inference, so it is dangerous to make inference from a model that does not fit the data. Goodness of fit is assessed using the function gof_ds (Figure 4).
+gof_ds(sasp.hn)
+Figure 4: Q-Q plot of half normal detection function to savannah sparrow data. +
+##
+## Goodness of fit results for ddf object
+##
+## Distance sampling Cramer-von Mises test (unweighted)
+## Test statistic = 0.0835959 p-value = 0.671325
+The function summarise_ds_models combines the work of AIC and gof_ds to produce a table of fitted models and summary statistics.
+knitr::kable(summarize_ds_models(sasp.hn, sasp.hr.poly, sasp.unif.cos),digits=3,
+ caption="Model selection summary of savannah sparrow point transect data.")| + | Model | +Key function | +Formula | +C-vM p-value | +\(\hat{P_a}\) | +se(\(\hat{P_a}\)) | ++\(\Delta\)AIC | +
|---|---|---|---|---|---|---|---|
| 1 | ++ | Half-normal | +~1 | +0.671 | +0.321 | +0.023 | +0.000 | +
| 3 | ++ | Uniform with cosine adjustment term of order 1 | +NA | +0.364 | +0.350 | +0.015 | +1.402 | +
| 2 | ++ | Hazard-rate with simple polynomial adjustment term of order 4 | +~1 | +0.904 | +0.295 | +0.053 | +3.009 | +
Key differences between analysis of line transect data and point transect data
+transect in ds() must be set to "point",pdf=TRUE argument added to plot() function,vignettes/web-only/strata/strata-distill.Rmd
+ strata-distill.RmdIn this exercise, we use R (R Core Team, 2019) and the Distance package (Miller, Rexstad, Thomas, Marshall, & Laake, 2019) to fit different detection function models to point transect survey data of savanna sparrows (Passerculus sandwichensis) density and abundance. These data were part of a study examining the effect of livestock grazing upon vegetation structure and consequently upon the avian community described by Knopf et al. (1988). This dataset was also used to demonstrate point transect analysis
A total of 373 point transects were placed in three pastures in the Arapaho National Wildlife Refuge in Colorado (Figure 1). Elevation of these pastures was ~2500m. In this example, we will perform pasture-level analysis of these data.
++Figure 1: Summer grazed pastures along Illinois River Arapaho National Wildlife Refuge, Colorado. +Figure from (Knopf et al., 1988). +
+The fields of the Savannah_sparrow_1980 data set are:
This command assumes that the dsdata package has been installed on your computer. The R workspace Savannah_sparrow_1980 contains detections of savanna sparrows from point transect surveys of Knopf et al. (1988).
+library(Distance)
+data(Savannah_sparrow_1980)
+conversion.factor <- convert_units("meter", NULL, "hectare")The simplest way to fit pasture-specific detection functions is to subset the data. This could be done at the time the ds() function is called, but we perform the step here as a data preparation step.
Fit half-normal key functions without adjustments to each pasture separately after performing 5% right truncation.
+
+past1.hn <- ds(data=sasp.past1, key="hn", adjustment=NULL,
+ transect="point", convert_units=conversion.factor, truncation="5%")
+past2.hn <- ds(data=sasp.past2, key="hn", adjustment=NULL,
+ transect="point", convert_units=conversion.factor, truncation="5%")
+past3.hn <- ds(data=sasp.past3, key="hn", adjustment=NULL,
+ transect="point", convert_units=conversion.factor, truncation="5%")The total AIC for the model that fits separate detection functions to each pasture is the sum of the AICs for the individual pastures.
+ +This model is much simpler to fit because there is only a single call to ds() using the original data.
+cat(paste("Stratum-specific detection AIC", round(model.separate.AIC),
+ "\nCommon detection function AIC", round(model.pooled.AIC$AIC)), sep=" ")## Stratum-specific detection AIC 2007
+## Common detection function AIC 2022
+Because the AIC for model with stratum-specific detection functions (2007) is less than AIC for model with pooled detection function (2022), we base our inference upon the stratum-specific detection function model (depicted in Figure 2).
+
+cutpoints <- c(0,5,10,15,20,30,40,53)
+par(mfrow=c(1,3))
+plot(past1.hn, breaks=cutpoints, pdf=TRUE, main="Pasture 1")
+plot(past2.hn, breaks=cutpoints, pdf=TRUE, main="Pasture 2")
+plot(past3.hn, breaks=cutpoints, pdf=TRUE, main="Pasture 3")
+Figure 2: Pasture-specific detection functions based upon half-normal key function. +
+Always best to check the fit of the preferred model to the data.
+ +##
+## Goodness of fit results for ddf object
+##
+## Distance sampling Cramer-von Mises test (unweighted)
+## Test statistic = 0.0939637 p-value = 0.615284
+##
+## Goodness of fit results for ddf object
+##
+## Distance sampling Cramer-von Mises test (unweighted)
+## Test statistic = 0.0478577 p-value = 0.889162
+##
+## Goodness of fit results for ddf object
+##
+## Distance sampling Cramer-von Mises test (unweighted)
+## Test statistic = 0.0402974 p-value = 0.931609
+Further exploration of analyses involving stratification can be found in the example of dung survey analysis.
+Note there is a difference of 14 AIC units between the model using stratum-specific detection functions and the model using a pooled detection function, with the stratum-specific detection function model being preferrable. To be thorough, absolute goodness of fit for the three stratum-specific detection functions is checked, and all models fit the data adequately.
+This vignette focuses upon use of stratum-specific detection functions as a model selection exercise. Consequently, the vignette does not examine stratum-specific abundance or density estimates. That output is not included in this example analysis, but can easily be produced by continuing the analysis begun in this example.
+vignettes/web-only/variance/variance-distill.Rmd
+ variance-distill.RmdContinuing with the Montrave winter wren line transect data from the line transect vignette, we focus upon producing robust estimates of precision in our point estimates of abundance and density. The analysis in R (R Core Team, 2019) makes use of the Distance package (Miller, Rexstad, Thomas, Marshall, & Laake, 2019).
The R workspace wren_lt contains detections of winter wrens from the line transect surveys of S. T. Buckland (2006).
The function names() allows you to see the names of the columns of the data frame wren_lt. Definitions of those fields were provided in the line transect vignette.
The effort, or transect length has been adjusted to recognise each transect is walked twice.
+
+conversion.factor <- convert_units("meter", "kilometer", "hectare")Rather than refitting models used in the line transect vignette, we move directly to the model selected by S. T. Buckland (2006).
+
+wren.unif.cos <- ds(wren_lt, key="unif", adjustment="cos",
+ convert_units=conversion.factor)Based upon experience in the field, the uniform cosine model was used for inference.
+Looking at the density estimates from the uniform cosine model
+
+print(wren.unif.cos$dht$individuals$D)## Label Estimate se cv lcl ucl df
+## 1 Total 1.066101 0.2126892 0.1995019 0.7218009 1.574632 168.204
+The coefficient of variation (CV) is 0.2, and confidence interval bounds are (0.72 - 1.57) birds per hectare. The coefficient of variation is based upon a delta-method approximation of the uncertainty in both the parameters of the detection function and the variability in encounter rates between transects.
+\[[CV(\hat{D})]^2 = [CV(\frac{n}{L})]^2 + [CV(P_a)]^2\] +where
+These confidence interval bounds assume the sampling distribution of \(\hat{D}\) is log-normal (S. Buckland, Rexstad, Marques, & Oedekoven, 2015, sec. 6.2.1).
+Rather than relying upon the delta-method approximation that assumes independence between uncertainty in the detection function and variability in encounter rate, a bootstrap procedure can be employed. Resampling with replacement of the transects produces replicate samples with which a sampling distribution of \(\hat{D}\) is approximated. From that sampling distribution, the percentile method is used to produce confidence interval bounds respecting the shape of the sampling distribution (S. Buckland et al., 2015, sec. 6.3.1.2).
+The function bootdht_Nhat_summarize is included in the Distance package. It is used to extract information from the object created by bootdht. I will modify it slightly so as to extract the density estimates rather than the abundance estimates.
+bootdht_Dhat_summarize <- function(ests, fit) {
+ return(data.frame(D=ests$individuals$D$Estimate))
+}After the summary function is defined, the bootstrap procedure can be performed. Arguments here are the name of the fitted object, the object containing the data, conversion factor and number of bootstrap replicates. Here, I use the cores= argument to use multiple cores to process the bootstraps in parallel. If you do not have this many cores in your computer, you will need to reduce/remove the argument.
+nboots <- 300
+est.boot <- bootdht(model=wren.unif.cos, flatfile=wren_lt,
+ summary_fun=bootdht_Dhat_summarize,
+ convert_units=conversion.factor, nboot=nboots, cores=10)The object est.boot contains a data frame with two columns consisting of \(\hat{D}\) as specified in bootdht_Dhat_summarize. This data frame can be processed to produce a histogram (Fig. 1) representing the sampling distribution of the estimated parameters as well as the percentile confidence interval bounds.
## 2.5% 97.5%
+## 0.7940937 1.4088653
+
+hist(est.boot$D, nc=30,
+ main="Distribution of bootstrap estimates\nwithout model uncertainty",
+ xlab="Estimated density")
+abline(v=bootci, lwd=2, lty=2)
+Figure 1: Sampling distribution of \(\hat{D}\) approximated from bootstrap. +
+The argument model in bootdht can be a single model as shown above, or it can consist of a list of models. In the later instance, all models in the list are fitted to each bootstrap replicate and model selection based on AIC is performed for each replicate. The consequence is that model uncertainty is incorporated into the resulting estimate of precision (Fig. 2).
+wren.hn <- ds(wren_lt, key="hn", adjustment="cos",
+ convert_units=conversion.factor)## Warning in check.mono(result, n.pts = control$mono.points): Detection function
+## is not strictly monotonic!
+## Warning in check.mono(result, n.pts = control$mono.points): Detection function
+## is not strictly monotonic!
+
+wren.hr.poly <- ds(wren_lt, key="hr", adjustment="poly",
+ convert_units=conversion.factor)
+est.boot.uncert <- bootdht(model=list(wren.hn, wren.hr.poly, wren.unif.cos),
+ flatfile=wren_lt,
+ summary_fun=bootdht_Dhat_summarize,
+ convert_units=conversion.factor, nboot=nboots, cores=10)## 2.5% 97.5%
+## 0.8080775 1.3620822
+
+hist(est.boot.uncert$D, nc=30,
+ main="Distribution of bootstrap estimates\nincluding model uncertainty",
+ xlab="Estimated density")
+abline(v=modselci, lwd=2, lty=2)
+Figure 2: Sampling distribution of \(\hat{D}\) approximated from bootstrap including model uncertainty. +
+Recognise that producing bootstrap estimates of precision is computer-intensive. In this example we have created only 300 bootstrap replicates in the interest of computation time. For inference you wish to draw, you will likely increase the number of bootstrap replicates to 999.
+For this data set, the bootstrap estimate of precision is greater than the delta-method approximation precision (based on confidence interval width). In addition, incorporating model uncertainty into the estimate of precision for density changes the precision estimate very little. The confidence interval width without incorporating model uncertainty is 0.615 while the confidence interval including model uncertainty is 0.554. This represents a change of -10% due to uncertainty regarding the best model for these data.
+