go-adstxt-crawler

Ads.txt crawler and parser based on IAB Ads.txt Specification Version 1.0.1 implemented in Go

This library provides a mechanism for obtaining and parsing Ads.txt file from websites, or parse your local copy of Ads.txt file

Motivation

There are some nice online tools for crawling and validating Ads.txt files (for example Ads.txt validator from AppNexus or another Ads.txt Validator by AdReform) that follows IAB Ads.txt Specification Version 1.0.1. However, you cannot easily use those tools for massive site scanning since they do not provide free API.

There are also few open source projects I've found Github for scanning Ads.txt files, but at least the ones that I've tried were not fully competible with latest Ads.txt Spec. You can use them, of course, and they would do a decent job, but they are lacking a good validation mechanism to ensure that Ads.txt format is correct and follows the official spec.

This Ads.txt library allows massive sites crawling, and follows IAB Ads.txt Specification Version 1.0.1 to validate that the Ads.txt file is valid

Examples

You can see examples folder for a short example of adstxt library 3 main methods: adstxt.Get to fetch and parse single Ads.txt file from a remote host, adstxt.GetMultiple to fetch and parse multiple Ads.txt files form different hosts or adstxt.ParseBody that can be used to parse the content of a local Ads.txt file

req, err := adstxt.NewRequest("example.com")
if err != nil {
  log.Fatal(err)
}
res, err := adstxt.Get(req)
if err != nil {
  log.Fatal(err)
}
// res now holds Ads.txt file DataRecords, Variables and Warnings for Ads.txt parse warnings
for _, r := range res.DataRecords { ... }
for _, v := range res.Variables { ... }
for _, w := range res.Warnings { ... }

Or get Ads.txt files for multiple hosts simultaneously

// define handler function to handle Ads.txt response
h := func(req *Request, res *Response, err error) {
  for _, r := range res.DataRecords { ... }
  for _, v := range res.Variables { ... }
  for _, w := range res.Warnings { ... }
}

// collection of domains to validate
domains := []string{
  "http://example.com",
  "http://test.com",
}

requests := make([]*adstxt.Request, len(domains))
for index, d := range domains {
  r, _ := adstxt.NewRequest(d)
  requests[index] = r
}

adstxt.GetMultiple(requests, adstxt.HandlerFunc(h))

You can also parse local Ads.txt file in a similar way

body, err := ioutil.ReadFile("/<path_to>/ads.txt")
if err != nil {
  log.Fatal(err)
}
rec, err := adstxt.ParseBody(body)
if err != nil {
  log.Fatal(err)
}
// rec now holds Ads.txt file DataRecords, Variables and Warnings for Ads.txt parse warnings
for _, r := range rec.DataRecords { ... }
for _, v := range rec.Variables { ... }
for _, w := range rec.Warnings { ... }

Import as a Library

import "github.com/tzafrirben/go-adstxt-crawler/adstxt" and you can use adstxt library in your code

ToDo

robots.txt file on remote host is ignored by crawler, a good practice will be to scan this file first (as specified in Ads.txt specification)

LICENSE

MIT

Author

Tzafrir Ben Ami (a.k.a. tzafrirben)

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
examples		examples
LICENSE		LICENSE
README.md		README.md
adstxt.go		adstxt.go
adstxt_test.go		adstxt_test.go
adsystems.go		adsystems.go
adsystems_test.go		adsystems_test.go
crawler.go		crawler.go
crawler_test.go		crawler_test.go
handler.go		handler.go
record.go		record.go
record_test.go		record_test.go
request.go		request.go
request_test.go		request_test.go
response.go		response.go
warning.go		warning.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

go-adstxt-crawler

Motivation

Examples

Import as a Library

ToDo

LICENSE

Author

About

Releases

Packages

Contributors 2

Languages

License

tzafrirben/go-adstxt-crawler

Folders and files

Latest commit

History

Repository files navigation

go-adstxt-crawler

Motivation

Examples

Import as a Library

ToDo

LICENSE

Author

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages