1. Fixed single column matrix acccidentally being converted down to a vector. 2. Added feature keepHTMLTITLEtext #7

AndreMikulec · 2013-10-13T16:59:31Z

TheSystematicInvestor@gmail

[Pull request] SIT File: fundamental.data.r R function: fund.data

Michael Kapler,

The following are the

reasons
code

supporting

fixes
enhancement

File: fundamental.data.r
R function: fund.data

PROBLEM #1

The situation exists that

    AAOI only has one quarter of information on the web page

When I run

outp <- fund.data31("AAOI", 20)

The following code is (eventually) executed,

# remove empty columns
all.data = all.data[, colSums(nchar(trim(all.data))) > 0]

if( ncol(all.data) > n ) {

Code
all.data[, colSums(nchar(trim(all.data))) > 0]

accidentally converts the single column matrix
( because only one column of information on the web page ) into a vector.

So next

    if( ncol(all.data) > n ) {

fails with error

     Downloading http://uk.advfn.com/p.php?pid=financials&symbol=AAOI&btn=quarterly_reports
     Error in if (ncol(all.data) > n) { : argument is of length zero

This is because, "vectors" do not have a "ncol" attribute

This is the fix

# remove empty columns
all.data = all.data[, colSums(nchar(trim(all.data))) > 0]
# if converted to a vector, then make it a matrix again
if (is.vector(all.data)) {
    all.data.temp <- matrix(all.data,nrow=length(all.data))
    rownames(all.data.temp) <- names(all.data)
    colnames(all.data.temp) <- all.data.temp[1, ]
    all.data <- all.data.temp
    all.data.temp <- matrix(nrow=0, ncol=0)
}

if( ncol(all.data) > n ) {  
    all.data <- all.data[, (ncol(all.data) - n + 1):ncol(all.data)]
    # if converted to a vector, then make it a matrix again
    if (is.vector(all.data)) {
        all.data.temp <- matrix(all.data,nrow=length(all.data))
        rownames(all.data.temp) <- names(all.data)
        colnames(all.data.temp) <- all.data.temp[1, ]
        all.data <- all.data.temp
        all.data.temp <- matrix(nrow=0, ncol=0)
    }

PROBLEM #2 ( RELATED TO PROBLEM #1 )

The situation exists that

            I may only want one quarter of information

When I run

            outp <- fund.data31("THM", 1)

When requesting only one quarter of information ( using n=1 )
again,

This Code

           all.data = all.data[, colSums(nchar(trim(all.data))) > 0]

accidentally converts the single column matrix into a vector

It is tested by

           if( ncol(all.data) > n ) {

Instead it is directly returned from the function by

    return(all.data)

as a vector ( but this should be a matrix )

This is the fix

# remove empty columns
all.data = all.data[, colSums(nchar(trim(all.data))) > 0]
# if converted to a vector, then make it a matrix again
if (is.vector(all.data)) {
    all.data.temp <- matrix(all.data,nrow=length(all.data))
    rownames(all.data.temp) <- names(all.data)
    colnames(all.data.temp) <- all.data.temp[1, ]
    all.data <- all.data.temp
    all.data.temp <- matrix(nrow=0, ncol=0)
}

if( ncol(all.data) > n ) {  
    all.data <- all.data[, (ncol(all.data) - n + 1):ncol(all.data)]
    # if converted to a vector, then make it a matrix again
    if (is.vector(all.data)) {
        all.data.temp <- matrix(all.data,nrow=length(all.data))
        rownames(all.data.temp) <- names(all.data)
        colnames(all.data.temp) <- all.data.temp[1, ]
        all.data <- all.data.temp
        all.data.temp <- matrix(nrow=0, ncol=0)
    }

ENHANCEMENT

One may want to capture the page title for reasons of

make visually matching symbols to companies easier
compare and contrast errors between the page titles and the in-page company names

This is the enhancement

    if ( keepHTMLTITLEtext == TRUE ) {
        # extract title from this page
        HTMLOPENTITLETAGposStart    <- regexpr(pattern="<title>", txt,ignore.case=TRUE)[1]
        HTMLCLOSETITLETAGposStart   <- regexpr(pattern="</title>",txt,ignore.case=TRUE)[1]
        HTMLOPENTITLETAGlength      <- nchar("<title>")
        HTMLTITLEtext <- substr(txt, HTMLOPENTITLETAGposStart  + HTMLOPENTITLETAGlength , HTMLCLOSETITLETAGposStart - 1 )
            }
    ...

            if ( keepHTMLTITLEtext == TRUE ) {
        # add a row of the HTMLTITLEtext values
        all.data <- rbind(all.data, rep( HTMLTITLEtext, ncol(all.data) ) )
        # to the new 'added row' name it "HTMLTITLEtext"
        rownames(all.data)[nrow(all.data)] <- "HTMLTITLEtext"
    }
    return(all.data)
} else {
            if ( keepHTMLTITLEtext == TRUE ) {
        # add a row of the HTMLTITLEtext values
        all.data <- rbind(all.data, rep( HTMLTITLEtext, ncol(all.data) ) )
        # to the new 'added row' name it "HTMLTITLEtext"
        rownames(all.data)[nrow(all.data)] <- "HTMLTITLEtext"
    }
    return(all.data)
}

}

SIMPLE METHOD TO TEST

BELOW fund.data41 is actually fund.data function of
https://github.com/AndreMikulec/SIT/blob/hotfix/hotfix-vector_null_HTMLTitle/R/fundamental.data.r
AndreMikulec@0a14fea

library(SIT)

fund.data41 <- function
(
Symbol, # ticker
n=10, # number of periods
mode=c('quarterly','annual'), # periodicity
max.attempts=5, # maximum number of attempts to download before exiting
keepHTMLTITLEtext = FALSE # last row includes HTML TITLE text
)
{
all.data = c()
option.value = -1

start_date = spl('istart_date,start_date')
    names(start_date) = spl('quarterly,annual')

repeat {
    # download Quarterly Financial Report data
    if(option.value >= 0) {
        url = paste('http://uk.advfn.com/p.php?pid=financials&symbol=', Symbol, '&btn=', mode[1], '_reports&', start_date[mode[1]], '=', option.value, sep = '')    
    } else {
        url = paste('http://uk.advfn.com/p.php?pid=financials&symbol=', Symbol, '&btn=', mode[1], '_reports', sep = '')
    }

    cat('Downloading', url, '\n')

    #txt = join(readLines(url))     
    for(iattempt in 1:max.attempts) { 
        flag = T
        tryCatch({
            txt = join(readLines(url))
        }, interrupt = function(ex) {
            flag <<-  F
            Sys.sleep(0.1)
        }, error = function(ex) {
            flag <<-  F
            Sys.sleep(0.1)
        }, finally = {
            if(flag) break
        })
    }

if( len(grep('INDICATORS', txt, ignore.case = T)) == 0 ) {
cat('No Data Found for', Symbol, '\n')
return(all.data)
}

    if ( keepHTMLTITLEtext == TRUE ) {
        # extract title from this page
        HTMLOPENTITLETAGposStart    <- regexpr(pattern="<title>", txt,ignore.case=TRUE)[1]
        HTMLCLOSETITLETAGposStart   <- regexpr(pattern="</title>",txt,ignore.case=TRUE)[1]
        HTMLOPENTITLETAGlength      <- nchar("<title>")
        HTMLTITLEtext <- substr(txt, HTMLOPENTITLETAGposStart  + HTMLOPENTITLETAGlength , HTMLCLOSETITLETAGposStart - 1 )
            }

    # extract table from this page
    data = extract.table.from.webpage(txt, 'INDICATORS', hasHeader = T)
        colnames(data) = data[1,]
        rownames(data) = data[,1]
        data = data[,-1,drop=F]

    # only add not already present data
    add.index = which( is.na(match( colnames(data), colnames(all.data) )) )         
    all.data = cbind(data[,add.index,drop=F], all.data)

    # check if it is time to stop
    if(ncol(all.data) >= n) break
    if(option.value == 0)  break

    # extract option value to go to the next page
    temp = gsub(pattern = '<option', replacement = '<tr>', txt, perl = TRUE)
    temp = gsub(pattern = '</option>', replacement = '</tr>', temp, perl = TRUE)    
    temp = extract.table.from.webpage(temp, 'All amounts', hasHeader = T)

    temp = apply(temp,1,join)
    index.selected = grep('selected', temp)
    option.value = 0
    if( len(index.selected) )
        option.value = as.double( gsub('.*value=\'([0-9]*).*', '\\1', temp[index.selected]) ) 

    if(option.value > 0) {
        # can only get 5 time periods at a time
        option.value = option.value - 5
        option.value = max(0, option.value)     
    } else {
        break
    }
}

# remove empty columns
all.data = all.data[, colSums(nchar(trim(all.data))) > 0]
# if converted to a vector, then make it a matrix again
if (is.vector(all.data)) {
    all.data.temp <- matrix(all.data,nrow=length(all.data))
    rownames(all.data.temp) <- names(all.data)
    colnames(all.data.temp) <- all.data.temp[1, ]
    all.data <- all.data.temp
    all.data.temp <- matrix(nrow=0, ncol=0)
}

if( ncol(all.data) > n ) {  
    all.data <- all.data[, (ncol(all.data) - n + 1):ncol(all.data)]
    # if converted to a vector, then make it a matrix again
    if (is.vector(all.data)) {
        all.data.temp <- matrix(all.data,nrow=length(all.data))
        rownames(all.data.temp) <- names(all.data)
        colnames(all.data.temp) <- all.data.temp[1, ]
        all.data <- all.data.temp
        all.data.temp <- matrix(nrow=0, ncol=0)
    }
            if ( keepHTMLTITLEtext == TRUE ) {
        # add a row of the HTMLTITLEtext values
        all.data <- rbind(all.data, rep( HTMLTITLEtext, ncol(all.data) ) )
        # to the new 'added row' name it "HTMLTITLEtext"
        rownames(all.data)[nrow(all.data)] <- "HTMLTITLEtext"
    }
    return(all.data)
} else {
            if ( keepHTMLTITLEtext == TRUE ) {
        # add a row of the HTMLTITLEtext values
        all.data <- rbind(all.data, rep( HTMLTITLEtext, ncol(all.data) ) )
        # to the new 'added row' name it "HTMLTITLEtext"
        rownames(all.data)[nrow(all.data)] <- "HTMLTITLEtext"
    }
    return(all.data)
}

}

library(SIT)

BLUE: CONTROL ( NEVER DID 'NOT WORK' )

outp <- fund.data41("THM", 10 )

is.matrix(outp)
[1] TRUE
rownames(outp)[nrow(outp)];outp[nrow(outp),]
[1] "auditors report"
2002/09 2002/12 2003/03 2003/06 2003/09 2003/12 2004/03 2004/06 2004/09
"" "" "" "" "" "" "" "" ""

RED: A PROBLEM ( FIXED )

outp <- fund.data41("AAOI", 10)

is.matrix(outp)
[1] TRUE
rownames(outp)[nrow(outp)];outp[nrow(outp),]
[1] "% of leverage-to-industry"
[1] "0.0"

NEVER A PROBLEM

outp <- fund.data41("THM", 2)

is.matrix(outp)
[1] TRUE
rownames(outp)[nrow(outp)];outp[nrow(outp),]
[1] "auditors report"
2004/06 2004/09
"" ""

A PROBLEM ( FIXED )

outp <- fund.data41("THM", 1)

is.matrix(outp)
[1] TRUE
rownames(outp)[nrow(outp)];outp[nrow(outp),]
[1] "auditors report"
[1] ""

library(SIT)

BLUE: CONTROL ( NEVER DID 'NOT WORK' )

outp <- fund.data41("THM", 10, keepHTMLTITLEtext = TRUE)

is.matrix(outp)
[1] TRUE
rownames(outp)[nrow(outp)];outp[nrow(outp),]
[1] "HTMLTITLEtext"
2002/09
"Thomas Equipment Company Financial Information"
2002/12
"Thomas Equipment Company Financial Information"
2003/03
"Thomas Equipment Company Financial Information"
2003/06
"Thomas Equipment Company Financial Information"
2003/09
"Thomas Equipment Company Financial Information"
2003/12
"Thomas Equipment Company Financial Information"
2004/03
"Thomas Equipment Company Financial Information"
2004/06
"Thomas Equipment Company Financial Information"
2004/09
"Thomas Equipment Company Financial Information"

RED: A PROBLEM ( FIXED )

outp <- fund.data41("AAOI", 10, keepHTMLTITLEtext = TRUE)

is.matrix(outp)
[1] TRUE
rownames(outp)[nrow(outp)];outp[nrow(outp),]
[1] "HTMLTITLEtext"
[1] "APPLIED OPTOELECTRONICS, INC. Company Financial Information"

NEVER A PROBLEM

outp <- fund.data41("THM", 2, keepHTMLTITLEtext = TRUE)

is.matrix(outp)
[1] TRUE
rownames(outp)[nrow(outp)];outp[nrow(outp),]
[1] "HTMLTITLEtext"
2004/06
"Thomas Equipment Company Financial Information"
2004/09
"Thomas Equipment Company Financial Information"

A PROBLEM ( FIXED )

outp <- fund.data41("THM", 1, keepHTMLTITLEtext = TRUE)

is.matrix(outp)
[1] TRUE
rownames(outp)[nrow(outp)];outp[nrow(outp),]
[1] "HTMLTITLEtext"
[1] "Thomas Equipment Company Financial Information"

AndreMikulec added 2 commits October 11, 2013 21:03

single vector to matrix fix ...plus ... extract html title

8b7cfc5

added feature keepHTMLTITLEtext

0a14fea

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1. Fixed single column matrix acccidentally being converted down to a vector. 2. Added feature keepHTMLTITLEtext #7

1. Fixed single column matrix acccidentally being converted down to a vector. 2. Added feature keepHTMLTITLEtext #7

AndreMikulec commented Oct 13, 2013

1. Fixed single column matrix acccidentally being converted down to a vector. 2. Added feature keepHTMLTITLEtext #7

Are you sure you want to change the base?

1. Fixed single column matrix acccidentally being converted down to a vector. 2. Added feature keepHTMLTITLEtext #7

Conversation

AndreMikulec commented Oct 13, 2013

PROBLEM #1

PROBLEM #2 ( RELATED TO PROBLEM #1 )

ENHANCEMENT

SIMPLE METHOD TO TEST

BLUE: CONTROL ( NEVER DID 'NOT WORK' )

RED: A PROBLEM ( FIXED )

NEVER A PROBLEM

A PROBLEM ( FIXED )

BLUE: CONTROL ( NEVER DID 'NOT WORK' )

RED: A PROBLEM ( FIXED )

NEVER A PROBLEM

A PROBLEM ( FIXED )