Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1. Fixed single column matrix acccidentally being converted down to a vector. 2. Added feature keepHTMLTITLEtext #7

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

AndreMikulec
Copy link

TheSystematicInvestor@gmail

[Pull request] SIT File: fundamental.data.r R function: fund.data

Michael Kapler,

The following are the

  1. reasons
  2. code

supporting

  1. fixes
  2. enhancement

File: fundamental.data.r
R function: fund.data

PROBLEM #1

The situation exists that

    AAOI only has one quarter of information on the web page

When I run

outp <- fund.data31("AAOI", 20)

The following code is (eventually) executed,

# remove empty columns
all.data = all.data[, colSums(nchar(trim(all.data))) > 0]

if( ncol(all.data) > n ) {

Code
all.data[, colSums(nchar(trim(all.data))) > 0]

accidentally converts the single column matrix
( because only one column of information on the web page ) into a vector.

So next

    if( ncol(all.data) > n ) {

fails with error

     Downloading http://uk.advfn.com/p.php?pid=financials&symbol=AAOI&btn=quarterly_reports
     Error in if (ncol(all.data) > n) { : argument is of length zero

This is because, "vectors" do not have a "ncol" attribute

This is the fix

# remove empty columns
all.data = all.data[, colSums(nchar(trim(all.data))) > 0]
# if converted to a vector, then make it a matrix again
if (is.vector(all.data)) {
    all.data.temp <- matrix(all.data,nrow=length(all.data))
    rownames(all.data.temp) <- names(all.data)
    colnames(all.data.temp) <- all.data.temp[1, ]
    all.data <- all.data.temp
    all.data.temp <- matrix(nrow=0, ncol=0)
}

if( ncol(all.data) > n ) {  
    all.data <- all.data[, (ncol(all.data) - n + 1):ncol(all.data)]
    # if converted to a vector, then make it a matrix again
    if (is.vector(all.data)) {
        all.data.temp <- matrix(all.data,nrow=length(all.data))
        rownames(all.data.temp) <- names(all.data)
        colnames(all.data.temp) <- all.data.temp[1, ]
        all.data <- all.data.temp
        all.data.temp <- matrix(nrow=0, ncol=0)
    }

PROBLEM #2 ( RELATED TO PROBLEM #1 )

The situation exists that

            I may only want one quarter of information

When I run

            outp <- fund.data31("THM", 1) 

When requesting only one quarter of information ( using n=1 )
again,

This Code

           all.data = all.data[, colSums(nchar(trim(all.data))) > 0]

accidentally converts the single column matrix into a vector

It is tested by

           if( ncol(all.data) > n ) {

Instead it is directly returned from the function by

    return(all.data)

as a vector ( but this should be a matrix )

This is the fix

# remove empty columns
all.data = all.data[, colSums(nchar(trim(all.data))) > 0]
# if converted to a vector, then make it a matrix again
if (is.vector(all.data)) {
    all.data.temp <- matrix(all.data,nrow=length(all.data))
    rownames(all.data.temp) <- names(all.data)
    colnames(all.data.temp) <- all.data.temp[1, ]
    all.data <- all.data.temp
    all.data.temp <- matrix(nrow=0, ncol=0)
}

if( ncol(all.data) > n ) {  
    all.data <- all.data[, (ncol(all.data) - n + 1):ncol(all.data)]
    # if converted to a vector, then make it a matrix again
    if (is.vector(all.data)) {
        all.data.temp <- matrix(all.data,nrow=length(all.data))
        rownames(all.data.temp) <- names(all.data)
        colnames(all.data.temp) <- all.data.temp[1, ]
        all.data <- all.data.temp
        all.data.temp <- matrix(nrow=0, ncol=0)
    }

ENHANCEMENT

One may want to capture the page title for reasons of

  1. make visually matching symbols to companies easier
  2. compare and contrast errors between the page titles and the in-page company names

This is the enhancement

    if ( keepHTMLTITLEtext == TRUE ) {
        # extract title from this page
        HTMLOPENTITLETAGposStart    <- regexpr(pattern="<title>", txt,ignore.case=TRUE)[1]
        HTMLCLOSETITLETAGposStart   <- regexpr(pattern="</title>",txt,ignore.case=TRUE)[1]
        HTMLOPENTITLETAGlength      <- nchar("<title>")
        HTMLTITLEtext <- substr(txt, HTMLOPENTITLETAGposStart  + HTMLOPENTITLETAGlength , HTMLCLOSETITLETAGposStart - 1 )
            }
    ...

            if ( keepHTMLTITLEtext == TRUE ) {
        # add a row of the HTMLTITLEtext values
        all.data <- rbind(all.data, rep( HTMLTITLEtext, ncol(all.data) ) )
        # to the new 'added row' name it "HTMLTITLEtext"
        rownames(all.data)[nrow(all.data)] <- "HTMLTITLEtext"
    }
    return(all.data)
} else {
            if ( keepHTMLTITLEtext == TRUE ) {
        # add a row of the HTMLTITLEtext values
        all.data <- rbind(all.data, rep( HTMLTITLEtext, ncol(all.data) ) )
        # to the new 'added row' name it "HTMLTITLEtext"
        rownames(all.data)[nrow(all.data)] <- "HTMLTITLEtext"
    }
    return(all.data)
}

}

SIMPLE METHOD TO TEST

BELOW fund.data41 is actually fund.data function of
https://github.com/AndreMikulec/SIT/blob/hotfix/hotfix-vector_null_HTMLTitle/R/fundamental.data.r
AndreMikulec@0a14fea

library(SIT)

fund.data41 <- function
(
Symbol, # ticker
n=10, # number of periods
mode=c('quarterly','annual'), # periodicity
max.attempts=5, # maximum number of attempts to download before exiting
keepHTMLTITLEtext = FALSE # last row includes HTML TITLE text
)
{
all.data = c()
option.value = -1

start_date = spl('istart_date,start_date')
    names(start_date) = spl('quarterly,annual')

repeat {
    # download Quarterly Financial Report data
    if(option.value >= 0) {
        url = paste('http://uk.advfn.com/p.php?pid=financials&symbol=', Symbol, '&btn=', mode[1], '_reports&', start_date[mode[1]], '=', option.value, sep = '')    
    } else {
        url = paste('http://uk.advfn.com/p.php?pid=financials&symbol=', Symbol, '&btn=', mode[1], '_reports', sep = '')
    }

    cat('Downloading', url, '\n')

    #txt = join(readLines(url))     
    for(iattempt in 1:max.attempts) { 
        flag = T
        tryCatch({
            txt = join(readLines(url))
        }, interrupt = function(ex) {
            flag <<-  F
            Sys.sleep(0.1)
        }, error = function(ex) {
            flag <<-  F
            Sys.sleep(0.1)
        }, finally = {
            if(flag) break
        })
    }

if( len(grep('INDICATORS', txt, ignore.case = T)) == 0 ) {
cat('No Data Found for', Symbol, '\n')
return(all.data)
}

    if ( keepHTMLTITLEtext == TRUE ) {
        # extract title from this page
        HTMLOPENTITLETAGposStart    <- regexpr(pattern="<title>", txt,ignore.case=TRUE)[1]
        HTMLCLOSETITLETAGposStart   <- regexpr(pattern="</title>",txt,ignore.case=TRUE)[1]
        HTMLOPENTITLETAGlength      <- nchar("<title>")
        HTMLTITLEtext <- substr(txt, HTMLOPENTITLETAGposStart  + HTMLOPENTITLETAGlength , HTMLCLOSETITLETAGposStart - 1 )
            }

    # extract table from this page
    data = extract.table.from.webpage(txt, 'INDICATORS', hasHeader = T)
        colnames(data) = data[1,]
        rownames(data) = data[,1]
        data = data[,-1,drop=F]

    # only add not already present data
    add.index = which( is.na(match( colnames(data), colnames(all.data) )) )         
    all.data = cbind(data[,add.index,drop=F], all.data)

    # check if it is time to stop
    if(ncol(all.data) >= n) break
    if(option.value == 0)  break

    # extract option value to go to the next page
    temp = gsub(pattern = '<option', replacement = '<tr>', txt, perl = TRUE)
    temp = gsub(pattern = '</option>', replacement = '</tr>', temp, perl = TRUE)    
    temp = extract.table.from.webpage(temp, 'All amounts', hasHeader = T)

    temp = apply(temp,1,join)
    index.selected = grep('selected', temp)
    option.value = 0
    if( len(index.selected) )
        option.value = as.double( gsub('.*value=\'([0-9]*).*', '\\1', temp[index.selected]) ) 

    if(option.value > 0) {
        # can only get 5 time periods at a time
        option.value = option.value - 5
        option.value = max(0, option.value)     
    } else {
        break
    }
}

# remove empty columns
all.data = all.data[, colSums(nchar(trim(all.data))) > 0]
# if converted to a vector, then make it a matrix again
if (is.vector(all.data)) {
    all.data.temp <- matrix(all.data,nrow=length(all.data))
    rownames(all.data.temp) <- names(all.data)
    colnames(all.data.temp) <- all.data.temp[1, ]
    all.data <- all.data.temp
    all.data.temp <- matrix(nrow=0, ncol=0)
}

if( ncol(all.data) > n ) {  
    all.data <- all.data[, (ncol(all.data) - n + 1):ncol(all.data)]
    # if converted to a vector, then make it a matrix again
    if (is.vector(all.data)) {
        all.data.temp <- matrix(all.data,nrow=length(all.data))
        rownames(all.data.temp) <- names(all.data)
        colnames(all.data.temp) <- all.data.temp[1, ]
        all.data <- all.data.temp
        all.data.temp <- matrix(nrow=0, ncol=0)
    }
            if ( keepHTMLTITLEtext == TRUE ) {
        # add a row of the HTMLTITLEtext values
        all.data <- rbind(all.data, rep( HTMLTITLEtext, ncol(all.data) ) )
        # to the new 'added row' name it "HTMLTITLEtext"
        rownames(all.data)[nrow(all.data)] <- "HTMLTITLEtext"
    }
    return(all.data)
} else {
            if ( keepHTMLTITLEtext == TRUE ) {
        # add a row of the HTMLTITLEtext values
        all.data <- rbind(all.data, rep( HTMLTITLEtext, ncol(all.data) ) )
        # to the new 'added row' name it "HTMLTITLEtext"
        rownames(all.data)[nrow(all.data)] <- "HTMLTITLEtext"
    }
    return(all.data)
}

}


library(SIT)

BLUE: CONTROL ( NEVER DID 'NOT WORK' )

outp <- fund.data41("THM", 10 )

is.matrix(outp)
[1] TRUE
rownames(outp)[nrow(outp)];outp[nrow(outp),]
[1] "auditors report"
2002/09 2002/12 2003/03 2003/06 2003/09 2003/12 2004/03 2004/06 2004/09
"" "" "" "" "" "" "" "" ""

RED: A PROBLEM ( FIXED )

outp <- fund.data41("AAOI", 10)

is.matrix(outp)
[1] TRUE
rownames(outp)[nrow(outp)];outp[nrow(outp),]
[1] "% of leverage-to-industry"
[1] "0.0"

NEVER A PROBLEM

outp <- fund.data41("THM", 2)

is.matrix(outp)
[1] TRUE
rownames(outp)[nrow(outp)];outp[nrow(outp),]
[1] "auditors report"
2004/06 2004/09
"" ""

A PROBLEM ( FIXED )

outp <- fund.data41("THM", 1)

is.matrix(outp)
[1] TRUE
rownames(outp)[nrow(outp)];outp[nrow(outp),]
[1] "auditors report"
[1] ""


library(SIT)

BLUE: CONTROL ( NEVER DID 'NOT WORK' )

outp <- fund.data41("THM", 10, keepHTMLTITLEtext = TRUE)

is.matrix(outp)
[1] TRUE
rownames(outp)[nrow(outp)];outp[nrow(outp),]
[1] "HTMLTITLEtext"
2002/09
"Thomas Equipment Company Financial Information"
2002/12
"Thomas Equipment Company Financial Information"
2003/03
"Thomas Equipment Company Financial Information"
2003/06
"Thomas Equipment Company Financial Information"
2003/09
"Thomas Equipment Company Financial Information"
2003/12
"Thomas Equipment Company Financial Information"
2004/03
"Thomas Equipment Company Financial Information"
2004/06
"Thomas Equipment Company Financial Information"
2004/09
"Thomas Equipment Company Financial Information"

RED: A PROBLEM ( FIXED )

outp <- fund.data41("AAOI", 10, keepHTMLTITLEtext = TRUE)

is.matrix(outp)
[1] TRUE
rownames(outp)[nrow(outp)];outp[nrow(outp),]
[1] "HTMLTITLEtext"
[1] "APPLIED OPTOELECTRONICS, INC. Company Financial Information"

NEVER A PROBLEM

outp <- fund.data41("THM", 2, keepHTMLTITLEtext = TRUE)

is.matrix(outp)
[1] TRUE
rownames(outp)[nrow(outp)];outp[nrow(outp),]
[1] "HTMLTITLEtext"
2004/06
"Thomas Equipment Company Financial Information"
2004/09
"Thomas Equipment Company Financial Information"

A PROBLEM ( FIXED )

outp <- fund.data41("THM", 1, keepHTMLTITLEtext = TRUE)

is.matrix(outp)
[1] TRUE
rownames(outp)[nrow(outp)];outp[nrow(outp),]
[1] "HTMLTITLEtext"
[1] "Thomas Equipment Company Financial Information"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant