-
Notifications
You must be signed in to change notification settings - Fork 367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add automatic time-series conversion to read_table #238
Comments
Some ideas on coding. #converting a column of strings to type Calendar
time_array = Calendar.parse("yyyy-MM-dd", string_array[:,1])
# above line taking the format argument (probably called fmt) and column to convert (timecol)
time_array = Calendar.parse(fmt, string_array[:,timecol])
# build the DataFrame called df here
# check the order is ascending
if time_array[1] > time_array[2]
flipud!(df)
end |
What are the implications for dependencies on other packages? I'd generally prefer using kwargs for something like this, instead of On Sun, Apr 7, 2013 at 1:55 PM, milktrader [email protected] wrote:
|
|
I have this working in a DataFrames branch with the following code (and alias). I didn't add a kwargs to the method (still need to see that implemented). This is from the io.jl file. #function read_table{T <: String}(filename::T)
function read_table{T<:String}(filename::T, fmt::String, indexby::Int)
# Do inference for missing configuration settings
separator = determine_separator(filename)
quotation_character = DEFAULT_QUOTATION_CHARACTER
missingness_indicators = DEFAULT_MISSINGNESS_INDICATORS
header = true
nrows = determine_nrows(filename, header)
io = open(filename, "r")
column_names = determine_column_names(io, separator, quotation_character, header)
df = read_table(io,
separator,
quotation_character,
missingness_indicators,
header,
column_names,
nrows)
close(io)
#############################
within!(df, quote
Date = parse_date($fmt, $df[:,$indexby])
end)
df["Date"] = IndexedVector(df["Date"])
if df[1,1] > df[2,1]
flipud!(df)
end
#############################
return df
end
read_table(filename) = read_table(filename, "yyyy-MM-dd", 1) This also requires |
After looking at it, I think I'd prefer that this be in a separate function. Although, I'd use this functionality, I think it clutters
|
What about putting a |
That's probably a good idea. Also, a function that operates on an existing On Sun, Apr 7, 2013 at 10:11 PM, milktrader [email protected]:
|
Alright, I'd be willing to do that. I'll just copy and paste (I think it was John's) io.jl code and adapt it for time series. I'm not sure how it does what it does, but I can sort out the pieces. I'll leave this open for someone else to close, but I think the Calendar dependency will convince most this is the best course. Also, if anyone wants to write it before I do, please feel free. I'm glad to add any contributors. |
Do you really have to adapt the io.jl code? I was thinking you might just On Mon, Apr 8, 2013 at 7:49 AM, milktrader [email protected] wrote:
|
Tom has a point: why can't you build up a DataFrame and then handle the conversion to time series columns after the fact? |
Okay, much better. Thanks. |
Well it turns out I implemented this already for testing, so just cleaning it up and now I'll export |
Now that we have This does require a dependency on |
Okay, I said a one-liner (I think you can still do this but why) and here is a few lines that would recognize and convert a for col in colnames(x)
ismatch(r"(?i)date", col)?
x[col] = Date[date(d) for d in x[col]]:
Nothing
end
h/t @dcjones for the constructor comprehension. |
Added a pull request #386 Maybe conversation about this idea can move over there? I'll close this one since it's pretty old (it should still have a reference on the pull request). |
There are two ways to go about this. One is to add kwargs to
readtable
and the other is to add a new method call itreadtimetable
or some such.This is important because it will decide how DataFrames prefers to handle time series. For some background, here is how R's
read.zoo
is designed.If readtable is enhanced, the kwarg
time=true
can be added. I think the important args that should be adopted includeformat
, which tellsCalendar.parse
how to treat the string,tz
, which allows users to override their local setting andindex.column
which defines which column is the time-series (most cases it's the first column)This references a separate issue on the Quandl package, which opens up an API to open source time-based data in a wide variety of disciplines.
milktrader/Quandl.jl#1 (comment)
The text was updated successfully, but these errors were encountered: