Thank you!
]]>Hi there,
I can’t say I’ve had to work with data sets containing millions of records and thousands of variables, but I can say that ff will definitely be able to handle it. Like SAS, ff allows you to analyze data without loading all of it into memory at once. I would for sure recommend using ff to solve your work needs. Just remember that when working with ff, everything needs to be treated accordingly. When making new vectors, they always seem to need to be declared (e.g. “as.ff(ifelse(x[,”something”] == “blah”,1,0)) ) and not every R function works the same when your data is in ff form.
Good luck!
]]>Is this kind of data similar to what you analyze? I have been struggling to find a consistent workflow for analyzing large datasets (not “big” data) using open-source tools like R. I need to read in csv files too big to fit in memory, explore the data, create new column vectors, and store it all on disk somehow. Would you recommend the ff package as a solution that can handle this?
Cheers, and thanks for the interesting blog posts!
]]>Thanks for your comment. I sent an email to the address associated with the account you’ve posted from. Is that the right email to reach you at?
]]>Hi there,
I wish it were so easy that I could just analyze the English translations, but if I were to do that then the results would probably say a lot more about the person doing the translating than the original text.
]]>