No announcement yet.

Hadoop research question

  • Filter
  • Time
  • Show
Clear All
new posts

  • Hadoop research question

    Our company currently uses Amazon S3 storage solution to archive large amounts of log file data. The data is in a win-zipped format and inside the zip file are comma separated files. I have been asked to research the potential capabilities that Hadoop could provide us. Many times we have the need to go back to this archive data and extract a particular set of data due to a device recall or trouble shooting purpose.
    My question is, can someone confirm for me that it would be possible to query this huge data store considering the data is win-zipped.
    Also assuming that this is possible am I on the right track by thinking that this would be achieved using the PIG query tool and/or the Hue user interface?
    Thank you for your input.


  • #2
    Thanks for the response. Sounds like unzipping the files is probably inevitable. What if we had a section of disk space where we extracted all the contents of the zipped files. That would basically result in a bunch of .csv files. Do yousee any problem with Hadoop and PIG working with that data?
    Reading about Hadoop kind of makes me think that it is like a big "grep" tool that somehow formats the results into a dataset. What do you think of that analogy? Is that kind of how it works or is that way off base? Have you ever used Hadoop and PIG? If so what did you use it for?