Announcement

Collapse
No announcement yet.

hdfs data and PIG

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • hdfs data and PIG

    i have setup hadoop 2.7 cluster on ubuntu server pc with 1tb disk. Copied 300gb data file to hdfs location as follows:
    ( hdfs-site.xml has dfs.datanode.dir.name set to /hadoopdata )

    hdfs dfs -mkdir /hadoopdata/sales
    hdfs dfs -put /tmp/annual.dat /hadoopdata/sales

    i can list file using hdfs dfs -ls command.

    i can run classic mapreduce java jobs(mapper reducer driver) to process data.

    But when i use pig to run equivalent pig script, the LOAD command fails:

    data = LOAD '/hadoopdata/sales/annual.dat' as (c1, c2, .......);

    error says it cannot find file /hadoopdata/sales/annual.dat(which is present in hdfs).

    error also tags hdfs://localhost:8020/ as prefix to file name. looking at pig example one need to create dir with localhost tag. This where i am confused , i have data in hdfs why cant pig see it? i dont want to make another copy of 300gb file.

    Maybe i am missing something any pointers will help.

Working...
X