Announcement

Collapse
No announcement yet.

WordFile Java Hadoop

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • WordFile Java Hadoop

    Hello, I am a beginner with hadoop, and I I'd like to split a file of more than one line that ends with tags;
    example

    # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
    id1: 1 result
    id2: Results2
    id3: results3
    # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
    1 at block

    # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #
    id1: result 11
    id2: resultats265
    id3: resultats3655
    # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #

    I would like to extract the information from the file and put them in a final form fihcier

    Res.txt

    resulta1, Result2, resultat3
    resultat11, resultats265, resultats3655

  • #2
    File Input format split only those file which is bigger then HDFS. We can control split size by various hadoop properties.some of input path and filter properties are:

    Property name Type Default value Description

    mapred.min.split.size int 1 Smallest valid size inbytes for a file split

    mapred.max.split.size long Long.MAX_VALUE, that is 9223372036854775807 Largest valid size in bytes for a file split

    dfs.block.size long 64 MB The size block in HDFS

    and if you want calculate split size use formula (see the computeSplitSize() method in FileInputFormat):

    max(minimumSize, min(maximumSize, blockSize))
    and by default:
    minimumSize < blockSize < maximumSize
    so the split size is blockSize.
    for more information about Hadoop kindly go through link
    Last edited by orbrey; 09-24-2014, 09:37 AM.

    Comment

    Working...
    X