Running nerd-stats on part of the Wikipedia dump #14

Nasreddine · 2014-12-06T07:57:49Z

I've tried locally to run nerd-stats.pig script on part of Wikipedia dump http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles1.xml-p000000010p000010000.bz2, I got intended statistics. But when I tried to run the same script on part of the above wiki dump, no results were available.
Does the script require minimum amount of data ?

tilneyyang · 2014-12-11T06:45:07Z

I'm stuck with the same problem on named entity extraction script. Everything works fine with hadoop. But the output folder is not created...

tilneyyang · 2014-12-11T08:42:04Z

it might be the problem of the OUTPUT path you set. I tried with local dir, no luck. Then with a hdfs path and got the final results. I'm not familiar with hadoop or pig, hope someone can figure it out...

Nasreddine · 2014-12-11T12:44:28Z

In my case the output folder is created, but no results were created.
How did you set the hdfs path ?

tilneyyang · 2014-12-15T02:37:27Z

The full hdfs path /user/username/outputdir. Also please do try hadoop0.20.0 with the script, otherwise there might also be unexpected problems cause by hadoop version.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running nerd-stats on part of the Wikipedia dump #14

Running nerd-stats on part of the Wikipedia dump #14

Nasreddine commented Dec 6, 2014

tilneyyang commented Dec 11, 2014

tilneyyang commented Dec 11, 2014

Nasreddine commented Dec 11, 2014

tilneyyang commented Dec 15, 2014

Running nerd-stats on part of the Wikipedia dump #14

Running nerd-stats on part of the Wikipedia dump #14

Comments

Nasreddine commented Dec 6, 2014

tilneyyang commented Dec 11, 2014

tilneyyang commented Dec 11, 2014

Nasreddine commented Dec 11, 2014

tilneyyang commented Dec 15, 2014