Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running nerd-stats on part of the Wikipedia dump #14

Open
Nasreddine opened this issue Dec 6, 2014 · 4 comments
Open

Running nerd-stats on part of the Wikipedia dump #14

Nasreddine opened this issue Dec 6, 2014 · 4 comments

Comments

@Nasreddine
Copy link

I've tried locally to run nerd-stats.pig script on part of Wikipedia dump http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles1.xml-p000000010p000010000.bz2, I got intended statistics. But when I tried to run the same script on part of the above wiki dump, no results were available.
Does the script require minimum amount of data ?

@tilneyyang
Copy link

I'm stuck with the same problem on named entity extraction script. Everything works fine with hadoop. But the output folder is not created...

@tilneyyang
Copy link

it might be the problem of the OUTPUT path you set. I tried with local dir, no luck. Then with a hdfs path and got the final results. I'm not familiar with hadoop or pig, hope someone can figure it out...

@Nasreddine
Copy link
Author

In my case the output folder is created, but no results were created.
How did you set the hdfs path ?

@tilneyyang
Copy link

The full hdfs path /user/username/outputdir. Also please do try hadoop0.20.0 with the script, otherwise there might also be unexpected problems cause by hadoop version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants