Skip to content

Conversation

sebastian-nagel
Copy link

Note: execution of the WAT/WET extractor in Hadoop local mode failed for me:

$> java -cp .../ia-hadoop-tools-jar-with-dependencies.jar org.archive.hadoop.jobs.WEATGenerator \
         -Dmapreduce.framework.name=local -strictMode batch-id-1 .../warc/my.warc.gz
Exception in thread "main" java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.

@sebastian-nagel sebastian-nagel force-pushed the 4-skip-unneeded-functionality-dependencies branch from dc957bb to 993204b Compare October 1, 2024 16:29
- Pig, Cassandra, Petabox, server
- in order to address #4 (failed to download dependencies
  from builds.archive.org)
@sebastian-nagel sebastian-nagel force-pushed the 4-skip-unneeded-functionality-dependencies branch from 993204b to fb0b053 Compare October 7, 2024 13:38
@sebastian-nagel sebastian-nagel marked this pull request as ready for review October 7, 2024 13:40
@sebastian-nagel
Copy link
Author

Successfully tested on a single-node Hadoop cluster, see instructions in the discussion of #10.

@sebastian-nagel
Copy link
Author

... also tested on production cluster.

@sebastian-nagel sebastian-nagel merged commit f5027e9 into master Oct 18, 2024
@sebastian-nagel sebastian-nagel deleted the 4-skip-unneeded-functionality-dependencies branch October 18, 2024 13:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant