diff --git a/.dev/build-docs.sh b/.dev/build-docs.sh new file mode 100644 index 0000000000..c4297b4177 --- /dev/null +++ b/.dev/build-docs.sh @@ -0,0 +1,24 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +docker run \ + -e HOST_UID=$(id -u) \ + -e HOST_GID=$(id -g) \ + --mount type=bind,source="$PWD",target="/spark-website" \ + -w /spark-website \ + docs-builder:latest \ + /bin/bash -c "sh .dev/run-in-container.sh" diff --git a/.dev/run-in-container.sh b/.dev/run-in-container.sh new file mode 100644 index 0000000000..1ba306d629 --- /dev/null +++ b/.dev/run-in-container.sh @@ -0,0 +1,35 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# 1.Set env variable. +export JAVA_HOME=/usr/lib/jvm/java-17-openjdk-arm64 +export PATH=$JAVA_HOME/bin:$PATH + +# 2.Install bundler. +gem install bundler -v 2.4.22 +bundle install + +# 3. Create a user matching the host UID/GID +groupadd -g $HOST_GID docuser +useradd -u $HOST_UID -g $HOST_GID -m docuser + +# We need this link to make sure `python3` points to `python3.11` which contains the prerequisite packages. +ln -s "$(which python3.11)" "/usr/local/bin/python3" + +# Build docs +rm -rf .jekyll-cache +su docuser -c "bundle exec jekyll build" diff --git a/README.md b/README.md index 7d051f074a..2e3e003dc6 100644 --- a/README.md +++ b/README.md @@ -3,31 +3,23 @@ In this directory you will find text files formatted using Markdown, with an `.md` suffix. Building the site requires [Ruby 3](https://www.ruby-lang.org), [Jekyll](http://jekyllrb.com/docs), and -[Rouge](https://github.com/rouge-ruby/rouge). -The easiest way to install the right version of these tools is using -[Bundler](https://bundler.io/) and running `bundle install` in this directory. - -See also [https://github.com/apache/spark/blob/master/docs/README.md](https://github.com/apache/spark/blob/master/docs/README.md) - -A site build will update the directories and files in the `site` directory with the generated files. -Using Jekyll via `bundle exec jekyll` locks it to the right version. -So after this you can generate the html website by running `bundle exec jekyll build` in this -directory. Use the `--watch` flag to have jekyll recompile your files as you save changes. - -In addition to generating the site as HTML from the Markdown files, jekyll can serve the site via -a web server. To build the site and run a web server use the command `bundle exec jekyll serve` which runs -the web server on port 4000, then visit the site at http://localhost:4000. - -Please make sure you always run `bundle exec jekyll build` after testing your changes with -`bundle exec jekyll serve`, otherwise you end up with broken links in a few places. - -## Updating Jekyll version - -To update `Jekyll` or any other gem please follow these steps: - -1. Update the version in the `Gemfile` -1. Run `bundle update` which updates the `Gemfile.lock` -1. Commit both files +[Rouge](https://github.com/rouge-ruby/rouge). The most reliable way to ensure a compatible environment +is to use the official Docker build image from the Apache Spark repository. + +If you haven't already, clone the [Apache Spark](https://github.com/apache/spark) repository. Navigate to +the Spark root directory and run the following command to create the builder image: +``` +docker build \ + --tag docs-builder:latest \ + --file dev/spark-test-image/docs/Dockerfile \ + dev/spark-test-image-util/docs/ +``` + +Once the image is built, navigate to the `spark-website` root directory, run the script which processes +the Markdown files in the Docker container. +``` +SPARK_WEBSITE_PATH="/path/to/spark-website" sh .dev/build-docs.sh +``` ## Docs sub-dir diff --git a/developer-tools.md b/developer-tools.md index bce821d8c6..0908cef343 100644 --- a/developer-tools.md +++ b/developer-tools.md @@ -352,17 +352,32 @@ By default, this script will format files that differ from git master. For more
Make sure you have a clean start before setting up the IDE: A clean git clone of the Spark repo, install the latest +version of the IDE.
+ +If something goes wrong, clear the build outputs by ./build/sbt clean and ./build/mvn clean, clear the m2
+cache by rm -rf ~/.m2/repository/*, re-import the project into the IDE cleanly and try again.
While many of the Spark developers use SBT or Maven on the command line, the most common IDE we
-use is IntelliJ IDEA. You can get the community edition for free (Apache committers can get
-free IntelliJ Ultimate Edition licenses) and install the JetBrains Scala plugin from Preferences > Plugins.
Preferences > Plugins.
+
+Due to the complexity of Spark build, please modify the following global settings of IntelliJ IDEA:
+ +Settings -> Build, Execution, Deployment -> Build Tools -> Maven -> Importing, make sure you
+choose “Detect automatically” for Generated source folders, and choose “generate sources” for
+Phase to be used for folders update.Settings -> Build, Execution, Deployment -> Compiler -> Scala Compiler -> Scala Compiler Server,
+pick a large enough number for Maximum heap size, MB, such as “4000”.To create a Spark project for IntelliJ:
File -> Import Project, locate the spark source directory, and select “Maven Project”.File -> Import Project, locate the spark source directory, and select “Maven Project”. It’s important to
+pick Maven instead of sbt here, as Spark has complicated building logic that is implemented for sbt using Scala code
+in SparkBuilder.scala, and IntelliJ IDEA cannot understant it well.