Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Once you have created your XSEDE account, have set up DUO authentication, and have verified that you can connect to the SSO (single sign on hub https://portal.xsede.org/single-sign-on-hub)
Then follow the instructions to start the module
 
 
  https://portal.xsede.org/psc-bridges
 
With the augmented instructions that follow.  These are the steps your students most likely will follow as well.  Stay tuned for official instructions.  Let me know if you do not get an account set up and
  active by Monday, and we will set up another time.
  To start a hadoop session with a persistent hdfs requires three simple
  steps:
 
  # XSEDE's Single Sign On (SSO) login hub, and have connected to Bridges, execute the following instructions to generate your personal virtual three node Hadoop cluster.  If you have not yet connected to Bridges via the Single Sign On hub, click here for instructions on how to do so.
 

To start a Hadoop session with a persistent hdfs:
 

1. start an interactive job with at least 3 nodes 

interact -N 3

...

Depending on the time of day and the number of jobs running on Bridges, if you attempt to access Bridges outside of the reserved time slots for your course, it may be several minutes before you are assigned a virtual cluster.

2. load the hadoop module when the job starts 

module load

...

 hadoop

WARNING! Once the module is loaded it will instruct you to run "start-hadoop.sh" This will create a NON-PERSISTENT HDFS (filesystem) nothing will be saved.

-----------------------
Almost there... one more step.
Now please run
    start-hadoop.sh
to set up your environment
-----------------------

3. Instead, for a persistent file system, set up your environment with the following script:

Note: The directory in which you run this command will be the home for the file-system, if you run this command again in another session, it will look for the file-system.  If you are not in the same directory that you had generated your file-system in previous sessions, it will generate a new file-system.  For consistency, you may want to always run this command in the root of your home directory.

/opt/packages/hadoop-testing/persist-start

...

.sh

 This command creates a persistent HDFS file system that can hold about 5 GB, this file-system will remain in tact for the duration of the course, after which it may be cleared.

4. Finally, load up the Hive module

source /opt/packages/hive/hive.sh

5. Run Hive

hive

If you see java error messages, wait a minute or two, and try to run hive again.

For more information on Bridges see https://portal.xsede.org/psc-bridges.