Contributing


This repository is at https://github.com/ustun/storm-getting-started/wiki . You can edit the page, make suggestions from the Issues tab and add pull requests.

Getting Started with Storm

Installation


0- You need to install Java, Leiningen or Maven for project management (optional, but recommended), and Git for pulling repositories (optional, but recommended).

1- Install Java. I am not sure there are any reliable deb repos for this for Ubuntu, so I downloaded from the JDK from Oracle's web site.

2- Install either Leiningen or Maven for project management.

Maven is a Java tool for downloading dependencies and running tasks on a project so that you don't need an IDE or custom compilation scripts. It requires a pom.xml file that declares the dependencies of a project. It is convention based, so that you don't need to specify the location of your source files; i.e. they need to be in a folder called source/main/java. The command to use Maven is mvn.

To install Maven, again either use the one in the repo (not recommended) or download manually and add the bin directory to your path.

  wget http://www.bizdirusa.com/mirrors/apache/maven/maven-3/3.0.4/binaries/apache-maven-3.0.4-bin.tar.gz
  tar xf apache-maven-3.0.4-bin.tar.gz
  export PATH=$PATH:/home/vagrant/apache-maven-3.0.4/bin

Leiningen is a tool that uses Maven internally for dependency management, but it is more suited for Clojure projects (Part of Storm was written in Clojure, a Lisp that runs on JVM, and there is a Clojure DSL for defining topologies. I suggest you to learn it and try to use it as it seems to be the project author's ([Nathan Marz](https://twitter.com/nathanmarz)) choice too. Note that you *can* compile and run your Java project with Lein, it is not required that you write your code in Clojure to be able to use it.) The command to use Leiningen is lein.

Its installation is simpler, but Storm examples currently use an older Leiningen version, so beware. The latest version is 2.0, but you need to install 1.7.1 for the moment.

  mkdir -p ~/bin; cd ~/bin;
  wget https://raw.github.com/technomancy/leiningen/1.7.1/bin/lein
  chmod +x lein

Now, add the ~/bin directory on your PATH and execute the lein command. On first run, it will download the required libraries for its own use.

  lein

3- Install Git.

  sudo apt-get install git

4- Clone the `storm-starter` repository on Github.

  git clone https://github.com/nathanmarz/storm-starter.git

5- Follow the instructions at https://github.com/nathanmarz/storm-starter

 For Leiningen, to run a Java project:
  lein deps
  lein compile
  java -cp $(lein classpath) storm.starter.ExclamationTopology

To run a Clojure project,

  lein deps
  lein compile
  lein run -m storm.starter.clj.word-count

For Maven, to run a Java project,

  mvn -f m2-pom.xml compile exec:java -Dexec.classpathScope=compile -Dexec.mainClass=storm.starter.WordCountTopology

Note that instead of pom.xml here we forced Maven to use the m2-pom.xml file instead. In your own projects, you could simply rename this pom.xml and get rid of the `-f m2-pom.xml` section.

6- The example projects in the previous step were all run in local mode of Storm. Storm has another mode called remote mode, which is used in production, where you package your storm project, and send it to remote workers for execution. See https://github.com/nathanmarz/storm/wiki/Running-topologies-on-a-production-cluster .I will add information about this later on.

General Overview


There are a few fundamental concepts in Storm:

- Spout (musluk): Unit of data source, where the data stream is output

- Bolt (yildirim ya da civata anlaminda): Units of execution, where the output

from a spout or bolt is transformed.

- Topology: A network of spouts and bolts. Here we define which bolt is

connected to which spout, and how. How the output of a spout is distributed to
connected bolts is via declaring the grouping. For example, if there are 3
bolts consuming the output of spout, we could arrange each output of spout to
be consumed randomly by each bolt, which is the default shuffle grouping.

- Cluster: A realization of a topology, can be local or remote. The master node

runs a daemon called Nimbus, it manages the worker nodes who run daemons
called Supervisor. The coordination between the master and workers is done via
Zookeeper, which keeps all the state of workers, hence the system can recover
in the event of a failure on a worker node.

Links


Here are a few pointers:

- http://storm-project.net/ - https://github.com/nathanmarz/storm - https://github.com/nathanmarz/storm/wiki - https://github.com/nathanmarz/storm-starter - http://www.infoq.com/presentations/Storm - http://www.infoq.com/presentations/Storm-Distributed-and-Fault-tolerant-Real-time-Computation - http://engineering.twitter.com/2011/08/storm-is-coming-more-details-and-plans.html