70

There have been a few articles floating around online over the last few years about building a cluster of computers at home - here for example.

The Pi strikes me as a good platform for building / testing this kind of thing due to their low cost; should "generic" guides like this transfer over to the Pi pretty easily, or is there anything specifically I should be aware of when attempting such a project?

Alex Chamberlain
  • 15,530
  • 14
  • 67
  • 113
berry120
  • 10,924
  • 10
  • 51
  • 62
  • 3
    Related forum thread: http://www.raspberrypi.org/phpBB3/viewtopic.php?p=18356#p18356 – finnw Jun 12 '12 at 21:50
  • I've added the tag "bramble" as this is what these are named. Can't give a proper answer beyond "put a bunch together and run hadoop or something like that" - indeed, I've favved the question to watch for answers myself! – winwaed Jun 12 '12 at 23:48
  • Other Pi users also don't know about the magic word "bramble"; searching the Foundation's forum might have helped but it's often slow to respond so I won't down-vote you. – mlp Jun 14 '12 at 00:57
  • 4
    University of SouthHampton have produced steps to make a 64 pi cluster (or "supercomputer"): http://www.southampton.ac.uk/~sjc/raspberrypi/pi_supercomputer_southampton_web.pdf – Alex L Sep 12 '12 at 13:53

7 Answers7

35

I suggest looking at Dispy - the distributed computation python module.

To run a program on a number of Raspberry Pi's (nodes) from a PC (server - assume IP is 192.168.0.100):

  • Install an operating system on each RasPi

  • Attach each RasPi to your network. Find the IP (if dynamic), or set up static IPs.
    (Let's assume that you have three nodes, and their IPs are 192.168.0.50-52)

  • Set up Python (if not already), install dispy, then run dispynode.py -i 192.168.0.100 on each RasPi. This will tell dispynode to receive job information from the server.

  • On the PC (the server), install dispy, then run the following python code:

#!/usr/bin/env python
import dispy
cluster = dispy.JobCluster('/some/program', nodes=['192.168.0.50', '192.168.0.51', '192.168.0.52'])

You can also replace /some/program with a python function - e.g. compute.

You can also include dependencies such as python objects, modules and files (which dispy will transfer to each node) by adding depends=[ClassA, moduleB, 'file1']

Alex L
  • 7,605
  • 12
  • 42
  • 53
  • 8
    I'd love to hear if someone has done this - please leave a comment letting me know if it works! – Alex L Jun 15 '12 at 09:48
  • In my case (openelec + python 2.7) I get this error "File "./Lib/multiprocessing/init.py", line 84, in , ImportError: /usr/lib/python2.7/lib-dynload/_multiprocessing.so: undefined symbol: SemLockType". – Guido Mar 13 '13 at 21:40
  • But dispy3-3.6 runs with wheezy + python3 ! 2013-03-13 23:01:30,664 - dispynode - serving 1 cpus at 192.168.1.34:51348. When you launch a task (i.e. /bin/ls) the node receives the task, moves the executable to /tmp, but something goes wrong "Executing ['/tmp/dispy/b7e04cb4a1e144e1/ls'] failed with (<class 'OSError'>, OSError(8, 'Exec format error'), <traceback object at 0x16f2580>)" – Guido Mar 13 '13 at 23:45
10

You should be aware of the work that has already been done - there's even a name for a cluster of RasPi boxen. The Embedded Linux Wiki says a Bramble is defined as "a Beowulf cluster of Raspberry Pi devices". Raspberry Pi Homebrew has a number of posts about Brambles, and see also the Foundation's own forum.

mlp
  • 323
  • 2
  • 8
  • 14
    I do not think telling someone to google the answer is good for this site. It is useful to know the name but consider adding some content and reference links to your answer. – Joe Jun 12 '12 at 23:47
  • Content and links added, @Joe. It would be nice if the downvoters now reassessed their opinions ... – mlp Jun 14 '12 at 00:59
  • 1
    I would if I actually downvoted you... – Joe Jun 14 '12 at 03:00
  • I phrased it very carefully to avoid insinuating that you had, Joe. Perhaps they don't realise that votes here can be undone by re-clicking the same button, not just reversed by clicking the opposite button. – mlp Jun 15 '12 at 00:56
10

Some guys at Southampton Uni have put together a cluster and written a detailed overview of their work at http://www.southampton.ac.uk/~sjc/raspberrypi/.

Alex Chamberlain
  • 15,530
  • 14
  • 67
  • 113
6

It is completely possible, but the biggest problem is attainability. It is an idea I would not only think workable, but useful as you could go with the idea of portable parallel computing. As far as specifics, coding languages like FORTRAN and C++ will do best.

Look at beowulf.org for more on cluster computing

Alex Chamberlain
  • 15,530
  • 14
  • 67
  • 113
Kaminara
  • 437
  • 3
  • 9
1

This is reply to Guido Garcia's post above regarding 'dispy' - I can't figure out how to reply to his post.

When a program ('/bin/ls') is distributed with 'dispy' for parallel execution, then that program on the client machine is transferred to each of the nodes (to '/tmp'). This is so that the user developed program on the client machine is transferred without having NFS or some shared diretory. This works with binary programs only when nodes and client architectures are compatible. In your case, I am guessing that client architecture is different from that of the remote nodes and a node can't execute the binary '/bin/ls' transferred from the client. If you want to execute /bin/ls on each node, it may easier to write a Python function or program to print the directory (e.g., using os.listdir) and distribute that instead of binary executable.

1

There's also http://pi.interworx.com if you want a full featured control panel with it. They have instructions up on this page on how to replicate, but you'll have to be patient as that subdomain tself is running from a Rasberry Pi cluster. Here's a photo in case it goes down:

http://www.facebook.com/photo.php?fbid=596262440393836&set=a.244167858936631.60071.170159826337435&type=1

Corey
  • 111
  • 1
0

The main options I see for cluster management on Raspberry Pi are; Docker Swarm k3s and microk8s. I found Docker Swarm easiest to set up and work with (using RPi 3Bs), and adequate for my purposes. The Kubernetes options were also fairly straightforward to set up though.

Set up guides:

Docker Swarm

k3s

microk8s

Another "vanilla" way of doing it would be to create a script that scp copies dockerfiles then ssh connects to the hosts in turn and then runs them in turn.


Watch-it:

I found I needed to install extra modules (on Ubuntu 22, 3B) to get swarm mode to work seamlessly:

sudo apt install linux-modules-extra-raspi
Lee
  • 209
  • 3
  • 15