U
s

D
a

b

a

A
R
R
A
A

K
F
M
M
I

1

i
t
m
m
w
c
f
c
r
g
t
m
s
h
t
a
t
l
b
m
o
a

1
d

Journal of Computational Science 2 (2011) 279– 285

Contents lists available at ScienceDirect

Journal  of  Computational Science

jo ur nal homepage: www.elsev ier .com/ l ocate / jocs

sing  a  custom-FPGA  architecture  to  extend  the  scale  of  atomistic  magnetic  spin
imulations

avid  Cortiea,b,∗,  John  Pillansb

Institute for Superconducting and Electronic Materials, University of Wollongong, Wollongong, New South Wales, Australia
Australian Nuclear Science and Technology Organisation, New Illawara Road, Sydney, New South Wales, Australia

 r  t  i  c  l  e  i  n  f  o

rticle history:
eceived 16 February 2011
eceived in revised form 21 April 2011

a  b  s  t  r  a  c  t

New  computational  solutions  are  required  to  understand  how  atomic-scale  properties  affect  magnetic
behaviour  at  micrometer  dimensions.  We  describe  a field-programmable-gate-array  (FPGA)  based  simu-
lation  of a dilute  antiferromagnet  with  a large  number  of  Ising  spins  using  Glauber  dynamics.  The simple
ccepted 27 April 2011
vailable online 19 May 2011

eywords:
ield-programmable-gate-array
agnetism

atomic  model  qualitatively  reproduces  experimental  findings  when  scaled  up  to sufficiently  large  spatial
dimensions,  and  provides  insight  into  the finite  size  thresholds  separating  nanoscale  from  microscale
behaviour.  A  real-time  visualisation  module  was  used  to  study  the dynamics  of  the  fractal  domain  struc-
ture and  non-exponential  relaxation  mechanism.  A performance  comparison  with  contemporary  GPU
and CPU  implementations  suggests  that  a  FPGA  route  is  a competitive  alternative.
odelling
sing model

. Introduction

Historically the study of magnetism in solids has motivated
nnovative approaches in mathematics, physics and computa-
ional science because simple experimental facts have complicated

icroscopic origins. For example, the origin of a permanent
agnet’s properties was only satisfactorily understood in 1944
hen Onsager proved that atomic spins with Ising-behaviour

ould undergo a disorder–order phase transition into a stable
erromagnetic state [17]. To this day, significant gaps remain in
ontemporary scientific knowledge about magnetism, particularly
egarding the properties of magnetically ordered materials that are
rown into artificial geometries such as nano-wires or atomic-layer
hin films. Computational modeling has proved to be a vital instru-

ent for resolving many questions, but the diverse range of time
cales (picoseconds to years) and spatial scales (nm to cm)  [19]
as necessitated multi-scale modeling which relies on a ladder of
echniques [10]. For example, there is a fissure between discrete
tomistic and continuous micromagnetic modeling which hinders
he development of spintronic technology operating between these
ength scales. Large scale atomistic simulations are highly desirable,

ut the computational costs for systems that reach from atomic to
icrometer scales are unattainable for first-principal calculations

r classical Landau–Lifshitz–Gilbert (LLG) dynamics for discrete
tomic spins. Slow creep dynamics of large spin systems have

∗ Corresponding author. Tel.: +61 2 9717 3600.
E-mail address: dcr@ansto.gov.au (D. Cortie).

877-7503/$ – see front matter. Crown Copyright ©  2011 Published by Elsevier B.V. All ri
oi:10.1016/j.jocs.2011.04.001
Crown Copyright ©  2011 Published by Elsevier B.V. All rights reserved.

been studied using the Monte Carlo formalism derived from sta-
tistical mechanics, but the computational cost is prohibitive [15].
To increase the number of spins that can be simulated in a rea-
sonable time frame, a series of computational approaches have
been explored including custom-built computers [4],  supercom-
puters [15] and, more recently, the use of graphical-process units
(GPU’s) [1,2,18]. Here we  discuss an alternative solution using
a field-programmable-gate-array (FPGA) implementation of the
Ising Model with a Monte Carlo update algorithm.

2. FPGA implementations of scientific models

The field-programmable-gate-array is a compromise between
the speed of an application specific integrated circuit (ASIC) and
the re-programmable nature of software running on a CPU or
GPU. Unlike an ASIC, the underlying FPGA architecture is hardware
re-programmable allowing the reconfiguration of logic gates for
specific computational tasks. Due to this unique capability, FPGA’s
are currently topical in the field of adaptive computation, cryp-
toanalysis and neural networks [20] but they have seldom been
applied to scientific simulations of magnetism [6].  They are ideally
suited to Ising Model Monte Carlo programs owing to their built-
in parallelism and the binary nature of Ising spins. In this work
we developed a single-chip FPGA architecture to study the Ising

spin system dynamics on a simple cubic lattice using parallelised
calculations. Large 3D lattice sizes could be simulated including
32 × 32 × 32, 128 × 128 × 128 and 512 × 512 × 128 but increasing
lattice size introduced constraints on the number of parallel pro-
cesses which could be run. A simple system, parameterised in the

ghts reserved.


2 mputational Science 2 (2011) 279– 285

d
w
r
r
o
p
n
O
2
t
t
c
a
l
F
d

3

m
I
v
a
b
b
s
s
l
s
s
e
[

H

A
a
i
o
p
k
H

F
t

80 D. Cortie, J. Pillans / Journal of Co

imensions of the array, was developed using a low level hard-
are development language which allowed efficient utilisation of

esources within the inexpensive Xilinx Spartan 3E 1200 FPGA. A
eal-time visualisation was included in the system without impact
n computational performance, and several metrics were com-
uted and streamed to an ordinary personal computer. To ensure
umerical accuracy, the FPGA-simulation was first checked against
nsager’s exact solution which analytically describes the simpler
D Ising ferromagnet. We  then applied the computational system to
he case of a diluted antiferromagnet (DAFF) system which is a pro-
otypical example of a random-field antiferromagnet. This material
lass displays a rich set of phenomena which cannot be solved
nalytically or described in a mean-field approach, but are techno-
ogically important for exchange bias layers in spin valves [14]. The
PGA architecture allowed the simulation to access larger spatial
imensions and time scales than have previously been studied.

. Implementation

A Monte Carlo procedure with a heat-bath algorithm for deter-
ining spin flips was used to study the ensemble properties of an

sing spin system as a function of temperature and field. Initially the
irtual spins are placed at regular points on a cubic lattice and given

 random direction. To simulate dilution, certain spins are replaced
y non-magnetic defect sites. For the FPGA model, we  implement a
itwise representation so that 01 represents spin up, 00 represents
pin down and 10 represents a defect site. The array of spin data is
tored in a single long shift register, thereby mapping the 3D cubic
attice of spins into a 1D linear array. At each time step (Monte Carlo
tep), a sweep is made through the entire array of spins. During the
weep, the energy is calculated for each spin using the Hamiltonian
nergy equation for the dilute antiferromagnet in an external field
15]:

 = −J
∑

ij

εiεj�i�j −
∑

i

�Bzεi�i (1)

A detailed explanation of the terms above is given in Appendix
. On an FPGA, the latter energy calculation can be conceptualised
s a 3D convolution. Each convolution unit taps the spin shift reg-

ster at appropriate points to feed a cubic kernel so that the values
f nearest-neighbor spins arrive synchronously to be processed in
arallel by the kernel as illustrated schematically in Fig. 1. The
ernel unit handles the mathematical operations defined by the
amiltonian by calculating the energy of a spin using the values

ig. 2. The kernel performs the mathematical operations defined by the Hamiltonian by m
he  effect of an external magnetic field bias. The temperature parameter alters the probab
Fig. 1. The convolution unit employs delay lines to synchronously feed the data
from the 6 nearest-neighbors to the kernel which processes the Hamiltonian energy
calculation.

of cubic-nearest neighbor spins and the external field, as shown in
Fig. 2. In this bitwise representation, the multiplications of neigh-
boring spins is equivalent to the fundamental logic operation of
eXclusive OR, of which the six resultants are then summed as either
positive or negative one, and then summed with the bias field
multiplied by the central cell. To avoid unnecessary calculation,
we choose natural units with J = −1, and b = �Bz, so that the first
term of the Hamiltonian gives an integer in the range [−6, 6] . Once
the energy contribution of a spin in its current configuration (E)
is known, a Heat-Bath algorithm is used to stochastically deter-
mine if the spin flips to the opposite state of energy (E′′) using the
probability:

Pflip = 1
1 + exp(E�/kBT)

(2)

′′
where E� = E − E, i.e. the energy difference between the flipped
(final) and unflipped (initial) calculated from Eq. (1).  To reduce
numerical calculations, we employ the standard result that:

E� = −2E

ultiplaying the spin at the origin with the six nearest-neighbor values and adding
ility for accepting a trial move which is read from a look-up table.


D. Cortie, J. Pillans / Journal of Computa

Fig. 3. The addition of convolution units allows for parallel calculation of spin
u
a
t

a
e
t
h
a
o
r
T
r
v
p
o
t
w
e
t
r
b
t
t
[
w
r
t
c
f
T
f
a
o
n

4

w
c
s
w
p
f
i
n
i

lation by a single grid point creates a distinct out-of-phase pattern,
but translating a second grid point in any direction restores the
initial phase. On an FPGA, this domain image processing was han-
pdates at independent points in the array. Single bit wide paths are shown in blue
nd  scalar paths are shown in red. (For interpretation of the references to colour in
his figure legend, the reader is referred to the web  version of the article.)

s E′′ = −E which can be shown by letting �i → − �i in Eq. (1). This
liminates the need for 6 XOR gates and associated adders. Fur-
her optimisations are found in the probability function of the
eat-bath algorithm. It comprises an exponential and division oper-
tion which would require significant resources to implement, but
nly depends on two variables: the energy shift (E�), and the
educed temperature (t = kBT) where kB is the Boltzmann constant.
hese two variables are therefore absorbed into a look-up table
esident in read only memory (ROM) which has pre-calculated
alues of the probability function for a discrete range of tem-
eratures and fields. The spin-flip probability given by the ROM
utput is then compared against a Linear Feedback Shift Regis-
er which generates pseudo-random numbers in the range [0,1)
ith a period of 232. If the pseudo-random number is less or

qual to the spin-flip probability, the flip is accepted, otherwise
he spin remains in its initial state. Since, like the Metropolis algo-
ithm, the heat-bath algorithm obeys the principal of detailed
alance, the stochastic generation of new states eventually forces
he ensemble to converge to the true statistical distribution despite
he fact that the partition function is never explicitly calculated
12]. Defect cells representing dilute, non-magnetic impurities
ere included in our final program by duplicating the entire shift

egister loop with a second parallel set of shift registers storing
he defect state (defect/not). The kernel required only minimal
hanges to incorporate this defect information, inhibiting the adder
or each side of the cubic kernel that has a defect present on it.
he completed kernel module is shown in Fig. 2. It is fed data
rom a string of shift registers as in the naive implementation of

 convolution, with the output of the kernel being independent
f the input allowing additional pipeline stages to be added if
eeded.

. Parallelization

The notable thing about the FPGA implementation is the ease
ith which it can be parallelized simply by the addition of more

onvolution units, limited only by the available resources of the
ystem. This can be viewed as a hardware analogue to a soft-
are approach that uses multi-threaded processes, where it is
ossible for each convolution unit/core to simultaneously per-

orm calculations on independent sections of the spin lattice. This
s schematically shown in Fig. 3. In this work we  explored run-
ing between 1 and 24 convolution units simultaneously on our

nexpensive system. The number of convolution units was lim-
tional Science 2 (2011) 279– 285 281

ited by the spin lattice size: 32 × 32 × 32 (24 cores); 64 × 64 × 64
(8 cores); 128 × 128 × 128 (2 cores); 512 × 512 × 128 (1 core). By
convention, a Monte Carlo step is defined as sufficient iterations
of the main algorithm so that, on average, every spin in the array
receives a single opportunity to alter its value. For this reason,
the performance in Monte Carlo steps per second is inversely pro-
portional to the volume (length) of the array. Larger lattices are
therefore more costly. The embarrassingly parallel nature of the
problem can be exploited to reduce the cost by chaining multiple
convolution units, linearly increasing performance by the number
of parallel processes. Each convolution unit operates at a con-
stant rate of one spin update per clock cycle. Beyond our current
implementation on a low cost development board with a single
FPGA and external memory element, there are further opportu-
nities for scaling the system both in array size and performance.
The parallel implementation of the convolution units and their low
bandwidth interconnect allows linear scaling of speed by adding
additional devices/silicon. Our implementation operates with a
100 MHz  clock while recently released devices surpass 400 MHz.
Technical details regarding device resource usage are given in
Fig. 9 in Appendix A.

4.1. Visualisation

Rather than having to share read access to the dataset in a sin-
gle memory location, as in processor based implementations, both
the visualisation and stastistics modules duplicate the memory
write operations of the storage array to their own local memo-
ries. This allows them to process every spin of the lattice at each
time step for implicit real-time presentation of the generated data
without reducing the performance of the simulation, in contrast
to other contemporary solutions [1].  To visualise the domains in
the dilute antiferromagnet, it is insufficient to simply render the
values of the spins at each site as this leads to a confusing image
of many inter-penetrating checkerboards. Instead in this work we
deploy a simple post-processing algorithm on the spin lattice image
to distinguish between antiferromagnetic domains. The boundary
between domains designates a point where two  spins lie parallel
to each other, separating larger clusters in which neighboring spins
lie antiparallel to one aother. Past work has relied on unnecessarily
complicated cluster methods to differentiate between antiferro-
magnetic domains in an Ising model [16]. In this work we  used
a simpler method to colour spins based on whether or not they
obeyed the relation below:

�x,y,z = (−1)x+y+z

where x, y and z are the Cartesian indexes of the spin in the
cubic lattice. Green was  used to colour spins which obeyed the
above relation. Blue spins did not. Note, after this visualisation
post-processing, solid blue (or green) regions actually contain Ising
spins with both +1 and −1 values. Importantly, inside each of
these regions antiferromagnetic ordering is unbroken so that per-
fect long-range ordering would correspond to a solid colour. Our
method implicitly relies on the fact that only two types of domains
(out of phase by 180◦) are possible in a cubic Ising antiferromag-
net. This fact can be easily checked conceptually by considering the
translation properties of an infinite 2D or 3D checkerboard. Trans-
dled by the XOR of the spin with the least significant bits of its
indices in the 3D array. Defect spins were rendered separately and
represented as red pixels. Generally we examined only a 2D plane
representing a single layer of the lattice.


282 D. Cortie, J. Pillans / Journal of Computational Science 2 (2011) 279– 285

1.0

0.5

0.0
43210

 FPGA Monty Carlo Solutio n
 Onsager Analytical Solution

F nctio
b ne).

5

5
I

f
s
p
s
l
w
s
a
5
c
A
n

〈

s
t
T
e
a

F
m

ig. 4. The FGPA Monte Carlo numerical result for the average spin value (M)  as a fu
y  Onsager. Computational simulation (circles) and analytical calculation (dotted li

. Results

.1. Numerical accuracy: ordering temperature of undiluted 2D
sing model

A test of the numerical accuracy of the simulation was per-
ormed by comparing the solution against Onsager’s analytical
olution for a 2D Ising ferromagnet [17]. To facilitate direct com-
arison with the simpler case that can be exactly solved, a lattice
ize of 128 × 128 × 1 with J = 1 and b = 0 and zero defects was  simu-
ated. Either fixed or periodic boundary conditions could be applied

ith no discernable effect on the interior ensemble averages. The
ystem was placed in a fully saturated state at zero temperature,
nd then sequentially heated to higher temperatures allowing for
00 steps per temperature interval, and 5000 steps near the criti-
al temperature to compensate for the slowing down of dynamics.
fter stability was reached at each temperature, the ensemble mag-
etisation was calculated from the average value of all N spins:

M
〉

=
i≤N∑

i

�i

N
(3)

Fig. 4 shows that the average
〈

M
〉

calculated from the FPGA

imulation agrees well with the analytical solution giving a critical
emperature Tc ≈ 2.4. More precise analysis of numerical error in
c requires the use of Binder accumulants [1] to avoid finite size
ffects, but this approach is of less interest here due to the obvious
greement for our larger simulation volumes. Smaller simulations,

ig. 5. A 2D plane extracted from the 3D lattice after post-processing shows sub-region
agnitude of external field during cooling (left) b = 0.5 (center) b = 2.0 (right) b = 4.0.
n of temperature in a 2D Ising ferromagnet agrees with the exact solution provided

or simulations with insufficient MC  steps, are well known to deviate
from the exact result [18].

5.2. Meta-stable fractal domain state in bulk dilute
antiferromagnet

The addition of random fields, due to disorder, in the Ising mag-
net results in the formation of frozen domains. This has been proven
experimentally [5,9,13] and theoretically [3,15,16]. We  used our
simulation to investigate this feature. Following an argument sim-
ilar to the original Imry-Ma argument [21], it can be predicted
that domains should be sensitive to field cooling values but should
be frozen at sufficiently low temperatures. We  find that meta-
stable domains are formed in the Ising simulation, as visualised
in Fig. 5, which have a strong dependence on the degree of non-
magnetic impurities and the magnitude of the cooling field. This
behaviour is different from the larger domains that form in pure
ferromagnet or antiferromagnet Ising magnets since these tend to
be highly unstable and transient. Fig. 6 shows the surface layer
of a 6 percent diluted DAFF field cooled from t = 6.0 to t = 0.4 for
500 MC’s per 0.2 temperature interval in three different fields.
Stronger external fields lead to the formation of a smaller and more
chaotic domain structure [16]. Increasing dilution has a similar
effect. The application of the zero or opposite field did not alter

the domain distribution once the domains were formed, imply-
ing the structure is frozen at sufficiently low temperatures, and
mirroring magnetic memory effects known to occur experimen-
tally. Sufficiently high temperature removed the domain memory
by forcing the ordered spins back into a random, paramagnetic

s of antiferromagnetically ordered spins appear. The domain size depends on the


D. Cortie, J. Pillans / Journal of Computa

10-4

10-3

10-2

10-1

100

M
(1

/M
sa

t)

102 10 3 10 4 10 5

Time (MC Steps)

 Ferromagnet
 Antiferromagnet
 Dilute Antiferromagnet

Fig. 6. After a large saturating field is suddenly turned off, the average value of spin
behaves very differently in pure and dilute Ising magnets. For the ferromagnetic
case, no relaxation occurs and a remnant magnetic state is preserved even in zero
field which is the defining characteristic of a permanent magnet. For the antiferro-
magnet, the magnetisation quickly approaches zero as two  antiparallel sub-lattices
f
c
p
t

s
d
f
c
w
o
f
a
t

5
a

t
I
I
l
s
a
c
f
w
p
c
r
f
r
a
fl

F
s

orm providing perfect cancellation. In the dilute anti ferromagnet, such sub-lattice
ompensation is prevented by disorder so that a small parasitic magnetisation is
reserved for long time periods and undergoes gradual non-exponential decay. Fits
o  the log–log relationship yield an exponent of  ̨ = 10.4.

tate. These features are in qualitative agreement with neutron
iffraction experiments on FeZnxF2 − x [5] which showed that anti-
erromagnet domain size was highly dependent on field cooling
onditions and remained frozen below a critical temperature even
hen the field was removed. Based on these experimental and the-

retical findings, an obvious point to make is that there must be a
undamental limit on the size of antiferromagnetic particle where

 domain can form which is dependent on the field and dilution of
he particle. This is discussed further in Section 5.4.

.3. Altered relaxation Glauber dynamics in dilute
ntiferromagnet compared with ferromagnet

Although real spins have more complex dynamic behaviour
han idealised Ising spins, Glauber pioneered a link between the
sing model and real dynamics showing that, in certain cases, the
sing model reveals insights into how systems approach equi-
ibrium [8].  In this work, we use the FPGA implementation to
how that the diluted antiferromagnet has highly distinct relax-
tion dynamics from the pure ferromagnetic or antiferromagnetic
ase. Each of the 3 different types of magnets was field cooled
rom t = 6.0 to t = 0.4, reducing temperature in 0.2 intervals while
aiting 500 MC’s per step, in an external field of b = 2.0. At this
oint the external field was switched off (b = 0.0) and data was
ollected regarding the system’s reaction. Fig. 7 shows that the fer-

omagnet remained as a permanent magnet where local exchange
orced spins to remain aligned. Meanwhile, spins in the antifer-
omagnet spontaneously and abruptly reordered into antiparallel
lignments so that

〈
M

〉
was reduced rapidly to the level of thermal

uctuations in less than 100 MC  steps. However, the diluted antifer-

ig. 7. Domain images for different lattices dimensions with 12.5 dilution and b = 0.5 a
imulation sizes. This suggests that domain behaviour is inhibited in antiferromagnetic n
tional Science 2 (2011) 279– 285 283

romagnet showed a qualitatively different behaviour undergoing a
rapid drop in magnetisation to about 10 percent polarisation fol-
lowed by slow relaxation dynamics which follows a power-law
shown by the straight line in the log–log plots. The fitted expo-
nent is in good agreement with past theoretical findings [15].
This implies that the domains in the diluted antiferromagnet are
capable of retaining some trapped magnetisation which is slowly
destroyed by thermal activation leading to domain wall migration.
This agrees qualitatively with the slow non-exponential decay law
found experimentally [9,13].

5.4. Finite size effects: cross-over from nanoparticle to bulk
behaviour

It has been noted by previous authors that domains do not form
in small simulations [16], necessitating expensive large scale simu-
lations, or unphysical field cooling values to create artificially small
domains. This is noted even for simulations where periodic bound-
ary conditions (mimicking an infinite lattice) are applied. Here we
show that changing the boundary condition to a vacuum boundary
(zero spin) accentuates this phenomenon so that domains are not
formed for simulations of smaller than 64 × 64 × 64 spins at a cool-
ing field of b = 0.5 with 12.5 percent dilution. Although transient
domains appear (as they do for the undiluted Ising model), these
completely disappear between 100 and 5000 Monte Carlo steps
resulting in single-domain states for smaller simulations in Fig. 7.
Larger simulations begin to show the formation of domains for
identical field and dilution values, and these domains persist well
beyond 100,000 time steps with little change. The exact thresh-
old point where a domain state can be frozen into the particle
depends on three parameters: cooling field, dilution and size of
simulation. We argue that the latter parameter is not merely a com-
putational artifact but reflects a fundamental change in behaviour
from nanocrystallites or nanoparticles to the case of bulk materi-
als with vastly increased number of spins. If we  assume a lattice
constant of ≈0.5–1.0 nm which is typical for cubic Ising antiferro-
magnets (CoO or FeF2) then, based on these preliminary results,
we suggest that the critical grain size is between 30 and 60 nm for
a multidomain state to form. More rigorous computational work
based on more quantitative models would assist in determining
the precise threshold points. However this qualitative feature is
strongly suggested using the simple Ising model. Such a threshold
would certainly explain puzzling discrepancies in various theories
of magnetic exchange bias which disagree about whether domains
play an important role [7,14].

6. Discussion and summary

We  used a novel computational method to study the proper-
ties of the diluted Ising antiferromagnet. This class of material is

fundamentally different from that of the pure ferromagnetic or
antiferromagnetic version. The distinguishing features are the for-
mation of a meta-stable domain structure and the possibility of
a weakly decaying parasitic magnetisation in zero field. Notably
domains only form for at least 64 × 64 × 64 spin volumes (for b = 0.5

fter 1000 Monte Carlo steps, showing that multidomain states vanish for smaller
anoparticles or nanocrystalline solids beneath a critical size threshold.


284 D. Cortie, J. Pillans / Journal of Computational Science 2 (2011) 279– 285

10 -3

10 -2

10 -1

10 0

10 1

10 2

10 3

10 4

10 5

M
on

te
 C

ar
lo

 S
te

ps
 P

er
 S

ec
on

d

104 105 106 107 108 109 1010

Total number of spins in cubic volume

N
an

om
et

er
 S

ca
le

s

M
ic

ro
m

et
er

 S
ca

le
s

Acceptable Total Simulation TimeI) II)

III)III) IV)

M
onths

D
ay

s

 CPU
 FPGA 
 GPU 

F ngle C
1 d 819
C lation
s

a
a
t
r
t
t
t
s
a
a
p
i
t
s
a
c
i
o
d
t
o
f
t
t
s
i
c
o
t
m

7

t
3
f
n
d
t
a

H = −J

ij

εiεj�i�j −
i

�Bzεi�i

The first term describes Ising spins where �i is the value of a
spin at a lattice point which can only take two values (+1 or −1).
ig. 8. Numerical benchmark of our single FPGA Ising model with contemporary si
200  with 100 MHz  clock. CPU was a Intel Xeon X5560 at a clock rate of 2.80 GHz an
1060.  All existing solutions fail to cross the threshold where a full micrometer simu
imulation will typically need 100,000 Monte Carlo steps.

nd low dilution of 12.5%) and so large scale atomistic simulations
re essential to study these features. We  tentatively suggest that
he latter finite size effect is not merely a computational artifact but
epresents a fundamental change in behaviour from the nanoscale
o the microscale. More rigorous theory and experiment should aim
o certify this. Based on this preliminary result, if we estimate that
he spins are 0.5–1 nm apart, the model predicts a critical length
ize where antiferromagnetic domains no longer form in the region
round 30–60 nm.  This threshold has technological consequences
s this length scale is the typical grain size of nanocrystallites in
olycrystalline thin films [7] suggesting that small grains may  be

ncapable of forming multidomain states. The enhanced computa-
ional performance, and flexibility of the FPGA will enable future
tudy of the effects of different geometric particle shapes, dilution
nd cooling field on this threshold point. Fig. 8 shows a performance
omparison of our FPGA method with contemporary GPU and CPU
mplementations of the Ising model. Ostensibly the algorithm used
n our single-chip FPGA is better for smaller systems (≤ 643) but
oes not scale as well as the GPU implementations. We  remark that
he exploratory work presented in this paper was performed on an
lder FPGA device (pre 2008), which, including the visual inter-
ace has less than 1Mb  RAM. Extrapolating the performance figures
o a target such as the SciEngines RIVYERA platform should place
he developed algorithm to be competitive with the more expen-
ive GPU systems even to much higher length scales. Fig. 8 also
llustrates that all existing computational platform become unac-
eptably slow once micrometer scales are reached. Both a cluster
f GPU’s or a multi-chip FPGA supercomputer should soon be able
o breach this limit, but the GPU version would consume orders of

agnitude more power and space than an FPGA based system [11].

. Conclusions

New computational techniques should aim to allow spin models
o cross from the micrometer to the nanometer scales. The 2D and
D Ising models were successfully implemented on a FPGA chip

or the ferromagnetic, antiferromagnet and dilute antiferromag-
etic cases. The computational performance assisted our study of
istinct long time period behaviour of the three cases at large spa-
ial scales approaching a micrometer. We  showed that the FPGA
pproach was useful for running the large scale Monte Carlo sim-
PU and and single-die GPU. Data taken from Ref. [2].  FGPGA was Xilinx Spartan 3E
2 kB cache. The ‘multi-GPU’ implementation was  on a CUDA enabled NVIDIA Tesla

 can be run in less than 7 days. To determine this threshold we assumed a complete

ulations required to observe distinct features which do not appear
for small simulations of a dilute antiferromagnet. FPGA implemen-
tations of the Ising model may  have useful applications outside the
field of magnetism. Further innovation in software, hardware and
computational techniques is required to allow existing spin models
to extend from the nanometer to micrometer range.

Acknowledgments

D.C. would like to express his sincere thanks for the financial
support of the Australian Institute of Nuclear Science and Engi-
neering(AINSE) and to Prof. R.L. Stamps and Dr. Klose for helpful
discussions.

Appendix A.
∑ ∑
Fig. 9. FPGA hardware synthesis results for the 128 × 128 × 128 spin lattice.


mputa

T
r
t
i
b
m
a
n
p
o
s
t
d
t
m
fi

R

[

[

[

[

[

[

[

[

[

[

[

[

ciplinary teams, John Pillans brings a broad background of experience to his
D. Cortie, J. Pillans / Journal of Co

he binary nature of the Ising model represents a magnetic mate-
ial with a large (infinite) crystalline anisotropy which forces spins
o align along one Cartesian axis. This approximates highly uniax-
al magnets such as CoO or FeF2. Certain Ising spins are replaced
y non-magnetic defects so that εi = 0 for a defect and εi = 1 for a
agnetic spin. This means that Ising spins do not directly inter-

ct with defects, but the defects reduce the number of magnetic
eighbors. The defect spins are randomly positioned and the total
ercentage of defect sites is termed the dilution �. The second term
f the Hamiltonian describes the Zeeman coupling of individual
pins with an externally applied magnetic field which tends to align
hem in the direction of the field. The magnetic moment per atom is
escribed by �. This term is responsible for paramagnetism above
he critical temperature, and, in the case of the ferromagnetic Ising

odel, causes ferromagnetic domains to align with the external
eld below the critical temperature.

eferences

[1] K.A. Hawick, A. Leist, D.P. Playne, Interactive visualisation of spins and clusters
in  regular and small-world Ising models with CUD A on GPUs, J. Comput. Sci. 1
(2010) 33–40.

[2] B. Block, P. Virnau, T. Preis, Multi-GPU accelerated multi-spin Monte Carlo sim-
ulation of the 2D Ising model, Comput. Phys. Commun. 181 (2010) 1549–1556.

[3]  D. Chowdhury, S. Kumar, Domain growth in the three dimensional dilute Ising
model, J. Stat. Phys. 49 (1987) 855–858.

[4] J.H. Condon, A.T. Ogielski, Fast special purpose computer for Monte Carlo sim-
ulations in statistical physics, Rev. Sci. Instrum. 56 (1985) 1691–1696.

[5] A.R. King, D.P. Belanger, V. Jaccarino, Neutron scattering study of hysteresis near
Tc in d = 3 random field Ising system, Solid State Commun. 54 (1985) 79–81.

[6]  F. Belletti, et al., Simulating spin systems on IANUS, an FPGA-based computer,
Comput. Phys. Commun. 178 (2008) 208–216.

[7] K. O’ Grady, et al., A new paradigm for exchange bias in polycrystalline thin

films, J. Magn. Magn. Mater. 322 (2010) 883–899.

[8]  R.J. Glauber, Time-dependent statistics of the Ising model, J. Math. Phys. 4
(1963) 295–307.

[9] S.-J. Han, D.P. Belanger, Relaxation of the excess magnetization of random-field-
induced metastable domains in FeZnF2, Phys. Rev. B 45 (1992) 972879–972881.
tional Science 2 (2011) 279– 285 285

10] N. Kazantseva, D. Hinzke, U. Nowak, R.W. Chantrell, U. Atxitia, O.  Chubykalo-
Fesenko, Towards multiscale modeling of magnetic materials: Simulations of
FePt, Phys. Rev. B 77 (2008) 184428.

11] S. Kestur, J.D. Davis, O. Williams, BLAS Comparison on FPGA, CPU and GPU 2010,
IEEE Annu. Symp. VLSI 1 (2010) 883–899.

12] D. Loison, C.L. Qin, K.D. Schotte, X.F. Jin, Canonical local algorithms for spin
systems: heat bath and Hasting’s methods, Eur. Phys. J. B 41 (2004) 395–
405.

13] R. Bruinsma, J. Hammann, M.  Lederman, J.V. Selinger, R. Orbach, Low-
temperature dynamics of a diluted Ising antiferromagnet, Phys. Rev. Lett. 68
(1992) 2086–2089.

14] P. Miltenyi, M.  Gierlings, J. Keller, B. Beschoten, G. Guntherodt, Diluted antifer-
romagnets in exchange bias: proof of the domain state model, Phys. Rev. Lett.
84 (2000) 0031–9007.

15] U. Nowak, K.D. Usadel, Nonexponential relaxation of diluted antiferromagnets,
Phys. Rev. B 43 (1991) 851–853.

16] U. Nowak, K.D. Usadel, Structure of domains in random Ising magnets, Phys.
Rev.  B 46 (1992) 8329–8335.

17] L. Onsager, Crystal Statistics: a two-dimensional model with an order-disorder
transition, Phys. Rev. 65 (1944) 117–149.

18] T. Preis, P. Virnau, W.  Paul, J.J. Schneider, GPU accelerated Monte Carlo sim-
ulation of the 2D and 3D Ising model, Comput. Phys. Commun. 228 (2009)
44684477.

19] R.L. Stamps, Dynamic magnetic properties of films, multilayers and patterned
elements, Adv. Funct. Mater. 20 (2010) 2380–2394.

20] T. Guneysu, C. Paar, J. Pelzl, G. Pfeiffer, M.  Schimmler, C. Schleif-fer, Parallel
computing with low-cost FPGA’s: a framework for COPA-COBANA, Adv. Parallel
Comput. 15 (2008) 741–748.

21] U. Nowak, K.D. Usadel, Domain state model for EB. I. theory, Phys. Rev. B. 66
(2002) 014430.

David Cortie is a PhD student with the Australian Nuclear Science and Technology
Organization and the University of Wollongong, studying the physics of future spin-
tronic materials. His research interests include computational physics, frustrated
interactions in condensed matter and nanomagnetic systems studied with polarised
neutron scattering.

An early career professional engineer working across diverse areas in multidis-
role  as systems engineer. Having taught digital design at the university level, he
brings a contemporary knowledge to FPGA projects, and is an evangelist for novel
applications of the technology. He is currently in a research role at a prominent
multinational technology and will be innovating in less visible environments.