U s D a b a A R R A A K F M M I 1 i t m m w c f c r g t m s h t a t l b m o a 1 d Journal of Computational Science 2 (2011) 279– 285 Contents lists available at ScienceDirect Journal of Computational Science jo ur nal homepage: www.elsev ier .com/ l ocate / jocs sing a custom-FPGA architecture to extend the scale of atomistic magnetic spin imulations avid Cortiea,b,∗, John Pillansb Institute for Superconducting and Electronic Materials, University of Wollongong, Wollongong, New South Wales, Australia Australian Nuclear Science and Technology Organisation, New Illawara Road, Sydney, New South Wales, Australia r t i c l e i n f o rticle history: eceived 16 February 2011 eceived in revised form 21 April 2011 a b s t r a c t New computational solutions are required to understand how atomic-scale properties affect magnetic behaviour at micrometer dimensions. We describe a field-programmable-gate-array (FPGA) based simu- lation of a dilute antiferromagnet with a large number of Ising spins using Glauber dynamics. The simple ccepted 27 April 2011 vailable online 19 May 2011 eywords: ield-programmable-gate-array agnetism atomic model qualitatively reproduces experimental findings when scaled up to sufficiently large spatial dimensions, and provides insight into the finite size thresholds separating nanoscale from microscale behaviour. A real-time visualisation module was used to study the dynamics of the fractal domain struc- ture and non-exponential relaxation mechanism. A performance comparison with contemporary GPU and CPU implementations suggests that a FPGA route is a competitive alternative. odelling sing model . Introduction Historically the study of magnetism in solids has motivated nnovative approaches in mathematics, physics and computa- ional science because simple experimental facts have complicated icroscopic origins. For example, the origin of a permanent agnet’s properties was only satisfactorily understood in 1944 hen Onsager proved that atomic spins with Ising-behaviour ould undergo a disorder–order phase transition into a stable erromagnetic state [17]. To this day, significant gaps remain in ontemporary scientific knowledge about magnetism, particularly egarding the properties of magnetically ordered materials that are rown into artificial geometries such as nano-wires or atomic-layer hin films. Computational modeling has proved to be a vital instru- ent for resolving many questions, but the diverse range of time cales (picoseconds to years) and spatial scales (nm to cm) [19] as necessitated multi-scale modeling which relies on a ladder of echniques [10]. For example, there is a fissure between discrete tomistic and continuous micromagnetic modeling which hinders he development of spintronic technology operating between these ength scales. Large scale atomistic simulations are highly desirable, ut the computational costs for systems that reach from atomic to icrometer scales are unattainable for first-principal calculations r classical Landau–Lifshitz–Gilbert (LLG) dynamics for discrete tomic spins. Slow creep dynamics of large spin systems have ∗ Corresponding author. Tel.: +61 2 9717 3600. E-mail address: dcr@ansto.gov.au (D. Cortie). 877-7503/$ – see front matter. Crown Copyright © 2011 Published by Elsevier B.V. All ri oi:10.1016/j.jocs.2011.04.001 Crown Copyright © 2011 Published by Elsevier B.V. All rights reserved. been studied using the Monte Carlo formalism derived from sta- tistical mechanics, but the computational cost is prohibitive [15]. To increase the number of spins that can be simulated in a rea- sonable time frame, a series of computational approaches have been explored including custom-built computers [4], supercom- puters [15] and, more recently, the use of graphical-process units (GPU’s) [1,2,18]. Here we discuss an alternative solution using a field-programmable-gate-array (FPGA) implementation of the Ising Model with a Monte Carlo update algorithm. 2. FPGA implementations of scientific models The field-programmable-gate-array is a compromise between the speed of an application specific integrated circuit (ASIC) and the re-programmable nature of software running on a CPU or GPU. Unlike an ASIC, the underlying FPGA architecture is hardware re-programmable allowing the reconfiguration of logic gates for specific computational tasks. Due to this unique capability, FPGA’s are currently topical in the field of adaptive computation, cryp- toanalysis and neural networks [20] but they have seldom been applied to scientific simulations of magnetism [6]. They are ideally suited to Ising Model Monte Carlo programs owing to their built- in parallelism and the binary nature of Ising spins. In this work we developed a single-chip FPGA architecture to study the Ising spin system dynamics on a simple cubic lattice using parallelised calculations. Large 3D lattice sizes could be simulated including 32 × 32 × 32, 128 × 128 × 128 and 512 × 512 × 128 but increasing lattice size introduced constraints on the number of parallel pro- cesses which could be run. A simple system, parameterised in the ghts reserved. 2 mputational Science 2 (2011) 279– 285 d w r r o p n O 2 t t c a l F d 3 m I v a b b s s l s s e [ H A a i o p k H F t 80 D. Cortie, J. Pillans / Journal of Co imensions of the array, was developed using a low level hard- are development language which allowed efficient utilisation of esources within the inexpensive Xilinx Spartan 3E 1200 FPGA. A eal-time visualisation was included in the system without impact n computational performance, and several metrics were com- uted and streamed to an ordinary personal computer. To ensure umerical accuracy, the FPGA-simulation was first checked against nsager’s exact solution which analytically describes the simpler D Ising ferromagnet. We then applied the computational system to he case of a diluted antiferromagnet (DAFF) system which is a pro- otypical example of a random-field antiferromagnet. This material lass displays a rich set of phenomena which cannot be solved nalytically or described in a mean-field approach, but are techno- ogically important for exchange bias layers in spin valves [14]. The PGA architecture allowed the simulation to access larger spatial imensions and time scales than have previously been studied. . Implementation A Monte Carlo procedure with a heat-bath algorithm for deter- ining spin flips was used to study the ensemble properties of an sing spin system as a function of temperature and field. Initially the irtual spins are placed at regular points on a cubic lattice and given random direction. To simulate dilution, certain spins are replaced y non-magnetic defect sites. For the FPGA model, we implement a itwise representation so that 01 represents spin up, 00 represents pin down and 10 represents a defect site. The array of spin data is tored in a single long shift register, thereby mapping the 3D cubic attice of spins into a 1D linear array. At each time step (Monte Carlo tep), a sweep is made through the entire array of spins. During the weep, the energy is calculated for each spin using the Hamiltonian nergy equation for the dilute antiferromagnet in an external field 15]: = −J ∑ ij εiεj�i�j − ∑ i �Bzεi�i (1) A detailed explanation of the terms above is given in Appendix . On an FPGA, the latter energy calculation can be conceptualised s a 3D convolution. Each convolution unit taps the spin shift reg- ster at appropriate points to feed a cubic kernel so that the values f nearest-neighbor spins arrive synchronously to be processed in arallel by the kernel as illustrated schematically in Fig. 1. The ernel unit handles the mathematical operations defined by the amiltonian by calculating the energy of a spin using the values ig. 2. The kernel performs the mathematical operations defined by the Hamiltonian by m he effect of an external magnetic field bias. The temperature parameter alters the probab Fig. 1. The convolution unit employs delay lines to synchronously feed the data from the 6 nearest-neighbors to the kernel which processes the Hamiltonian energy calculation. of cubic-nearest neighbor spins and the external field, as shown in Fig. 2. In this bitwise representation, the multiplications of neigh- boring spins is equivalent to the fundamental logic operation of eXclusive OR, of which the six resultants are then summed as either positive or negative one, and then summed with the bias field multiplied by the central cell. To avoid unnecessary calculation, we choose natural units with J = −1, and b = �Bz, so that the first term of the Hamiltonian gives an integer in the range [−6, 6] . Once the energy contribution of a spin in its current configuration (E) is known, a Heat-Bath algorithm is used to stochastically deter- mine if the spin flips to the opposite state of energy (E′′) using the probability: Pflip = 1 1 + exp(E�/kBT) (2) ′′ where E� = E − E, i.e. the energy difference between the flipped (final) and unflipped (initial) calculated from Eq. (1). To reduce numerical calculations, we employ the standard result that: E� = −2E ultiplaying the spin at the origin with the six nearest-neighbor values and adding ility for accepting a trial move which is read from a look-up table. D. Cortie, J. Pillans / Journal of Computa Fig. 3. The addition of convolution units allows for parallel calculation of spin u a t a e t h a o r T r v p o t w e t r b t t [ w r t c f T f a o n 4 w c s w p f i n i lation by a single grid point creates a distinct out-of-phase pattern, but translating a second grid point in any direction restores the initial phase. On an FPGA, this domain image processing was han- pdates at independent points in the array. Single bit wide paths are shown in blue nd scalar paths are shown in red. (For interpretation of the references to colour in his figure legend, the reader is referred to the web version of the article.) s E′′ = −E which can be shown by letting �i → − �i in Eq. (1). This liminates the need for 6 XOR gates and associated adders. Fur- her optimisations are found in the probability function of the eat-bath algorithm. It comprises an exponential and division oper- tion which would require significant resources to implement, but nly depends on two variables: the energy shift (E�), and the educed temperature (t = kBT) where kB is the Boltzmann constant. hese two variables are therefore absorbed into a look-up table esident in read only memory (ROM) which has pre-calculated alues of the probability function for a discrete range of tem- eratures and fields. The spin-flip probability given by the ROM utput is then compared against a Linear Feedback Shift Regis- er which generates pseudo-random numbers in the range [0,1) ith a period of 232. If the pseudo-random number is less or qual to the spin-flip probability, the flip is accepted, otherwise he spin remains in its initial state. Since, like the Metropolis algo- ithm, the heat-bath algorithm obeys the principal of detailed alance, the stochastic generation of new states eventually forces he ensemble to converge to the true statistical distribution despite he fact that the partition function is never explicitly calculated 12]. Defect cells representing dilute, non-magnetic impurities ere included in our final program by duplicating the entire shift egister loop with a second parallel set of shift registers storing he defect state (defect/not). The kernel required only minimal hanges to incorporate this defect information, inhibiting the adder or each side of the cubic kernel that has a defect present on it. he completed kernel module is shown in Fig. 2. It is fed data rom a string of shift registers as in the naive implementation of convolution, with the output of the kernel being independent f the input allowing additional pipeline stages to be added if eeded. . Parallelization The notable thing about the FPGA implementation is the ease ith which it can be parallelized simply by the addition of more onvolution units, limited only by the available resources of the ystem. This can be viewed as a hardware analogue to a soft- are approach that uses multi-threaded processes, where it is ossible for each convolution unit/core to simultaneously per- orm calculations on independent sections of the spin lattice. This s schematically shown in Fig. 3. In this work we explored run- ing between 1 and 24 convolution units simultaneously on our nexpensive system. The number of convolution units was lim- tional Science 2 (2011) 279– 285 281 ited by the spin lattice size: 32 × 32 × 32 (24 cores); 64 × 64 × 64 (8 cores); 128 × 128 × 128 (2 cores); 512 × 512 × 128 (1 core). By convention, a Monte Carlo step is defined as sufficient iterations of the main algorithm so that, on average, every spin in the array receives a single opportunity to alter its value. For this reason, the performance in Monte Carlo steps per second is inversely pro- portional to the volume (length) of the array. Larger lattices are therefore more costly. The embarrassingly parallel nature of the problem can be exploited to reduce the cost by chaining multiple convolution units, linearly increasing performance by the number of parallel processes. Each convolution unit operates at a con- stant rate of one spin update per clock cycle. Beyond our current implementation on a low cost development board with a single FPGA and external memory element, there are further opportu- nities for scaling the system both in array size and performance. The parallel implementation of the convolution units and their low bandwidth interconnect allows linear scaling of speed by adding additional devices/silicon. Our implementation operates with a 100 MHz clock while recently released devices surpass 400 MHz. Technical details regarding device resource usage are given in Fig. 9 in Appendix A. 4.1. Visualisation Rather than having to share read access to the dataset in a sin- gle memory location, as in processor based implementations, both the visualisation and stastistics modules duplicate the memory write operations of the storage array to their own local memo- ries. This allows them to process every spin of the lattice at each time step for implicit real-time presentation of the generated data without reducing the performance of the simulation, in contrast to other contemporary solutions [1]. To visualise the domains in the dilute antiferromagnet, it is insufficient to simply render the values of the spins at each site as this leads to a confusing image of many inter-penetrating checkerboards. Instead in this work we deploy a simple post-processing algorithm on the spin lattice image to distinguish between antiferromagnetic domains. The boundary between domains designates a point where two spins lie parallel to each other, separating larger clusters in which neighboring spins lie antiparallel to one aother. Past work has relied on unnecessarily complicated cluster methods to differentiate between antiferro- magnetic domains in an Ising model [16]. In this work we used a simpler method to colour spins based on whether or not they obeyed the relation below: �x,y,z = (−1)x+y+z where x, y and z are the Cartesian indexes of the spin in the cubic lattice. Green was used to colour spins which obeyed the above relation. Blue spins did not. Note, after this visualisation post-processing, solid blue (or green) regions actually contain Ising spins with both +1 and −1 values. Importantly, inside each of these regions antiferromagnetic ordering is unbroken so that per- fect long-range ordering would correspond to a solid colour. Our method implicitly relies on the fact that only two types of domains (out of phase by 180◦) are possible in a cubic Ising antiferromag- net. This fact can be easily checked conceptually by considering the translation properties of an infinite 2D or 3D checkerboard. Trans- dled by the XOR of the spin with the least significant bits of its indices in the 3D array. Defect spins were rendered separately and represented as red pixels. Generally we examined only a 2D plane representing a single layer of the lattice. 282 D. Cortie, J. Pillans / Journal of Computational Science 2 (2011) 279– 285 1.0 0.5 0.0 43210 FPGA Monty Carlo Solutio n Onsager Analytical Solution F nctio b ne). 5 5 I f s p s l w s a 5 c A n 〈 s t T e a F m ig. 4. The FGPA Monte Carlo numerical result for the average spin value (M) as a fu y Onsager. Computational simulation (circles) and analytical calculation (dotted li . Results .1. Numerical accuracy: ordering temperature of undiluted 2D sing model A test of the numerical accuracy of the simulation was per- ormed by comparing the solution against Onsager’s analytical olution for a 2D Ising ferromagnet [17]. To facilitate direct com- arison with the simpler case that can be exactly solved, a lattice ize of 128 × 128 × 1 with J = 1 and b = 0 and zero defects was simu- ated. Either fixed or periodic boundary conditions could be applied ith no discernable effect on the interior ensemble averages. The ystem was placed in a fully saturated state at zero temperature, nd then sequentially heated to higher temperatures allowing for 00 steps per temperature interval, and 5000 steps near the criti- al temperature to compensate for the slowing down of dynamics. fter stability was reached at each temperature, the ensemble mag- etisation was calculated from the average value of all N spins: M 〉 = i≤N∑ i �i N (3) Fig. 4 shows that the average 〈 M 〉 calculated from the FPGA imulation agrees well with the analytical solution giving a critical emperature Tc ≈ 2.4. More precise analysis of numerical error in c requires the use of Binder accumulants [1] to avoid finite size ffects, but this approach is of less interest here due to the obvious greement for our larger simulation volumes. Smaller simulations, ig. 5. A 2D plane extracted from the 3D lattice after post-processing shows sub-region agnitude of external field during cooling (left) b = 0.5 (center) b = 2.0 (right) b = 4.0. n of temperature in a 2D Ising ferromagnet agrees with the exact solution provided or simulations with insufficient MC steps, are well known to deviate from the exact result [18]. 5.2. Meta-stable fractal domain state in bulk dilute antiferromagnet The addition of random fields, due to disorder, in the Ising mag- net results in the formation of frozen domains. This has been proven experimentally [5,9,13] and theoretically [3,15,16]. We used our simulation to investigate this feature. Following an argument sim- ilar to the original Imry-Ma argument [21], it can be predicted that domains should be sensitive to field cooling values but should be frozen at sufficiently low temperatures. We find that meta- stable domains are formed in the Ising simulation, as visualised in Fig. 5, which have a strong dependence on the degree of non- magnetic impurities and the magnitude of the cooling field. This behaviour is different from the larger domains that form in pure ferromagnet or antiferromagnet Ising magnets since these tend to be highly unstable and transient. Fig. 6 shows the surface layer of a 6 percent diluted DAFF field cooled from t = 6.0 to t = 0.4 for 500 MC’s per 0.2 temperature interval in three different fields. Stronger external fields lead to the formation of a smaller and more chaotic domain structure [16]. Increasing dilution has a similar effect. The application of the zero or opposite field did not alter the domain distribution once the domains were formed, imply- ing the structure is frozen at sufficiently low temperatures, and mirroring magnetic memory effects known to occur experimen- tally. Sufficiently high temperature removed the domain memory by forcing the ordered spins back into a random, paramagnetic s of antiferromagnetically ordered spins appear. The domain size depends on the D. Cortie, J. Pillans / Journal of Computa 10-4 10-3 10-2 10-1 100 M (1 /M sa t) 102 10 3 10 4 10 5 Time (MC Steps) Ferromagnet Antiferromagnet Dilute Antiferromagnet Fig. 6. After a large saturating field is suddenly turned off, the average value of spin behaves very differently in pure and dilute Ising magnets. For the ferromagnetic case, no relaxation occurs and a remnant magnetic state is preserved even in zero field which is the defining characteristic of a permanent magnet. For the antiferro- magnet, the magnetisation quickly approaches zero as two antiparallel sub-lattices f c p t s d f c w o f a t 5 a t I I l s a c f w p c r f r a fl F s orm providing perfect cancellation. In the dilute anti ferromagnet, such sub-lattice ompensation is prevented by disorder so that a small parasitic magnetisation is reserved for long time periods and undergoes gradual non-exponential decay. Fits o the log–log relationship yield an exponent of ̨ = 10.4. tate. These features are in qualitative agreement with neutron iffraction experiments on FeZnxF2 − x [5] which showed that anti- erromagnet domain size was highly dependent on field cooling onditions and remained frozen below a critical temperature even hen the field was removed. Based on these experimental and the- retical findings, an obvious point to make is that there must be a undamental limit on the size of antiferromagnetic particle where domain can form which is dependent on the field and dilution of he particle. This is discussed further in Section 5.4. .3. Altered relaxation Glauber dynamics in dilute ntiferromagnet compared with ferromagnet Although real spins have more complex dynamic behaviour han idealised Ising spins, Glauber pioneered a link between the sing model and real dynamics showing that, in certain cases, the sing model reveals insights into how systems approach equi- ibrium [8]. In this work, we use the FPGA implementation to how that the diluted antiferromagnet has highly distinct relax- tion dynamics from the pure ferromagnetic or antiferromagnetic ase. Each of the 3 different types of magnets was field cooled rom t = 6.0 to t = 0.4, reducing temperature in 0.2 intervals while aiting 500 MC’s per step, in an external field of b = 2.0. At this oint the external field was switched off (b = 0.0) and data was ollected regarding the system’s reaction. Fig. 7 shows that the fer- omagnet remained as a permanent magnet where local exchange orced spins to remain aligned. Meanwhile, spins in the antifer- omagnet spontaneously and abruptly reordered into antiparallel lignments so that 〈 M 〉 was reduced rapidly to the level of thermal uctuations in less than 100 MC steps. However, the diluted antifer- ig. 7. Domain images for different lattices dimensions with 12.5 dilution and b = 0.5 a imulation sizes. This suggests that domain behaviour is inhibited in antiferromagnetic n tional Science 2 (2011) 279– 285 283 romagnet showed a qualitatively different behaviour undergoing a rapid drop in magnetisation to about 10 percent polarisation fol- lowed by slow relaxation dynamics which follows a power-law shown by the straight line in the log–log plots. The fitted expo- nent is in good agreement with past theoretical findings [15]. This implies that the domains in the diluted antiferromagnet are capable of retaining some trapped magnetisation which is slowly destroyed by thermal activation leading to domain wall migration. This agrees qualitatively with the slow non-exponential decay law found experimentally [9,13]. 5.4. Finite size effects: cross-over from nanoparticle to bulk behaviour It has been noted by previous authors that domains do not form in small simulations [16], necessitating expensive large scale simu- lations, or unphysical field cooling values to create artificially small domains. This is noted even for simulations where periodic bound- ary conditions (mimicking an infinite lattice) are applied. Here we show that changing the boundary condition to a vacuum boundary (zero spin) accentuates this phenomenon so that domains are not formed for simulations of smaller than 64 × 64 × 64 spins at a cool- ing field of b = 0.5 with 12.5 percent dilution. Although transient domains appear (as they do for the undiluted Ising model), these completely disappear between 100 and 5000 Monte Carlo steps resulting in single-domain states for smaller simulations in Fig. 7. Larger simulations begin to show the formation of domains for identical field and dilution values, and these domains persist well beyond 100,000 time steps with little change. The exact thresh- old point where a domain state can be frozen into the particle depends on three parameters: cooling field, dilution and size of simulation. We argue that the latter parameter is not merely a com- putational artifact but reflects a fundamental change in behaviour from nanocrystallites or nanoparticles to the case of bulk materi- als with vastly increased number of spins. If we assume a lattice constant of ≈0.5–1.0 nm which is typical for cubic Ising antiferro- magnets (CoO or FeF2) then, based on these preliminary results, we suggest that the critical grain size is between 30 and 60 nm for a multidomain state to form. More rigorous computational work based on more quantitative models would assist in determining the precise threshold points. However this qualitative feature is strongly suggested using the simple Ising model. Such a threshold would certainly explain puzzling discrepancies in various theories of magnetic exchange bias which disagree about whether domains play an important role [7,14]. 6. Discussion and summary We used a novel computational method to study the proper- ties of the diluted Ising antiferromagnet. This class of material is fundamentally different from that of the pure ferromagnetic or antiferromagnetic version. The distinguishing features are the for- mation of a meta-stable domain structure and the possibility of a weakly decaying parasitic magnetisation in zero field. Notably domains only form for at least 64 × 64 × 64 spin volumes (for b = 0.5 fter 1000 Monte Carlo steps, showing that multidomain states vanish for smaller anoparticles or nanocrystalline solids beneath a critical size threshold. 284 D. Cortie, J. Pillans / Journal of Computational Science 2 (2011) 279– 285 10 -3 10 -2 10 -1 10 0 10 1 10 2 10 3 10 4 10 5 M on te C ar lo S te ps P er S ec on d 104 105 106 107 108 109 1010 Total number of spins in cubic volume N an om et er S ca le s M ic ro m et er S ca le s Acceptable Total Simulation TimeI) II) III)III) IV) M onths D ay s CPU FPGA GPU F ngle C 1 d 819 C lation s a a t r t t t s a a p i t s a c i o d t o f t t s i c o t m 7 t 3 f n d t a H = −J ij εiεj�i�j − i �Bzεi�i The first term describes Ising spins where �i is the value of a spin at a lattice point which can only take two values (+1 or −1). ig. 8. Numerical benchmark of our single FPGA Ising model with contemporary si 200 with 100 MHz clock. CPU was a Intel Xeon X5560 at a clock rate of 2.80 GHz an 1060. All existing solutions fail to cross the threshold where a full micrometer simu imulation will typically need 100,000 Monte Carlo steps. nd low dilution of 12.5%) and so large scale atomistic simulations re essential to study these features. We tentatively suggest that he latter finite size effect is not merely a computational artifact but epresents a fundamental change in behaviour from the nanoscale o the microscale. More rigorous theory and experiment should aim o certify this. Based on this preliminary result, if we estimate that he spins are 0.5–1 nm apart, the model predicts a critical length ize where antiferromagnetic domains no longer form in the region round 30–60 nm. This threshold has technological consequences s this length scale is the typical grain size of nanocrystallites in olycrystalline thin films [7] suggesting that small grains may be ncapable of forming multidomain states. The enhanced computa- ional performance, and flexibility of the FPGA will enable future tudy of the effects of different geometric particle shapes, dilution nd cooling field on this threshold point. Fig. 8 shows a performance omparison of our FPGA method with contemporary GPU and CPU mplementations of the Ising model. Ostensibly the algorithm used n our single-chip FPGA is better for smaller systems (≤ 643) but oes not scale as well as the GPU implementations. We remark that he exploratory work presented in this paper was performed on an lder FPGA device (pre 2008), which, including the visual inter- ace has less than 1Mb RAM. Extrapolating the performance figures o a target such as the SciEngines RIVYERA platform should place he developed algorithm to be competitive with the more expen- ive GPU systems even to much higher length scales. Fig. 8 also llustrates that all existing computational platform become unac- eptably slow once micrometer scales are reached. Both a cluster f GPU’s or a multi-chip FPGA supercomputer should soon be able o breach this limit, but the GPU version would consume orders of agnitude more power and space than an FPGA based system [11]. . Conclusions New computational techniques should aim to allow spin models o cross from the micrometer to the nanometer scales. The 2D and D Ising models were successfully implemented on a FPGA chip or the ferromagnetic, antiferromagnet and dilute antiferromag- etic cases. The computational performance assisted our study of istinct long time period behaviour of the three cases at large spa- ial scales approaching a micrometer. We showed that the FPGA pproach was useful for running the large scale Monte Carlo sim- PU and and single-die GPU. Data taken from Ref. [2]. FGPGA was Xilinx Spartan 3E 2 kB cache. The ‘multi-GPU’ implementation was on a CUDA enabled NVIDIA Tesla can be run in less than 7 days. To determine this threshold we assumed a complete ulations required to observe distinct features which do not appear for small simulations of a dilute antiferromagnet. FPGA implemen- tations of the Ising model may have useful applications outside the field of magnetism. Further innovation in software, hardware and computational techniques is required to allow existing spin models to extend from the nanometer to micrometer range. Acknowledgments D.C. would like to express his sincere thanks for the financial support of the Australian Institute of Nuclear Science and Engi- neering(AINSE) and to Prof. R.L. Stamps and Dr. Klose for helpful discussions. Appendix A. ∑ ∑ Fig. 9. FPGA hardware synthesis results for the 128 × 128 × 128 spin lattice. mputa T r t i b m a n p o s t d t m fi R [ [ [ [ [ [ [ [ [ [ [ [ ciplinary teams, John Pillans brings a broad background of experience to his D. Cortie, J. Pillans / Journal of Co he binary nature of the Ising model represents a magnetic mate- ial with a large (infinite) crystalline anisotropy which forces spins o align along one Cartesian axis. This approximates highly uniax- al magnets such as CoO or FeF2. Certain Ising spins are replaced y non-magnetic defects so that εi = 0 for a defect and εi = 1 for a agnetic spin. This means that Ising spins do not directly inter- ct with defects, but the defects reduce the number of magnetic eighbors. The defect spins are randomly positioned and the total ercentage of defect sites is termed the dilution �. The second term f the Hamiltonian describes the Zeeman coupling of individual pins with an externally applied magnetic field which tends to align hem in the direction of the field. The magnetic moment per atom is escribed by �. This term is responsible for paramagnetism above he critical temperature, and, in the case of the ferromagnetic Ising odel, causes ferromagnetic domains to align with the external eld below the critical temperature. eferences [1] K.A. Hawick, A. Leist, D.P. Playne, Interactive visualisation of spins and clusters in regular and small-world Ising models with CUD A on GPUs, J. Comput. Sci. 1 (2010) 33–40. [2] B. Block, P. Virnau, T. Preis, Multi-GPU accelerated multi-spin Monte Carlo sim- ulation of the 2D Ising model, Comput. Phys. Commun. 181 (2010) 1549–1556. [3] D. Chowdhury, S. Kumar, Domain growth in the three dimensional dilute Ising model, J. Stat. Phys. 49 (1987) 855–858. [4] J.H. Condon, A.T. Ogielski, Fast special purpose computer for Monte Carlo sim- ulations in statistical physics, Rev. Sci. Instrum. 56 (1985) 1691–1696. [5] A.R. King, D.P. Belanger, V. Jaccarino, Neutron scattering study of hysteresis near Tc in d = 3 random field Ising system, Solid State Commun. 54 (1985) 79–81. [6] F. Belletti, et al., Simulating spin systems on IANUS, an FPGA-based computer, Comput. Phys. Commun. 178 (2008) 208–216. [7] K. O’ Grady, et al., A new paradigm for exchange bias in polycrystalline thin films, J. Magn. Magn. Mater. 322 (2010) 883–899. [8] R.J. Glauber, Time-dependent statistics of the Ising model, J. Math. Phys. 4 (1963) 295–307. [9] S.-J. Han, D.P. Belanger, Relaxation of the excess magnetization of random-field- induced metastable domains in FeZnF2, Phys. Rev. B 45 (1992) 972879–972881. tional Science 2 (2011) 279– 285 285 10] N. Kazantseva, D. Hinzke, U. Nowak, R.W. Chantrell, U. Atxitia, O. Chubykalo- Fesenko, Towards multiscale modeling of magnetic materials: Simulations of FePt, Phys. Rev. B 77 (2008) 184428. 11] S. Kestur, J.D. Davis, O. Williams, BLAS Comparison on FPGA, CPU and GPU 2010, IEEE Annu. Symp. VLSI 1 (2010) 883–899. 12] D. Loison, C.L. Qin, K.D. Schotte, X.F. Jin, Canonical local algorithms for spin systems: heat bath and Hasting’s methods, Eur. Phys. J. B 41 (2004) 395– 405. 13] R. Bruinsma, J. Hammann, M. Lederman, J.V. Selinger, R. Orbach, Low- temperature dynamics of a diluted Ising antiferromagnet, Phys. Rev. Lett. 68 (1992) 2086–2089. 14] P. Miltenyi, M. Gierlings, J. Keller, B. Beschoten, G. Guntherodt, Diluted antifer- romagnets in exchange bias: proof of the domain state model, Phys. Rev. Lett. 84 (2000) 0031–9007. 15] U. Nowak, K.D. Usadel, Nonexponential relaxation of diluted antiferromagnets, Phys. Rev. B 43 (1991) 851–853. 16] U. Nowak, K.D. Usadel, Structure of domains in random Ising magnets, Phys. Rev. B 46 (1992) 8329–8335. 17] L. Onsager, Crystal Statistics: a two-dimensional model with an order-disorder transition, Phys. Rev. 65 (1944) 117–149. 18] T. Preis, P. Virnau, W. Paul, J.J. Schneider, GPU accelerated Monte Carlo sim- ulation of the 2D and 3D Ising model, Comput. Phys. Commun. 228 (2009) 44684477. 19] R.L. Stamps, Dynamic magnetic properties of films, multilayers and patterned elements, Adv. Funct. Mater. 20 (2010) 2380–2394. 20] T. Guneysu, C. Paar, J. Pelzl, G. Pfeiffer, M. Schimmler, C. Schleif-fer, Parallel computing with low-cost FPGA’s: a framework for COPA-COBANA, Adv. Parallel Comput. 15 (2008) 741–748. 21] U. Nowak, K.D. Usadel, Domain state model for EB. I. theory, Phys. Rev. B. 66 (2002) 014430. David Cortie is a PhD student with the Australian Nuclear Science and Technology Organization and the University of Wollongong, studying the physics of future spin- tronic materials. His research interests include computational physics, frustrated interactions in condensed matter and nanomagnetic systems studied with polarised neutron scattering. An early career professional engineer working across diverse areas in multidis- role as systems engineer. Having taught digital design at the university level, he brings a contemporary knowledge to FPGA projects, and is an evangelist for novel applications of the technology. He is currently in a research role at a prominent multinational technology and will be innovating in less visible environments.