bjoernager/rust - mandelbrot.dk

Author	SHA1	Message	Date
Alex Crichton	bcfd504744	Rollup merge of #38559 - japaric:ptx2, r=alexcrichton PTX support, take 2 - You can generate PTX using `--emit=asm` and the right (custom) target. Which then you can run on a NVIDIA GPU. - You can compile `core` to PTX. [Xargo] also works and it can compile some other crates like `collections` (but I doubt all of those make sense on a GPU) [Xargo]: https://github.com/japaric/xargo - You can create "global" functions, which can be "called" by the host, using the `"ptx-kernel"` ABI, e.g. `extern "ptx-kernel" fn kernel() { .. }`. Every other function is a "device" function and can only be called by the GPU. - Intrinsics like `__syncthreads()` and `blockIdx.x` are available as `"platform-intrinsics"`. These intrinsics are not in the `core` crate but any Rust user can create "bindings" to them using an `extern "platform-intrinsics"` block. See example at the end. - Trying to emit PTX with `-g` (debuginfo); you get an LLVM error. But I don't think PTX can contain debuginfo anyway so `-g` should be ignored and a warning should be printed ("`-g` doesn't work with this target" or something). - "Single source" support. You can't write a single source file that contains both host and device code. I think that should be possible to implement that outside the compiler using compiler plugins / build scripts. - The equivalent to CUDA `__shared__` which it's used to declare memory that's shared between the threads of the same block. This could be implemented using attributes: `#[shared] static mut SCRATCH_MEMORY: [f32; 64]` but hasn't been implemented yet. - Built-in targets. This PR doesn't add targets to the compiler just yet but one can create custom targets to be able to emit PTX code (see the example at the end). The idea is to have people experiment with this feature before committing to it (built-in targets are "insta-stable") - All functions must be "inlined". IOW, the `.rlib` must always contain the LLVM bitcode of all the functions of the crate it was produced from. Otherwise, you end with "undefined references" in the final PTX code but you won't get any linker error because no linker is involved. IOW, you'll hit a runtime error when loading the PTX into the GPU. The workaround is to use `#[inline]` on non-generic functions and to never use `#[inline(never)]` but this may not always be possible because e.g. you could be relying on third party code. - Should `--emit=asm` generate a `.ptx` file instead of a `.s` file? TL;DR Use Xargo to turn a crate into a PTX module (a `.s` file). Then pass that PTX module, as a string, to the GPU and run it. The full code is in [this repository]. This section gives an overview of how to run Rust code on a NVIDIA GPU. [this repository]: https://github.com/japaric/cuda - Create a custom target. Here's the 64-bit NVPTX target (NOTE: the comments are not valid because this is supposed to be a JSON file; remove them before you use this file): ``` js // nvptx64-nvidia-cuda.json { "arch": "nvptx64", // matches LLVM "cpu": "sm_20", // "oldest" compute capability supported by LLVM "data-layout": "e-i64:64-v16:16-v32:32-n16:32:64", "llvm-target": "nvptx64-nvidia-cuda", "max-atomic-width": 0, // LLVM errors with any other value :-( "os": "cuda", // matches LLVM "panic-strategy": "abort", "target-endian": "little", "target-pointer-width": "64", "target-vendor": "nvidia", // matches LLVM -- not required } ``` (There's a 32-bit target specification in the linked repository) - Write a kernel ``` rust extern "platform-intrinsic" { fn nvptx_block_dim_x() -> i32; fn nvptx_block_idx_x() -> i32; fn nvptx_thread_idx_x() -> i32; } /// Copies an array of `n` floating point numbers from `src` to `dst` pub unsafe extern "ptx-kernel" fn memcpy(dst: mut f32, src: const f32, n: usize) { let i = (nvptx_block_dim_x() as isize) .wrapping_mul(nvptx_block_idx_x() as isize) .wrapping_add(nvptx_thread_idx_x() as isize); if (i as usize) < n { dst.offset(i) = src.offset(i); } } ``` - Emit PTX code ``` $ xargo rustc --target nvptx64-nvidia-cuda --release -- --emit=asm Compiling core v0.0.0 (file://..) (..) Compiling nvptx-builtins v0.1.0 (https://github.com/japaric/nvptx-builtins) Compiling kernel v0.1.0 $ cat target/nvptx64-nvidia-cuda/release/deps/kernel-.s // // Generated by LLVM NVPTX Back-End // .version 3.2 .target sm_20 .address_size 64 // .globl memcpy .visible .entry memcpy( .param .u64 memcpy_param_0, .param .u64 memcpy_param_1, .param .u64 memcpy_param_2 ) { .reg .pred %p<2>; .reg .s32 %r<5>; .reg .s64 %rd<12>; ld.param.u64 %rd7, [memcpy_param_2]; mov.u32 %r1, %ntid.x; mov.u32 %r2, %ctaid.x; mul.wide.s32 %rd8, %r2, %r1; mov.u32 %r3, %tid.x; cvt.s64.s32 %rd9, %r3; add.s64 %rd10, %rd9, %rd8; setp.ge.u64 %p1, %rd10, %rd7; @%p1 bra LBB0_2; ld.param.u64 %rd3, [memcpy_param_0]; ld.param.u64 %rd4, [memcpy_param_1]; cvta.to.global.u64 %rd5, %rd4; cvta.to.global.u64 %rd6, %rd3; shl.b64 %rd11, %rd10, 2; add.s64 %rd1, %rd6, %rd11; add.s64 %rd2, %rd5, %rd11; ld.global.u32 %r4, [%rd2]; st.global.u32 [%rd1], %r4; LBB0_2: ret; } ``` - Run it on the GPU ``` rust // `kernel.ptx` is the `.s` file we got in the previous step const KERNEL: &'static str = include_str!("kernel.ptx"); driver::initialize()?; let device = Device(0)?; let ctx = device.create_context()?; let module = ctx.load_module(KERNEL)?; let kernel = module.function("memcpy")?; let h_a: Vec<f32> = /* create some random data /; let h_b = vec![0.; N]; let d_a = driver::allocate(bytes)?; let d_b = driver::allocate(bytes)?; // Copy from host to GPU driver::copy(h_a, d_a)?; // Run `memcpy` on the GPU kernel.launch(d_b, d_a, N)?; // Copy from GPU to host driver::copy(d_b, h_b)?; // Verify assert_eq!(h_a, h_b); // `d_a`, `d_b`, `h_a`, `h_b` are dropped/freed here ``` --- cc @alexcrichton @brson @rkruppe > What has changed since #34195? - `core` now can be compiled into PTX. Which makes it very easy to turn `no_std` crates into "kernels" with the help of Xargo. - There's now a way, the `"ptx-kernel"` ABI, to generate "global" functions. The old PR required a manual step (it was hack) to "convert" "device" functions into "global" functions. (Only "global" functions can be launched by the host) - Everything is unstable. There are not "insta stable" built-in targets this time (\). The users have to use a custom target to experiment with this feature. Also, PTX instrinsics, like `__syncthreads` and `blockIdx.x`, are now implemented as `"platform-intrinsics"` so they no longer live in the `core` crate. (\*) I'd actually like to have in-tree targets because that makes this target more discoverable, removes the need to lug around .json files, etc. However, bundling a target with the compiler immediately puts it in the path towards stabilization. Which gives us just two cycles to find and fix any problem with the target specification. Afterwards, it becomes hard to tweak the specification because that could be a breaking change. A possible solution could be "unstable built-in targets". Basically, to use an unstable target, you'll have to also pass `-Z unstable-options` to the compiler. And unstable targets, being unstable, wouldn't be available on stable. > Why should this be merged? - To let people experiment with the feature out of tree. Having easy access to the feature (in every nightly) allows this. I also think that, as it is, it should be possible to start prototyping type-safe single source support using build scripts, macros and/or plugins. - It's a straightforward implementation. No different that adding support for any other architecture.	2016-12-29 17:26:15 -08:00
Alex Crichton	03bc2cf35a	Fallout from updating bootstrap Cargo	2016-12-29 08:47:26 -08:00
Jorge Aparicio	18d49288d5	PTX support - `--emit=asm --target=nvptx64-nvidia-cuda` can be used to turn a crate into a PTX module (a `.s` file). - intrinsics like `__syncthreads` and `blockIdx.x` are exposed as `"platform-intrinsics"`. - "cabi" has been implemented for the nvptx and nvptx64 architectures. i.e. `extern "C"` works. - a new ABI, `"ptx-kernel"`. That can be used to generate "global" functions. Example: `extern "ptx-kernel" fn kernel() { .. }`. All other functions are "device" functions.	2016-12-26 21:06:23 -05:00
bors	b7e5148bbd	Auto merge of #38314 - japaric:do-not-delete-enable-llvm-backend, r=alexcrichton initial SPARC support ### UPDATE Can now compile `no_std` executables with: ``` $ cargo new --bin app && cd $_ $ edit Cargo.toml && tail -n2 $_ [dependencies] core = { path = "/path/to/rust/src/libcore" } $ edit src/main.rs && cat $_ #![feature(lang_items)] #![no_std] #![no_main] #[no_mangle] pub fn _start() -> ! { loop {} } #[lang = "panic_fmt"] fn panic_fmt() -> ! { loop {} } $ edit sparc-none-elf.json && cat $_ { "arch": "sparc", "data-layout": "E-m:e-p:32:32-i64:64-f128:64-n32-S64", "executables": true, "llvm-target": "sparc", "os": "none", "panic-strategy": "abort", "target-endian": "big", "target-pointer-width": "32" } $ cargo rustc --target sparc-none-elf -- -C linker=sparc-unknown-elf-gcc -C link-args=-nostartfiles $ file target/sparc-none-elf/debug/app app: ELF 32-bit MSB executable, SPARC, version 1 (SYSV), statically linked, not stripped $ sparc-unknown-elf-readelf -h target/sparc-none-elf/debug/app ELF Header: Magic: 7f 45 4c 46 01 02 01 00 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2's complement, big endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: EXEC (Executable file) Machine: Sparc Version: 0x1 Entry point address: 0x10074 Start of program headers: 52 (bytes into file) Start of section headers: 1188 (bytes into file) Flags: 0x0 Size of this header: 52 (bytes) Size of program headers: 32 (bytes) Number of program headers: 2 Size of section headers: 40 (bytes) Number of section headers: 14 Section header string table index: 11 $ sparc-unknown-elf-objdump -Cd target/sparc-none-elf/debug/app target/sparc-none-elf/debug/app: file format elf32-sparc Disassembly of section .text: 00010074 <_start>: 10074: 9d e3 bf 98 save %sp, -104, %sp 10078: 10 80 00 02 b 10080 <_start+0xc> 1007c: 01 00 00 00 nop 10080: 10 80 00 02 b 10088 <_start+0x14> 10084: 01 00 00 00 nop 10088: 10 80 00 00 b 10088 <_start+0x14> 1008c: 01 00 00 00 nop ``` --- Someone wants to attempt launching some Rust [into space](https://www.reddit.com/r/rust/comments/5h76oa/c_interop/) but their platform is based on the SPARCv8 architecture. Let's not block them by enabling LLVM's SPARC backend. Something very important that they'll also need is the "cabi" stuff as they'll be embedding some Rust code into a bigger C application (i.e. heavy use of `extern "C"`). The question there is what name(s) should we use for "target_arch" as the "cabi" implementation [varies according to that parameter](https://github.com/rust-lang/rust/blob/1.13.0/src/librustc_trans/abi.rs#L498-L523). AFAICT, SPARCv8 is a 32-bit architecture and SPARCv9 is a 64-bit architecture. And, LLVM uses `sparc`, `sparcv9` and `sparcel` for [the architecture triple](`ac1c94226e/include/llvm/ADT/Triple.h (L67-L69)`) so perhaps we should use `target_arch = "sparc"` (32-bit) and `target_arch = "sparcv9"` (64-bit) as well. r? @alexcrichton This PR only enables this LLVM backend when rustbuild is used. Do I also need to implement this for the old Makefile-based build system? Or are all our nightlies now being generated using rustbuild? cc @brson	2016-12-26 20:48:43 +00:00
Mark Simulacrum	afc2dcd0ca	Make drop glue for unsized value pass two arguments instead of *(data, meta)	2016-12-21 12:02:09 -07:00
Jorge Aparicio	bea6ab23f9	enable LLVM's SPARC backend	2016-12-19 12:23:37 -05:00
bors	1692c0b587	Auto merge of #37973 - vadimcn:dllimport, r=alexcrichton Implement RFC 1717 Implement the first two points from #37403. r? @alexcrichton	2016-12-06 10:54:45 +00:00
bors	125474de07	Auto merge of #37857 - shepmaster:llvm-4.0-dinodes, r=michaelwoerister [LLVM 4.0] Handle new DIFlags enum	2016-12-04 02:30:23 +00:00
bors	2cdbd5eb42	Auto merge of #38079 - BurntSushi:attrtarget, r=alexcrichton Add new #[target_feature = "..."] attribute. This commit adds a new attribute that instructs the compiler to emit target specific code for a single function. For example, the following function is permitted to use instructions that are part of SSE 4.2: #[target_feature = "+sse4.2"] fn foo() { ... } In particular, use of this attribute does not require setting the -C target-feature or -C target-cpu options on rustc. This attribute does not have any protections built into it. For example, nothing stops one from calling the above `foo` function on hosts without SSE 4.2 support. Doing so may result in a SIGILL. I've also expanded the x86 target feature whitelist.	2016-12-03 17:41:14 +00:00
Jake Goulding	dbdd60e6d7	[LLVM] Introduce a stable representation of DIFlags In LLVM 4.0, this enum becomes an actual type-safe enum, which breaks all of the interfaces. Introduce our own copy of the bitflags that we can then safely convert to the LLVM one.	2016-12-02 21:13:31 -05:00
Vadim Chugunov	a9a6f8c8ed	Remove the "linked_from" feature.	2016-12-01 16:56:49 -08:00
Alex Crichton	2186660b51	Update the bootstrap compiler Now that we've got a beta build, let's use it!	2016-11-30 10:38:08 -08:00
Andrew Gallant	80ef1dbf2d	Add new #[target_feature = "..."] attribute. This commit adds a new attribute that instructs the compiler to emit target specific code for a single function. For example, the following function is permitted to use instructions that are part of SSE 4.2: #[target_feature = "+sse4.2"] fn foo() { ... } In particular, use of this attribute does not require setting the -C target-feature or -C target-cpu options on rustc. This attribute does not have any protections built into it. For example, nothing stops one from calling the above `foo` function on hosts without SSE 4.2 support. Doing so may result in a SIGILL. This commit also expands the target feature whitelist to include lzcnt, popcnt and sse4a. Namely, lzcnt and popcnt have their own CPUID bits, but were introduced with SSE4.	2016-11-29 20:32:14 -05:00
Seo Sanghyeon	c45f3dee10	Restore compatibility with LLVM 3.7 and 3.8	2016-11-21 20:30:05 +09:00
bors	0bd2ce62b2	Auto merge of #37831 - rkruppe:llvm-attr-fwdcompat, r=eddyb [LLVM 4.0] Use llvm::Attribute APIs instead of "raw value" APIs The latter will be removed in LLVM 4.0 (see `4a6fc8bacf`). The librustc_llvm API remains mostly unchanged, except that llvm::Attribute is no longer a bitflag but represents only a single attribute. The ability to store many attributes in a small number of bits and modify them without interacting with LLVM is only used in rustc_trans::abi and closely related modules, and only attributes for function arguments are considered there. Thus rustc_trans::abi now has its own bit-packed representation of argument attributes, which are translated to rustc_llvm::Attribute when applying the attributes. cc #37609	2016-11-19 16:39:25 -06:00
Robin Kruppe	30daedf603	Use llvm::Attribute API instead of "raw value" APIs, which will be removed in LLVM 4.0. The librustc_llvm API remains mostly unchanged, except that llvm::Attribute is no longer a bitflag but represents only a single attribute. The ability to store many attributes in a small number of bits and modify them without interacting with LLVM is only used in rustc_trans::abi and closely related modules, and only attributes for function arguments are considered there. Thus rustc_trans::abi now has its own bit-packed representation of argument attributes, which are translated to rustc_llvm::Attribute when applying the attributes.	2016-11-17 21:12:26 +01:00
Jorge Aparicio	f5a05adb25	enable the MSP430 LLVM backend to let people experiment with this target out of tree. The MSP430 architecture is used in 16-bit microcontrollers commonly used in Digital Signal Processing applications.	2016-11-12 17:33:35 -05:00
Jeffrey Seyfried	dd0781ea25	Register and stability check `#[no_link]` crates.	2016-11-10 09:21:29 +00:00
Srinivas Reddy Thatiparthy	9972d17ecf	run rustfmt on librustc_llvm folder	2016-10-22 18:37:35 +05:30
Jan-Erik Rediger	939bd47339	Configure LLVM to use js backend Initialize the asmjs backend for LLVM	2016-09-30 14:02:39 -07:00
Jorge Aparicio	027eab2f87	initial support for s390x A new target, `s390x-unknown-linux-gnu`, has been added to the compiler and can be used to build no_core/no_std Rust programs. Known limitations: - librustc_trans/cabi_s390x.rs is missing. This means no support for `extern "C" fn`. - No support for this arch in libc. This means std can be cross compiled for this target.	2016-08-26 21:05:50 -05:00
Ariel Ben-Yehuda	3041a97b1a	finish type-auditing rustllvm	2016-08-03 15:08:47 +03:00
Ariel Ben-Yehuda	24874170b4	split the FFI part of rustc_llvm to rustc_llvm::ffi	2016-08-03 15:08:47 +03:00
Ariel Ben-Yehuda	d091ef802f	begin auditing the C++ types in RustWrapper	2016-08-03 15:08:47 +03:00
Ariel Ben-Yehuda	696691e3c4	audit LLVM C++ types in ArchiveWrapper and PassWrapper	2016-08-03 15:08:47 +03:00
Ariel Ben-Yehuda	81df89fc2d	remove the ExecutionEngine binding the code has no tests and will just bitrot by itself. this is a [breaking-change]	2016-08-03 15:08:47 +03:00
Alex Crichton	2492d24baa	llvm: Remove no longer existent LLVMAddTargetData binding	2016-07-29 10:29:59 +02:00
Jan-Erik Rediger	9e706f90cb	[LLVM-3.9] Configure PIE at the module level instead of compilation unit level This was deleted here[1] which appears to be replaced by this[2] which is a new setPIELevel function on the LLVM module itself. [1]: http://reviews.llvm.org/D19753 [2]: http://reviews.llvm.org/D19671	2016-07-29 10:29:44 +02:00
Jan-Erik Rediger	7420874a97	[LLVM-3.9] Rename custom methods to Rust-specific ones	2016-07-29 10:29:44 +02:00
Jake Goulding	3f36f7a980	Remove linking with AR Since we only support LLVM 3.7 and above, we will never need to use the AR linker. Remove the possibility of calling it and all the now-dead code.	2016-06-10 18:26:42 -04:00
Brandon Edens	b1337d309a	Add opt-level options for optimizing for size and minimum size. This attempts to mimic the behavior of clang's options Os and Oz.	2016-04-28 23:08:30 -07:00
Michael Woerister	0fc9f9a200	Make the codegen unit partitioner also emit item declarations.	2016-04-28 16:53:00 -04:00
bors	92e3fb3ebe	Auto merge of #31709 - ranma42:target_feature-from-llvm, r=alexcrichton Compute `target_feature` from LLVM This is a work-in-progress fix for #31662. The logic that computes the target features from the command line has been replaced with queries to the `TargetMachine`.	2016-04-20 09:57:57 -07:00
Michael Woerister	e8441b6784	Add initial version of codegen unit partitioning for incremental compilation.	2016-04-15 10:05:53 -04:00
Andrea Canciani	c883463e94	Implement feature extraction from `TargetMachine` Add the `LLVMRustHasFeature` function to check whether a `TargetMachine` has a given feature.	2016-04-09 00:39:04 +02:00
Björn Steinbrink	22f4587586	Use weak_odr linkage when reusing definitions across codegen units When reuing a definition across codegen units, we obviously cannot use internal linkage, but using external linkage means that we can end up with multiple conflicting definitions of a single symbol across multiple crates. Since the definitions should all be equal semantically, we can use weak_odr linkage to resolve the situation. Fixes #32518	2016-03-29 16:44:54 +02:00
Björn Steinbrink	95697a8395	Fix removal of function attributes on ARM We use a 64bit integer to pass the set of attributes that is to be removed, but the called C function expects a 32bit integer. On most platforms this doesn't cause any problems other than being unable to unset some attributes, but on ARM even the lower 32bit aren't handled correctly because the 64bit value is passed in different registers, so the C function actually sees random garbage. So we need to fix the relevant functions to use 32bit integers instead. Additionally we need an implementation that actually accepts 64bit integers because some attributes can only be unset that way. Fixes #32360	2016-03-26 13:02:54 +01:00
Ulrik Sverdrup	2dbac1fb8e	Add intrinsics for float arithmetic with `fast` flag enabled `fast` a.k.a UnsafeAlgebra is the flag for enabling all "unsafe" (according to llvm) float optimizations. See LangRef for more information http://llvm.org/docs/LangRef.html#fast-math-flags Providing these operations with less precise associativity rules (for example) is useful to numerical applications. For example, the summation loop: let sum = 0.; for element in data { sum += *element; } Using the default floating point semantics, this loop expresses the floats must be added in a sequence, one after another. This constraint is usually completely unintended, and it means that no autovectorization is possible.	2016-03-18 17:31:41 +01:00
Eduard Burtescu	77f3484148	trans: Apply all attributes through FnType.	2016-03-17 21:51:51 +02:00
Eduard Burtescu	de5f8244f2	trans: Use llvm::Attributes directly in ArgTy.	2016-03-17 21:51:51 +02:00
Eduard Burtescu	c7172a9935	rustc_llvm: An AttrBuilder that's not completely wasteful.	2016-03-17 21:51:51 +02:00
Eduard Burtescu	763b6cba37	rustc_llvm: Update the Attribute bitflags and remove OtherAttribute.	2016-03-17 21:51:51 +02:00
Eduard Burtescu	b47fcb8375	trans: Use fmt::Debug for debugging instead of ad-hoc methods.	2016-03-17 17:51:58 +02:00
Simonas Kazlauskas	ba26efb60c	Implement filling drop in MIR Hopefully the author caught all the cases. For the mir_dynamic_drops_3 test case the ratio of memsets to other instructions is 12%. On the other hand we actually do not double drop for at least the test cases provided anymore in MIR.	2016-02-24 21:05:21 +02:00
bors	d3929b2c8a	Auto merge of #30969 - Amanieu:extended_atomic_cmpxchg, r=alexcrichton This is an implementation of rust-lang/rfcs#1443.	2016-02-22 19:10:13 +00:00
Amanieu d'Antras	64ddcb33f4	Add intrinsics for compare_exchange and compare_exchange_weak	2016-02-18 19:07:05 +00:00
Corey Farwell	5850d16d52	Remove unnecessary explicit lifetime bounds. These explicit lifetimes can be ommitted because of lifetime elision rules. Instances were found using rust-clippy.	2016-02-18 08:37:10 -05:00
Alex Crichton	34f7364332	rustc_llvm: Tweak how initialization is performed Refactor a bit to have less repetition and #[cfg] and try to bury it all inside of a macro.	2016-02-11 11:12:33 -08:00
Alex Crichton	eac0a8bc30	bootstrap: Add directives to not double-link libs Have all Cargo-built crates pass `--cfg cargobuild` and then add appropriate `#[cfg]` definitions to all crates to avoid linking anything if this is passed. This should help allow libstd to compile with both the makefiles and with Cargo.	2016-02-11 11:12:32 -08:00
Alex Crichton	696a1da861	Remove old #[allow(trivial_casts)] annotations These were added a long time ago but we long since switched the lint back to allow-by-default, so these annotations shouldn't be necessary.	2016-02-08 09:35:09 -08:00

1 2 3 4 5

215 commits