GPU Banking Fraud Analytics, 10M Accounts, 48M Transactions

GPU-accelerated fraud and influence analytics on a 10M-account banking network with 48M transactions: PageRank, Louvain, betweenness, triangle count, and GPU MATCH expansion.

Category: graph

Syntax

-- Force GPU on a simple count
USE external.gpu_finance_network.gpu_finance_network
ON GPU
MATCH (a)-[r]->(b)
RETURN count(r) AS total_transactions;

-- GPU PageRank
USE external.gpu_finance_network.gpu_finance_network
ON GPU
CALL algo.pageRank({dampingFactor: 0.85, iterations: 20})
YIELD node_id, score, rank
RETURN node_id, score, rank
ORDER BY score DESC LIMIT 25;

-- GPU connected components (single giant component of 10M accounts)
USE external.gpu_finance_network.gpu_finance_network
ON GPU
CALL algo.connectedComponents()
YIELD node_id, component_id
RETURN component_id, count(*) AS community_size
ORDER BY community_size DESC LIMIT 25;

-- GPU Louvain (non-deterministic)
USE external.gpu_finance_network.gpu_finance_network
ON GPU
CALL algo.louvain({resolution: 1.0})
YIELD node_id, community_id
RETURN community_id, count(*) AS size
ORDER BY size DESC LIMIT 25;

-- Approximate betweenness via sampling, gatekeeper / mule detection
USE external.gpu_finance_network.gpu_finance_network
ON GPU
CALL algo.betweenness({samplingSize: 1000})
YIELD node_id, centrality, rank
RETURN node_id, centrality, rank
ORDER BY centrality DESC LIMIT 25;

-- Triangle count, dense cycle detection for circular transaction rings
USE external.gpu_finance_network.gpu_finance_network
ON GPU
CALL algo.triangleCount()
YIELD node_id, triangle_count
RETURN node_id, triangle_count
ORDER BY triangle_count DESC LIMIT 25;

-- GPU MATCH, filter to cross-bank transactions
USE external.gpu_finance_network.gpu_finance_network
ON GPU
MATCH (a)-[r]->(b)
WHERE a.bank <> b.bank
RETURN a.bank AS from_bank, b.bank AS to_bank,
       count(r) AS connections,
       avg(r.weight) AS avg_strength
ORDER BY connections DESC
LIMIT 30;

-- GPU + streaming for property-heavy algorithms
USE external.gpu_finance_network.gpu_finance_network
ON GPU STREAMING CACHE 5000000
CALL algo.pageRank({dampingFactor: 0.85, iterations: 5})
YIELD node_id, score, rank
RETURN node_id, score, rank ORDER BY score DESC LIMIT 25;

-- Threshold-based fallback: below 20M nodes (10M < 20M) run on CPU
USE external.gpu_finance_network.gpu_finance_network
ON GPU THRESHOLD 20000000
CALL algo.connectedComponents()
YIELD node_id, component_id
RETURN component_id, count(*) AS community_size
ORDER BY community_size DESC LIMIT 25;

Description

## When to Use When your graph is large enough that CPU algorithms take minutes-to-hours and you need GPU parallelism to answer fraud, AML (anti-money-laundering), and influence questions in seconds. This demo is the fraud-detection / applied showcase: 10,000,000 bank accounts across 30 banks and 25 cities, linked by 48,099,998 directed transactions spanning 18 transaction types (wire-transfer, card-payment, advisory, etc.). Every query uses the `ON GPU` hint to force the GPU execution path so you can verify correctness at financial-services scale. Run this on a host with 24 GB+ addressable RAM, the algorithm executors for betweenness and Louvain peak at ~10 GB resident during execution. ## What You Will Learn 1. `USE <graph> ON GPU ...` routes a Cypher query (MATCH or CALL algo.*) to the GPU kernel for that graph. The five GPU-accelerable algorithms are pageRank, connectedComponents, louvain, betweenness, triangleCount. 2. `ON GPU STREAMING CACHE <N>` streams node properties in batches of N while the algorithm runs, essential when the property columns don't fit in GPU memory alongside the CSR topology. 3. `ON GPU THRESHOLD <N>`, only use GPU if the graph has at least N nodes; below the threshold the engine silently falls back to CPU. Useful for cost control on small subgraphs. 4. GPU MATCH expansion: `ON GPU MATCH (a)-[r]->(b) WHERE ... RETURN ...` runs single-hop pattern matching on the GPU via edge scatter/gather, matching CPU results bit-for-bit on exact COUNT queries. 5. Applied fraud patterns: triangle counting detects dense transaction rings (circular money movement), betweenness centrality identifies gatekeeper accounts that sit on many shortest paths (mule candidates), Louvain finds transaction clusters that may correspond to fraud rings. 6. Correctness verification: the golden totals (48,099,998 transactions; 5,500,000 advisory edges; single connected component of 10M accounts) are asserted after each algorithm to prove GPU output matches CPU semantics. ## Prerequisites - A host with a CUDA-capable GPU and the DeltaForge GPU graph feature enabled. - 24 GB+ host memory (peak ~10 GB RSS during the heaviest algorithms). - HTTP client timeout raised to 1800s+ (the first query cold-loads the CSR over 48M edges, which takes several minutes on first run, subsequent queries read the cached .dcsr sidecar in ~200 ms).

Pitfalls

Only five algorithms have GPU kernels: pageRank, connectedComponents, louvain, betweenness, triangleCount. `ON GPU` on any other algorithm (shortestPath, mst, knn, closeness, bfs, dfs, scc) silently falls back to CPU.
GPU connected components runs on the directed graph but requires a reverse CSR to compute weakly connected components correctly, ensure the graph has both forward and reverse CSRs built, or the GPU CC shader will produce incorrect partitions. CPU CC computes reverse topology on the fly and doesn't have this issue.
Betweenness centrality on large graphs uses sampling (`samplingSize` parameter), not the exact Brandes algorithm. Results are approximate; ranks of the top-K gatekeepers are stable but raw centrality values vary run-to-run. Use `ASSERT WARNING` for value checks.
`algo.*` always runs on the full graph, any `relationshipTypes` / `nodeLabels` filters you pass are ignored. To run an algorithm on a subgraph, build a second graph definition over a filtered view of the edge/vertex tables.
First cold load of the CSR from 48M edges takes several minutes. Raise your HTTP client timeout (`DF_HTTP_TIMEOUT_SECS=0` or at least 1800) so the client does not abort before the topology is built. Subsequent queries read the cached .dcsr sidecar in ~200 ms.
GPU betweenness + Louvain peak at ~10 GB RSS on this 10M-node graph. On <24 GB hosts, expect OOM, drop to the smaller 1M-node stress test or fall back to CPU for those algorithms.
Louvain is non-deterministic on both CPU and GPU. Assert only bounds on community counts (`ASSERT WARNING ROW_COUNT >= 2`), not specific community sizes.

GPU Banking Fraud Analytics, 10M Accounts, 48M Transactions

Syntax

Description

Pitfalls

See Also