The conversation about ethical artificial intelligence in targeting has matured from abstract principle-setting to hard questions about verification, test and evaluation, and operational governance. As of mid July 2024 the Department of Defense continues to rely on a small set of cross-cutting policy instruments rather than a single targeting-specific rulebook. The five DoD AI Ethical Principles remain the anchor for decisions about the use of algorithmic tools in targeting: Responsible, Equitable, Traceable, Reliable, and Governable.
Those principles were operationalized into the Department’s Responsible Artificial Intelligence Strategy and Implementation Pathway in June 2022, which defines lines of effort for governance, warfighter trust, acquisition lifecycle integration, requirements validation, ecosystem development, and workforce. The pathway sets expectations for documentation, lifecycle assurance, and risk management, but it does not substitute for targeting-specific doctrine or a dedicated, publicly available targeting assurance standard.
On autonomy in weapons systems the Department updated DoD Directive 3000.09 on January 25, 2023. That directive continues to require that commanders and operators be able to exercise appropriate levels of human judgment over the use of force and it imposes senior-level review and testing requirements for systems that perform autonomous target selection or engagement. In practice this means the Department treats autonomy and algorithmic assistance in targeting as high risk, subject to extra scrutiny before development and fielding.
Despite these foundations there is no single, public DoD issuance narrowly and exhaustively describing how algorithmic outputs may be used as inputs to targeting decisions at the tactical kill chain. What exists is a layered governance model: ethical principles, the RAI implementation pathway for lifecycle practices, and DoDD 3000.09 for autonomy in weapons. Recent practical steps toward operational assurance include crowdsourced red teaming and industry engagement to surface failure modes in models and data pipelines. For example, the CDAO launched an AI Bias Bounty for large language models in January 2024 to discover unknown risks and bias vectors that could affect downstream applications.
What this architecture buys and what it leaves exposed
-
Strengths. The DoD has durable, public ethical principles and an implementation pathway that require lifecycle attention to reliability and traceability. The autonomy directive forces senior review for systems that shift discretion to machines. These elements together create a framework that ties technical assurance to policy-level accountability.
-
Gaps. The principal shortfall is operational specificity for the targeting context. Targeting is a low-latency, high-consequence function with legal obligations under the law of armed conflict that hinge on discrimination, proportionality, and military necessity. The RAI pathway and DoDD 3000.09 require assurance and human judgment, but they do not publish a technical specification for when an AI-derived recommendation may be acted upon, what statistical performance guarantees are required under what environmental conditions, or how to instrument and audit human-machine decisions in contested networks.
Four technical priorities the DoD must deliver to make ethical AI meaningful in targeting
1) Test and Evaluation Baselines tied to operational risk categories. A targeting assurance standard must define scenario-based performance thresholds, environmental covariates, and failure tolerances. Those baselines should be expressed as conditional metrics, for example: precision and recall for object classification given sensor degradations; bounded false positive rates under spoofing; latency distributions under contested comms. These are not one-off tests. They must feed continuous monitoring post-fielding through telemetry and anomaly detection. The RAI pathway assigns the Department to lifecycle assurance but does not specify the concrete baselines that make a targeting recommendation legally and ethically actionable.
2) Mandatory provenance, explainability, and audit logs for the entire data and model pipeline. Traceability is a DoD principle. In a targeting context that principle must map to machine-readable provenance records that record sensor lineage, model version, confidence intervals, decision thresholds, and the human action taken at each node. Audit logs must be tamper resistant and survive loss of connectivity so after-action reviews can reconstruct the human-machine decision chain.
3) Human-machine interface prescriptions tied to cognitive workload and tempo. “Appropriate levels of human judgment” is a policy phrase. Translate it into human factors requirements. Which human role performs veto, which monitors model health, and which authorizes lethal outcomes must be mapped to doctrine, training standards, and measurable reaction time windows. For time-critical intercepts exceptions may be necessary but those exceptions must be formally approved and constrained by pre-authorized rules and automated abort criteria. DoDD 3000.09 frames senior review for autonomy but the Department must further codify operator roles and decision authority in doctrine.
4) Red teaming, continuous operational validation, and a public filing of known failure modes. The Department has begun crowdsourced bias bounties and similar assurance experiments. Scale that approach into an institutional program of independent testing and public summaries of generic failure modes. Public transparency about high-level risks is important for allied interoperability and for industry to harden supply chains while preserving classified specifics. The May 22, 2024 DoD request for public comment on enabling the Defense Industrial Base to adopt AI highlights that the Department is seeking industry inputs on supply chain, workforce, and acquisition barriers. That RFI could be used to solicit standards for assurance testing and responsible deployment practices.
Practical recommendations for immediate policy action
-
Publish an unclassified targeting assurance baseline document that maps performance thresholds to operational conditions and defines required telemetry fields for audit. This does not reveal tactics. It gives engineers and contractors concrete targets to design toward.
-
Require model and dataset manifests during acquisition. Contracts should mandate provenance artifacts and independent verification reports before any fielding decision.
-
Fund joint DoD-industry TEVV toolchains. A shared test harness, including adversarial scenario libraries and contested comms injects, will accelerate certification while reducing redundant effort across services.
-
Formalize red team exercises and bounty programs as recurring inputs into acquisition milestones, not as ad hoc experimentation. The CDAO bias bounty is a model that should be expanded into domain-specific exercises for sensors, computer vision pipelines, and fusion stacks.
Concluding assessment
By July 16, 2024 the DoD had built the ethical scaffolding and taken initial practical steps to stress-test AI, but the department had not yet closed the gap between principle and a targeting-specific assurance standard that a program manager or commander can rely on when lives and legal obligations are at stake. The policy architecture is promising. The test will be whether DoD moves from describing “appropriate levels of human judgment” and lifecycle assurance to publishing the technical baselines, audit primitives, and human factors prescriptions that convert ethical commitments into verifiable practice. If those elements appear in upcoming guidance they will materially reduce risk in the field and make compliance auditable without slowing necessary innovation.