E-2D Large Language Model Entity (ELLMENT)

Navy Phase I SBIR Topic: DON26BZ01-NV010
Naval Air Systems Command (NAVAIR)
Pre-release 4/13/26   Opens to accept proposals 5/6/26   Closes 6/3/26 12:00pm ET    [ View Q&A ]

DON26BZ01-NV010; TITLE: E-2D Large Language Model Entity (ELLMENT)

COMPONENT TECHNOLOGY PRIORITY AREA(S): Advanced Computing and Software;Trusted AI and Autonomy

PROJECTED CMMC LEVEL REQUIREMENT: Level 2 (Self)

The technology within this topic is restricted under the International Traffic in Arms Regulation (ITAR), 22 CFR Parts 120-130, which controls the export and import of defense-related material and services, including export of sensitive technical data, or the Export Administration Regulation (EAR), 15 CFR Parts 730-774, which controls dual use items. Offerors must disclose any proposed use of foreign nationals (FNs), their country(ies) of origin, the type of visa or work permit possessed, and the statement of work (SOW) tasks intended for accomplishment by the FN(s) in accordance with the Announcement. Offerors are advised foreign nationals proposed to perform on this topic may be restricted due to the technical data under US Export Control Laws.

OBJECTIVE: Develop and implement a traceable, explainable, referenced, and reasoned Large Language Model (LLM) that functions as an on-demand Natural Language Processing (NLP) decision-support assistant for Naval Flight Officers (NFOs) and mission crew aboard a carrier-based, all weather, tactical battle management, airborne early warning, and command and control aircraft.

DESCRIPTION: Artificial Intelligence/Machine Learning (AI/ML) technologies are transforming how complex data is understood and acted upon in operational environments. This SBIR topic seeks to explore the development of a domain-specific LLM system to support rapid insight generation from structured and unstructured documents (e.g., Tactics, Techniques, and Procedures [TTPs]), mission logs, communications, and other high-volume data sources relevant to tactical operations.

The goal is to deliver a modular, self-contained AI/NLP solution that can assist NFOs and mission crew by summarizing, reasoning over, and extracting meaning from dense operational material in real time. This LLM must be specifically designed to operate in a stand-alone configuration in accordance with information assurance policies, with mechanisms for traceability, where the information came from and how is it connecting to the goal, source attribution, and model transparency. The system must also support future extensibility to multi-modal data ingestion.

Work produced in Phase II may become classified. Note: The prospective contractor(s) must be U.S. owned and operated with no foreign influence as defined by 32 U.S.C. § 2004.20 et seq., National Industrial Security Program Executive Agent and Operating Manual, unless acceptable mitigating procedures can and have been implemented and approved by the Defense Counterintelligence and Security Agency (DCSA) formerly Defense Security Service (DSS). The selected contractor must be able to acquire and maintain a secret level facility and Personnel Security Clearances. This will allow contractor personnel to perform on advanced phases of this project as set forth by DCSA and NAVAIR in order to gain access to classified information pertaining to the national defense of the United States and its allies; this will be an inherent requirement. The selected company will be required to safeguard classified material during the advanced phases of this contract IAW the National Industrial Security Program Operating Manual (NISPOM), which can be found at Title 32, Part 2004.20 of the Code of Federal Regulations.

PHASE I: Define and develop the foundational architecture and baseline capability for implementing Large Language Model Operations (LLMOps) in support of mission decision-aid tools for the E-2D platform, as outlined by Gallagher et al. [Ref 2].

1. Security, Ethics, and Data Governance Planning

• The small business will collaborate with relevant Navy civilian representatives—such as TPOCs and PMA-231 S&T leads—to:

• Establish appropriate data classification levels for training and deployment environments

• Define a cybersecurity framework aligned with DOW and platform-specific requirements

• Incorporate an ethical AI governance structure, including bias mitigation and auditability provisions

2. LLM Selection and Mission Alignment

• An appropriate LLM architecture will be selected based on mission-specific demands of the aircraft operator, with consideration for:

• Performance in tactical and technical language domains

• Model transparency and explainability

• Compatibility with in-theater deployment constraints

3. Corpus Curation and Model Training

• The selected LLM will be trained on an aircraft relevant corpus, including—but not limited to—mission-specific Tactics, Techniques, and Procedures (TTPs), doctrine documents, and communication logs. Training methodologies will include:

• Prompt engineering

• Fine-tuning with Navy-specific linguistic patterns and use cases

• Retrieval-Augmented Generation (RAG) to support on-demand referencing of large knowledge bases

4. Evaluation and Output Validation

• Model performance will be assessed using a comprehensive metrics suite, as recommended by Diaz-de-Arcaya et al. (2024) and Gallagher et al. (2023), including:

• Response accuracy and relevance

• Appropriateness and alignment with operational context

• Bias detection and mitigation

• Trustworthiness

• Independent Subject Matter Expert (SME) evaluation

5. Deployment Pathways and Phase II Readiness

• As part of final Phase I efforts, the small business will:

• Evaluate and down-select hardware and software deployment options (e.g., computer architecture, human-machine interface designs)

• Develop a baseline implementation roadmap for transitioning to Phase II prototype construction and TRL advancement

PHASE II: The developed LLM will be deployed to a stand-alone laboratory environment, for rigorous evaluation in an Operator-in-the-Loop (OITL) configuration. In this setup, NFOs and mission operators will engage with the LLM across representative command and control mission scenarios to assess its efficacy as a real-time natural language decision-support system.

Subject Matter Experts (SMEs) in specific operations and AI/ML will conduct structured evaluations using predefined metrics identified in Phase I —including response accuracy, contextual relevance, trustworthiness, and bias sensitivity. Iterative testing cycles will drive continuous refinement of the model’s behavior and performance . Performance referring to the system’s suggestions compared to SME suggestions.

To support future scale-up, candidate computing architectures will be assessed, including emerging platforms such as quantum-accelerated processing (e.g., D-Wave). These evaluations will focus on increasing operational capacity, expanding conversational memory (buffer length) and handling of larger mission datasets in constrained compute environments.

A lifecycle monitoring framework will also be established to operate the LLM Ops strategy introduced in Phase I. This includes procedures for tracking long-term model performance, retraining triggers, audit logs, output traceability, and alignment with evolving mission requirements.

Work in Phase II may become classified. Please see note in the Description section.

PHASE III DUAL USE APPLICATIONS: Upon successful completion of final verification and validation (V&V) testing, the developed system will be authorized for transition to designated operational platforms and associated industry partners, in alignment with established Navy acquisition and technology transition procedures.

In parallel, the capability has garnered interest from additional mission-critical stakeholders—specifically ONR Code 32, in connection with Anti-Submarine Warfare (ASW) mission domains. This cross-domain interest highlights the system’s adaptability and potential for broader operational utility beyond its original use case, further enhancing the value and return on investment for the Department of the Navy.

The development and refinement of such an LLM pushes the boundaries of AI and NLP, contributing to the overall advancement of these technologies. The need for traceable, referenced data management promotes innovation in data governance, lineage tracking, and knowledge management, which are valuable for private sector organizations dealing with large datasets.

Examples of Dual-Use Applications include:

• Predictive Maintenance: Predicting equipment failures and optimizing maintenance schedules

• Supply Chain Optimization: Optimizing supply chain logistics

• Threat Detection: Identifying and responding to cyber threats

• Security Auditing: Automating security audits

Overall, this approach has a more focused and specialized domain than current commercial applications. While LLMs like Gemini and ChatGPT focused on cloud-based approach, the proposed LLM suggests that a targeted local network approach can be a forward design to target specific problems. Some examples of alternative cloud-based approaches could include Neuromorphic computing, Local LLMs, LLMs on edge devices, and Small Language Models (SLMs).

REFERENCES:

  1. Díaz-de-Arcaya, J.; López-de-Armentia, J.; Miñón, R.; Ojanguren, I.L. and Torre-Bastida, A.I. "Large language model operations (LLMOps): Definition, challenges, and lifecycle management." 2024 9th International Conference on Smart and Sustainable Technologies (SpliTech), Bol and Split, Croatia. https://doi.org/10.23919/SpliTech61897.2024.10612341
  2. Gallagher, S.; Mellinger, A.O.; Ratchford, J.; Winski, N.; Brooks, T.; Heim, E.; VanHoudnos, N.M.; Rallapalli, S.; Nichols, W.; Brown, B.; McDowell, A. and Barmer, H. "A Retrospective in Engineering Large Language Models for National Security." Carnegie Mellon University, Software Engineering Institute, 2023. https://insights.sei.cmu.edu/library/a-retrospective-in-engineering-large-language-models-for-national-security/
  3. "SavantX Seeker." SavantX Research Center, 2024. https://www.savantx.com/seeker
  4. "National Industrial Security Program Executive Agent and Operating Manual (NISP), 32 U.S.C. § 2004.20 et seq. 1993." https://www.ecfr.gov/current/title-32/subtitle-B/chapter-XX/part-2004

KEYWORDS: Large language model; LLMs; Natural Language Processing; NLP; Multi-modal approaches


Topic Q & A

5/18/26  Q. For the traceability and source-attribution mechanisms required by the topic, does the Government have a preferred artifact format (e.g., per-response provenance manifest, audit-log schema, model card disclosures) that Phase I should produce to demonstrate compliance with the 'transparent and auditable methodologies' criterion?
   A. No preference for artifact format as long as, transparency and explainability are part of the format.
5/18/26  Q. Among the Phase I evaluation criteria - response accuracy and relevance, operational-context appropriateness, bias mitigation, trustworthiness, and independent SME evaluation - does the Government weight these equally, or does any dimension take precedence when scoring Phase I feasibility outcomes against the OITL transition gate?
   A. The evaluated criteria describe is acceptable. The priority at Phase I would be equally distributed. For Phase II priority might change after testing.
5/18/26  Q. For hallucination and abstention testing, should the Phase I proposal include classification-boundary or adversarial prompts as tabletop cases even if no classified or controlled Navy data is ingested?
   A. No classified information will be strictly provided in Phase I. However, the rule of aggregation of data might come up. For Phase I treating data as unclassified is acceptable. Abstention testing is also an acceptable design.
5/18/26  Q. For the Phase I evaluation plan, what specific evaluation-set composition does the Government prefer - e.g., minimum question count, distribution across mission scenarios (TTPs, mission logs, comms), and proportion of adversarial vs. nominal prompts? Is an SME-validated gold-answer set the expected scoring substrate, or does the Government expect a different evaluation methodology?
   A. The methodology described is acceptable.
5/18/26  Q. Is a CLI/API prototype acceptable for Phase I demonstration, or should the Phase I scope include an operator-facing human-machine interface mockup?
   A. Phase I demonstration is sufficient.
5/14/26  Q. Are there any specific examples of "Navy-specific Linguistic Patterns" that could be used for fine-tuning?
   A. No specific examples of Navy-specific Linguistic Patterns, however the use of public and available data is acceptable for fine-tunning at this point in time.
5/14/26  Q. Are there specific targeted NFO roles (e..g, Air Battle Managers, Weapons Director, Surveillance Office, etc.) that this solution should address in Phase I?
   A. Eithar and all of the NFO selection that was specified would be sufficient for a solution in Phase I.
5/10/26  Q. Is there any specific types of multi-modal data that have already been identified that could be considered in the architecture planning?
   A. Nothing specific at this point in time.
5/10/26  Q. The topic references an aircraft-relevant corpus that may include mission-specific TTPs, doctrine documents, communication logs, mission logs, and other high-volume operational data sources. Could the Government clarify whether any Government-furnished operational datasets, communications data, mission logs, TTPs, doctrine, or tactical corpora are expected to be made available to performers during Phase I or Phase II? If such data will not be available during Phase I, should offerors assume that Phase I feasibility work may use surrogate, proxy, publicly available, internally developed, or commercially available datasets to demonstrate architecture, traceability, referencing, reasoning, and evaluation methods?
   A. Yes: It is acceptable to use, surrogate, proxy, publicly available, internally developed, or commercially available datasets to demonstrate architecture, traceability, referencing, reasoning, and evaluation methods
5/10/26  Q. The Phase I description states that the selected LLM will be "trained" on an aircraft-relevant corpus and lists prompt engineering, fine-tuning, and Retrieval-Augmented Generation as possible methodologies. Because these activities represent different levels of technical implementation, could the Government clarify whether Phase I requires modification of model weights through fine-tuning, or whether the Phase I "training" expectation may be satisfied through corpus preparation, retrieval configuration, prompt engineering, model selection, and evaluation of outputs without model-weight fine-tuning?
   A. Yes, the latter: Phase I "training" expectation may be satisfied through corpus preparation, retrieval configuration, prompt engineering, model selection, and evaluation of outputs without model-weight fine-tuning
5/12/26  Q. The topic lists Projected CMMC Level Requirement: Level 2 (Self). Two questions:
1. Is a CMMC Level 2 (Self) status posting (SPRS score and affirmation reflecting full or conditional implementation of NIST SP 800-171 R2) a condition of Phase I contract award, or does the projected level apply only to Phase II?

2. If conditional status is acceptable at Phase I award, what is the minimum SPRS score and affirmation posture required??
   A. The projected CMMC Level 2 (Self) applies at time of Phase I contract award.
5/11/26  Q. There are some minor discrepancies between the Technical Volume proposal instructions in the DOW 2026 SBIR BAA solicitation and the DON provided template for the Technical Volume II (provided via URL in the DON component instructions). Specifically, the BAA requires twelve (12) items whereas the DON template includes eight (8) items. The items included in the DOW BAA that are not included in the DON template are: #5 - Relationship with Future Research or Research and Development; #8 -Foreign Citizens; #10-Subcontractors/Consultants; #11-Prior, Current, or Pending Support of Similar Proposals or Awards; and #12 - Identification and Assertion of Restrictions on the Government's Use, Release, or Disclosure of Technical Data or Computer Software. The DON template has one item not in the BAA Tech Volume instructions, which is 5.0 Letters of Support. The DOW BAA provides the option to provide Letters of Support in Volume 5, not counting against page limits, whereas the DON template requires it within Volume II and does count against page limits. To ensure Offeror's are able to provide a compliant proposal, can the government please indicate the specific requirements for Volume II for this topic?
   A. Thank you for your message. As indicated in the Navy's 26.BZ Release 1 instruction document, in the IMPORTANT box on page 1, the information provided in the DON Proposal Submission Instructions takes precedence over the DoW Instructions posted for this BAA.

To respond to this topic you will follow the Phase I Conventional Topics technical proposal template that can be downloaded from our website: https://www.navysbir.com/links_forms.htm
5/8/26  Q. Topic states the technology is restricted under ITAR (22 CFR 120-130). Is DDTC registration required at Phase I proposal submission or award, or is registration deferred to Phase II contingent on classified work and technical data handling?
   A. This will not be required at time of proposal submission. If selected for award, a Contracts Specialist will coordinate with you for further requirements.
05/02/2026  Q. Question 1: Hardware and Compute Deployment
"The solicitation specifies a 'stand-alone configuration' for the E-2D and mentions 'constrained compute environments' in Phase II. What are the specific hardware targets or Size, Weight, and Power (SWaP) benchmarks (e.g., GB of RAM, GPU availability) the Navy expects for this local deployment? Additionally, should Phase I research prioritize optimization for the quantum-accelerated processing mentioned as a potential Phase II candidate architecture?"

Question 2: Phase I Fidelity and Logic Requirements
"Phase I focuses on foundational architecture and LLMOps, while Phase II moves to Operator-in-the-Loop (OITL) evaluation. Regarding the 'fidelity' of the system in Phase I, is the priority on the model's reasoning and NLP accuracy relative to SME decision-making, or is the Navy seeking integration with high-fidelity, physics-based tactical mission simulations during the feasibility stage?"

Question 3: Mission Scenario Scope and Application
"The topic identifies the E-2D platform as the primary focus but mentions expansion into Anti-Submarine Warfare (ASW) and other mission-critical stakeholders for Phase III. For the Phase I feasibility study, is it preferable to demonstrate the LLM's reasoning across a broad spectrum of the scenarios listed (TTPs, mission logs, comms), or should the effort focus on deep optimization for a single, specific mission set like AEW Command and Control?"

Are you planning to emphasize a specific architectural approach, such as Retrieval-Augmented Generation (RAG) or Small Language Models (SLMs), in your response to the hardware constraints?
   A. Question 1: Hardware and Compute Deployment
Defining and evaluating hardware/software deployment options and tradeoffs is part of the Phase I effort. Compatibility with in-theater deployment constraints, including Size, Weight, and Power (SWaP) should be kept in mind but there are no exact hardware or compute limits available at this time. The use of quantum-accelerated processing is an acceptable research as long as it prioritizes the constraints listed in Phase 1. This LLM must be specifically designed to operate in a stand-alone configuration in accordance with information assurance policies, with mechanisms for traceability, source attribution, and model transparency. The system must also support future extensibility to multimodal data ingestion.
Question 2: Phase I Fidelity and Logic Requirements
Great questions. The priority on the model's reasoning and NLP accuracy relative to SME decision-making.
Question 3: Mission Scenario Scope and Application
It is preferable to demonstrate the LLM's reasoning across a broad spectrum of the scenarios listed (TTPs, mission logs, comms),

As long as the LLM achieves traceability and explainability, and is referenced and reasoned, no specific architectural approach is preferred. If Retrieval-Augmented Generation (RAG) or Small Language Models (SLMs), can achieve these goals, then either could be acceptable implementations.
04/27/2026  Q. Are there existing decision-support or knowledge management tools currently in use aboard the E-2D that this LLM would need to integrate with or replace, or is this envisioned as a net-new standalone capability?
   A. The topic envisions the LLM as a net-new, stand-alone capability for the E-2D platform, designed to operate independently in accordance with information assurance policies. While future integration with mission systems is anticipated, Phase I is scoped to a stand-alone NLP capability and does not specify integration with or replacement of existing tools.
04/27/2026  Q. The topic references independent SME evaluation of model outputs. Will SMEs be made available to offerors during Phase I for iterative feedback, or will evaluation occur only at the conclusion of Phase I deliverables?
   A. The topic states that model performance will be assessed using a comprehensive metrics suite and independent SME evaluation.
There might be some collaboration with Navy civilian representatives (such as TPOCs and PMA-231 S&T leads) during Phase I. Some SME engagement and feedback may be available during Phase I with more engagement in higher level phases.
04/27/2026  Q. The topic mentions future extensibility to multi-modal data ingestion. Should Phase I address multi-modal architecture planning, or is the expectation that Phase I focuses exclusively on text-based NLP with multi-modal deferred entirely to Phase II?
   A. Phase I is focused on defining and developing the foundational architecture and baseline capability for LLMOps in support of mission decision-aid tools, with an emphasis on text-based NLP. The system must support future extensibility to multi-modal data ingestion but does not require multi-modal implementation in Phase I. Planning for extensibility may be appropriate, but the primary expectation is that Phase I deliverables focus on text-based NLP.
04/27/2026  Q. Is there a preference for open-source/open-weight LLM architectures (e.g., LLaMA, Mistral) versus commercially licensed models, given the stand-alone and security requirements? Are there any restrictions on model provenance that offerors should be aware of?
   A. There is no preference for open source versus commercially licensed LLM architecture. However, it does require that the solution be U.S. owned and operated with no foreign influence, and that it complies with information assurance and security requirements. Offerors should ensure that any model used meets these security and provenance requirements and is suitable for deployment in a classified or sensitive environment with a focus on model transparency and explainability.
04/27/2026  Q. The topic emphasizes a stand-alone configuration in accordance with information assurance policies. Are there specific hardware or compute constraints the LLM must operate within (e.g., memory limits, GPU availability, form factor restrictions), or is defining those constraints part of the Phase I effort?
   A. Defining and evaluating hardware/software deployment options and tradeoffs is part of the Phase I effort. Compatibility with in-theater deployment constraints, including Size, Weight, and Power (SWaP) should be kept in mind but there are no exact hardware or compute limits available at this time.
4/9/25  Q. Will the Government provide access to the TTP documents, mission logs, and communication data referenced in the topic description during Phase I, or should offerors plan to demonstrate their approach using unclassified surrogate data? If surrogate data is expected, are there publicly available Navy doctrinal references you would consider representative?
   A. It is encouraged and acceptable to use unclassified surrogate data or publicly available sources for initial development and demonstration if government-furnished data is not provided. Any and all publicly available Navy doctrinal references are acceptable, as well as reports and other open literature.
4/17/26  Q. Hello! Will sample data be provided for the Phase I effort, or is it a requirement that proposers already have example data (e.g., Tactics, Techniques, and Procedures [TTPs]), mission logs, communications, and other high-volume data sources relevant to tactical operations). Can proposers use proxy documents in place of having a corpus of these specific types?
   A. Phase I will focus on defining and developing the foundational architecture and baseline capability for implementing Large Language Model Operations (LLMOps) in support of mission decision-aid tools for the E-2D platform. Part of Phase I is understanding what data will be needed. Example data is not required. Proxy documents and data is acceptable
04/16/2026  Q. How many awards will be issued for the Phase I ELLMENT opportunity?
   A. In general, DON will select three Phase I proposals for award to a Conventional Topic.
04/16/2026  Q. Will the Navy provide relevant data for LLM training/tuning/testing during Phase I?
   A. Phase I will focus on defining and developing the foundational architecture and baseline capability for implementing Large Language Model Operations (LLMOps) in support of mission decision-aid tools for the E-2D platform. Part of Phase I is understanding what data will be needed.

** TOPIC NOTICE **

The Navy Topic above is an "unofficial" copy from the Navy Topics in the DoW FY-26 Release 1 SBIR BAA. Please see the official DoW Topic website at www.dodsbirsttr.mil/submissions/solicitation-documents/active-solicitations for any updates.

The DoW issued its Navy FY-26 Release 1 SBIR Topics pre-release on April 13, 2026 which opens to receive proposals on May 6, 2026, and closes June 3, 2026 (12:00pm ET).

Direct Contact with Topic Authors: During the pre-release period (April 13, through May 5, 2026) proposing firms have an opportunity to directly contact the Technical Point of Contact (TPOC) to ask technical questions about the specific BAA topic. The TPOC contact information is listed in each topic description. Once DoW begins accepting proposals on May 6, 2026 no further direct contact between proposers and topic authors is allowed unless the Topic Author is responding to a question submitted during the Pre-release period.

DoD On-line Q&A System: After the pre-release period, until May 20, 2026, at 12:00 PM ET, proposers may submit written questions through the DoW On-line Topic Q&A at https://www.dodsbirsttr.mil/submissions/login/ by logging in and following instructions. In the Topic Q&A system, the questioner and respondent remain anonymous but all questions and answers are posted for general viewing.

DoW Topics Search Tool: Visit the DoW Topic Search Tool at www.dodsbirsttr.mil/topics-app/ to find topics by keyword across all DoW Components participating in this BAA.

Help: If you have general questions about the DoD SBIR program, please contact the DoD SBIR Help Desk via email at DoDSBIRSupport@reisystems.com


[ Top  -  Return ]