Skip to content

Conversation

@jasonmolenda
Copy link
Collaborator

The ObjectFile plugin interface accepts an optional DataBufferSP argument. If the caller has the contents of the binary, it can provide this in that DataBufferSP. The ObjectFile subclasses in their CreateInstance methods will fill in the DataBufferSP with the actual binary contents if it is not set.
ObjectFile base class creates an ivar DataExtractor from the DataBufferSP passed in.

My next patch will be a caller that creates a VirtualDataExtractor with the binary data, and needs to pass that in to the ObjectFile plugin, instead of the bag-of-bytes DataBufferSP. It builds on the previous patch changing ObjectFile's ivar from DataExtractor to DataExtractorSP so I could pass in a subclass in the shared ptr. And it will be using the VirtualDataExtractor that Jonas added in #168802

No behavior is changed by the patch; we're simply moving the creation of the DataExtractor to the caller, instead of a DataBuffer that is immediately used to set up the ObjectFile DataExtractor. The patch is a bit complicated because all of the ObjectFile subclasses have to initialize their DataExtractor to pass in to the base class.

I ran the testsuite on macOS and on AArch64 Ubutnu. (btw David, I ran it under qemu on my M4 mac with SME-no-SVE again, Ubuntu 25.10, checked lshw(1) cpu capabilities, and qemu doesn't seem to be virtualizing the SME, that explains why the testsuite passes)

rdar://148939795

The ObjectFile plugin interface accepts an optional
DataBufferSP argument.  If the caller has the contents
of the binary, it can provide this in that DataBufferSP.
The ObjectFile subclasses in their CreateInstance methods
will fill in the DataBufferSP with the actual binary contents
if it is not set.
ObjectFile base class creates an ivar DataExtractor from the
DataBufferSP passed in.

My next patch will be a caller that creates a VirtualDataExtractor
with the binary data, and needs to pass that in to the ObjectFile
plugin, instead of the bag-of-bytes DataBufferSP.  It builds on
the previous patch changing ObjectFile's ivar from DataExtractor
to DataExtractorSP so I could pass in a subclass in the shared ptr.
And it will be using the VirtualDataExtractor that Jonas added in
llvm#168802

No behavior is changed by the patch; we're simply moving the creation
of the DataExtractor to the caller, instead of a DataBuffer that
is immediately used to set up the ObjectFile DataExtractor.  The
patch is a bit complicated because all of the ObjectFile subclasses
have to initialize their DataExtractor to pass in to the base class.

I ran the testsuite on macOS and on AArch64 Ubutnu.  (btw David, I
ran it under qemu on my M4 mac with SME-no-SVE again, Ubuntu 25.10,
checked lshw(1) cpu capabilities, and qemu doesn't seem to be
virtualizing the SME, that explains why the testsuite passes)

rdar://148939795
@llvmbot
Copy link
Member

llvmbot commented Dec 10, 2025

@llvm/pr-subscribers-lldb

Author: Jason Molenda (jasonmolenda)

Changes

The ObjectFile plugin interface accepts an optional DataBufferSP argument. If the caller has the contents of the binary, it can provide this in that DataBufferSP. The ObjectFile subclasses in their CreateInstance methods will fill in the DataBufferSP with the actual binary contents if it is not set.
ObjectFile base class creates an ivar DataExtractor from the DataBufferSP passed in.

My next patch will be a caller that creates a VirtualDataExtractor with the binary data, and needs to pass that in to the ObjectFile plugin, instead of the bag-of-bytes DataBufferSP. It builds on the previous patch changing ObjectFile's ivar from DataExtractor to DataExtractorSP so I could pass in a subclass in the shared ptr. And it will be using the VirtualDataExtractor that Jonas added in #168802

No behavior is changed by the patch; we're simply moving the creation of the DataExtractor to the caller, instead of a DataBuffer that is immediately used to set up the ObjectFile DataExtractor. The patch is a bit complicated because all of the ObjectFile subclasses have to initialize their DataExtractor to pass in to the base class.

I ran the testsuite on macOS and on AArch64 Ubutnu. (btw David, I ran it under qemu on my M4 mac with SME-no-SVE again, Ubuntu 25.10, checked lshw(1) cpu capabilities, and qemu doesn't seem to be virtualizing the SME, that explains why the testsuite passes)

rdar://148939795


Patch is 84.12 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/171574.diff

41 Files Affected:

  • (modified) lldb/include/lldb/Expression/ObjectFileJIT.h (+1-1)
  • (modified) lldb/include/lldb/Symbol/ObjectContainer.h (+2-2)
  • (modified) lldb/include/lldb/Symbol/ObjectFile.h (+3-3)
  • (modified) lldb/include/lldb/Utility/DataExtractor.h (+12)
  • (modified) lldb/include/lldb/lldb-private-interfaces.h (+1-1)
  • (modified) lldb/source/Core/Module.cpp (+4-2)
  • (modified) lldb/source/Expression/ObjectFileJIT.cpp (+3-2)
  • (modified) lldb/source/Plugins/ObjectContainer/BSD-Archive/ObjectContainerBSDArchive.cpp (+23-18)
  • (modified) lldb/source/Plugins/ObjectContainer/BSD-Archive/ObjectContainerBSDArchive.h (+9-6)
  • (modified) lldb/source/Plugins/ObjectContainer/Mach-O-Fileset/ObjectContainerMachOFileset.cpp (+7-7)
  • (modified) lldb/source/Plugins/ObjectContainer/Universal-Mach-O/ObjectContainerUniversalMachO.cpp (+4-4)
  • (modified) lldb/source/Plugins/ObjectFile/Breakpad/ObjectFileBreakpad.cpp (+16-10)
  • (modified) lldb/source/Plugins/ObjectFile/Breakpad/ObjectFileBreakpad.h (+10-7)
  • (modified) lldb/source/Plugins/ObjectFile/COFF/ObjectFileCOFF.cpp (+17-12)
  • (modified) lldb/source/Plugins/ObjectFile/COFF/ObjectFileCOFF.h (+11-8)
  • (modified) lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp (+22-12)
  • (modified) lldb/source/Plugins/ObjectFile/ELF/ObjectFileELF.h (+8-6)
  • (modified) lldb/source/Plugins/ObjectFile/JSON/ObjectFileJSON.cpp (+23-16)
  • (modified) lldb/source/Plugins/ObjectFile/JSON/ObjectFileJSON.h (+11-8)
  • (modified) lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.cpp (+15-10)
  • (modified) lldb/source/Plugins/ObjectFile/Mach-O/ObjectFileMachO.h (+6-4)
  • (modified) lldb/source/Plugins/ObjectFile/Minidump/ObjectFileMinidump.cpp (+1-1)
  • (modified) lldb/source/Plugins/ObjectFile/Minidump/ObjectFileMinidump.h (+4-3)
  • (modified) lldb/source/Plugins/ObjectFile/PDB/ObjectFilePDB.cpp (+7-7)
  • (modified) lldb/source/Plugins/ObjectFile/PDB/ObjectFilePDB.h (+2-2)
  • (modified) lldb/source/Plugins/ObjectFile/PECOFF/ObjectFilePECOFF.cpp (+14-10)
  • (modified) lldb/source/Plugins/ObjectFile/PECOFF/ObjectFilePECOFF.h (+8-5)
  • (modified) lldb/source/Plugins/ObjectFile/XCOFF/ObjectFileXCOFF.cpp (+17-10)
  • (modified) lldb/source/Plugins/ObjectFile/XCOFF/ObjectFileXCOFF.h (+6-4)
  • (modified) lldb/source/Plugins/ObjectFile/wasm/ObjectFileWasm.cpp (+21-15)
  • (modified) lldb/source/Plugins/ObjectFile/wasm/ObjectFileWasm.h (+10-7)
  • (modified) lldb/source/Plugins/Process/gdb-remote/GDBRemoteRegisterContext.cpp (+2-1)
  • (modified) lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp (+3-3)
  • (modified) lldb/source/Plugins/SymbolVendor/ELF/SymbolVendorELF.cpp (+4-4)
  • (modified) lldb/source/Plugins/SymbolVendor/MacOSX/SymbolVendorMacOSX.cpp (+2-2)
  • (modified) lldb/source/Plugins/SymbolVendor/PECOFF/SymbolVendorPECOFF.cpp (+2-2)
  • (modified) lldb/source/Plugins/SymbolVendor/wasm/SymbolVendorWasm.cpp (+2-2)
  • (modified) lldb/source/Symbol/ObjectContainer.cpp (+5-3)
  • (modified) lldb/source/Symbol/ObjectFile.cpp (+34-21)
  • (modified) lldb/source/Symbol/SymbolVendor.cpp (+1-1)
  • (modified) lldb/source/Utility/DataExtractor.cpp (+9)
diff --git a/lldb/include/lldb/Expression/ObjectFileJIT.h b/lldb/include/lldb/Expression/ObjectFileJIT.h
index 7e50340ca84bd..ff584f15248f1 100644
--- a/lldb/include/lldb/Expression/ObjectFileJIT.h
+++ b/lldb/include/lldb/Expression/ObjectFileJIT.h
@@ -49,7 +49,7 @@ class ObjectFileJIT : public ObjectFile {
   }
 
   static lldb_private::ObjectFile *
-  CreateInstance(const lldb::ModuleSP &module_sp, lldb::DataBufferSP data_sp,
+  CreateInstance(const lldb::ModuleSP &module_sp, lldb::DataExtractorSP data_sp,
                  lldb::offset_t data_offset, const lldb_private::FileSpec *file,
                  lldb::offset_t file_offset, lldb::offset_t length);
 
diff --git a/lldb/include/lldb/Symbol/ObjectContainer.h b/lldb/include/lldb/Symbol/ObjectContainer.h
index 886ab46dc1e48..6f03c4d968707 100644
--- a/lldb/include/lldb/Symbol/ObjectContainer.h
+++ b/lldb/include/lldb/Symbol/ObjectContainer.h
@@ -97,7 +97,7 @@ class ObjectContainer : public PluginInterface, public ModuleChild {
   /// Attempts to parse the object header.
   ///
   /// This function is used as a test to see if a given plug-in instance can
-  /// parse the header data already contained in ObjectContainer::m_data. If
+  /// parse the header data already contained in ObjectContainer::m_data_sp. If
   /// an object file parser does not recognize that magic bytes in a header,
   /// false should be returned and the next plug-in can attempt to parse an
   /// object file.
@@ -140,7 +140,7 @@ class ObjectContainer : public PluginInterface, public ModuleChild {
   lldb::addr_t m_length;
 
   /// The data for this object file so things can be parsed lazily.
-  DataExtractor m_data;
+  lldb::DataExtractorSP m_data_sp;
 
 private:
   ObjectContainer(const ObjectContainer &) = delete;
diff --git a/lldb/include/lldb/Symbol/ObjectFile.h b/lldb/include/lldb/Symbol/ObjectFile.h
index 993650b8888f5..0c9bf8fc03c63 100644
--- a/lldb/include/lldb/Symbol/ObjectFile.h
+++ b/lldb/include/lldb/Symbol/ObjectFile.h
@@ -105,10 +105,10 @@ class ObjectFile : public std::enable_shared_from_this<ObjectFile>,
   /// more than one architecture or object.
   ObjectFile(const lldb::ModuleSP &module_sp, const FileSpec *file_spec_ptr,
              lldb::offset_t file_offset, lldb::offset_t length,
-             lldb::DataBufferSP data_sp, lldb::offset_t data_offset);
+             lldb::DataExtractorSP extractor_sp, lldb::offset_t data_offset);
 
   ObjectFile(const lldb::ModuleSP &module_sp, const lldb::ProcessSP &process_sp,
-             lldb::addr_t header_addr, lldb::DataBufferSP data_sp);
+             lldb::addr_t header_addr, lldb::DataExtractorSP extractor_sp);
 
   /// Destructor.
   ///
@@ -152,7 +152,7 @@ class ObjectFile : public std::enable_shared_from_this<ObjectFile>,
   static lldb::ObjectFileSP
   FindPlugin(const lldb::ModuleSP &module_sp, const FileSpec *file_spec,
              lldb::offset_t file_offset, lldb::offset_t file_size,
-             lldb::DataBufferSP &data_sp, lldb::offset_t &data_offset);
+             lldb::DataExtractorSP extractor_sp, lldb::offset_t &data_offset);
 
   /// Find a ObjectFile plug-in that can parse a file in memory.
   ///
diff --git a/lldb/include/lldb/Utility/DataExtractor.h b/lldb/include/lldb/Utility/DataExtractor.h
index db85b44debf43..56d4fd36198fe 100644
--- a/lldb/include/lldb/Utility/DataExtractor.h
+++ b/lldb/include/lldb/Utility/DataExtractor.h
@@ -109,6 +109,18 @@ class DataExtractor {
   DataExtractor(const lldb::DataBufferSP &data_sp, lldb::ByteOrder byte_order,
                 uint32_t addr_size, uint32_t target_byte_size = 1);
 
+  /// Constructure with shared data, but no byte order/addr size unspecified.
+  ///
+  /// Copies the data shared pointer which adds a reference to the contained
+  /// in \a data_sp. The shared data reference is reference counted to ensure
+  /// the data lives as long as anyone still has a valid shared pointer to the
+  /// data in \a data_sp.
+  ///
+  /// \param[in] data_sp
+  ///     A shared pointer to data.
+  DataExtractor(const lldb::DataBufferSP &data_sp,
+                uint32_t target_byte_size = 1);
+
   /// Construct with a subset of \a data.
   ///
   /// Initialize this object with a subset of the data bytes in \a data. If \a
diff --git a/lldb/include/lldb/lldb-private-interfaces.h b/lldb/include/lldb/lldb-private-interfaces.h
index 52806eea190a7..f7673b5dadc11 100644
--- a/lldb/include/lldb/lldb-private-interfaces.h
+++ b/lldb/include/lldb/lldb-private-interfaces.h
@@ -50,7 +50,7 @@ typedef size_t (*ObjectFileGetModuleSpecifications)(
     lldb::offset_t data_offset, lldb::offset_t file_offset,
     lldb::offset_t length, ModuleSpecList &module_specs);
 typedef ObjectFile *(*ObjectFileCreateInstance)(const lldb::ModuleSP &module_sp,
-                                                lldb::DataBufferSP data_sp,
+                                                lldb::DataExtractorSP data_sp,
                                                 lldb::offset_t data_offset,
                                                 const FileSpec *file,
                                                 lldb::offset_t file_offset,
diff --git a/lldb/source/Core/Module.cpp b/lldb/source/Core/Module.cpp
index da2c188899f03..3795321dd9025 100644
--- a/lldb/source/Core/Module.cpp
+++ b/lldb/source/Core/Module.cpp
@@ -1206,10 +1206,12 @@ ObjectFile *Module::GetObjectFile() {
         m_did_load_objfile = true;
         // FindPlugin will modify its data_sp argument. Do not let it
         // modify our m_data_sp member.
-        auto data_sp = m_data_sp;
+        DataExtractorSP extractor_sp;
+        if (m_data_sp)
+          extractor_sp = std::make_shared<DataExtractor>(m_data_sp);
         m_objfile_sp = ObjectFile::FindPlugin(
             shared_from_this(), &m_file, m_object_offset,
-            file_size - m_object_offset, data_sp, data_offset);
+            file_size - m_object_offset, extractor_sp, data_offset);
         if (m_objfile_sp) {
           // Once we get the object file, update our module with the object
           // file's architecture since it might differ in vendor/os if some
diff --git a/lldb/source/Expression/ObjectFileJIT.cpp b/lldb/source/Expression/ObjectFileJIT.cpp
index 46ceb75fbc721..431bcd9b5f198 100644
--- a/lldb/source/Expression/ObjectFileJIT.cpp
+++ b/lldb/source/Expression/ObjectFileJIT.cpp
@@ -41,7 +41,7 @@ void ObjectFileJIT::Terminate() {
 }
 
 ObjectFile *ObjectFileJIT::CreateInstance(const lldb::ModuleSP &module_sp,
-                                          DataBufferSP data_sp,
+                                          DataExtractorSP data_sp,
                                           lldb::offset_t data_offset,
                                           const FileSpec *file,
                                           lldb::offset_t file_offset,
@@ -70,7 +70,8 @@ size_t ObjectFileJIT::GetModuleSpecifications(
 
 ObjectFileJIT::ObjectFileJIT(const lldb::ModuleSP &module_sp,
                              const ObjectFileJITDelegateSP &delegate_sp)
-    : ObjectFile(module_sp, nullptr, 0, 0, DataBufferSP(), 0), m_delegate_wp() {
+    : ObjectFile(module_sp, nullptr, 0, 0, DataExtractorSP(), 0),
+      m_delegate_wp() {
   if (delegate_sp) {
     m_delegate_wp = delegate_sp;
     m_data_nsp->SetByteOrder(delegate_sp->GetByteOrder());
diff --git a/lldb/source/Plugins/ObjectContainer/BSD-Archive/ObjectContainerBSDArchive.cpp b/lldb/source/Plugins/ObjectContainer/BSD-Archive/ObjectContainerBSDArchive.cpp
index 6e5617664f7fe..50c67eea18200 100644
--- a/lldb/source/Plugins/ObjectContainer/BSD-Archive/ObjectContainerBSDArchive.cpp
+++ b/lldb/source/Plugins/ObjectContainer/BSD-Archive/ObjectContainerBSDArchive.cpp
@@ -68,21 +68,19 @@ void ObjectContainerBSDArchive::Object::Dump() const {
 ObjectContainerBSDArchive::Archive::Archive(const lldb_private::ArchSpec &arch,
                                             const llvm::sys::TimePoint<> &time,
                                             lldb::offset_t file_offset,
-                                            lldb_private::DataExtractor &data,
+                                            lldb::DataExtractorSP data,
                                             ArchiveType archive_type)
     : m_arch(arch), m_modification_time(time), m_file_offset(file_offset),
-      m_objects(), m_data(data), m_archive_type(archive_type) {}
+      m_objects(), m_data_sp(data), m_archive_type(archive_type) {}
 
 Log *l = GetLog(LLDBLog::Object);
 ObjectContainerBSDArchive::Archive::~Archive() = default;
 
 size_t ObjectContainerBSDArchive::Archive::ParseObjects() {
-  DataExtractor &data = m_data;
-
   std::unique_ptr<llvm::MemoryBuffer> mem_buffer =
       llvm::MemoryBuffer::getMemBuffer(
-          llvm::StringRef((const char *)data.GetDataStart(),
-                          data.GetByteSize()),
+          llvm::StringRef((const char *)m_data_sp->GetDataStart(),
+                          m_data_sp->GetByteSize()),
           llvm::StringRef(),
           /*RequiresNullTerminator=*/false);
 
@@ -221,7 +219,7 @@ ObjectContainerBSDArchive::Archive::shared_ptr
 ObjectContainerBSDArchive::Archive::ParseAndCacheArchiveForFile(
     const FileSpec &file, const ArchSpec &arch,
     const llvm::sys::TimePoint<> &time, lldb::offset_t file_offset,
-    DataExtractor &data, ArchiveType archive_type) {
+    DataExtractorSP data, ArchiveType archive_type) {
   shared_ptr archive_sp(
       new Archive(arch, time, file_offset, data, archive_type));
   if (archive_sp) {
@@ -368,16 +366,17 @@ ObjectContainerBSDArchive::~ObjectContainerBSDArchive() = default;
 
 bool ObjectContainerBSDArchive::ParseHeader() {
   if (m_archive_sp.get() == nullptr) {
-    if (m_data.GetByteSize() > 0) {
+    if (m_data_sp->GetByteSize() > 0) {
       ModuleSP module_sp(GetModule());
       if (module_sp) {
         m_archive_sp = Archive::ParseAndCacheArchiveForFile(
             m_file, module_sp->GetArchitecture(),
-            module_sp->GetModificationTime(), m_offset, m_data, m_archive_type);
+            module_sp->GetModificationTime(), m_offset, m_data_sp,
+            m_archive_type);
       }
-      // Clear the m_data that contains the entire archive data and let our
+      // Clear the m_data_sp that contains the entire archive data and let our
       // m_archive_sp hold onto the data.
-      m_data.Clear();
+      m_data_sp = std::make_shared<DataExtractor>();
     }
   }
   return m_archive_sp.get() != nullptr;
@@ -416,14 +415,18 @@ ObjectFileSP ObjectContainerBSDArchive::GetObjectFile(const FileSpec *file) {
               child_data_sp->GetByteSize() != object->file_size)
             return ObjectFileSP();
           lldb::offset_t data_offset = 0;
+          DataExtractorSP extractor_sp =
+              std::make_shared<DataExtractor>(child_data_sp);
           return ObjectFile::FindPlugin(
               module_sp, &child, m_offset + object->file_offset,
-              object->file_size, child_data_sp, data_offset);
+              object->file_size, extractor_sp, data_offset);
         }
         lldb::offset_t data_offset = object->file_offset;
+        DataExtractorSP extractor_sp =
+            std::make_shared<DataExtractor>(m_archive_sp->GetData());
         return ObjectFile::FindPlugin(
             module_sp, file, m_offset + object->file_offset, object->file_size,
-            m_archive_sp->GetData().GetSharedDataBuffer(), data_offset);
+            extractor_sp, data_offset);
       }
     }
   }
@@ -438,9 +441,10 @@ size_t ObjectContainerBSDArchive::GetModuleSpecifications(
   // We have data, which means this is the first 512 bytes of the file Check to
   // see if the magic bytes match and if they do, read the entire table of
   // contents for the archive and cache it
-  DataExtractor data;
-  data.SetData(data_sp, data_offset, data_sp->GetByteSize());
-  ArchiveType archive_type = ObjectContainerBSDArchive::MagicBytesMatch(data);
+  DataExtractorSP extractor_sp = std::make_shared<DataExtractor>();
+  extractor_sp->SetData(data_sp, data_offset, data_sp->GetByteSize());
+  ArchiveType archive_type =
+      ObjectContainerBSDArchive::MagicBytesMatch(*extractor_sp.get());
   if (!file || !data_sp || archive_type == ArchiveType::Invalid)
     return 0;
 
@@ -455,9 +459,10 @@ size_t ObjectContainerBSDArchive::GetModuleSpecifications(
     data_sp =
         FileSystem::Instance().CreateDataBuffer(file, file_size, file_offset);
     if (data_sp) {
-      data.SetData(data_sp, 0, data_sp->GetByteSize());
+      extractor_sp->SetData(data_sp);
       archive_sp = Archive::ParseAndCacheArchiveForFile(
-          file, ArchSpec(), file_mod_time, file_offset, data, archive_type);
+          file, ArchSpec(), file_mod_time, file_offset, extractor_sp,
+          archive_type);
     }
   }
 
diff --git a/lldb/source/Plugins/ObjectContainer/BSD-Archive/ObjectContainerBSDArchive.h b/lldb/source/Plugins/ObjectContainer/BSD-Archive/ObjectContainerBSDArchive.h
index fbecd1d27063e..1463c473a6d8b 100644
--- a/lldb/source/Plugins/ObjectContainer/BSD-Archive/ObjectContainerBSDArchive.h
+++ b/lldb/source/Plugins/ObjectContainer/BSD-Archive/ObjectContainerBSDArchive.h
@@ -13,7 +13,9 @@
 #include "lldb/Symbol/ObjectContainer.h"
 #include "lldb/Utility/ArchSpec.h"
 #include "lldb/Utility/ConstString.h"
+#include "lldb/Utility/DataExtractor.h"
 #include "lldb/Utility/FileSpec.h"
+#include "lldb/Utility/NonNullSharedPtr.h"
 
 #include "llvm/Object/Archive.h"
 #include "llvm/Support/Chrono.h"
@@ -112,7 +114,7 @@ class ObjectContainerBSDArchive : public lldb_private::ObjectContainer {
 
     Archive(const lldb_private::ArchSpec &arch,
             const llvm::sys::TimePoint<> &mod_time, lldb::offset_t file_offset,
-            lldb_private::DataExtractor &data, ArchiveType archive_type);
+            lldb::DataExtractorSP data, ArchiveType archive_type);
 
     ~Archive();
 
@@ -127,7 +129,7 @@ class ObjectContainerBSDArchive : public lldb_private::ObjectContainer {
     static Archive::shared_ptr ParseAndCacheArchiveForFile(
         const lldb_private::FileSpec &file, const lldb_private::ArchSpec &arch,
         const llvm::sys::TimePoint<> &mod_time, lldb::offset_t file_offset,
-        lldb_private::DataExtractor &data, ArchiveType archive_type);
+        lldb::DataExtractorSP data_sp, ArchiveType archive_type);
 
     size_t GetNumObjects() const { return m_objects.size(); }
 
@@ -154,7 +156,8 @@ class ObjectContainerBSDArchive : public lldb_private::ObjectContainer {
 
     bool HasNoExternalReferences() const;
 
-    lldb_private::DataExtractor &GetData() { return m_data; }
+    lldb_private::DataExtractor &GetData() { return *m_data_sp.get(); }
+    lldb::DataExtractorSP &GetDataSP() { return m_data_sp; }
 
     ArchiveType GetArchiveType() { return m_archive_type; }
 
@@ -166,9 +169,9 @@ class ObjectContainerBSDArchive : public lldb_private::ObjectContainer {
     lldb::offset_t m_file_offset;
     std::vector<Object> m_objects;
     ObjectNameToIndexMap m_object_name_to_index_map;
-    lldb_private::DataExtractor m_data; ///< The data for this object container
-                                        ///so we don't lose data if the .a files
-                                        ///gets modified
+    lldb::DataExtractorSP m_data_sp; ///< The data for this object container
+                                     /// so we don't lose data if the .a files
+                                     /// gets modified
     ArchiveType m_archive_type;
   };
 
diff --git a/lldb/source/Plugins/ObjectContainer/Mach-O-Fileset/ObjectContainerMachOFileset.cpp b/lldb/source/Plugins/ObjectContainer/Mach-O-Fileset/ObjectContainerMachOFileset.cpp
index 76141f7976413..7938a234e246f 100644
--- a/lldb/source/Plugins/ObjectContainer/Mach-O-Fileset/ObjectContainerMachOFileset.cpp
+++ b/lldb/source/Plugins/ObjectContainer/Mach-O-Fileset/ObjectContainerMachOFileset.cpp
@@ -197,24 +197,24 @@ bool ObjectContainerMachOFileset::ParseHeader() {
 
   std::lock_guard<std::recursive_mutex> guard(module_sp->GetMutex());
 
-  std::optional<mach_header> header = ParseMachOHeader(m_data);
+  std::optional<mach_header> header = ParseMachOHeader(*m_data_sp.get());
   if (!header)
     return false;
 
   const size_t header_size = MachHeaderSizeFromMagic(header->magic);
   const size_t header_and_lc_size = header_size + header->sizeofcmds;
 
-  if (m_data.GetByteSize() < header_and_lc_size) {
+  if (m_data_sp->GetByteSize() < header_and_lc_size) {
     ProcessSP process_sp(m_process_wp.lock());
     DataBufferSP data_sp =
         process_sp
             ? ObjectFile::ReadMemory(process_sp, m_memory_addr,
                                      header_and_lc_size)
             : ObjectFile::MapFileData(m_file, header_and_lc_size, m_offset);
-    m_data.SetData(data_sp);
+    m_data_sp->SetData(data_sp);
   }
 
-  return ParseFileset(m_data, *header, m_entries, m_memory_addr);
+  return ParseFileset(*m_data_sp.get(), *header, m_entries, m_memory_addr);
 }
 
 size_t ObjectContainerMachOFileset::GetModuleSpecifications(
@@ -282,11 +282,11 @@ ObjectContainerMachOFileset::GetObjectFile(const lldb_private::FileSpec *file) {
   if (!entry)
     return {};
 
-  DataBufferSP data_sp;
+  DataExtractorSP extractor_sp;
   lldb::offset_t data_offset = 0;
   return ObjectFile::FindPlugin(module_sp, file, m_offset + entry->fileoff,
-                                m_data.GetByteSize() - entry->fileoff, data_sp,
-                                data_offset);
+                                m_data_sp->GetByteSize() - entry->fileoff,
+                                extractor_sp, data_offset);
 }
 
 ObjectContainerMachOFileset::Entry *
diff --git a/lldb/source/Plugins/ObjectContainer/Universal-Mach-O/ObjectContainerUniversalMachO.cpp b/lldb/source/Plugins/ObjectContainer/Universal-Mach-O/ObjectContainerUniversalMachO.cpp
index 9af9c0120dd7d..b455fd34ec00b 100644
--- a/lldb/source/Plugins/ObjectContainer/Universal-Mach-O/ObjectContainerUniversalMachO.cpp
+++ b/lldb/source/Plugins/ObjectContainer/Universal-Mach-O/ObjectContainerUniversalMachO.cpp
@@ -74,10 +74,10 @@ ObjectContainerUniversalMachO::ObjectContainerUniversalMachO(
 ObjectContainerUniversalMachO::~ObjectContainerUniversalMachO() = default;
 
 bool ObjectContainerUniversalMachO::ParseHeader() {
-  bool success = ParseHeader(m_data, m_header, m_fat_archs);
+  bool success = ParseHeader(*m_data_sp.get(), m_header, m_fat_archs);
   // We no longer need any data, we parsed all we needed to parse and cached it
   // in m_header and m_fat_archs
-  m_data.Clear();
+  m_data_sp = std::make_shared<DataExtractor>();
   return success;
 }
 
@@ -177,11 +177,11 @@ ObjectContainerUniversalMachO::GetObjectFile(const FileSpec *file) {
     }
 
     if (arch_idx < m_header.nfat_arch) {
-      DataBufferSP data_sp;
+      DataExtractorSP extractor_sp;
       lldb::offset_t data_offset = 0;
       return ObjectFile::FindPlugin(
           module_sp, file, m_offset + m_fat_archs[arch_idx].GetOffset(),
-          m_fat_archs[arch_idx].GetSize(), data_sp, data_offset);
+          m_fat_archs[arch_idx].GetSize(), extractor_sp, data_offset);
     }
   }
   return ObjectFileSP();
diff --git a/lldb/source/Plugins/ObjectFile/Breakpad/ObjectFileBreakpad.cpp b/lldb/source/Plugins/ObjectFile/Breakpad/ObjectFileBreakpad.cpp
index 53ff6ef6613e9..079ca4adf62e2 100644
--- a/lldb/source/Plugins/ObjectFile/Breakpad/ObjectFileBreakpad.cpp
+++ b/lldb/source/Plugins/ObjectFile/Breakpad/ObjectFileBreakpad.cpp
@@ -57,29 +57,35 @@ void ObjectFileBreakpad::Terminate() {
   PluginManager::UnregisterPlugin(CreateInstance);
 }
 
-ObjectFile *ObjectFileBreakpad::CreateInstance(
-    const ModuleSP &module_sp, DataBufferSP data_sp, offset_t data_offset,
-    const FileSpec *file, offset_t file_offset, offset_t length) {
-  if (!data_sp) {
-    data_sp = MapFileData(*file, length, file_offset);
+ObjectFile *ObjectFileBreakpad::CreateInstance(const ModuleSP &module_sp,
+                                               DataExtractorSP extractor_sp,
+                                               offset_t data_offset,
+                                               const FileSpec *file,
+                                               offset_t file_offset,
+                                               offset_t length) {
+  if (!extractor_sp || extractor_sp->GetByteSize() == 0) {
+    DataBufferSP data_sp = MapFileData(*file, length, file_offset);
     if (!data_sp)
  ...
[truncated]

@DavidSpickett
Copy link
Collaborator

I ran the testsuite on macOS and on AArch64 Ubutnu. (btw David, I ran it under qemu on my M4 mac with SME-no-SVE again, Ubuntu 25.10, checked lshw(1) cpu capabilities, and qemu doesn't seem to be virtualizing the SME, that explains why the testsuite passes)

Ok good, the issues probably didn't make it into release versions then.

Copy link
Collaborator

@DavidSpickett DavidSpickett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since @JDevlieghere has more context for this effort, they can be the approver.

@jasonmolenda
Copy link
Collaborator Author

Since @JDevlieghere has more context for this effort, they can be the approver.

I didn't want to get too deep into details not directly related to this PR, but the overall project is a mac specific one. On Darwin systems we have a shared cache, which is the combination of all the system libraries in a single block. The sections of the binaries are rearranged -- so all of the texts, all of the datas, etc., keeping within pc-relative branch offsets. And the nlist table names (linkedit segment) are all combined into one, so duplicated symbol names can be combined. The discrete library binaries don't exist in the OS, only this shared cache binary blob that is mapped into memory in every process.

When lldb is debugging a process on the local computer, it checks that the UUID of the shared cache in the inferior is the same as the UUID of lldb's own shared cache, and if so, it creates ObjectFile's out of its own memory by creating a DataBufferUnowned in HostInfoMacOSX, and passing that DataBufferSP into the ObjectFile plugin finder.

The change I'll work on next is to use some API to map a binary from the shared cache blob on disk into lldb's address space, instead of using lldb's own in-memory mapping of it. mmapping the shared cache in its entirety can be tricky because it's a large address range and finding room for it can be a problem, so instead there are API to map the separate sections of a binary into lldb's address space. It can create a DataBuffer of those mapped segments and create the object file from them. The problem is that the sections are not in any particular order. Normally a Mach-O binary starts with text, then data, then linkedit, and all of our Mach-O parsing depends on this.

The VirutalDataExtractor allows me to create a remapping of the address where different sections got mapped into lldb's own address space into a "0-offset text" style DataBuffer, by providing a remapping, and parse it as if it's a normal Mach-O. So HostInfoMacOSX needs to create this VirtualDataExtractor with the correct remapping, pointing to the local address range for the mapped sections, and pass it to an ObjectFile plugin interface as a DataExtractorSP.

The old system of everyone passing in a DataBufferSP and then ObjectFile base class creating a DataExtractor was the problem; I needed to be able to pick a subclass of DataExtractor at the HostInfoMacOSX caller point.

Copy link
Member

@JDevlieghere JDevlieghere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Motivation for this patch aside, it seems equally reasonable to pass around and store a DataExtractorSP as it is to use a DataBufferSP. The patch looks mostly mechanical (though I know there were a few tricky edge cases).

General observations:

  1. It would be nice if we could consistently use extractor_sp for a DataExtractorSP and data_sp for a DataBufferSP. There's a few cases where you changed the type but not the member, presumably to limit the number of lines changed. I think it would be worth the churn to avoid confusion/surprises down the line.
  2. I'm not a huge fan of checking the ByteSize. I would prefer an operator bool opr a HasData() that checks that instead.

if (file_exists && module_sp->GetObjectName()) {
ObjectFileSP object_file_sp = CreateObjectFromContainer(
module_sp, file, file_offset, file_size, data_sp, data_offset);
module_sp, file, file_offset, file_size, DataBufferSP(), data_offset);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not extractor_sp?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This CreateObjectFromContainer takes a DataBufferSP(). This block opens with !extractor_sp || extractor_sp->GetByteSize() == 0 so it might be an empty DataExtractorSP; we can't pass extractor_sp->GetSharedDataBuffer() here.

jasonmolenda and others added 5 commits December 10, 2025 19:24
…rBSDArchive.h

Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>
Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>
Co-authored-by: Jonas Devlieghere <jonas@devlieghere.com>
callers of GetByteSize()==0 to use it.
DataExtractos being "extractor", DataBuffers being "data"
in all the files being touched by this PR.
@jasonmolenda
Copy link
Collaborator Author

  1. It would be nice if we could consistently use extractor_sp for a DataExtractorSP and data_sp for a DataBufferSP. There's a few cases where you changed the type but not the member, presumably to limit the number of lines changed. I think it would be worth the churn to avoid confusion/surprises down the line.

Agreed, I had the same thought while I was finishing this patch up, but was reluctant to expand all the changes to address it. I added a commit to implement this naming convention.

  1. I'm not a huge fan of checking the ByteSize. I would prefer an operator bool opr a HasData() that checks that instead.

Added a DataExtractor::HasData(), switched callers checking for GetByteSize()==0 to use it.

@DavidSpickett
Copy link
Collaborator

I didn't want to get too deep into details not directly related to this PR, but the overall project is a mac specific one.

Thanks for explaining and now you've explained, I feel justified in getting Jonas to review :)

Copy link
Member

@JDevlieghere JDevlieghere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jasonmolenda jasonmolenda merged commit e4c83b7 into llvm:main Dec 11, 2025
10 checks passed
@jasonmolenda jasonmolenda deleted the objectfile-argument-change-to-dataextractor branch December 11, 2025 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants