Skip to content

Bug when using graph-convert to convert symmetric matrix market file to galois graph file #424

@xiaozxiong

Description

@xiaozxiong

Hi!
I am trying to run your BFS program and the dataset I used is hollywood-2009. I used the following command to perform conversion:

 ./tools/graph-convert/graph-convert --mtx2gr --edgeType=int32 hollywood-2009.mtx  hollywood-2009.gr

Then I started to run the bfs-cpu under lonestar/analytics/cpu/bfs directory, but the result was wrong:

# ./bfs-cpu --symmetricGraph  hollywood-2009.gr 
Running SyncTile algorithm with PARALLEL execution
Node 1 has distance 2147483646
INFO: # visited nodes is 1
INFO: Max distance is 0
INFO: Sum of visited distances is 0
1139904 unvisited nodes; this is an error if the graph is strongly connected
max dist: 0
Verification successful.

Finally, I found the problem locates in tools/graph-convert/graph-convert.cpp in which the conversion program is:

struct Mtx2Gr : public HasNoVoidSpecialization {
  template <typename EdgeTy>
  void convert(const std::string& infilename, const std::string& outfilename) {
    typedef galois::graphs::FileGraphWriter Writer;

    Writer p;
    uint32_t nnodes;
    size_t nedges;

    for (int phase = 0; phase < 2; ++phase) {
      std::ifstream infile(infilename.c_str());
      if (!infile) {
        GALOIS_DIE("failed to open input file");
      }

      // Skip comments
      while (infile) {
        if (infile.peek() != '%') {
          break;
        }
        skipLine(infile);
      }

      // Read header
      char header[256];
      infile.getline(header, 256);
      std::istringstream line(header, std::istringstream::in);
      std::vector<std::string> tokens;
      while (line) {
        std::string tmp;
        line >> tmp;
        if (line) {
          tokens.push_back(tmp);
        }
      }
      if (tokens.size() != 3) {
        GALOIS_DIE("unknown problem specification line: ", line.str());
      }
      // Prefer C functions for maximum compatibility
      // nnodes = std::stoull(tokens[0]);
      // nedges = std::stoull(tokens[2]);
      nnodes = strtoull(tokens[0].c_str(), NULL, 0);
      nedges = strtoull(tokens[2].c_str(), NULL, 0);

      // Parse edges
      if (phase == 0) {
        p.setNumNodes(nnodes);
        p.setNumEdges<EdgeTy>(nedges);
        p.phase1();
      } else {
        p.phase2();
      }

      for (size_t edge_num = 0; edge_num < nedges; ++edge_num) {
        // if ((nedges / 500 > 0) && (edge_num % (nedges / 500)) == 0) {
        //   printf("Phase %d: current edge progress %lf%%\n", phase,
        //          ((double)edge_num / nedges) * 100);
        // }
        uint32_t cur_id, neighbor_id;
        double weight = 1;

        infile >> cur_id >> neighbor_id >> weight;
        if (cur_id == 0 || cur_id > nnodes) {
          GALOIS_DIE("node id out of range: ", cur_id);
        }
        if (neighbor_id == 0 || neighbor_id > nnodes) {
          GALOIS_DIE("neighbor id out of range: ", neighbor_id);
        }

        // 1 indexed
        if (phase == 0) {
          p.incrementDegree(cur_id - 1);
        } else {
          if constexpr (std::is_void<EdgeTy>::value) {
            p.addNeighbor(cur_id - 1, neighbor_id - 1);
          } else {
            p.addNeighbor<EdgeTy>(cur_id - 1, neighbor_id - 1,
                                  static_cast<EdgeTy>(weight));
          }
          if(cur_id - 1 == 0){
            printf("%d - %d\n", cur_id - 1, neighbor_id - 1);
          }
        }

        skipLine(infile);
      }

      infile.peek();
      if (!infile.eof()) {
        GALOIS_DIE("additional lines in file");
      }
    }
    // this is for the progress print

    p.finish();

    p.toFile(outfilename);
    printStatus(p.size(), p.sizeEdges());
  }
};

This code can only be used to process directed graphs, leading to the issue for undirected graphs.

BTW, maybe it is the reason of #407 .

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions