Skip to content

Input Wizard Toolset

Parisa Kianmajd edited this page Mar 4, 2015 · 5 revisions

Introduction

This toolset gets two or more taxonomies as input and helps the user to create input files for Euler/X toolkit based on them. In particular, it suggests a set of ISA relationships for each taxonomy as well as a set of articulations between the two in csv format. After being verified by the user, the output csv file can be then converted to CleanTax and fed into Euler/X.

Components

  • addID - takes a csv file as input and adds an extra identifier column to it (if multiple files are given as input, it merges them and operates on them as a unified file)

  • addIsa - takes one (or more) input file(s), groups them based on groupID outputs a file, named "isa"+groupID, for each group, which contains a set of suggested ISA pairs for that group

  • addArt - takes one (or more) input file(s), and optionally one (or more) file(s) of suggested ISA pairs, and outputs a set of suggested articulation pairs between the groups elements

  • p2c - generates a text file with CleanTax syntax from the csv file generated by addArt

  • c2csv - generates a 3 column csv file from a cleanTax file, which can be then be used as a test input for Input Wizard Toolset for evaluating it.

Synopsis

  • addID -i csv_file -i csv_file .... > csv_file

  • addIsa -i csv_file -i csv_file ....

  • addArt -i csv_file -i csv_file .... -t isa1_csv_file -t -t isa2_csv_file > csv_file

  • p2c csv_file -i csv_file> cleanTax_file

  • c2csv -i cleanTax_file > csv_file

Description

  • addID

    • Receives one (or more) csv file(s) with 3 column <Simple_Name, According_To, Rank> (with the first line as the header line) as input and adds a new column ID to each row.
    • The ID is of format TaxonID.Rank****No, in which TaxonID is the taxonomy id (a number assigned to each According_To value), Rank is a pair tuple produced by mapping the column in the input into (one letter, number) where number shows the position of this rank in and is an incremental number that is assigned to the rows sorted based on Rank, Simple_Name.
  • addIsa

    • Receives one (or more) csv file(s) with 4 column <ID, Simple_Name, According_To, Rank> as input and for each groupID, returns a csv file of suggested ISA pairs:
      <ID, TaxonID, Rank, No, Scientific_Name>, ISA, <ID, TaxonID, Rank, No, Scientific_Name>
      the user is then asked to verify these isa relations and use the result as input for the next step
    • the user can use p2c to convert the suggested ISA pais to cleanTax format and visualize it using Euler visualizer (i.e. use c2y to convert it to yaml and y2d to convert to dot).addArt Receives one or more csv file with 4 column <ID, Simple_Name, According_To, Rank>, and optionally one or more files of suggested ISA pairs, as input and returns a csv file of suggested articulation pairs:
      <ID, TaxonID, Rank, No, Scientific_Name>, ? , <ID, TaxonID, Rank, No, Scientific_Name>
      the user is then asked to revise these relations and replace the ? with appropriate articulation.
  • p2c
    Receives a csv file of revised articulation, ISA pairs, or both as well as the input file with the name of each taxonomic group, and converts it to standard cleanTax format.

  • c2csv

    • Receives a cleanTax file as input and returns a 3 column <Name, Author, Rank> csv file that can be used as an input for input wizard.
    • The goal is to reverse engineer cleanTax files and compare the suggested articulation pairs by the input wizard with the correct ones to gauge its efficiency.
Clone this wiki locally