Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • V validator.genesys-pgr.org
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 1
    • Issues 1
    • List
    • Boards
    • Service Desk
    • Milestones
  • Deployments
    • Deployments
    • Releases
  • Packages and registries
    • Packages and registries
    • Container Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • Repository
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
Collapse sidebar
  • Genesys PGRGenesys PGR
  • validator.genesys-pgr.org
  • Issues
  • #11
Closed
Open
Issue created Sep 20, 2017 by Matija Obreza@mobrezaOwner

Auto-detect CSV configuration

Add "Autodetect" button to the CSV configuration section of the form. A Javascript function inspects the text in the csvText text area and tries to automatically detect the

  1. separator char
  2. quote char
  3. escape char
  4. decimal mark

Sensible defaults to start from are tab, ", \ and ..

Separator char

Simple version: try tab, ,, , | in this order and find for which candidate char the first 10 rows (or less) return the same number of columns when splitting the row.

Text analysis version: built a map of character occurrences for each of the first few lines. It is unlikely that [a-Z0-9] would be used as separator, so those can be ignored. The character that has the most similar number of occurrences in all rows is most likely the separator.

Quote char

After selecting the separator, extract all column values of the first 10 rows.

Simple version: test if " or ' appear as the first and last character.

Text analysis version: build the character occurrence map of first and last characters (they must must match) of all column values. It is again unlikely that [a-Z0-9] would be used as the quote char. The character that appears most as the first and last character is likely the quote char.

Escape char

After selecting the separator and quote char, extract all column values of the first 10 rows that start and end with the quote char. Build an occurrence map of the character immediately before any occurrence of quote char within the column values.

Decimal mark

Note: decimal mark appears only once in a number.

Extract all column values of the first 10 rows that contain at least one digit. Build a character occurrence map of all non-digit characters, ignoring +, - and [a-Z]. Decimal mark is the character that appears at most once in all column values that contain digits.

Assignee
Assign to
Time tracking