XML Developer's Guide

Содержание

Saved searches
Use saved searches to filter your results more quickly
SneakyBrian/XML2JSON
Name already in use
Sign In Required
Launching GitHub Desktop
Launching GitHub Desktop
Launching Xcode
Launching Visual Studio Code
Latest commit
Git stats
Files
README.md
Saved searches
Use saved searches to filter your results more quickly
License
sinelaw/xml-to-json
Name already in use
Sign In Required
Launching GitHub Desktop
Launching GitHub Desktop
Launching Xcode
Launching Visual Studio Code
Latest commit
Git stats
Files
README.md
About
script to translate xml to json
6 Answers 6

Saved searches

Use saved searches to filter your results more quickly

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

XML2JSON is a command line tool to convert XML files to JSON

SneakyBrian/XML2JSON

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

XML2JSON is a simple command line tool for converting XML files to JSON. It uses the JSON.NET library.

XML2JSON.exe input.xml output.json

   Gambardella, Matthew  Computer 44.95 2000-10-01 An in-depth look at creating applications with XML.  Ralls, Kim  Fantasy 5.95 2000-12-16 A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world.

Источник

Saved searches

Use saved searches to filter your results more quickly

Fast & easy command line tool for converting XML files to JSON

License

sinelaw/xml-to-json

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Name already in use

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Читайте также: All linux version names

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

Fast & easy library & command line tool for converting XML files to JSON.

Heads up! The project has been split into two projects. See xml-to-json-fast for a low-memory-usage version that has less features.

xml-to-json converts xml to json. It includes a Haskell library and a command-line tool.

xml-to-json ships with two different executables:

xml-to-json-fast («fast») uses a lot less memory, but you can’t control the output. Can be used on XML files of any size.
xml-to-json («classic») provides some control over json output format, but uses a lot of memory. Suitable for smaller files.

The so-called «fast» version (which uses a lot less memory) has been forked into a separate project, xml-to-json-fast.

The fully featured «classic» xml-to-json provides compact json output that’s designed to be easy to store and process using JSON-based databases, such as mongoDB or CouchDB. In fact, the original motivation for xml-to-json was to store and query a large (~10GB) XML-based dataset, using an off-the-shelf scalable JSON database.

When using «classic» xml-to-json, the input XML must be valid.

Currently the xml-to-json processes XMLs according to lossy rules designed to produce sensibly minimal output. If you need to convert without losing information at all consider something like the XSLT offered by the jsonml project. Unlike jsonml, this tool — xml-to-json — produces json output similar (but not identical) to the xml2json-xslt project.

xml-to-json is implemented in Haskell.

As of this writing, xml-to-json uses hxt with the expat-based hxt-expat parser. The pure Haskell parsers for hxt all seem to have memory issues which hxt-expat doesn’t.

You may need to install libcurl . For that, follow your platform’s instructions. On debian/ubuntu, you can use: sudo apt-get install libcurl4-openssl-dev .

Can’t use Stack? Use cabal-install

Note for Windows users: Only local files, not URLs, are supported as command line arguments. This is because curl doesn’t compile on my (windows + cygwin) machine out-of-the-box.

To install the release version: Since xml-to-json is implemented in Haskell, «all you need to do» is install the latest cabal-install (you can try the Haskell platform) for your system, and then run:

cabal update cabal install xml-to-json

To install from source: Clone this repository locally, and then (assuming you have Haskell platform installed) run cabal install :

cd xml-to-json cabal install

Just run the tool with the filename as a single argument, and direct the stdout to a file or a pipe:

Classic xml-to-json : Advanced Usage

Use the —help option to see the full command line options.

Here’s a (possibly outdated) snapshot of the —help output:

Usage: [OPTION. ] files. -h --help Show this help -t TAG --tag-name=TAG Start conversion with nodes named TAG (ignoring all parent nodes) -s --skip-roots Ignore the selected nodes, and start converting from their children (can be combined with the 'start-tag' option to process only children of the matching nodes) -a --as-array Output the resulting objects in a top-level JSON array -m --multiline When using 'as-array' output, print each of top-level json object on a seperate line. (If not using 'as-array', this option will be on regardless, and output is always line-seperated.) --no-collapse-text=PATTERN For elements with tag matching regex PATTERN only: Don't collapse elements that only contain text into a simple string property. Instead, always emit '.value' properties for text nodes, even if an element contains only text. (Output 'schema' will be more stable.) --no-ignore-nulls Don't ignore nulls (and do output them) in the top level of output objects

    Some simple text More text Just a dummy Xml file.

JSON output using default settings:

Formatted for readability (not the actual output):

Note that currently xml-to-json does not retain the order of elements / attributes.

Using the various options you can control various aspects of the output such as:

At which top-level nodes the conversion starts to work (-t)
Whether to wrap the output in a top-level JSON array,
Whether or not to collapse simple string elements, such as the element in the example, into a simple string property (you can specify a regular expression pattern to decide which nodes to collapse)
And more.

Use the —help option to see a full list of options.

Performance of «classic» xml-to-json

The «classic» xml-to-json cannot operate on large files. However, it is fast when operating on multiple small files. For large XML files, the speed on a core-i5 machine is about 2MB of xml / sec, with a 100MB XML file resulting in a 56MB json output. It took about 10 minutes to process 1GB of xml data. The main performance limit is memory — only one single-threaded process was running since every single large file (tens of megabytes) consumes a lot of memory — about 50 times the size of the file.

A few simple tests have shown this to be at least twice as fast as jsonml’s xlst-based converter (however, the outputs are not similar, as stated above).

Currently the program processes files serially. If run in parallel on many small XML files (<5MB) the performance becomes cpu-bound and processing may be much faster, depending on the architecture. A simple test showed performance of about 5MB / sec (on the same core-i5).

About

Fast & easy command line tool for converting XML files to JSON

Источник

script to translate xml to json

What kind of assumptions can you make on the input? Are questions and answers always on separate lines, are the and tags always on their own on a line. Could the text have some encoding ( CDATA , { , &eacute . ), could it contain double quotes or backslashes? Is the XML file encoded in UTF-8?

6 Answers 6

in fact, you could get away here even w/o Python programming, just using 2 unix utilities:

jtm — that one allows xml json lossless conversion
jtc — that one allows manipulating JSONs

Thus, assuming your xml is in file.xml , jtm would convert it to the following json:

bash $ jtm file.xml [ < "quiz": [ < "que": "The question her" >, < "ca": "text" >, < "ia": "text" >, < "ia": "text" >, < "ia": "text" >] > ] bash $

and then, applying a series of JSON transformations, you can arrive to a desired result:

bash $ jtm file.xml | jtc -w'l:[1:][-2]' -ei echo < '"answer[-]"': <>>\; -i'l:[1:]' | jtc -w'l:[-1][:][0]' -w'l:[-1][:]' -s | jtc -w'l:' -w'l:[0]' -s | jtc -w'l: <>v' -u'"text"' [ < "answer1": "text", "answer2": "text", "answer3": "text", "answer4": "text", "text": "The question her" >] bash $

Though, due to involved shell scripting ( echo command), it’ll be slower than Python’s — for 5000 questions, I’d expect it would run around a minute. (In the future version of the jtc I plan to allow interpolations even in statically specified JSONs, so that for templating no external shell-scripting would be required, then the operations will be blazing fast)

Читайте также: Linking files in linux

if you’re curious about jtc syntax, you could find a user guide here: https://github.com/ldn-softdev/jtc/blob/master/User%20Guide.md

Your solution looks really good. I didn’t have time to check your concern regarding performance. But I’m curious: isn’t it possible to replace multiple echo invocations with piping through sed ?

The xq tool from https://kislyuk.github.io/yq/ turns your XML into

by just using the identity filter ( xq . file.xml ).

We can massage this into something closer the form you want using

To fix up the answers bit so that you get your enumerated keys:

xq '.quiz | < text: .que >+ ( [ range(.ia|length) as $i | < key: "answer\($i+1)", value: .ia[$i] >] | from_entries )' file.xml

This adds the enumerated answer keys with the values from the ia nodes by iterating over the ia nodes and manually producing a set of keys and values. These are then turned into real key-value pairs using from_entries and added to the proto-object we’ve created ( < text: .que >).

If your XML document contains multiple quiz nodes under some root node, then change .quiz in the jq expression above to .[].quiz[] to transform each of them, and you may want to put the resulting objects into an array:

xq '.[].quiz[] | [ < text: .que >+ ( [ range(.ia|length) as $i | < key: "answer\($i+1)", value: .ia[$i] >] | from_entries ) ]' file.xml

I assume your Ubuntu has python installed

#!/usr/bin/python3 import io import json import xml.etree.ElementTree d = """ The question her text text text text  """ s = io.StringIO(d) # root = xml.etree.ElementTree.parse("filename_here").getroot() root = xml.etree.ElementTree.parse(s).getroot() out = <> i = 1 for child in root: name, value = child.tag, child.text if name == 'que': name = 'question' else: name = 'answer%s' % i i += 1 out[name] = value print(json.dumps(out))

save it and chmod to executable you can easily modify to take a file as input instead of just text

EDIT Okey, this is a more complete script:

#!/usr/bin/python3 import json import sys import xml.etree.ElementTree def read_file(filename): root = xml.etree.ElementTree.parse(filename).getroot() return root # assule we have a list of , contained in some other element def parse_quiz(quiz_element, out): i = 1 tmp = <> for child in quiz_element: name, value = child.tag, child.text if name == 'que': name = 'question' else: name = 'answer%s' % i i += 1 tmp[name] = value out.append(tmp) def parse_root(root_element, out): for child in root_element: if child.tag == 'quiz': parse_quiz(child, out) def convert_xml_to_json(filename): root = read_file(filename) out = [] parse_root(root, out) print(json.dumps(out)) if __name__ == '__main__': if len(sys.argv) > 1: convert_xml_to_json(sys.argv[1]) else: print("Usage: script ")

I made a file with following, which I named xmltest :

  The question her text text text text  Question number 1 blabla stuff

So you have a list of quiz inside some other container.

Now, I launch it like this: $ chmod u+x scratch.py , then scratch.py filenamewithxml

Источник

XML Developer’s Guide

Saved searches

Use saved searches to filter your results more quickly

SneakyBrian/XML2JSON

Name already in use

Sign In Required

Launching GitHub Desktop

Launching GitHub Desktop

Launching Xcode

Launching Visual Studio Code

Latest commit

Git stats

Files

README.md

Saved searches

Use saved searches to filter your results more quickly

License

sinelaw/xml-to-json

Name already in use

Sign In Required

Launching GitHub Desktop

Launching GitHub Desktop

Launching Xcode

Launching Visual Studio Code

Latest commit

Git stats

Files

README.md

About

script to translate xml to json

6 Answers 6